**** BEGIN LOGGING AT Tue Jul 17 03:00:03 2018 Jul 17 04:23:35 e Jul 17 12:40:50 zmatt: didn't say thank you last night, sorry Jul 17 13:31:18 I've got an X15 with the fpga_mgr and altera_ps_spi drivers successfully loaded. Jul 17 13:31:34 Is anyone familiar with how to get the kernel to start the transfer? Jul 17 13:31:42 Can it be done from user space? Jul 17 15:16:18 [ Jul 17 15:23:01 stash: I get literally zero google hits on altera_ps_spi. do you mean the spi-altera driver? Jul 17 15:23:53 KERNEL/drivers/fpga/altera-ps-spi.c Jul 17 15:23:58 maybe not Jul 17 15:24:03 ok Jul 17 15:25:03 zmatt: I get: fpga0 altera-ps-spi spi1.0 registered from dmsg, but no /dev/ devices to write to. Jul 17 15:25:30 you created a suitable DT entry? Jul 17 15:25:48 I guess so given that message Jul 17 15:26:07 zmatt: yes, the drivers are happy, I'm not sure if I need an additional spidev with the same CS in the mscpi1 device Jul 17 15:26:28 ? Jul 17 15:26:48 what does your dt fragment look like exactly? Jul 17 15:28:48 https://gist.github.com/stashlukj/690a1890d0482ab4c61709c286915eef Jul 17 15:31:45 based on some quick googling it looks like there's no userspace interface. instead you're supposed to use DT to declare what needs to be loaded Jul 17 15:31:49 https://www.kernel.org/doc/Documentation/devicetree/bindings/fpga/fpga-region.txt Jul 17 15:32:37 I think Jul 17 15:32:49 I just very briefly skimmed it Jul 17 15:32:56 zmatt: I'll try that out. Jul 17 15:33:15 I suggests you dig around more in that doc and other fpga manager docs google may find you Jul 17 17:23:03 I have the PmodAD5 (SPI, AD7193 24bit Sigma-Delta ADC). Apart from the kernel module, do I need to compile a devicetree properties? Jul 17 17:31:59 yeah you'll need to declare the ad7193 in DT (either the main dt or an overlay) Jul 17 17:32:36 but Jul 17 17:34:01 it looks like the driver currently doesn't have dt support Jul 17 17:35:29 someome submitted a patch in januari Jul 17 17:36:27 got feedback that some things needed to be fixed Jul 17 17:36:31 I see a v2 patch Jul 17 17:37:08 where are you looking? Jul 17 17:37:41 linux-iio mailing list archives Jul 17 17:37:50 https://www.spinics.net/lists/linux-iio/thrd5.html Jul 17 17:37:59 search for ad7192 Jul 17 17:38:19 okay Jul 17 17:42:42 some of his patches have been applied (apparently the driver had plenty of issues), but he still needed to fix some things in the dt patch... Jul 17 17:43:11 and then he vanished. last post from him was late january Jul 17 17:44:03 you could try emailing him Jul 17 17:46:07 (him = alexandru ardelean) Jul 17 17:46:55 the mailing list conceals domains of email addresses, but the html page contains the message id in comments, which includes the domain :D Jul 17 17:57:20 One thing first: If I get it correctly, the devicetrees set "boards" properties that are needed during device registration? Jul 17 17:59:22 the devicetree (not plural) is a datastructure passed by u-boot to the kernel that describes the hardware to the kernel Jul 17 18:01:02 can one use adafruit library to write a pwm signal to the servo pins on the beaglebone blue Jul 17 18:01:13 ? Jul 17 18:01:46 Not all hardware, like register addresses, right? Jul 17 18:02:33 CoffeeBreakfast: well it says e.g. there's this peripheral located at that address, this is its interrupt signal, that sort of stuff Jul 17 18:02:42 enough information to be able to instantiate the driver for it Jul 17 18:03:39 labradoodle: all I know about the servo pins is that they're controlled by pru firmware Jul 17 18:03:44 they're not simple pwm outputs Jul 17 18:03:58 Is there another simple pwm output on the board? Jul 17 18:04:17 think so, lemme check Jul 17 18:08:35 GP0.6 (ecap2), GP1.5/usr_red_led (dmtimer4), GP1.6/usr_grn_led (dmtimer7), GPS.3 (ehrpwm0a), GPS.4 (ehrpwm0b), S1.2.6 (ecap0), UT0.3 (ecap2), UT0.4 (ecap1) Jul 17 18:08:51 those seem to be all externally accessible pins which should have hardware pwm capability Jul 17 18:09:16 I don't know which ones can *actually* be configured to pwm, that depends a bit on the Device Tree declarations Jul 17 18:09:28 but I can check Jul 17 18:09:48 Oh perfect thanks so much! Jul 17 18:10:18 zmatt: the pointed source code of analog, without devicetree support, can be used anyway? Jul 17 18:10:26 CoffeeBreakfast: nope Jul 17 18:10:33 well, not without a lot of hassle Jul 17 18:10:53 you'd need to instantiate the driver using kernel code Jul 17 18:11:43 zmatt, maybe you remember our discussion about loading the kernel module of the power monitor hardware ina2xx for access via the i2c-bus. it worked with echo ina219 0x40 < /sys/bus/i2c....., thanks for that, got it working today, even sensors works with it now Jul 17 18:11:51 insmod is not enough? Jul 17 18:14:05 CoffeeBreakfast: manually insmodding/modprobing a device driver is basically never useful. if the kernel has a device that needs that driver, the module gets loaded automatically. if no such device exists, manually loading the module does nothing Jul 17 18:15:06 Hello everybody.. Have small question is there anyway to get the BBG with out connector? I mean not populated yet. No Ethernet, no Usb no breakout pinning? Jul 17 18:15:36 CoffeeBreakfast: the driver needs a bunch of information. this is either passed via a data structure in the kernel ("platform data"), or via device tree Jul 17 18:15:46 (or similar mechanisms on other platforms, e.g. ACPI on x86) Jul 17 18:16:24 labradoodle: I'm still browsing the DT btw. I'm just failing a bit at multitasking ;) Jul 17 18:16:29 zmatt: This is my first year in touch with embedded systems. I thought that driver were written on top of the spi device. Jul 17 18:16:48 it's a driver for an spi device Jul 17 18:17:14 no worries Jul 17 18:17:19 so that's again more information it needs to know (it needs to be attached to an spi bus and a particular chip select) Jul 17 18:19:18 I dont know if thats the correct place to ask my question 😅 Jul 17 18:19:43 zmatt: I think I'm starting to get it :P Jul 17 18:21:29 the LDD3 says nothing about devicetree >.< Jul 17 18:22:48 CoffeeBreakfast: but I suggest just poking the Analog guy what the status is of the dt patch Jul 17 18:25:28 labradoodle: looks like exactly one of those pins can be configured to pwm mode when using the default device tree for the blue (in a recent kernel 4.14-ti anyway) Jul 17 18:26:03 it's GP0 pin 6 Jul 17 18:26:41 and it looks like you'd configure it to pwm mode using "config-pin P9_28 pwm2" Jul 17 18:26:58 which falls very much in the "..... o...okay" category Jul 17 18:28:38 Ok thanks! Jul 17 18:28:42 .... except it looks like they don't enable that ecap peripheral -.- Jul 17 18:28:47 what the hell guys Jul 17 18:29:12 zmatt: private email to him? or on the mailing list? Jul 17 18:30:20 CoffeeBreakfast: to him Jul 17 18:31:13 okay Jul 17 18:33:18 labradoodle: I think I'll just make a tiny overlay for you, since this whole situation is beyond silly Jul 17 18:33:31 labradoodle: do you have a preference which pins? how many pwms do you need? Jul 17 18:33:47 I just need one Jul 17 18:50:30 labradoodle: git clone https://github.com/mvduin/overlay-utils Jul 17 18:50:48 in that dir "make bbblue-pwm.dtbo" Jul 17 18:51:42 copy that to /lib/firmware/ Jul 17 18:52:15 in /boot/uEnv.txt configure: uboot_overlay_addr4=/lib/firmware/bbblue-pwm.dtbo Jul 17 18:52:27 reboot, and pray I didn't fuck up the overlay Jul 17 18:52:27 :D Jul 17 18:52:57 Thanks so much man! I will give it a go Jul 17 18:53:52 I'll also poke rcn, since ecap2 being disabled even though it's made available as mux option is obviously a bug in the DT Jul 17 19:27:28 zmatt: the trick is to load a fpga-region overlay with firmware-name element set at runtime Jul 17 19:28:05 runtime overlays are deprecated though and going to be removed eventually afaik Jul 17 19:30:29 zmatt: I'll just hope they put in a sysfs entry for the firmware path before then. Jul 17 19:35:46 Has anyone toyed with GCC options when compiling code for the beagleboard x15? The default options are "-march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -mthumb". Does anyone have suggestions on what options to use for the Beagleboard x15? I'm looking for speed, not code size. Jul 17 19:36:26 -mcpu=cortex-a15 -mfpu= ... ehh ... hold on Jul 17 19:38:16 -mfpu=neon-vfpv4 Jul 17 19:39:33 and if you want the compiler to be able to use neon instructions for single-precision floating point math you will need -ffast-math (since neon's float math isn't fully IEEE 754 compliant) Jul 17 19:41:25 Let me try. So far all the combos I've tried do not show a noticeable difference in my test code. Jul 17 19:41:46 most code probably doesn't see a noticable difference Jul 17 19:43:32 my main test code is to multiply 32000 integers times a float or double constant and time the execution time. Jul 17 19:45:06 can you share it? benchmarking can be a very tricky thing Jul 17 19:45:40 microbenchmarks at least I mean Jul 17 19:46:30 the median is about 0.160 msec for float or double multiplication. Jul 17 19:46:40 that is complete nonsense Jul 17 19:46:57 then your benchmark is broken Jul 17 19:47:02 Sure, let me put it in a sample file. Jul 17 19:48:21 why do you think so? Jul 17 19:49:52 because that's 240000 cycles (@ 1.5 GHz) Jul 17 19:50:25 a multiply takes a few cycles at most (a few dozen cycles on the cortex-a8's slowass non-neon VFP unit) Jul 17 19:50:56 hold on, that number includes other overhead unrelated to just multiplying 32000 ints. Disregard it, it only makes sense if I compare against the same code compiled w different options. Jul 17 19:51:29 I was not clear. Jul 17 19:52:17 ah you mean .160 ms for 32000 multiplies... sorry, I thought that with "0.160 msec for float or double multiplication" you mean 0.16ms _per multiply_ Jul 17 19:55:45 yeah, sorry, my test includes reading 32000 ints and multiplying them by a float or double value. I'm putting together a sample test program that does the multiplication only. Jul 17 20:13:29 here is the sample code: https://pastebin.com/kR5jrGK8 Jul 17 20:15:17 I'm honestly surprised if that yields anything useful Jul 17 20:16:35 the optimizer should be able to notice the results of the calculations are never used, hence can be optimized away. but maybe gcc can't track that for a large array Jul 17 20:18:15 I can save the data to disk... Jul 17 20:18:36 nah, just put the computation in a function marked __attribute__((noinline)) Jul 17 20:27:21 I just dumped the data to stderr. The compiler optimized out the float vectors. A single result is: Float : 0.064254 ms, Double: 0.195525 ms. Jul 17 20:29:15 I'm making a little example myself (I've recently been doing a fair bit of benchmarking of a small piece of numerical code, so I picked up a few neat tricks) Jul 17 20:29:18 a single run w/o the machine specific options: Float : 0.129808 ms, Double: 0.135664 ms Jul 17 20:54:52 raffo: https://pastebin.com/yx8ya3MG Jul 17 20:57:18 maybe I should have added some comments here and there Jul 17 21:01:49 zmatt: kernel BUG at fs/sysfs/group.c:113! Jul 17 21:02:10 congrats? Jul 17 21:02:18 I'm using your add-overlay utility from overlay-utils. Should I be doing something else? Jul 17 21:02:30 My FPGA is loaded and running, so thanks! Jul 17 21:02:56 On X15 Jul 17 21:04:28 one of the reasons overlays are deprecated is because they've proven to be an endless source of bugs and kernel panics Jul 17 21:04:50 runtime overlays I mean Jul 17 21:05:54 raffo: by going through exactly the same code (bench()) with different values of the outer loop count, and subtracting the timings, all overhead caused by the code outside the actual computation loop is cancelled out Jul 17 21:06:29 the purpose of n_par is to ensure performance is not limited by data-dependencies Jul 17 21:08:02 doing something massively parallel like the out[i] = in[i] * x you did would avoid that too, but then there's a real risk the performance is limited by the large amount of loads/stores Jul 17 21:15:06 zmatt: thank you for the sample. I'm guessing the gist of code is to measure the number of CPU cycles per multiply. And the values are different if you don't use the machine specific options. Jul 17 21:18:21 yeah test() is simply an attempt to get the cpu to do as many multiplies as possible, based on educated guesswork, experience, and inspecting the compiler output Jul 17 21:20:33 I would have preferred not having to use SIMD vectors explicitly, but while gcc is actually able to "auto-vectorize" code at -O3, the code it produces is still complete garbage Jul 17 21:22:07 although I don't really need , gcc's built-in support for vectors suffices Jul 17 21:23:04 The application I'm working on does lots of floating point multiplies. I want to compile it with the most suitable options. I have been testing the built-in DSP too, but I'm still working on my test code. Jul 17 21:25:19 generally speaking, the performance gains you can get from tweaking compiler flags are insignificant compared to the gains you can get from tweaking the code itself Jul 17 21:26:16 it can also be a good idea to check for math/dsp libraries that are already optimized Jul 17 21:26:28 (for the target of interest) Jul 17 21:27:11 yeah, some of the code I need to optimize needlessly go back and forth from double to float... legacy code. Jul 17 21:28:37 I don't know how big a deal that is on the cortex-a15 Jul 17 21:29:19 on the cortex-a8 any non-neon floating point is slooooow Jul 17 21:30:34 unfortunately ARM never published instruction timings for the A15 afaik Jul 17 21:30:34 depending on how batchable it is, also look into GPGPU Jul 17 21:31:23 the DSPs on the X15 make more sense for that Jul 17 21:31:26 they support opencl Jul 17 21:31:28 the gpu doesn't Jul 17 21:31:49 that's good, getting rid of all the unnecessary for loops will definitely improve performance in my case. Jul 17 21:31:51 I didn't say OpenCL, I said GPGPU Jul 17 21:32:06 The biggest issue I see with the X15 is $$$$ Jul 17 21:33:15 ds2: Is there any sample code I can look at on using the GPGPU? on the Debian platform the beagleboard x15 uses, not the TI one. Jul 17 21:33:25 the sgx doesn't support gpgpu in any convenient way Jul 17 21:33:45 there is no sample code for it Jul 17 21:34:02 raffo: other then the general stuff no... textures == data; shaders == compute Jul 17 21:34:20 getting data out again == headache ;) Jul 17 21:34:25 inconvenient doesn't mean useless Jul 17 21:34:33 isn't the x15 around $150? That doesn't sound like a lot. Jul 17 21:34:55 yes, that is the problem with GPGPU in general - xfers into and out of the gpu can be expensive... batching things helps some of that Jul 17 21:35:07 no, but I'd prioritize using the DSPs over attempting GPGPU Jul 17 21:35:09 raffo: compared to 1/3 to 1/4 of that for a stock Bone Jul 17 21:35:36 for one off that's a trade off between time and cost Jul 17 21:35:47 but if you want to do short runs, the X15 costs add up :( Jul 17 21:37:01 it is next to trival to slap the equiv of a 'bone on a custom board Jul 17 21:37:18 you can do one offs of custom boards in the range of $200-$300 Jul 17 21:37:37 doing the same with an X15 can easily add a 0 to that Jul 17 21:39:16 for what we're gonna use it, the x15 is a good fit for us. Jul 17 21:39:53 btw the c66x dsp can do 4 float (single-precision) multiplies per cycle Jul 17 21:48:51 I have code to perform the scalar multiplication, my rough testing shows that the scalar multiplication, like the test above, is twice as fast in the DSP. I'm still working in optimizing the code though. Jul 17 22:06:35 optimization can be fun Jul 17 22:07:12 for sufficiently small "hotspots" you may also want to inspect the assembly output of the compiler to see what it's doing Jul 17 23:02:12 zmatt: getting data in and out of the DSP space takes a significant amount of time. Jul 17 23:05:14 depends on how it's done I guess **** ENDING LOGGING AT Wed Jul 18 03:00:01 2018