**** BEGIN LOGGING AT Tue Jul 17 03:00:03 2018
Jul 17 04:23:35 <kremlin> e
Jul 17 12:40:50 <CoffeeBreakfast> zmatt: didn't say thank you last night, sorry
Jul 17 13:31:18 <stash> I've got an X15 with the fpga_mgr and altera_ps_spi drivers successfully loaded.
Jul 17 13:31:34 <stash> Is anyone familiar with how to get the kernel to start the transfer?
Jul 17 13:31:42 <stash> Can it be done from user space?
Jul 17 15:16:18 <stash> [
Jul 17 15:23:01 <zmatt> stash: I get literally zero google hits on altera_ps_spi.  do you mean the spi-altera driver?
Jul 17 15:23:53 <stash> KERNEL/drivers/fpga/altera-ps-spi.c
Jul 17 15:23:58 <zmatt> maybe not
Jul 17 15:24:03 <zmatt> ok
Jul 17 15:25:03 <stash> zmatt: I get: fpga0 altera-ps-spi spi1.0 registered from dmsg, but no /dev/ devices to write to.
Jul 17 15:25:30 <zmatt> you created a suitable DT entry?
Jul 17 15:25:48 <zmatt> I guess so given that message
Jul 17 15:26:07 <stash> zmatt: yes, the drivers are happy, I'm not sure if I need an additional spidev with the same CS in the mscpi1 device
Jul 17 15:26:28 <zmatt> ?
Jul 17 15:26:48 <zmatt> what does your dt fragment look like exactly?
Jul 17 15:28:48 <stash> https://gist.github.com/stashlukj/690a1890d0482ab4c61709c286915eef
Jul 17 15:31:45 <zmatt> based on some quick googling it looks like there's no userspace interface.  instead you're supposed to use DT to declare what needs to be loaded
Jul 17 15:31:49 <zmatt> https://www.kernel.org/doc/Documentation/devicetree/bindings/fpga/fpga-region.txt
Jul 17 15:32:37 <zmatt> I think
Jul 17 15:32:49 <zmatt> I just very briefly skimmed it
Jul 17 15:32:56 <stash> zmatt: I'll try that out.
Jul 17 15:33:15 <zmatt> I suggests you dig around more in that doc and other fpga manager docs google may find you
Jul 17 17:23:03 <CoffeeBreakfast> I have the PmodAD5 (SPI, AD7193 24bit Sigma-Delta ADC). Apart from the kernel module, do I need to compile a devicetree properties?
Jul 17 17:31:59 <zmatt> yeah you'll need to declare the ad7193 in DT (either the main dt or an overlay)
Jul 17 17:32:36 <zmatt> but
Jul 17 17:34:01 <zmatt> it looks like the driver currently doesn't have dt support
Jul 17 17:35:29 <zmatt> someome submitted a patch in januari
Jul 17 17:36:27 <zmatt> got feedback that some things needed to be fixed
Jul 17 17:36:31 <zmatt> I see a v2 patch
Jul 17 17:37:08 <CoffeeBreakfast> where are you looking?
Jul 17 17:37:41 <zmatt> linux-iio mailing list archives
Jul 17 17:37:50 <zmatt> https://www.spinics.net/lists/linux-iio/thrd5.html
Jul 17 17:37:59 <zmatt> search for ad7192
Jul 17 17:38:19 <CoffeeBreakfast> okay
Jul 17 17:42:42 <zmatt> some of his patches have been applied (apparently the driver had plenty of issues), but he still needed to fix some things in the dt patch...
Jul 17 17:43:11 <zmatt> and then he vanished. last post from him was late january
Jul 17 17:44:03 <zmatt> you could try emailing him
Jul 17 17:46:07 <zmatt> (him = alexandru ardelean)
Jul 17 17:46:55 <zmatt> the mailing list conceals domains of email addresses, but the html page contains the message id in comments, which includes the domain :D
Jul 17 17:57:20 <CoffeeBreakfast> One thing first: If I get it correctly, the devicetrees set "boards" properties that are needed during device registration?
Jul 17 17:59:22 <zmatt> the devicetree (not plural) is a datastructure passed by u-boot to the kernel that describes the hardware to the kernel
Jul 17 18:01:02 <labradoodle> can one use adafruit library to write a pwm signal to the servo pins on the beaglebone blue
Jul 17 18:01:13 <labradoodle> ?
Jul 17 18:01:46 <CoffeeBreakfast> Not all hardware, like register addresses, right?
Jul 17 18:02:33 <zmatt> CoffeeBreakfast: well it says e.g. there's this peripheral located at that address, this is its interrupt signal, that sort of stuff
Jul 17 18:02:42 <zmatt> enough information to be able to instantiate the driver for it
Jul 17 18:03:39 <zmatt> labradoodle: all I know about the servo pins is that they're controlled by pru firmware
Jul 17 18:03:44 <zmatt> they're not simple pwm outputs
Jul 17 18:03:58 <labradoodle> Is there another simple pwm output on the board?
Jul 17 18:04:17 <zmatt> think so, lemme check
Jul 17 18:08:35 <zmatt> GP0.6 (ecap2), GP1.5/usr_red_led (dmtimer4), GP1.6/usr_grn_led (dmtimer7), GPS.3 (ehrpwm0a), GPS.4 (ehrpwm0b), S1.2.6 (ecap0), UT0.3 (ecap2), UT0.4 (ecap1)
Jul 17 18:08:51 <zmatt> those seem to be all externally accessible pins which should have hardware pwm capability
Jul 17 18:09:16 <zmatt> I don't know which ones can *actually* be configured to pwm, that depends a bit on the Device Tree declarations
Jul 17 18:09:28 <zmatt> but I can check
Jul 17 18:09:48 <labradoodle> Oh perfect thanks so much!
Jul 17 18:10:18 <CoffeeBreakfast> zmatt: the pointed source code of analog, without devicetree support, can be used anyway?
Jul 17 18:10:26 <zmatt> CoffeeBreakfast: nope
Jul 17 18:10:33 <zmatt> well, not without a lot of hassle
Jul 17 18:10:53 <zmatt> you'd need to instantiate the driver using kernel code
Jul 17 18:11:43 <MichaelLong> zmatt, maybe you remember our discussion about loading the kernel module of the power monitor hardware ina2xx for access via the i2c-bus. it worked with echo ina219 0x40 < /sys/bus/i2c....., thanks for that, got it working today, even sensors works with it now
Jul 17 18:11:51 <CoffeeBreakfast> insmod is not enough?
Jul 17 18:14:05 <zmatt> CoffeeBreakfast: manually insmodding/modprobing a device driver is basically never useful.  if the kernel has a device that needs that driver, the module gets loaded automatically.  if no such device exists, manually loading the module does nothing
Jul 17 18:15:06 <Borhan> Hello everybody.. Have small question is there anyway to get the BBG with out connector?  I mean not populated yet. No Ethernet, no Usb no breakout pinning?
Jul 17 18:15:36 <zmatt> CoffeeBreakfast: the driver needs a bunch of information.  this is either passed via a data structure in the kernel ("platform data"), or via device tree
Jul 17 18:15:46 <zmatt> (or similar mechanisms on other platforms, e.g. ACPI on x86)
Jul 17 18:16:24 <zmatt> labradoodle: I'm still browsing the DT btw.  I'm just failing a bit at multitasking ;)
Jul 17 18:16:29 <CoffeeBreakfast> zmatt: This is my first year in touch with embedded systems. I thought that driver were written on top of the spi device.
Jul 17 18:16:48 <zmatt> it's a driver for an spi device
Jul 17 18:17:14 <labradoodle> no worries
Jul 17 18:17:19 <zmatt> so that's again more information it needs to know (it needs to be attached to an spi bus and a particular chip select)
Jul 17 18:19:18 <Borhan> I dont know if thats the correct place to ask my question 😅
Jul 17 18:19:43 <CoffeeBreakfast> zmatt: I think I'm starting to get it :P
Jul 17 18:21:29 <CoffeeBreakfast> the LDD3 says nothing about devicetree >.<
Jul 17 18:22:48 <zmatt> CoffeeBreakfast: but I suggest just poking the Analog guy what the status is of the dt patch
Jul 17 18:25:28 <zmatt> labradoodle: looks like exactly one of those pins can be configured to pwm mode when using the default device tree for the blue (in a recent kernel 4.14-ti anyway)
Jul 17 18:26:03 <zmatt> it's GP0 pin 6
Jul 17 18:26:41 <zmatt> and it looks like you'd configure it to pwm mode using "config-pin P9_28 pwm2"
Jul 17 18:26:58 <zmatt> which falls very much in the "..... o...okay" category
Jul 17 18:28:38 <labradoodle> Ok thanks!
Jul 17 18:28:42 <zmatt> .... except it looks like they don't enable that ecap peripheral -.-
Jul 17 18:28:47 <zmatt> what the hell guys
Jul 17 18:29:12 <CoffeeBreakfast> zmatt: private email to him? or on the mailing list?
Jul 17 18:30:20 <zmatt> CoffeeBreakfast: to him
Jul 17 18:31:13 <CoffeeBreakfast> okay
Jul 17 18:33:18 <zmatt> labradoodle: I think I'll just make a tiny overlay for you, since this whole situation is beyond silly
Jul 17 18:33:31 <zmatt> labradoodle: do you have a preference which pins? how many pwms do you need?
Jul 17 18:33:47 <labradoodle> I just need one
Jul 17 18:50:30 <zmatt> labradoodle: git clone https://github.com/mvduin/overlay-utils
Jul 17 18:50:48 <zmatt> in that dir "make bbblue-pwm.dtbo"
Jul 17 18:51:42 <zmatt> copy that to /lib/firmware/
Jul 17 18:52:15 <zmatt> in /boot/uEnv.txt configure:  uboot_overlay_addr4=/lib/firmware/bbblue-pwm.dtbo
Jul 17 18:52:27 <zmatt> reboot, and pray I didn't fuck up the overlay
Jul 17 18:52:27 <zmatt> :D
Jul 17 18:52:57 <labradoodle> Thanks so much man! I will give it a go
Jul 17 18:53:52 <zmatt> I'll also poke rcn, since ecap2 being disabled even though it's made available as mux option is obviously a bug in the DT
Jul 17 19:27:28 <stash> zmatt: the trick is to load a fpga-region overlay with firmware-name element set at runtime
Jul 17 19:28:05 <zmatt> runtime overlays are deprecated though and going to be removed eventually afaik
Jul 17 19:30:29 <stash> zmatt: I'll just hope they put in a sysfs entry for the firmware path before then.
Jul 17 19:35:46 <raffo> Has anyone toyed with GCC options when compiling code for the beagleboard x15? The default options are "-march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -mthumb". Does anyone have suggestions on what options to use for the Beagleboard x15? I'm looking for speed, not code size.
Jul 17 19:36:26 <zmatt> -mcpu=cortex-a15 -mfpu= ... ehh ... hold on
Jul 17 19:38:16 <zmatt> -mfpu=neon-vfpv4
Jul 17 19:39:33 <zmatt> and if you want the compiler to be able to use neon instructions for single-precision floating point math you will need -ffast-math (since neon's float math isn't fully IEEE 754 compliant)
Jul 17 19:41:25 <raffo> Let me try. So far all the combos I've tried do not show a noticeable difference in my test code.
Jul 17 19:41:46 <zmatt> most code probably doesn't see a noticable difference
Jul 17 19:43:32 <raffo> my main test code is to multiply 32000 integers times a float or double constant and time the execution time.
Jul 17 19:45:06 <zmatt> can you share it?  benchmarking can be a very tricky thing
Jul 17 19:45:40 <zmatt> microbenchmarks at least I mean
Jul 17 19:46:30 <raffo> the median is about 0.160 msec for float or double multiplication.
Jul 17 19:46:40 <zmatt> that is complete nonsense
Jul 17 19:46:57 <zmatt> then your benchmark is broken
Jul 17 19:47:02 <raffo> Sure, let me put it in a sample file.
Jul 17 19:48:21 <raffo> why do you think so?
Jul 17 19:49:52 <zmatt> because that's 240000 cycles (@ 1.5 GHz)
Jul 17 19:50:25 <zmatt> a multiply takes a few cycles at most (a few dozen cycles on the cortex-a8's slowass non-neon VFP unit)
Jul 17 19:50:56 <raffo> hold on, that number includes other overhead unrelated to just multiplying 32000 ints. Disregard it, it only makes sense if I compare against the same code compiled w different options.
Jul 17 19:51:29 <raffo> I was not clear.
Jul 17 19:52:17 <zmatt> ah you mean .160 ms for 32000 multiplies... sorry, I thought that with "0.160 msec for float or double multiplication" you mean 0.16ms _per multiply_
Jul 17 19:55:45 <raffo> yeah, sorry, my test includes reading 32000 ints and multiplying them by a float or double value. I'm putting together a sample test program that does the multiplication only.
Jul 17 20:13:29 <raffo> here is the sample code: https://pastebin.com/kR5jrGK8
Jul 17 20:15:17 <zmatt> I'm honestly surprised if that yields anything useful
Jul 17 20:16:35 <zmatt> the optimizer should be able to notice the results of the calculations are never used, hence can be optimized away.  but maybe gcc can't track that for a large array
Jul 17 20:18:15 <raffo> I can save the data to disk...
Jul 17 20:18:36 <zmatt> nah, just put the computation in a function marked __attribute__((noinline))
Jul 17 20:27:21 <raffo> I just dumped the data to stderr. The compiler optimized out the float vectors. A single result is: Float : 0.064254 ms, Double: 0.195525 ms.
Jul 17 20:29:15 <zmatt> I'm making a little example myself (I've recently been doing a fair bit of benchmarking of a small piece of numerical code, so I picked up a few neat tricks)
Jul 17 20:29:18 <raffo> a single run w/o the machine specific options: Float : 0.129808 ms, Double: 0.135664 ms
Jul 17 20:54:52 <zmatt> raffo: https://pastebin.com/yx8ya3MG
Jul 17 20:57:18 <zmatt> maybe I should have added some comments here and there
Jul 17 21:01:49 <stash> zmatt: kernel BUG at fs/sysfs/group.c:113!
Jul 17 21:02:10 <zmatt> congrats?
Jul 17 21:02:18 <stash> I'm using your add-overlay utility from overlay-utils. Should I be doing something else?
Jul 17 21:02:30 <stash> My FPGA is loaded and running, so thanks!
Jul 17 21:02:56 <stash> On X15
Jul 17 21:04:28 <zmatt> one of the reasons overlays are deprecated is because they've proven to be an endless source of bugs and kernel panics
Jul 17 21:04:50 <zmatt> runtime overlays I mean
Jul 17 21:05:54 <zmatt> raffo: by going through exactly the same code (bench()) with different values of the outer loop count, and subtracting the timings, all overhead caused by the code outside the actual computation loop is cancelled out
Jul 17 21:06:29 <zmatt> the purpose of n_par is to ensure performance is not limited by data-dependencies
Jul 17 21:08:02 <zmatt> doing something massively parallel like the out[i] = in[i] * x you did would avoid that too, but then there's a real risk the performance is limited by the large amount of loads/stores
Jul 17 21:15:06 <raffo> zmatt: thank you for the sample. I'm guessing the gist of code is to measure the number of CPU cycles per multiply. And the values are different if you don't use the machine specific options.
Jul 17 21:18:21 <zmatt> yeah test() is simply an attempt to get the cpu to do as many multiplies as possible, based on educated guesswork, experience, and inspecting the compiler output
Jul 17 21:20:33 <zmatt> I would have preferred not having to use SIMD vectors explicitly, but while gcc is actually able to "auto-vectorize" code at -O3, the code it produces is still complete garbage
Jul 17 21:22:07 <zmatt> although I don't really need <arm_neon.h>, gcc's built-in support for vectors suffices
Jul 17 21:23:04 <raffo> The application I'm working on does lots of floating point multiplies. I want to compile it with the most suitable options. I have been testing the built-in DSP too, but I'm still working on my test code.
Jul 17 21:25:19 <zmatt> generally speaking, the performance gains you can get from tweaking compiler flags are insignificant compared to the gains you can get from tweaking the code itself
Jul 17 21:26:16 <zmatt> it can also be a good idea to check for math/dsp libraries that are already optimized
Jul 17 21:26:28 <zmatt> (for the target of interest)
Jul 17 21:27:11 <raffo> yeah, some of the code I need to optimize needlessly go back and forth from double to float... legacy code.
Jul 17 21:28:37 <zmatt> I don't know how big a deal that is on the cortex-a15
Jul 17 21:29:19 <zmatt> on the cortex-a8 any non-neon floating point is slooooow
Jul 17 21:30:34 <zmatt> unfortunately ARM never published instruction timings for the A15 afaik
Jul 17 21:30:34 <ds2> depending on how batchable it is, also look into GPGPU
Jul 17 21:31:23 <zmatt> the DSPs on the X15 make more sense for that
Jul 17 21:31:26 <zmatt> they support opencl
Jul 17 21:31:28 <zmatt> the gpu doesn't
Jul 17 21:31:49 <raffo> that's good, getting rid of all the unnecessary for loops will definitely improve performance in my case.
Jul 17 21:31:51 <ds2> I didn't say OpenCL, I said GPGPU
Jul 17 21:32:06 <ds2> The biggest issue I see with the X15 is $$$$
Jul 17 21:33:15 <raffo> ds2: Is there any sample code I can look at on using the GPGPU? on the Debian platform the beagleboard x15 uses, not the TI one.
Jul 17 21:33:25 <zmatt> the sgx doesn't support gpgpu in any convenient way
Jul 17 21:33:45 <zmatt> there is no sample code for it
Jul 17 21:34:02 <ds2> raffo: other then the general stuff no... textures == data; shaders == compute
Jul 17 21:34:20 <zmatt> getting data out again == headache ;)
Jul 17 21:34:25 <ds2> inconvenient doesn't mean useless
Jul 17 21:34:33 <raffo> isn't the x15 around $150? That doesn't sound like a lot.
Jul 17 21:34:55 <ds2> yes, that is the problem with GPGPU in general - xfers into and out of the gpu can be expensive... batching things helps some of that
Jul 17 21:35:07 <zmatt> no, but I'd prioritize using the DSPs over attempting GPGPU
Jul 17 21:35:09 <ds2> raffo: compared to 1/3 to 1/4 of that for a stock Bone
Jul 17 21:35:36 <ds2> for one off that's a trade off between time and cost
Jul 17 21:35:47 <ds2> but if you want to do short runs, the X15 costs add up :(
Jul 17 21:37:01 <ds2> it is next to trival to slap the equiv of a 'bone on a custom board
Jul 17 21:37:18 <ds2> you can do one offs of custom boards in the range of $200-$300
Jul 17 21:37:37 <ds2> doing the same with an X15 can easily add a 0 to that
Jul 17 21:39:16 <raffo> for what we're gonna use it, the x15 is a good fit for us.
Jul 17 21:39:53 <zmatt> btw the c66x dsp can do 4 float (single-precision) multiplies per cycle
Jul 17 21:48:51 <raffo> I have code to perform the scalar multiplication, my rough testing shows that the scalar multiplication, like the test above, is twice as fast in the DSP. I'm still working in optimizing the code though.
Jul 17 22:06:35 <zmatt> optimization can be fun
Jul 17 22:07:12 <zmatt> for sufficiently small "hotspots" you may also want to inspect the assembly output of the compiler to see what it's doing
Jul 17 23:02:12 <raffo> zmatt: getting data in and out of the DSP space takes a significant amount of time.
Jul 17 23:05:14 <zmatt> depends on how it's done I guess
**** ENDING LOGGING AT Wed Jul 18 03:00:01 2018