**** BEGIN LOGGING AT Thu Apr 26 03:00:03 2018 Apr 26 06:14:12 Hello Apr 26 06:14:53 I want to make a camera module for ADAS purpose. I have few questions Apr 26 06:16:48 I want to make a camera module with the help of BeagleBoard-xM and a CMOS Image sensor..... I want to know that for LED flicker mitigation we can write algo or not ? Apr 26 06:22:45 BBxm is a rather ancient hardware… Apr 26 06:24:56 Means ? Apr 26 06:27:54 It's a single core ARM Cortex-A8 - don't expect too much in terms of performance Apr 26 06:28:12 Then again, hardware runs software, any software Apr 26 06:28:57 everything is possible, you will get something to run. It's impossible to say if it will be sufficient for your needs. Apr 26 07:19:26 Hi , when I find an OS Image for beaglebone black, generally, what should I do to make it run on a beaglebone white ? Apr 26 07:20:19 fred__tv: try and see what fails? :-) Apr 26 07:21:32 some images won't start at all (stuck leds...) Apr 26 07:22:25 thats absolutely possible (actually one should thing of the BBW as obsolete these days) Apr 26 07:23:22 probably the fact BBW has no internal eMMC... Apr 26 07:23:44 thats basically not a killer, unless the image tries to do something with it Apr 26 07:24:09 I own a number of BBW to work with... Apr 26 07:24:34 and i own a lot of intel p4 to work with.... Apr 26 07:24:43 :-))) Apr 26 07:24:49 (still they are obsolete and i don't touch them :P) Apr 26 07:25:40 probably sticking to *old* images with the patched 3.something kernel gives you a greater chance. are actually building our own stuff, i'd guess the upstream kernel+dtb should be fine Apr 26 07:26:34 fred__tv: any image for the BBB should work on the BBW Apr 26 07:27:14 zmatt: actually i disagree Apr 26 07:27:52 I mean, it's possible to make an image that works on the BBB but not the BBW, but it would be worthy of a bug report Apr 26 07:28:11 since u-boot picks the appropriate device tree based on board identification Apr 26 08:09:51 hi zmatt. Are you here? Apr 26 08:22:43 LCD7 for BBB can be used in a RaspberryPi ? Apr 26 08:26:11 (in gpio mode) Apr 26 08:33:16 Hi.. Apr 26 08:46:02 fred__tv: if you can accurately toggle 20 gpios at a rate of 30 MHz, sure Apr 26 08:46:06 :P Apr 26 08:48:13 parduz: "are you here?" is roughly as stupid a question as "is anyone here?". if you have an actual question for me to answer, ask that instead :P Apr 26 08:48:35 (I'm sporadically here throughout the day) Apr 26 08:50:22 Do you mean raspberry is not capable of, or nothing has been done in this way ? (considerint raspberry has both a dedicated lcd connection and an HDMI port...) Apr 26 08:51:13 fred__tv: how comes that you expect us to explain the gpio/lcd capabilities of a rpi? Apr 26 08:51:21 does the rpi have a parallel video output? I thought it only had something like mipi dsi Apr 26 08:51:46 anyway, you were the one you said "in gpio mode" Apr 26 08:53:05 also, what LetoThe2nd just said Apr 26 08:56:43 just to know if anybody knows it was done..... Apr 26 08:57:17 well maybe ask the rpi people "hello, i have this lcd, can i use it with your board." Apr 26 08:57:22 would make sense, right? Apr 26 08:57:41 might have better luck in an rpi channel (they won't know what the lcd7 cape is, but the key question is whether it supports a parallel digital video output) Apr 26 08:59:37 actually it seems it does Apr 26 09:00:25 https://pinout.xyz/ if you click on various pins you will see mux options for "DPI" (Display Parallel Interface) Apr 26 09:01:04 hopefully there's a more convenient table out there somewhere Apr 26 09:02:44 hi zmatt. I was wondering if you was reading... sorry. I saw the reply from Robert Nelson, but i don't get what he meant. Apr 26 09:03:23 Does he meant that the patch slipped out and will be implemented again? Apr 26 09:04:54 parduz: not "will be", "has been". he said he merged it back in Apr 26 09:05:06 i.e. hit the reply button and say Thanks! Apr 26 09:05:07 :P Apr 26 09:05:25 (maybe after confirming it works in the latest kernel) Apr 26 09:05:25 yep, i'll do. Apr 26 09:05:38 so i have to update the kernel Apr 26 09:06:29 or have i to install a whole new disk image? Apr 26 09:06:38 just update the kernel Apr 26 09:08:07 it looks like 4.9.88-ti-r111 and 4.14.35-ti-r44 have the patch Apr 26 09:09:49 ok.... never done it before... i'm googling for an "how to". Apr 26 09:10:50 ok... let see if what the official wiki says works Apr 26 09:11:40 /opt/scripts/tools/update_kernel.sh Apr 26 09:11:44 iirc Apr 26 09:28:07 well, it better. Not perfect but better... at least the touchscreen is usable, now Apr 26 12:11:01 'lo Apr 26 12:35:07 Anyone here who has experience with the ti-sgx modules? I'm wondering why the omaplfb kernel-module isn't installed anymore? Apr 26 12:36:18 jof1: that kernel module doesn't exist anymore with the current drivers Apr 26 12:37:28 jof1: pvrsrvkm and tilcdc interact via the dma-buf api Apr 26 12:38:54 I see Apr 26 12:48:50 NishanthMenon: any idea whether the dra7 L3 clock asymmetric aging issue (i892) also applies to omap5 ? it's not explicitly included in the errata, but that may just be lack of maintenance Apr 26 12:49:48 zmatt, if my memory serves me right.. (and it can be very flaky) IP was the same or derivative. Apr 26 12:50:30 NishanthMenon: how serious is it? I'm assuming its nature means that the more time is spent with PD_CORE powered but the L3 clock gated, the greater the risk of instability? Apr 26 12:53:35 zmatt, i dont remember correctly now Apr 26 12:53:55 ok Apr 26 12:54:04 sorry. i remember i had to keep the PD always ON and could not even do CSWR or even inactive, if i remember Apr 26 12:54:36 yeah but dra7 doesn't support PD_CORE=OFF right? omap5 does Apr 26 12:55:13 (at least in theory, it remains to be seen whether it actually works) Apr 26 12:55:31 zmatt, originally they all did Apr 26 12:55:39 i mean target OFF, if i remember right Apr 26 12:58:33 hmm, yeah I guess if it didn't there wouldn't be a reason to have PD_CORE and PD_COREAON separate from each other I guess Apr 26 12:59:04 actually no that's not true, it could be just for retention Apr 26 12:59:13 oh well Apr 26 12:59:29 but you're saying switching PD_CORE to OFF is not sufficient to prevent the degradation? Apr 26 13:06:30 hi Apr 26 13:07:35 Im interested in the beaglebone black...im looking for a display for it and noticed that just about everything is in the form of a capacitive touchscreen cape Apr 26 13:07:55 thats all fine and good but i was wondering if there was a recommended ips display Apr 26 13:08:12 i noted that the beaglebpone has a mini hdmi output Apr 26 13:09:46 beezy: it does, so you can connect basically any hdmi display Apr 26 13:10:02 (although beware of resolution limitations) Apr 26 13:31:25 zmatt, sorry, i am multitasking -> you should NOT PD_CORE to off -> if you do that some leakage issues also come into play.. i forget details been more than 3 or 4 something years... Apr 26 13:31:40 ttyl Apr 26 14:21:37 How do you debug the PRU Apr 26 14:22:15 google prudebug Apr 26 14:22:24 there might also be other pru debuggers Apr 26 14:23:38 maybe I'll eventually make one using my py-uio library... it already supports single-stepping the core Apr 26 14:24:46 Currently starting to use the McASP with the PRU Apr 26 14:24:53 But the learning curve is pretty steap Apr 26 14:26:21 yep Apr 26 14:28:27 sbbo r27, MCASP_BASE_ADDRESS, register, 4 Apr 26 14:28:37 Bela uses that to write to a McASP register Apr 26 14:28:40 McASP can be a bit eccentric too... it may be worth to start with experimenting with mcasp from the cortex-a8 using uio, even though you can probably can't keep it running stably that way (due to scheduling) Apr 26 14:28:48 Bela ? Apr 26 14:28:57 https://github.com/BelaPlatform/Bela Apr 26 14:29:20 but I don't know why the first argument doesn't have & Apr 26 14:29:34 obsolete syntax Apr 26 14:29:45 you're allowed to omit the & even though doing so is deprecated Apr 26 14:30:07 so best practice is to use it? Apr 26 14:30:25 definitely Apr 26 14:33:37 while browsing their pru code I see a lot of weird or inefficient things actually, so it might not be the greatest example to learn from Apr 26 14:35:07 Good to know Apr 26 14:35:57 I am trying to do the same thing as Bela to improve our latency Apr 26 14:36:17 But as I find their code ugly, we're doing it from scratch Apr 26 14:36:52 We also wrote our own complex multiply for the neon processor Apr 26 14:37:07 Because libne10 doesn't have that, which we still find pretty strange Apr 26 14:38:14 But the macro's for writing the McASP are good, right? Apr 26 14:38:27 I don't know how to do it any better Apr 26 14:38:53 Albeit I don't understand the MCASP_REG_WRITE_EXT for writing beyond the 0xFF boundary Apr 26 14:42:49 MCASP_REG_SET_BIT_AND_POLL is pointlessly inefficient. I would just pass the bit number (rather than the mask) as argument and do: https://pastebin.com/raw/bKpmYH90 Apr 26 14:44:53 Ah, thanks! Apr 26 14:45:00 I think I understand the boundary problem Apr 26 14:45:24 Registers like XBUF have an address if 0x200 Apr 26 14:45:48 the read/write reg ext macros can be simplified too: https://pastebin.com/raw/cgWstn4h Apr 26 14:46:21 yeah most instructions only support immediate operands in range 0x00-0xff Apr 26 14:47:24 so if you want to use an operand outside that range, you have to load that value into register Apr 26 14:49:27 Aha, I see Apr 26 14:49:46 The third operand of SBBO is OP(255), which means it's limited to 8 bits Apr 26 14:50:13 OP(255) means "either a register, or an immediate in range 0-255" Apr 26 14:51:21 ("register" here in the general sense, i.e. including r0.b0 and such) Apr 26 14:51:46 your help is really appreciated, thanks! Apr 26 14:51:58 your making a Belgian student very happy Apr 26 14:52:02 you're* Apr 26 14:52:41 :) Apr 26 15:05:23 I'm really confused.. In bb-kernel (https://github.com/RobertCNelson/bb-kernel) the SGX stuff seems to have been dropped in version (branch) am33x-v4.16, but the git log doesn't say why are what I should do instead? Does anyone here know? Apr 26 15:05:46 s/are/and/ Apr 26 15:09:13 zmatt: Can the same approach of your rewritten write_ext macro be used for the read_ext? Apr 26 15:09:23 I assume it is possbile Apr 26 15:09:44 yes, and it occurs in more places Apr 26 15:11:11 alright, thanks for saying Apr 26 15:12:26 in general, https://pastebin.com/raw/v23XdtsW Apr 26 15:12:47 wait, I didn't phrase that right Apr 26 15:13:35 better: https://pastebin.com/raw/y2Partxv Apr 26 15:37:19 Okay, noob here looking for some info.... Apr 26 15:38:37 Survey, open-ended question about application physical environment - what kinds of applications have you put your beagle into and how did you mitigate problems like humidity, temperature, etc.? Apr 26 15:38:55 Was the board the limiting factor or was there something else? Apr 26 16:14:55 lol, good luck with that survey :) Apr 26 16:22:51 Would it be a terrible idea to always load the register offset in a register Apr 26 16:23:13 bou4: ? Apr 26 16:23:19 This way I got one macro that supports both, but with a little overhead for registers whose offset is smaller than 0xFF Apr 26 16:24:45 that's up to you. I'm not sure I even agree with the use of a macro for basically just a load/store instruction Apr 26 16:27:34 is there a negative side on using macros? Apr 26 16:27:34 actually, now that I think of it I'm pretty sure I disagree with the use of a macro for this. it doesn't significantly save space nor make the code clearer... I'd even say it does the opposite Apr 26 16:28:07 but initializing the mcasp requires a lot of writing to registers Apr 26 16:28:21 using macros will significantly shorten that code Apr 26 16:28:59 is there a negative side to using macros besides readability? Apr 26 16:32:10 ah right I thought the macro just wrapped the store (i.e. still expected the value in register) but I see now it's to write a constant to a register Apr 26 16:34:11 hi everyone, question, is it normal for the signal voltage on the servo pins of a beaglebone blue to fall below 3.3v (i got around 2.1 when I probed)... or am i doing something wrong? thanks! Apr 26 16:39:22 in that case I guess it helps... although you can also just load the values into consecutive registers and store that: https://pastebin.com/raw/iK45qzuW Apr 26 16:39:36 john____: uhh Apr 26 16:39:42 john____: that doesn't sound right Apr 26 16:40:01 let me check the schematic Apr 26 16:41:01 zmatt: all right sure.. thanks! fyi, i tried using channels 1 and 5 (since channels 5-8 are boot mode pins) and both times, the voltage i probed was a little over 2v Apr 26 16:41:31 this is without external loading? Apr 26 16:42:26 yes.. no loads connected.. Apr 26 16:44:39 what about 7 or 8 ? Apr 26 16:47:07 zmatt: same result.. around 2v Apr 26 16:47:30 what about any 3.3v supply output pin? Apr 26 16:49:48 zmatt: sorry i'm not following, what pin do you want me to check? without the motor the servo control pin outputs the correct voltage level Apr 26 16:50:14 around 3.3v Apr 26 16:50:15 ... so there was actually external load connected Apr 26 16:51:09 zmatt: oh okay.. sorry, i was thinking you meant the load to the servo.. :D yes, there was a servo connected when i probed it to be around 2v Apr 26 16:52:37 then it may be fine I guess, depends on the input specs of the servo Apr 26 16:53:29 oh okay i see... well i got a cheap servo, and i couldn't find anything relevant.. http://www.electronicoscaldas.com/datasheet/MG996R_Tower-Pro.pdf Apr 26 16:53:47 the pins on the beaglebone have 4.7 kΩ series resistors, so 2.2V voltage drop (3.3V - 2.1V) means it's drawing 0.25 mA Apr 26 16:54:25 maybe it's typical for control signals to a servo to use an opto-coupler? that would at least explain why they included those resistors Apr 26 16:55:23 ohhhh... okay.. i'll keep that in mind.. well, i'm not very familiar but I guess I could give it a try.. since it's already at 2.1v without load to the servo Apr 26 16:55:41 physical loading of the servo is irrelevant to the control signals Apr 26 16:55:45 I meant electrical loading Apr 26 16:56:06 oh yeah.. correct.. makes sense.. Apr 26 16:57:28 thanks! one more thing though if you don't mind, the servo that i'm using tends to move to one direction before calibrating itself to the correct position.. is that normal? Apr 26 16:57:38 I have no idea Apr 26 16:58:46 i read somewhere that the control signal must come first before the rail, i was trying this but somehow, i get around 1v at the control signal when the rail is off... is that normal? Apr 26 16:58:57 it sounds plausible that it needs to calibrate itself at startup Apr 26 16:59:22 control signal must be low when the servo is unpowered Apr 26 16:59:57 ohhh i see... well, that explains why i see a low voltage.. Apr 26 17:00:21 and okay, so it's not using an opto-coupler, instead they included those series resistors because they realized that people would end up accidently driving the servo pins high while the motor is unpowered Apr 26 17:00:22 all right.. sounds good to me.. glad to know i didn't break anything.. thanks zmatt! Apr 26 17:00:55 which, without the series resistor, could destroy both the processor and the servo Apr 26 17:03:30 so that's what they're for.. if you don't mind, in what way does it actually prevent the processor and servo from being destroyed? Apr 26 17:03:40 If you write a macro with .mparam value Apr 26 17:03:49 and you actually need &value Apr 26 17:04:01 do you need to call the macro with &r1 Apr 26 17:04:08 or can you write &value in the macro Apr 26 17:04:20 you can write &value in the macro Apr 26 17:04:23 pretty sure Apr 26 17:04:48 if I wanted to be sure I'd just do a quick test... you could have done the same :P Apr 26 17:05:55 hahah, I don't really know how to get something out of it for debugging purposes, or will it simply not compile? Apr 26 17:09:14 if it weren't allowed you'd get a syntax error Apr 26 17:12:25 you can also assemble an invocation of your macro along with the expansion you're expecting and use pasm -L to inspect whether the two versions produce the same code Apr 26 17:16:28 fantastic, thanks! Apr 26 17:44:32 bou4: uhh, I'm surprised Bela's firmware even works... they're polling the wrong register in MCASP_REG_SET_BIT_AND_POLL Apr 26 17:45:30 they should be reading GBLCTL instead of RGBLCTL or XGBLCTL Apr 26 18:08:52 that is indeed part of the code i don't understand Apr 26 18:15:18 So you recommend that I don't base my code on Bela? Apr 26 18:15:31 Perhaps better checking everything in the datasheet Apr 26 18:16:38 it could still be useful to check how they do certain things, but I would definitely suggest you at least understand what they're doing Apr 26 18:17:21 Is my intention Apr 26 18:17:59 I am also only going to drive the McASP Apr 26 18:18:08 and not other peripherals such as the on board adc's en dac's Apr 26 18:18:14 my code will be a lot smaller Apr 26 18:18:52 Is it a good idea to use polling instead of interrupts? Apr 26 18:19:09 The PRU will do nothing else but talk with the McASP, so I don't see the urge to use interrupts Apr 26 18:19:38 I strongly recommend using interrupts Apr 26 18:20:02 polling creates continuous pointless traffic on the L4 interconnect Apr 26 18:20:37 well, maybe "strongly" is a bit too strong. but I do recommend it Apr 26 18:22:41 eh, actually now that I think about it I'm not sure it matters all that much... the polling rate won't be that high anyway Apr 26 18:23:32 my reasoning was that the PRU is way faster than the sampling rate of the McASP Apr 26 18:23:49 so that polling would be faster Apr 26 18:26:27 using an irq wouldn't be to reduce the time it takes to react to an event... I mean, it will reduce it slightly, but at most 15 pru cycles or so which isn't a big deal Apr 26 18:26:55 it would mostly be to reduce spam on the L4 interconnect Apr 26 18:27:44 keep in mind that you could simplify the pru firmware by doing initialization stuff from the cortex-a8 instead Apr 26 18:27:54 such as setting up the pruss interrupt controller Apr 26 18:31:22 why is the McASP being polled or even IRQ'ed handled? Apr 26 18:31:52 is there a reason why DMA won't work? Apr 26 18:32:12 dma is a pointless complication in this case Apr 26 18:32:50 probably Apr 26 18:32:53 maybe Apr 26 18:33:06 bou4: what exactly are you doing again? Apr 26 18:33:37 we are doing realtime DSP with guitar signals Apr 26 18:34:10 X15 or Bone? Apr 26 18:34:14 but surely you're not using PRU for signal processing itself Apr 26 18:34:20 correct Apr 26 18:34:28 so what's the purpose of PRU here? Apr 26 18:34:30 only to create a buffer to write to the McASP Apr 26 18:34:45 because Xenomai lets us do very realtime processes Apr 26 18:34:47 so you're using PRU as... a DMA controller? Apr 26 18:34:51 correct Apr 26 18:34:56 that sounds like creating a lot of traffic for no good reason Apr 26 18:35:08 but Xenomai lets us only do that if we don't use code inside the kernel Apr 26 18:35:10 so no alsa Apr 26 18:35:19 working in the artic and need extra heat? Apr 26 18:35:27 prehaps a xeon would work better :D Apr 26 18:35:42 actually if he uses an irq from mcasp it shouldn't create extra traffic Apr 26 18:36:04 the transfer to the PRU and PRU to the McASP creates traffic Apr 26 18:36:28 exactly the same amount as would be generated if EDMA were used instead Apr 26 18:36:59 EDMA doesn't poll and the PRU doesn't have a "interrupt" Apr 26 18:37:53 You think it sounds like a bad idea? Apr 26 18:37:58 incorrect, both the tx and rx irqs of mcasp 0 and 1 are available on the pruss interrupt controller Apr 26 18:38:06 so there is no need to poll Apr 26 18:38:24 the PRU doesn't have the traditional concept of an interrupt Apr 26 18:38:39 that's irrelevant Apr 26 18:38:47 it is still a poll... but I do conceed that the polling is inside the PRUSS vs on the Lx buses Apr 26 18:39:23 if the concern is latency on the ALSA ('stock' McASP drivers)...prehaps a low latency McASP driver is inorder Apr 26 18:39:26 it can actually suspend its internal clocks until the "interrupt" hits (using the sleep instruction) Apr 26 18:39:43 so no polling at all, not even inside the subsystem Apr 26 18:39:52 But if we write it with the PRU Apr 26 18:40:00 though I very much doubt the polling has much relevant impact anyway Apr 26 18:40:02 sleep sleeps thePRU Apr 26 18:40:14 then we now exactly what its timing is going to be Apr 26 18:40:36 but isn't polling the easier way Apr 26 18:40:44 as i currently have no idea how interrupts work in the pru Apr 26 18:40:53 it is easier Apr 26 18:40:57 IMO - EDMA is simplier and less hassle Apr 26 18:41:03 and you can always switch to using interrupts later Apr 26 18:41:10 you will need to get the data into DRAM Apr 26 18:41:10 I wouldn't call EDMA simple Apr 26 18:41:29 so it is either the PRUSS writing or the EDMA Apr 26 18:42:12 you will have jitter either way - PRUSS -> L4 jitter, PRUSS ->DRAM jitter Apr 26 18:42:27 can you do EDMA without using the kernel? Apr 26 18:42:34 EDMA -> DRAM jitter (all this is on top of the jitter created by the FIFO on the McASP) Apr 26 18:42:37 yes, but you can't receive interrupts from EDMA in linux userspace, so you'd end up polling EDMA instead Apr 26 18:42:41 EDMA Is a hardware block Apr 26 18:42:50 blah... Userspace Apr 26 18:43:18 bou4: you *can* receive interrupts in userspace using xenomai right? I saw that Bela was using an event to signal the cortex-a8 Apr 26 18:43:22 i see where this is going. userspace vs k-space...i'll go get lunch Apr 26 18:43:40 ds2: 20:34 < bou4> because Xenomai lets us do very realtime processes Apr 26 18:43:49 ds2: 20:35 < bou4> but Xenomai lets us only do that if we don't use code inside the kernel Apr 26 18:44:27 there are ways to get RT Inside the kernel Apr 26 18:44:35 ds2: "linux userspace" might actually be a slight misnomer... they're not really linux userspace threads anymore once you use xenomai Apr 26 18:45:11 since they're scheduled outside and above linux userspace and kernel entirely Apr 26 18:45:34 if I remember correctly how xenomai works Apr 26 18:45:49 let me put it another way - if you want to tied your hands behind your back with the wrong tools, then this isn't going to be productive Apr 26 18:46:18 he just wants to do math and send it to mcasp with very low latency... his approach seems perfectly fine to me Apr 26 18:46:46 IIRC - Xenomai is limited realtime (good enough to issue commands to a driver (i.e. linuxcnc fork for arm)) but... Apr 26 18:47:44 xenomai is hard RT but you can't use any linux kernel drivers, you're limited to the xenomai API Apr 26 18:47:52 is low latency the sole goal? or also low jitter Apr 26 18:48:14 jitter is irrelevant since mcasp is responsible for pacing Apr 26 18:48:27 there's just a hard deadline Apr 26 18:49:25 no Apr 26 18:49:27 zmatt: we indeed have interrupts from the pru going Apr 26 18:49:39 without using the linux kernel Apr 26 18:49:46 say you use the PRUSS to pull data out of the McASP and accumate the data Apr 26 18:49:48 bou4: I'd suggest sticking to your approach, it's a good one Apr 26 18:50:09 that path can be low latency but Linux could be reacting to it in bursts which could be very jittery Apr 26 18:50:13 bou4: using edma instead is possible, but more complicated without any benefit really Apr 26 18:50:40 ds2: signal processing is normally not bursty Apr 26 18:50:54 if there are delays in L4.... Apr 26 18:50:59 the way our program works is Apr 26 18:51:10 (or how we like to let it work) Apr 26 18:51:22 if you can get all the processing to happen in the PRU, that's a different story Apr 26 18:51:29 the PRU signals that new data is available Apr 26 18:52:06 well, step 1 is that mcasp signals to pru that new data is available Apr 26 18:52:15 I wonder if this can be done just as well with PREMPT/RT and a proper driver Apr 26 18:52:18 or is mcasp output only? Apr 26 18:52:25 mcasp input and output Apr 26 18:53:06 Our program processes as fast as possible the available block and writes it back to the shared RAM Apr 26 18:53:21 it sounds like the PRU is being used to work around Xenomai's crippling of the system as seen from the Cortex Apr 26 18:53:22 the shared RAM contains two ring buffers for this purpose Apr 26 18:54:14 the X15 or the BBX's DSP might be simplier way of doing this Apr 26 18:54:17 and all the processing code is already written and works with alsa Apr 26 18:54:32 ds2: you're not actually being helpful, not even remotely Apr 26 18:54:44 It's for my bachelor's paper Apr 26 18:55:16 zmatt: I am trying to suggests a more appropriate tool given all the constraints Apr 26 18:55:20 And BeagleBone Black's are readily available in the lab Apr 26 18:56:59 bou4: why xenomai though rather than just an -rt kernel? Apr 26 18:59:19 Xenomai seemed faster in our eyes Apr 26 18:59:27 in your eyes or in your tests? Apr 26 18:59:27 As the is not running in Linux Apr 26 18:59:35 In the tests of Bela Apr 26 18:59:47 ok Apr 26 18:59:50 and in our tests too Apr 26 18:59:57 fair enough :) Apr 26 19:00:14 we tested interrupts and the time between interrupts and the processes waking up, was WAY faster Apr 26 19:00:25 I am going to get the numbers, sec Apr 26 19:01:04 With Xenomai we got 6 microseconds Apr 26 19:01:10 ah Apr 26 19:01:13 yeah Apr 26 19:01:20 that's a pretty big difference Apr 26 19:02:15 We'd like to have very long impulse responses convoluted in realtime Apr 26 19:02:34 So every win in time is a win in length of the impulse response Apr 26 19:02:54 what's the -rt number? Apr 26 19:03:26 -rt number? Apr 26 19:03:37 interrupt latency on -rt Apr 26 19:03:43 aha Apr 26 19:04:14 I got 40-44 us iirc in a test I did a while ago, but of course it might be different for you depending on methodology Apr 26 19:06:07 very long convolution... with that overlap-and-add technique using various sizes of FFTs to get FFT-like performance without the latency required for block processing? Apr 26 19:08:01 overlap-save technique Apr 26 19:08:04 but correct Apr 26 19:08:07 yeah that Apr 26 19:08:10 overlap-something Apr 26 19:08:12 overlap-add requires adding zeroes Apr 26 19:08:25 don't expect me to remember which is which ;) Apr 26 19:08:26 overlap-save works a little bit faster Apr 26 19:08:40 having a course on signal processing this semester helps :p Apr 26 19:09:32 I started working on the same thing a while back, but priorities eventually shifted to other stuff Apr 26 19:11:12 I don't have any priorities right now, it needs to be finished in 3 weeks Apr 26 19:11:17 other* Apr 26 19:11:39 what i find truly strange is the lack of complex multiply in libne10 Apr 26 19:11:49 well, here at work there are always more things that need to work urgently than time available for them ;-) Apr 26 19:12:21 I think I looked at libne10 Apr 26 19:12:21 writing our own (instead of just writing it in C) sped the time complex multiplying from 26% to 4% Apr 26 19:12:27 I don't recall being impressed Apr 26 19:12:42 https://github.com/thomasfaingnaert/arm-neon-complex/blob/master/src/arm_neon_complex.s Apr 26 19:14:35 you should probably use alignment-specifiers unless you actually need to work on misaligned data Apr 26 19:17:22 (16-byte alignment suffices, 32-byte alignment has no additional benefit) Apr 26 19:18:01 it should save a cycle per load instruction Apr 26 19:24:15 are we sure that the C++ compiler aligns every float to 156 bytes? Apr 26 19:24:19 16* Apr 26 19:24:32 certainly not, but you can ask it nicely to Apr 26 19:24:50 what does this require, as I am not aware of alignment-specifiers Apr 26 19:25:14 float data[N] alignas(16); Apr 26 19:25:33 (or __attribute__((aligned(16))) but using the C++ alignas is nicer) Apr 26 19:26:55 and for dynamically allocated memory there's aligned_alloc() Apr 26 19:29:09 Thanksss Apr 26 19:46:02 src/arm_neon_complex.s:25: Error: bad alignment -- `vld2.32 {d16-d19},[r1,:16]!' Apr 26 19:48:43 bits, not bytes Apr 26 19:48:48 so :128 Apr 26 19:51:47 thankss Apr 26 19:55:48 in case it's useful, here are my cryptic notes on cortex-a8 instruction timing: https://pastebin.com/raw/k69CEVbK Apr 26 19:58:02 (note: unless specified otherwise, the timings for "vadd" and such are referring to integer. timings for vector float ops are further down) Apr 26 20:01:03 these are based on testing, since the timings in the cortex-a8 TRM are known to be wrong Apr 26 20:01:49 (this observation, along with actual integer timings are originally from http://www.avison.me.uk/ben/programming/cortex-a8.html ) Apr 26 20:02:05 (although I use a different notation I consider to be more convenient) Apr 26 20:17:07 we timed it Apr 26 20:17:11 the different sizes Apr 26 20:17:32 and we notice no difference when timing 1024 * 256 multiplications Apr 26 20:17:46 with alignment you mean? Apr 26 20:17:54 yes Apr 26 20:20:16 hmm, it's possible this function is completely bottlenecked by floating-point operations and improving the load/store time has no effect as a result, I didn't analyze it :P Apr 26 20:20:28 how many cycles are you getting per loop iteration? Apr 26 20:23:41 24 cycles/iteration would be the best possible result Apr 26 20:32:45 how would i accurately measure cycles? Apr 26 20:34:00 sudo perf stat -e cycles ./command Apr 26 20:34:20 (I'm not sure if sudo is needed, I think it was for some uses of perf but not all) Apr 26 20:35:09 in the command call your routine a certain number of times with the same arguments Apr 26 20:36:25 compare the cycle timings for e.g. 1000 iterations of that loop vs 2000 iterations, take the difference and divide by (2000-1000) Apr 26 20:36:35 that's the most reliable way I currently have Apr 26 20:37:15 thankssss, i'll try Apr 26 20:37:39 I still want to put some effort into getting ETM to work to be able to obtain a detailed cycle-accurate instruction trace from the processor Apr 26 20:38:05 some day Apr 26 20:51:44 ola Apr 26 20:51:49 big difference in cycles Apr 26 20:52:00 between? Apr 26 20:52:09 w/o alignment Apr 26 20:53:03 oh now you're seeing a big difference? because you're operating repeatedly on the same block of data maybe? (i.e. always hits L1 cache) Apr 26 20:53:13 eh, cache, not L1 cache... probably Apr 26 20:53:15 maybe Apr 26 20:53:15 85120 Apr 26 20:53:36 (depends on the prior history of the memory that was accessed... neon doesn't allocate in L1 but it does hit in L1) Apr 26 20:53:49 https://imgur.com/a/7o2qLDs Apr 26 20:55:02 how big were the argument vectors in this test? Apr 26 20:55:03 ok, i calculated it Apr 26 20:55:07 not so much of a difference Apr 26 20:55:17 209 vs 206 cycles per iteration Apr 26 20:56:35 holy shit that's slow... but it looks like you used way too large vectors Apr 26 20:56:55 256*1024 was the size of the vectors Apr 26 20:57:01 the maximum before it segfaults :p Apr 26 20:57:59 https://imgur.com/a/Kb2CWqK Apr 26 20:58:21 in other words you're testing DDR3 performance rather than testing the performance of your algorithm :P Apr 26 20:59:09 use much smaller vectors if you want an actual test Apr 26 21:00:46 no need to initialize the arrays btw Apr 26 21:02:39 you really like the BBB it seems Apr 26 21:04:22 and I very much appreciate your help Apr 26 21:04:36 I wouldn't be able to do this without the community surrounding the BB Apr 26 21:04:56 and definitely not without you Apr 26 21:05:03 I do like it. I also just know a lot about it since I've been working with them for years Apr 26 21:05:06 thank you :) Apr 26 21:05:30 At the moment it's just a university project Apr 26 21:05:42 But I will probably maintain it further Apr 26 21:05:59 As the topic of DSP on the BBB is very interesting Apr 26 21:06:20 Maybe one day it will get recognized :p Apr 26 21:06:42 the neon unit of the A8 is not bad if you optimize code for it properly Apr 26 21:08:36 we redid the measurement Apr 26 21:08:45 40 cycles per iteration vs 48 Apr 26 21:10:46 interesting, that's actually a significantly bigger difference than I hypothesized Apr 26 21:12:09 https://imgur.com/a/M0oyjFr Apr 26 21:12:11 the data Apr 26 21:12:23 sorry for the dutch :p Apr 26 21:12:52 can you try: sudo perf stat -e cycles,r4d,r4e,r4f Apr 26 21:13:06 no worries Apr 26 21:16:00 will do Apr 26 21:16:14 are you an EE? Apr 26 21:16:38 nope Apr 26 21:16:47 CS? Apr 26 21:16:59 background in mathematics and computer science Apr 26 21:17:12 huge background it seems Apr 26 21:17:29 well I didn't learn this stuff at university Apr 26 21:17:30 :) Apr 26 21:17:58 here's a list of cortex-a8 PMU events btw: https://pastebin.com/raw/X0KUfifB Apr 26 21:19:02 4e-4f is to confirm it's not missing in l2 and 4d is because I'm curious whether it might still be hitting in L1 Apr 26 21:19:36 I still don't know what the performance implications are for neon if it hits in L1... there's presumably a good reason why they don't allocate in L1 for neon Apr 26 21:20:19 zmatt: https://imgur.com/a/WCDW01r Apr 26 21:23:18 hmm, so it not missing L2, but it *is* hitting L1 Apr 26 21:25:02 it sucks that the exact behaviour and timings of the caches are not really documented adequately Apr 26 21:26:50 but anyway, if this function is currently taking only 4% of your computation time, I guess it doesn't make sense to optimize it excessively :) Apr 26 21:28:25 Where can i find documentation about r4d,r4e and r4f? Apr 26 21:28:28 It is not in the man page Apr 26 21:28:52 do be mindful of the L2 cache size (256 KB). as you saw earlier, once the data set you're working with exceeds that, all performance goes down the drain Apr 26 21:29:13 they're raw pmu event numbers, that's why I pastebinned a list of cortex-a8 pmu events Apr 26 21:30:25 perf can record up to four different pmu events in addition to the number of cycles Apr 26 21:31:34 ultimately this list comes from the Cortex-A8 Technical Reference Manual **** ENDING LOGGING AT Fri Apr 27 03:00:02 2018