**** BEGIN LOGGING AT Thu Apr 26 03:00:03 2018
Apr 26 06:14:12 <Harsh_> Hello
Apr 26 06:14:53 <Harsh_> I want to make a camera module for ADAS purpose. I have few questions
Apr 26 06:16:48 <Harsh_> I want to make a camera module with the help of BeagleBoard-xM and a CMOS Image sensor..... I want to know that for LED flicker mitigation we can write algo or not ?
Apr 26 06:22:45 <tbr> BBxm is a rather ancient hardware…
Apr 26 06:24:56 <Harsh_> Means ?
Apr 26 06:27:54 <tbr> It's a single core ARM Cortex-A8 - don't expect too much in terms of performance
Apr 26 06:28:12 <tbr> Then again, hardware runs software, any software
Apr 26 06:28:57 <tbr> everything is possible, you will get something to run. It's impossible to say if it will be sufficient for your needs.
Apr 26 07:19:26 <fred__tv> Hi , when I find an OS Image for beaglebone black, generally, what should I do to make it run on a beaglebone white ?
Apr 26 07:20:19 <LetoThe2nd> fred__tv: try and see what fails? :-)
Apr 26 07:21:32 <fred__tv> some images won't start at all (stuck leds...)
Apr 26 07:22:25 <LetoThe2nd> thats absolutely possible (actually one should thing of the BBW as obsolete these days)
Apr 26 07:23:22 <fred__tv> probably the fact BBW has no internal eMMC...
Apr 26 07:23:44 <LetoThe2nd> thats basically not a killer, unless the image tries to do something with it
Apr 26 07:24:09 <fred__tv> I own a number of BBW to work with...
Apr 26 07:24:34 <LetoThe2nd> and i own a lot of intel p4 to work with....
Apr 26 07:24:43 <fred__tv> :-)))
Apr 26 07:24:49 <LetoThe2nd> (still they are obsolete and i don't touch them :P)
Apr 26 07:25:40 <LetoThe2nd> probably sticking to *old* images with the patched 3.something kernel gives you a greater chance. are actually building our own stuff, i'd guess the upstream kernel+dtb should be fine
Apr 26 07:26:34 <zmatt> fred__tv: any image for the BBB should work on the BBW
Apr 26 07:27:14 <LetoThe2nd> zmatt: actually i disagree
Apr 26 07:27:52 <zmatt> I mean, it's possible to make an image that works on the BBB but not the BBW, but it would be worthy of a bug report
Apr 26 07:28:11 <zmatt> since u-boot picks the appropriate device tree based on board identification
Apr 26 08:09:51 <parduz> hi zmatt. Are you here?
Apr 26 08:22:43 <fred__tv> LCD7 for BBB can be used in a RaspberryPi ?
Apr 26 08:26:11 <fred__tv> (in gpio mode)
Apr 26 08:33:16 <ssofia> Hi..
Apr 26 08:46:02 <zmatt> fred__tv: if you can accurately toggle 20 gpios at a rate of 30 MHz, sure
Apr 26 08:46:06 <zmatt> :P
Apr 26 08:48:13 <zmatt> parduz: "are you here?" is roughly as stupid a question as "is anyone here?".  if you have an actual question for me to answer, ask that instead :P
Apr 26 08:48:35 <zmatt> (I'm sporadically here throughout the day)
Apr 26 08:50:22 <fred__tv> Do you mean raspberry is not capable of, or nothing has been done in this way ?  (considerint raspberry has both a dedicated lcd connection and an HDMI port...)
Apr 26 08:51:13 <LetoThe2nd> fred__tv: how comes that you expect us to explain the gpio/lcd capabilities of a rpi?
Apr 26 08:51:21 <zmatt> does the rpi have a parallel video output? I thought it only had something like mipi dsi
Apr 26 08:51:46 <zmatt> anyway, you were the one you said "in gpio mode"
Apr 26 08:53:05 <zmatt> also, what LetoThe2nd just said
Apr 26 08:56:43 <fred__tv> just to know if anybody knows it was done.....
Apr 26 08:57:17 <LetoThe2nd> well maybe ask the rpi people "hello, i have this lcd, can i use it with your board."
Apr 26 08:57:22 <LetoThe2nd> would make sense, right?
Apr 26 08:57:41 <zmatt> might have better luck in an rpi channel (they won't know what the lcd7 cape is, but the key question is whether it supports a parallel digital video output)
Apr 26 08:59:37 <zmatt> actually it seems it does
Apr 26 09:00:25 <zmatt> https://pinout.xyz/  if you click on various pins you will see mux options for "DPI" (Display Parallel Interface)
Apr 26 09:01:04 <zmatt> hopefully there's a more convenient table out there somewhere
Apr 26 09:02:44 <parduz> hi zmatt. I was wondering if you was reading... sorry. I saw the reply from Robert Nelson, but i don't get what he meant.
Apr 26 09:03:23 <parduz> Does he meant that the patch slipped out and will be implemented again?
Apr 26 09:04:54 <zmatt> parduz: not "will be", "has been".  he said he merged it back in
Apr 26 09:05:06 <zmatt> i.e. hit the reply button and say Thanks!
Apr 26 09:05:07 <zmatt> :P
Apr 26 09:05:25 <zmatt> (maybe after confirming it works in the latest kernel)
Apr 26 09:05:25 <parduz> yep, i'll do.
Apr 26 09:05:38 <parduz> so i have to update the kernel
Apr 26 09:06:29 <parduz> or have i to install a whole new disk image?
Apr 26 09:06:38 <zmatt> just update the kernel
Apr 26 09:08:07 <zmatt> it looks like 4.9.88-ti-r111 and 4.14.35-ti-r44 have the patch
Apr 26 09:09:49 <parduz> ok.... never done it before... i'm googling for an "how to".
Apr 26 09:10:50 <parduz> ok... let see if what the official wiki says works
Apr 26 09:11:40 <zmatt>  /opt/scripts/tools/update_kernel.sh
Apr 26 09:11:44 <zmatt> iirc
Apr 26 09:28:07 <parduz> well, it better. Not perfect but better... at least the touchscreen is usable, now
Apr 26 12:11:01 <jof1> 'lo
Apr 26 12:35:07 <jof1> Anyone here who has experience with the ti-sgx modules? I'm wondering why the omaplfb kernel-module isn't installed anymore?
Apr 26 12:36:18 <zmatt> jof1: that kernel module doesn't exist anymore with the current drivers
Apr 26 12:37:28 <zmatt> jof1: pvrsrvkm and tilcdc interact via the dma-buf api
Apr 26 12:38:54 <jof1> I see
Apr 26 12:48:50 <zmatt> NishanthMenon: any idea whether the dra7 L3 clock asymmetric aging issue (i892) also applies to omap5 ?  it's not explicitly included in the errata, but that may just be lack of maintenance
Apr 26 12:49:48 <NishanthMenon> zmatt, if my memory serves me right.. (and it can be very flaky) IP was the same or derivative.
Apr 26 12:50:30 <zmatt> NishanthMenon: how serious is it? I'm assuming its nature means that the more time is spent with PD_CORE powered but the L3 clock gated, the greater the risk of instability?
Apr 26 12:53:35 <NishanthMenon> zmatt, i dont remember correctly now
Apr 26 12:53:55 <zmatt> ok
Apr 26 12:54:04 <NishanthMenon> sorry. i remember i had to keep the PD always ON and could not even do CSWR or even inactive, if i remember
Apr 26 12:54:36 <zmatt> yeah but dra7 doesn't support PD_CORE=OFF right? omap5 does
Apr 26 12:55:13 <zmatt> (at least in theory, it remains to be seen whether it actually works)
Apr 26 12:55:31 <NishanthMenon> zmatt, originally they all did
Apr 26 12:55:39 <NishanthMenon> i mean target OFF, if i remember right
Apr 26 12:58:33 <zmatt> hmm, yeah I guess if it didn't there wouldn't be a reason to have PD_CORE and PD_COREAON separate from each other I guess
Apr 26 12:59:04 <zmatt> actually no that's not true, it could be just for retention
Apr 26 12:59:13 <zmatt> oh well
Apr 26 12:59:29 <zmatt> but you're saying switching PD_CORE to OFF is not sufficient to prevent the degradation?
Apr 26 13:06:30 <beezy> hi
Apr 26 13:07:35 <beezy> Im interested in the beaglebone black...im looking for a display for it and noticed that just about everything is in the form of a capacitive touchscreen cape
Apr 26 13:07:55 <beezy> thats all fine and good but i was wondering if there was a recommended ips display
Apr 26 13:08:12 <beezy> i noted that the beaglebpone has a mini hdmi output
Apr 26 13:09:46 <zmatt> beezy: it does, so you can connect basically any hdmi display
Apr 26 13:10:02 <zmatt> (although beware of resolution limitations)
Apr 26 13:31:25 <NishanthMenon> zmatt, sorry, i am multitasking -> you should NOT PD_CORE to off -> if you do that some leakage issues also come into play.. i forget details been more than 3 or 4 something years...
Apr 26 13:31:40 <NishanthMenon> ttyl
Apr 26 14:21:37 <bou4> How do you debug the PRU
Apr 26 14:22:15 <zmatt> google prudebug
Apr 26 14:22:24 <zmatt> there might also be other pru debuggers
Apr 26 14:23:38 <zmatt> maybe I'll eventually make one using my py-uio library... it already supports single-stepping the core
Apr 26 14:24:46 <bou4> Currently starting to use the McASP with the PRU
Apr 26 14:24:53 <bou4> But the learning curve is pretty steap
Apr 26 14:26:21 <zmatt> yep
Apr 26 14:28:27 <bou4> sbbo    r27, MCASP_BASE_ADDRESS, register, 4
Apr 26 14:28:37 <bou4> Bela uses that to write to a McASP register
Apr 26 14:28:40 <zmatt> McASP can be a bit eccentric too... it may be worth to start with experimenting with mcasp from the cortex-a8 using uio, even though you can probably can't keep it running stably that way (due to scheduling)
Apr 26 14:28:48 <zmatt> Bela ?
Apr 26 14:28:57 <bou4> https://github.com/BelaPlatform/Bela
Apr 26 14:29:20 <bou4> but I don't know why the first argument doesn't have &
Apr 26 14:29:34 <zmatt> obsolete syntax
Apr 26 14:29:45 <zmatt> you're allowed to omit the & even though doing so is deprecated
Apr 26 14:30:07 <bou4> so best practice is to use it?
Apr 26 14:30:25 <zmatt> definitely
Apr 26 14:33:37 <zmatt> while browsing their pru code I see a lot of weird or inefficient things actually, so it might not be the greatest example to learn from
Apr 26 14:35:07 <bou4> Good to know
Apr 26 14:35:57 <bou4> I am trying to do the same thing as Bela to improve our latency
Apr 26 14:36:17 <bou4> But as I find their code ugly, we're doing it from scratch
Apr 26 14:36:52 <bou4> We also wrote our own complex multiply for the neon processor
Apr 26 14:37:07 <bou4> Because libne10 doesn't have that, which we still find pretty strange
Apr 26 14:38:14 <bou4> But the macro's for writing the McASP are good, right?
Apr 26 14:38:27 <bou4> I don't know how to do it any better
Apr 26 14:38:53 <bou4> Albeit I don't understand the MCASP_REG_WRITE_EXT for writing beyond the 0xFF boundary
Apr 26 14:42:49 <zmatt> MCASP_REG_SET_BIT_AND_POLL is pointlessly inefficient.  I would just pass the bit number (rather than the mask) as argument and do: https://pastebin.com/raw/bKpmYH90
Apr 26 14:44:53 <bou4> Ah, thanks!
Apr 26 14:45:00 <bou4> I think I understand the boundary problem
Apr 26 14:45:24 <bou4> Registers like XBUF have an address if 0x200
Apr 26 14:45:48 <zmatt> the read/write reg ext macros can be simplified too:  https://pastebin.com/raw/cgWstn4h
Apr 26 14:46:21 <zmatt> yeah most instructions only support immediate operands in range 0x00-0xff
Apr 26 14:47:24 <zmatt> so if you want to use an operand outside that range, you have to load that value into register
Apr 26 14:49:27 <bou4> Aha, I see
Apr 26 14:49:46 <bou4> The third operand of SBBO is OP(255), which means it's limited to 8 bits
Apr 26 14:50:13 <zmatt> OP(255) means "either a register, or an immediate in range 0-255"
Apr 26 14:51:21 <zmatt> ("register" here in the general sense, i.e. including r0.b0 and such)
Apr 26 14:51:46 <bou4> your help is really appreciated, thanks!
Apr 26 14:51:58 <bou4> your making a Belgian student very happy
Apr 26 14:52:02 <bou4> you're*
Apr 26 14:52:41 <zmatt> :)
Apr 26 15:05:23 <jof1> I'm really confused.. In bb-kernel (https://github.com/RobertCNelson/bb-kernel) the SGX stuff seems to have been dropped in version (branch) am33x-v4.16, but the git log doesn't say why are what I should do instead? Does anyone here know?
Apr 26 15:05:46 <jof1> s/are/and/
Apr 26 15:09:13 <bou4_> zmatt: Can the same approach of your rewritten write_ext macro be used for the read_ext?
Apr 26 15:09:23 <bou4_> I assume it is possbile
Apr 26 15:09:44 <zmatt> yes, and it occurs in more places
Apr 26 15:11:11 <bou4_> alright, thanks for saying
Apr 26 15:12:26 <zmatt> in general, https://pastebin.com/raw/v23XdtsW
Apr 26 15:12:47 <zmatt> wait, I didn't phrase that right
Apr 26 15:13:35 <zmatt> better: https://pastebin.com/raw/y2Partxv
Apr 26 15:37:19 <jables_> Okay, noob here looking for some info....
Apr 26 15:38:37 <jables_> Survey, open-ended question about application physical environment - what kinds of applications have you put your beagle into and how did you mitigate problems like humidity, temperature, etc.?
Apr 26 15:38:55 <jables_> Was the board the limiting factor or was there something else?
Apr 26 16:14:55 <zmatt> lol, good luck with that survey :)
Apr 26 16:22:51 <bou4> Would it be a terrible idea to always load the register offset in a register
Apr 26 16:23:13 <zmatt> bou4: ?
Apr 26 16:23:19 <bou4> This way I got one macro that supports both, but with a little overhead for registers whose offset is smaller than 0xFF
Apr 26 16:24:45 <zmatt> that's up to you.   I'm not sure I even agree with the use of a macro for basically just a load/store instruction
Apr 26 16:27:34 <bou4> is there a negative side on using macros?
Apr 26 16:27:34 <zmatt> actually, now that I think of it I'm pretty sure I disagree with the use of a macro for this.  it doesn't significantly save space nor make the code clearer... I'd even say it does the opposite
Apr 26 16:28:07 <bou4> but initializing the mcasp requires a lot of writing to registers
Apr 26 16:28:21 <bou4> using macros will significantly shorten that code
Apr 26 16:28:59 <bou4> is there a negative side to using macros besides readability?
Apr 26 16:32:10 <zmatt> ah right I thought the macro just wrapped the store (i.e. still expected the value in register) but I see now it's to write a constant to a register
Apr 26 16:34:11 <john____> hi everyone, question, is it normal for the signal voltage on the servo pins of a beaglebone blue to fall below 3.3v (i got around 2.1 when I probed)... or am i doing something wrong? thanks!
Apr 26 16:39:22 <zmatt> in that case I guess it helps... although you can also just load the values into consecutive registers and store that: https://pastebin.com/raw/iK45qzuW
Apr 26 16:39:36 <zmatt> john____: uhh
Apr 26 16:39:42 <zmatt> john____: that doesn't sound right
Apr 26 16:40:01 <zmatt> let me check the schematic
Apr 26 16:41:01 <john____> zmatt: all right sure.. thanks! fyi, i tried using channels 1 and 5 (since channels 5-8 are boot mode pins) and both times, the voltage i probed was a little over 2v
Apr 26 16:41:31 <zmatt> this is without external loading?
Apr 26 16:42:26 <john____> yes.. no loads connected..
Apr 26 16:44:39 <zmatt> what about 7 or 8 ?
Apr 26 16:47:07 <john____> zmatt: same result.. around 2v
Apr 26 16:47:30 <zmatt> what about any 3.3v supply output pin?
Apr 26 16:49:48 <john____> zmatt: sorry i'm not following, what pin do you want me to check? without the motor the servo control pin outputs the correct voltage level
Apr 26 16:50:14 <john____> around 3.3v
Apr 26 16:50:15 <zmatt> ... so there was actually external load connected
Apr 26 16:51:09 <john____> zmatt: oh okay.. sorry, i was thinking you meant the load to the servo.. :D yes, there was a servo connected when i probed it to be around 2v
Apr 26 16:52:37 <zmatt> then it may be fine I guess, depends on the input specs of the servo
Apr 26 16:53:29 <john____> oh okay i see... well i got a cheap servo, and i couldn't find anything relevant.. http://www.electronicoscaldas.com/datasheet/MG996R_Tower-Pro.pdf
Apr 26 16:53:47 <zmatt> the pins on the beaglebone have 4.7 kΩ series resistors, so 2.2V voltage drop (3.3V - 2.1V) means it's drawing 0.25 mA
Apr 26 16:54:25 <zmatt> maybe it's typical for control signals to a servo to use an opto-coupler? that would at least explain why they included those resistors
Apr 26 16:55:23 <john____> ohhhh... okay.. i'll keep that in mind.. well, i'm not very familiar but I guess I could give it a try.. since it's already at 2.1v without load to the servo
Apr 26 16:55:41 <zmatt> physical loading of the servo is irrelevant to the control signals
Apr 26 16:55:45 <zmatt> I meant electrical loading
Apr 26 16:56:06 <john____> oh yeah.. correct.. makes sense..
Apr 26 16:57:28 <john____> thanks! one more thing though if you don't mind, the servo that i'm using tends to move to one direction before calibrating itself to the correct position.. is that normal?
Apr 26 16:57:38 <zmatt> I have no idea
Apr 26 16:58:46 <john____> i read somewhere that the control signal must come first before the rail, i was trying this but somehow, i get around 1v at the control signal when the rail is off... is that normal?
Apr 26 16:58:57 <zmatt> it sounds plausible that it needs to calibrate itself at startup
Apr 26 16:59:22 <zmatt> control signal must be low when the servo is unpowered
Apr 26 16:59:57 <john____> ohhh i see... well, that explains why i see a low voltage..
Apr 26 17:00:21 <zmatt> and okay, so it's not using an opto-coupler, instead they included those series resistors because they realized that people would end up accidently driving the servo pins high while the motor is unpowered
Apr 26 17:00:22 <john____> all right.. sounds good to me.. glad to know i didn't break anything.. thanks zmatt!
Apr 26 17:00:55 <zmatt> which, without the series resistor, could destroy both the processor and the servo
Apr 26 17:03:30 <john____> so that's what they're for.. if you don't mind, in what way does it actually prevent the processor and servo from being destroyed?
Apr 26 17:03:40 <bou4> If you write a macro with .mparam value
Apr 26 17:03:49 <bou4> and you actually need &value
Apr 26 17:04:01 <bou4> do you need to call the macro with &r1
Apr 26 17:04:08 <bou4> or can you write &value in the macro
Apr 26 17:04:20 <zmatt> you can write &value in the macro
Apr 26 17:04:23 <zmatt> pretty sure
Apr 26 17:04:48 <zmatt> if I wanted to be sure I'd just do a quick test... you could have done the same :P
Apr 26 17:05:55 <bou4> hahah, I don't really know how to get something out of it for debugging purposes, or will it simply not compile?
Apr 26 17:09:14 <zmatt> if it weren't allowed you'd get a syntax error
Apr 26 17:12:25 <zmatt> you can also assemble an invocation of your macro along with the expansion you're expecting and use pasm -L to inspect whether the two versions produce the same code
Apr 26 17:16:28 <bou4> fantastic, thanks!
Apr 26 17:44:32 <zmatt> bou4: uhh, I'm surprised Bela's firmware even works... they're polling the wrong register in MCASP_REG_SET_BIT_AND_POLL
Apr 26 17:45:30 <zmatt> they should be reading GBLCTL instead of RGBLCTL or XGBLCTL
Apr 26 18:08:52 <bou4> that is indeed part of the code i don't understand
Apr 26 18:15:18 <bou4> So you recommend that I don't base my code on Bela?
Apr 26 18:15:31 <bou4> Perhaps better checking everything in the datasheet
Apr 26 18:16:38 <zmatt> it could still be useful to check how they do certain things, but I would definitely suggest you at least understand what they're doing
Apr 26 18:17:21 <bou4> Is my intention
Apr 26 18:17:59 <bou4> I am also only going to drive the McASP
Apr 26 18:18:08 <bou4> and not other peripherals such as the on board adc's en dac's
Apr 26 18:18:14 <bou4> my code will be a lot smaller
Apr 26 18:18:52 <bou4> Is it a good idea to use polling instead of interrupts?
Apr 26 18:19:09 <bou4> The PRU will do nothing else but talk with the McASP, so I don't see the urge to use interrupts
Apr 26 18:19:38 <zmatt> I strongly recommend using interrupts
Apr 26 18:20:02 <zmatt> polling creates continuous pointless traffic on the L4 interconnect
Apr 26 18:20:37 <zmatt> well, maybe "strongly" is a bit too strong.  but I do recommend it
Apr 26 18:22:41 <zmatt> eh, actually now that I think about it I'm not sure it matters all that much... the polling rate won't be that high anyway
Apr 26 18:23:32 <bou4> my reasoning was that the PRU is way faster than the sampling rate of the McASP
Apr 26 18:23:49 <bou4> so that polling would be faster
Apr 26 18:26:27 <zmatt> using an irq wouldn't be to reduce the time it takes to react to an event... I mean, it will reduce it slightly, but at most 15 pru cycles or so which isn't a big deal
Apr 26 18:26:55 <zmatt> it would mostly be to reduce spam on the L4 interconnect
Apr 26 18:27:44 <zmatt> keep in mind that you could simplify the pru firmware by doing initialization stuff from the cortex-a8 instead
Apr 26 18:27:54 <zmatt> such as setting up the pruss interrupt controller
Apr 26 18:31:22 <ds2> why is the McASP being polled or even IRQ'ed handled?
Apr 26 18:31:52 <ds2> is there a reason why DMA won't work?
Apr 26 18:32:12 <zmatt> dma is a pointless complication in this case
Apr 26 18:32:50 <zmatt> probably
Apr 26 18:32:53 <zmatt> maybe
Apr 26 18:33:06 <zmatt> bou4: what exactly are you doing again?
Apr 26 18:33:37 <bou4> we are doing realtime DSP with guitar signals
Apr 26 18:34:10 <ds2> X15 or Bone?
Apr 26 18:34:14 <zmatt> but surely you're not using PRU for signal processing itself
Apr 26 18:34:20 <bou4> correct
Apr 26 18:34:28 <zmatt> so what's the purpose of PRU here?
Apr 26 18:34:30 <bou4> only to create a buffer to write to the McASP
Apr 26 18:34:45 <bou4> because Xenomai lets us do very realtime processes
Apr 26 18:34:47 <zmatt> so you're using PRU as... a DMA controller?
Apr 26 18:34:51 <bou4> correct
Apr 26 18:34:56 <ds2> that sounds like creating a lot of traffic for no good reason
Apr 26 18:35:08 <bou4> but Xenomai lets us only do that if we don't use code inside the kernel
Apr 26 18:35:10 <bou4> so no alsa
Apr 26 18:35:19 <ds2> working in the artic and need extra heat?
Apr 26 18:35:27 <ds2> prehaps a xeon would work better :D
Apr 26 18:35:42 <zmatt> actually if he uses an irq from mcasp it shouldn't create extra traffic
Apr 26 18:36:04 <ds2> the transfer to the PRU and PRU to the McASP creates traffic
Apr 26 18:36:28 <zmatt> exactly the same amount as would be generated if EDMA were used instead
Apr 26 18:36:59 <ds2> EDMA doesn't poll and the PRU doesn't have a "interrupt"
Apr 26 18:37:53 <bou4> You think it sounds like a bad idea?
Apr 26 18:37:58 <zmatt> incorrect, both the tx and rx irqs of mcasp 0 and 1 are available on the pruss interrupt controller
Apr 26 18:38:06 <zmatt> so there is no need to poll
Apr 26 18:38:24 <ds2> the PRU doesn't have the traditional concept of an interrupt
Apr 26 18:38:39 <zmatt> that's irrelevant
Apr 26 18:38:47 <ds2> it is still a poll... but I do conceed that the polling is inside the PRUSS vs on the Lx buses
Apr 26 18:39:23 <ds2> if the concern is latency on the ALSA ('stock' McASP drivers)...prehaps a low latency McASP driver is inorder
Apr 26 18:39:26 <zmatt> it can actually suspend its internal clocks until the "interrupt" hits (using the sleep instruction)
Apr 26 18:39:43 <zmatt> so no polling at all, not even inside the subsystem
Apr 26 18:39:52 <bou4> But if we write it with the PRU
Apr 26 18:40:00 <zmatt> though I very much doubt the polling has much relevant impact anyway
Apr 26 18:40:02 <ds2> sleep sleeps thePRU
Apr 26 18:40:14 <bou4> then we now exactly what its timing is going to be
Apr 26 18:40:36 <bou4> but isn't polling the easier way
Apr 26 18:40:44 <bou4> as i currently have no idea how interrupts work in the pru
Apr 26 18:40:53 <zmatt> it is easier
Apr 26 18:40:57 <ds2> IMO - EDMA is simplier and less hassle
Apr 26 18:41:03 <zmatt> and you can always switch to using interrupts later
Apr 26 18:41:10 <ds2> you will need to get the data into DRAM
Apr 26 18:41:10 <zmatt> I wouldn't call EDMA simple
Apr 26 18:41:29 <ds2> so it is either the PRUSS writing or the EDMA
Apr 26 18:42:12 <ds2> you will have jitter either way - PRUSS -> L4 jitter, PRUSS ->DRAM jitter
Apr 26 18:42:27 <bou4> can you do EDMA without using the kernel?
Apr 26 18:42:34 <ds2> EDMA -> DRAM jitter (all this is on top of the jitter created by the FIFO on the McASP)
Apr 26 18:42:37 <zmatt> yes, but you can't receive interrupts from EDMA in linux userspace, so you'd end up polling EDMA instead
Apr 26 18:42:41 <ds2> EDMA Is a hardware block
Apr 26 18:42:50 <ds2> blah... Userspace
Apr 26 18:43:18 <zmatt> bou4: you *can* receive interrupts in userspace using xenomai right?  I saw that Bela was using an event to signal the cortex-a8
Apr 26 18:43:22 <ds2> i see where this is going. userspace vs k-space...i'll go get lunch
Apr 26 18:43:40 <zmatt> ds2: 20:34 < bou4> because Xenomai lets us do very realtime processes
Apr 26 18:43:49 <zmatt> ds2: 20:35 < bou4> but Xenomai lets us only do that if we don't use code inside the kernel
Apr 26 18:44:27 <ds2> there are ways to get RT Inside the kernel
Apr 26 18:44:35 <zmatt> ds2: "linux userspace" might actually be a slight misnomer... they're not really linux userspace threads anymore once you use xenomai
Apr 26 18:45:11 <zmatt> since they're scheduled outside and above linux userspace and kernel entirely
Apr 26 18:45:34 <zmatt> if I remember correctly how xenomai works
Apr 26 18:45:49 <ds2> let me put it another way - if you want to tied your hands behind your back with the wrong tools, then this isn't going to be productive
Apr 26 18:46:18 <zmatt> he just wants to do math and send it to mcasp with very low latency... his approach seems perfectly fine to me
Apr 26 18:46:46 <ds2> IIRC - Xenomai is limited realtime (good enough to issue commands to a driver (i.e. linuxcnc fork for arm)) but...
Apr 26 18:47:44 <zmatt> xenomai is hard RT but you can't use any linux kernel drivers, you're limited to the xenomai API
Apr 26 18:47:52 <ds2> is low latency the sole goal? or also low jitter
Apr 26 18:48:14 <zmatt> jitter is irrelevant since mcasp is responsible for pacing
Apr 26 18:48:27 <zmatt> there's just a hard deadline
Apr 26 18:49:25 <ds2> no
Apr 26 18:49:27 <bou4> zmatt: we indeed have interrupts from the pru going
Apr 26 18:49:39 <bou4> without using the linux kernel
Apr 26 18:49:46 <ds2> say you use the PRUSS to pull data out of the McASP and accumate the data
Apr 26 18:49:48 <zmatt> bou4: I'd suggest sticking to your approach, it's a good one
Apr 26 18:50:09 <ds2> that path can be low latency but Linux could be reacting to it in bursts which could be very jittery
Apr 26 18:50:13 <zmatt> bou4: using edma instead is possible, but more complicated without any benefit really
Apr 26 18:50:40 <zmatt> ds2: signal processing is normally not bursty
Apr 26 18:50:54 <ds2> if there are delays in L4....
Apr 26 18:50:59 <bou4> the way our program works is
Apr 26 18:51:10 <bou4> (or how we like to let it work)
Apr 26 18:51:22 <ds2> if you can get all the processing to happen in the PRU, that's a different story
Apr 26 18:51:29 <bou4> the PRU signals that new data is available
Apr 26 18:52:06 <zmatt> well, step 1 is that mcasp signals to pru that new data is available
Apr 26 18:52:15 <ds2> I wonder if this can be done just as well with PREMPT/RT and a proper driver
Apr 26 18:52:18 <zmatt> or is mcasp output only?
Apr 26 18:52:25 <bou4> mcasp input and output
Apr 26 18:53:06 <bou4> Our program processes as fast as possible the available block and writes it back to the shared RAM
Apr 26 18:53:21 <ds2> it sounds like the PRU is being used to work around Xenomai's crippling of the system as seen from the Cortex
Apr 26 18:53:22 <bou4> the shared RAM contains two ring buffers for this purpose
Apr 26 18:54:14 <ds2> the X15 or the BBX's DSP might be simplier way of doing this
Apr 26 18:54:17 <bou4> and all the processing code is already written and works with alsa
Apr 26 18:54:32 <zmatt> ds2: you're not actually being helpful, not even remotely
Apr 26 18:54:44 <bou4> It's for my bachelor's paper
Apr 26 18:55:16 <ds2> zmatt: I am trying to suggests a more appropriate tool given all the constraints
Apr 26 18:55:20 <bou4> And BeagleBone Black's are readily available in the lab
Apr 26 18:56:59 <zmatt> bou4: why xenomai though rather than just an -rt kernel?
Apr 26 18:59:19 <bou4> Xenomai seemed faster in our eyes
Apr 26 18:59:27 <zmatt> in your eyes or in your tests?
Apr 26 18:59:27 <bou4> As the is not running in Linux
Apr 26 18:59:35 <bou4> In the tests of Bela
Apr 26 18:59:47 <zmatt> ok
Apr 26 18:59:50 <bou4> and in our tests too
Apr 26 18:59:57 <zmatt> fair enough :)
Apr 26 19:00:14 <bou4> we tested interrupts and the time between interrupts and the processes waking up, was WAY faster
Apr 26 19:00:25 <bou4> I am going to get the numbers, sec
Apr 26 19:01:04 <bou4> With Xenomai we got 6 microseconds
Apr 26 19:01:10 <zmatt> ah
Apr 26 19:01:13 <zmatt> yeah
Apr 26 19:01:20 <zmatt> that's a pretty big difference
Apr 26 19:02:15 <bou4> We'd like to have very long impulse responses convoluted in realtime
Apr 26 19:02:34 <bou4> So every win in time is a win in length of the impulse response
Apr 26 19:02:54 <ds2> what's the -rt number?
Apr 26 19:03:26 <bou4> -rt number?
Apr 26 19:03:37 <zmatt> interrupt latency on -rt
Apr 26 19:03:43 <bou4> aha
Apr 26 19:04:14 <zmatt> I got 40-44 us iirc in a test I did a while ago, but of course it might be different for you depending on methodology
Apr 26 19:06:07 <zmatt> very long convolution... with that overlap-and-add technique using various sizes of FFTs to get FFT-like performance without the latency required for block processing?
Apr 26 19:08:01 <bou4> overlap-save technique
Apr 26 19:08:04 <bou4> but correct
Apr 26 19:08:07 <zmatt> yeah that
Apr 26 19:08:10 <zmatt> overlap-something
Apr 26 19:08:12 <bou4> overlap-add requires adding zeroes
Apr 26 19:08:25 <zmatt> don't expect me to remember which is which ;)
Apr 26 19:08:26 <bou4> overlap-save works a little bit faster
Apr 26 19:08:40 <bou4> having a course on signal processing this semester helps :p
Apr 26 19:09:32 <zmatt> I started working on the same thing a while back, but priorities eventually shifted to other stuff
Apr 26 19:11:12 <bou4> I don't have any priorities right now, it needs to be finished in 3 weeks
Apr 26 19:11:17 <bou4> other*
Apr 26 19:11:39 <bou4> what i find truly strange is the lack of complex multiply in libne10
Apr 26 19:11:49 <zmatt> well, here at work there are always more things that need to work urgently than time available for them ;-)
Apr 26 19:12:21 <zmatt> I think I looked at libne10
Apr 26 19:12:21 <bou4> writing our own (instead of just writing it in C) sped the time complex multiplying from 26% to 4%
Apr 26 19:12:27 <zmatt> I don't recall being impressed
Apr 26 19:12:42 <bou4> https://github.com/thomasfaingnaert/arm-neon-complex/blob/master/src/arm_neon_complex.s
Apr 26 19:14:35 <zmatt> you should probably use alignment-specifiers unless you actually need to work on misaligned data
Apr 26 19:17:22 <zmatt> (16-byte alignment suffices, 32-byte alignment has no additional benefit)
Apr 26 19:18:01 <zmatt> it should save a cycle per load instruction
Apr 26 19:24:15 <bou4> are we sure that the C++ compiler aligns every float to 156 bytes?
Apr 26 19:24:19 <bou4> 16*
Apr 26 19:24:32 <zmatt> certainly not, but you can ask it nicely to
Apr 26 19:24:50 <bou4> what does this require, as I am not aware of alignment-specifiers
Apr 26 19:25:14 <zmatt> float data[N] alignas(16);
Apr 26 19:25:33 <zmatt> (or __attribute__((aligned(16)))    but using the C++ alignas is nicer)
Apr 26 19:26:55 <zmatt> and for dynamically allocated memory there's aligned_alloc()
Apr 26 19:29:09 <bou4> Thanksss
Apr 26 19:46:02 <bou4> src/arm_neon_complex.s:25: Error: bad alignment -- `vld2.32 {d16-d19},[r1,:16]!'
Apr 26 19:48:43 <zmatt> bits, not bytes
Apr 26 19:48:48 <zmatt> so :128
Apr 26 19:51:47 <bou4> thankss
Apr 26 19:55:48 <zmatt> in case it's useful, here are my cryptic notes on cortex-a8 instruction timing: https://pastebin.com/raw/k69CEVbK
Apr 26 19:58:02 <zmatt> (note: unless specified otherwise, the timings for "vadd" and such are referring to integer.  timings for vector float ops are further down)
Apr 26 20:01:03 <zmatt> these are based on testing, since the timings in the cortex-a8 TRM are known to be wrong
Apr 26 20:01:49 <zmatt> (this observation, along with actual integer timings are originally from http://www.avison.me.uk/ben/programming/cortex-a8.html )
Apr 26 20:02:05 <zmatt> (although I use a different notation I consider to be more convenient)
Apr 26 20:17:07 <bou4> we timed it
Apr 26 20:17:11 <bou4> the different sizes
Apr 26 20:17:32 <bou4> and we notice no difference when timing 1024 * 256 multiplications
Apr 26 20:17:46 <zmatt> with alignment you mean?
Apr 26 20:17:54 <bou4> yes
Apr 26 20:20:16 <zmatt> hmm, it's possible this function is completely bottlenecked by floating-point operations and improving the load/store time has no effect as a result, I didn't analyze it :P
Apr 26 20:20:28 <zmatt> how many cycles are you getting per loop iteration?
Apr 26 20:23:41 <zmatt> 24 cycles/iteration would be the best possible result
Apr 26 20:32:45 <bou4> how would i accurately measure cycles?
Apr 26 20:34:00 <zmatt> sudo perf stat -e cycles ./command
Apr 26 20:34:20 <zmatt> (I'm not sure if sudo is needed, I think it was for some uses of perf but not all)
Apr 26 20:35:09 <zmatt> in the command call your routine a certain number of times with the same arguments
Apr 26 20:36:25 <zmatt> compare the cycle timings for e.g. 1000 iterations of that loop vs 2000 iterations, take the difference and divide by (2000-1000)
Apr 26 20:36:35 <zmatt> that's the most reliable way I currently have
Apr 26 20:37:15 <bou4> thankssss, i'll try
Apr 26 20:37:39 <zmatt> I still want to put some effort into getting ETM to work to be able to obtain a detailed cycle-accurate instruction trace from the processor
Apr 26 20:38:05 <zmatt> some day
Apr 26 20:51:44 <bou4> ola
Apr 26 20:51:49 <bou4> big difference in cycles
Apr 26 20:52:00 <zmatt> between?
Apr 26 20:52:09 <bou4> w/o alignment
Apr 26 20:53:03 <zmatt> oh now you're seeing a big difference? because you're operating repeatedly on the same block of data maybe? (i.e. always hits L1 cache)
Apr 26 20:53:13 <zmatt> eh, cache, not L1 cache... probably
Apr 26 20:53:15 <zmatt> maybe
Apr 26 20:53:15 <bou4> 85120
Apr 26 20:53:36 <zmatt> (depends on the prior history of the memory that was accessed... neon doesn't allocate in L1 but it does hit in L1)
Apr 26 20:53:49 <bou4> https://imgur.com/a/7o2qLDs
Apr 26 20:55:02 <zmatt> how big were the argument vectors in this test?
Apr 26 20:55:03 <bou4> ok, i calculated it
Apr 26 20:55:07 <bou4> not so much of a difference
Apr 26 20:55:17 <bou4> 209 vs 206 cycles per iteration
Apr 26 20:56:35 <zmatt> holy shit that's slow... but it looks like you used way too large vectors
Apr 26 20:56:55 <bou4> 256*1024 was the size of the vectors
Apr 26 20:57:01 <bou4> the maximum before it segfaults :p
Apr 26 20:57:59 <bou4> https://imgur.com/a/Kb2CWqK
Apr 26 20:58:21 <zmatt> in other words you're testing DDR3 performance rather than testing the performance of your algorithm :P
Apr 26 20:59:09 <zmatt> use much smaller vectors if you want an actual test
Apr 26 21:00:46 <zmatt> no need to initialize the arrays btw
Apr 26 21:02:39 <bou4> you really like the BBB it seems
Apr 26 21:04:22 <bou4> and I very much appreciate your help
Apr 26 21:04:36 <bou4> I wouldn't be able to do this without the community surrounding the BB
Apr 26 21:04:56 <bou4> and definitely not without you
Apr 26 21:05:03 <zmatt> I do like it.  I also just know a lot about it since I've been working with them for years
Apr 26 21:05:06 <zmatt> thank you :)
Apr 26 21:05:30 <bou4> At the moment it's just a university project
Apr 26 21:05:42 <bou4> But I will probably maintain it further
Apr 26 21:05:59 <bou4> As the topic of DSP on the BBB is very interesting
Apr 26 21:06:20 <bou4> Maybe one day it will get recognized :p
Apr 26 21:06:42 <zmatt> the neon unit of the A8 is not bad if you optimize code for it properly
Apr 26 21:08:36 <bou4> we redid the measurement
Apr 26 21:08:45 <bou4> 40 cycles per iteration vs 48
Apr 26 21:10:46 <zmatt> interesting, that's actually a significantly bigger difference than I hypothesized
Apr 26 21:12:09 <bou4> https://imgur.com/a/M0oyjFr
Apr 26 21:12:11 <bou4> the data
Apr 26 21:12:23 <bou4> sorry for the dutch :p
Apr 26 21:12:52 <zmatt> can you try: sudo perf stat -e cycles,r4d,r4e,r4f
Apr 26 21:13:06 <zmatt> no worries
Apr 26 21:16:00 <bou4> will do
Apr 26 21:16:14 <bou4> are you an EE?
Apr 26 21:16:38 <zmatt> nope
Apr 26 21:16:47 <bou4> CS?
Apr 26 21:16:59 <zmatt> background in mathematics and computer science
Apr 26 21:17:12 <bou4> huge background it seems
Apr 26 21:17:29 <zmatt> well I didn't learn this stuff at university
Apr 26 21:17:30 <zmatt> :)
Apr 26 21:17:58 <zmatt> here's a list of cortex-a8 PMU events btw: https://pastebin.com/raw/X0KUfifB
Apr 26 21:19:02 <zmatt> 4e-4f is to confirm it's not missing in l2 and 4d is because I'm curious whether it might still be hitting in L1
Apr 26 21:19:36 <zmatt> I still don't know what the performance implications are for neon if it hits in L1... there's presumably a good reason why they don't allocate in L1 for neon
Apr 26 21:20:19 <bou4> zmatt: https://imgur.com/a/WCDW01r
Apr 26 21:23:18 <zmatt> hmm, so it not missing L2, but it *is* hitting L1
Apr 26 21:25:02 <zmatt> it sucks that the exact behaviour and timings of the caches are not really documented adequately
Apr 26 21:26:50 <zmatt> but anyway, if this function is currently taking only 4% of your computation time, I guess it doesn't make sense to optimize it excessively :)
Apr 26 21:28:25 <bou4> Where can i find documentation about r4d,r4e and r4f?
Apr 26 21:28:28 <bou4> It is not in the man page
Apr 26 21:28:52 <zmatt> do be mindful of the L2 cache size (256 KB).  as you saw earlier, once the data set you're working with exceeds that, all performance goes down the drain
Apr 26 21:29:13 <zmatt> they're raw pmu event numbers, that's why I pastebinned a list of cortex-a8 pmu events
Apr 26 21:30:25 <zmatt> perf can record up to four different pmu events in addition to the number of cycles
Apr 26 21:31:34 <zmatt> ultimately this list comes from the Cortex-A8 Technical Reference Manual
**** ENDING LOGGING AT Fri Apr 27 03:00:02 2018