**** BEGIN LOGGING AT Sat Aug 27 02:59:58 2016 Aug 27 08:33:54 hello all. anyone having a BBB can lanuch a 2-minutes Iperf3 test for me? I'd like to compare my results with some other different environment but same parameters. Test is performed using physical ethernet device (not over usb) and parameters are the following: http://pastebin.com/uimGt3yM. Having a BBB directly connected to the PC would be better Aug 27 08:33:54 (no switches). Thank you very much Aug 27 09:00:45 alfatau: what kind of differences do you expect? Aug 27 09:07:14 KotH: I don't know exactly. Right now in my test environment i have nearly 99% of packet loss and the bandwidth is less than 2Mbit/sec. I'm running kernel 4.1.25-bone-rt and I would like to see if everyone has nearly the same performances. Running that test between standard (quite old) PCs I obtain 24Mbit bandwith and 0% packet loss. Aug 27 09:10:09 sounds like a driver issue or overwhelming the bbb Aug 27 09:10:28 instrument the kernel and the network directly to see what's going on Aug 27 09:10:48 also make sure you dont have any cpu hugging processes running on the bbb Aug 27 09:15:22 KotH: yes it could. I already tried to analyze dma/driver/network stack. The only setting that gives some improvement is on driver side: ethtool -C eth0 rx-usecs 250/500. Anyway performance seems too bad. Aug 27 09:15:54 KotH: of course I've no processes other than iperf that uses bbb cpu. Aug 27 09:18:20 KotH: i asked for a "community" test to see if someone experiences better performances than mine and then get what are differences in dist/kernel/test-env Aug 27 09:18:49 i think you dont understand how the "community" works Aug 27 09:20:33 KotH: ... Don't worry, I know :) Anyway my test takes much less time than trying to debug or replicate problems and discuss solutions Aug 27 09:22:11 KotH: if others have same performances, i can immediately exclude a general hw issue. Aug 27 09:22:29 hw/drv Aug 27 09:24:00 as i said, mostlikely driver issue (ie kernel bug) or something hogging your cpu Aug 27 10:40:35 alfatau: do you really need -rt ? Aug 27 10:41:55 since rt kernels generally a considerably less efficient Aug 27 10:45:32 and consider testing at least the latest 4.4 (currently 4.4.19-bone13, or if you really need it 4.4.19-bone-rt-r13 ) Aug 27 10:45:41 anyway, bbl Aug 27 10:56:38 zmatt: well since some muppet made -rt kernels the default .. *mutters* Aug 27 11:25:26 again? Aug 27 11:25:44 I thought that was just a very brief lapse of judgement a decent while ago Aug 27 11:27:04 to be fair, I'm entirely willing to believe the cpsw driver might very well be crap Aug 27 11:27:29 iirc there were also more concrete problems with certain kernels but I thought rcn had fixed the worst of 'em Aug 27 11:29:29 so basically you're reverting https://git.kernel.org/cgit/linux/kernel/git/tmlind/linux-omap.git/commit/drivers/gpu/drm/omapdrm/dss?h=omap-for-v4.8/fixes&id=2ab9f5879162499e1c4e48613287e3f59e593c4f Aug 27 11:29:34 woops, ww Aug 27 11:31:37 -shrug- Aug 27 12:08:30 zmatt: hi, I need an rt kernel as I'm working on a low-latency udp2spi server. Anyway, I also need PRU and pruss_uio seems not loadable without an rt kernel (tried with 4.1.24 and 4.1.25). Aug 27 12:12:08 zmatt: why do you think rt kernels are less efficient? my target is to receive with loss < 0.1% around 100K udp packets/sec where each payload is 24-48 bytes (66/90 bytes on network interface). Aug 27 12:13:43 pruss_uio definitely doesn't require rt Aug 27 12:15:02 low-latency udp2spi, what do you mean with "low latency" and how have you configured your real-time priorities so far? Aug 27 12:16:24 rt kernels turn all irq handlers into kernel threads and all "spinlocks" (which on a single-core processor like the Cortex-A8 normally just translate to "disable irqs") into priority-inheritance mutexes Aug 27 12:16:35 this has significant overhead, obviously Aug 27 12:19:52 there's no way to batch stuff together in larger packets to reduce the packet rate? sending a flood of minimum-size ethernet packets is pretty much the worst-case load on the ethernet driver probably Aug 27 12:20:12 zmatt: I'll try again with non-rt kernel. "low latency" means that time between udp recv and spi send (for each packet) must be less than 80us. to reduce cpu load waiting spi "EOT"s each packet is enqueued into a shared queue and PRU is polling EOT on SPI. So CPU only has to recv udp packet, extract payload and enqueue into the shared queue as fast Aug 27 12:20:12 as possibile. Aug 27 12:21:32 is it an option to give your packets an 803.1Q priority tag? Aug 27 12:21:39 802.1Q Aug 27 12:22:09 zmatt: ah, i configured my realtime priority using "chrt 1 server" Aug 27 12:23:20 zmatt: this enables SCHED_RR Aug 27 12:24:30 ok at least you're aware using rt requires actually *doing* something with scheduling classes and priorities... some people seem to think it's some kind of magic sauce you just pour over a system Aug 27 12:26:11 is it an option to give your packets an 802.1Q priority tag? or a vlan tag, or anything else that CPSW can match on to redirect the packets into a separate receive fifo Aug 27 12:26:32 then pru could receive those without linux ever being involved even Aug 27 12:26:45 while linux still has normal working networking Aug 27 12:29:41 zmatt: ok. however my CPU is not full (since I don't receive enough packages to process so I'm waiting on pselect). ethtool -S gives no DMA overrun. ifconfig gives no tcp/ip stack overrun. playing with socket buffers does not improve receive rate. it's definitively a dilemma :( Aug 27 12:29:58 btw, 100K packets/s ... that's one every 10us. your 80us latency limit means it can't really do more batching than 4 packets per irq or so Aug 27 12:30:11 hmm, that's a bit odd Aug 27 12:31:00 basically having more than 8 receive buffers in circulation at any time would be pointless Aug 27 12:32:06 zmatt: uhm... Aug 27 12:32:15 when I did measurements of latency of a GPIO irq being delivered to userspace (via uio_pdrv_genirq) a while back I got 44-88us with some outliers on a non-rt kernel, not much less with an rt kernel (though the outliers seemed gone) Aug 27 12:32:59 so I'm a bit pessimistic about your 80us requirement Aug 27 12:33:38 since the networking stack is *definitely* not tuned for such low latency Aug 27 12:34:33 hence my suggestion to try to segregate the traffic into a separate receive fifo and handle it outside of linux' knowledge Aug 27 12:34:39 zmatt: that's my real fear :( Aug 27 12:35:21 zmatt: so do you suggest to handle everything using prus? Aug 27 12:36:12 zmatt: let's assume that I can instrument cpsw driver to redirect data onto another queue. Aug 27 12:36:22 no, not the driver Aug 27 12:36:26 the hardware subsystem Aug 27 12:37:53 zmatt: ok. so it's not needed to tag packets with a priority tag. Aug 27 12:38:13 there's an embedded managed switch with vlan and QoS support inside the am335x... unfortunately no way to use the second external ethernet port on a beaglebone, but you can still use it for filtering, prioritization, and segregated delivery Aug 27 12:38:49 well, I'd have to check what your alternatives are Aug 27 12:39:06 priority tag I know for sure it can use Aug 27 12:39:59 iirc it also supports the IPv4 differentiated services field Aug 27 12:40:26 zmatt: i'm looking at the BBB datasheet. it's very interesting the embedded managed switch... Aug 27 12:40:52 zmatt: let me check the features Aug 27 12:41:21 so the basic idea is: cpdma has 8 rx queues, linux afaik only uses one Aug 27 12:43:34 bizarrely, cpsw actually has infrastructure for delivering separate irqs to up to 3 cores, but only the core 0 irqs are routed to various places such as the cortex-a8 and pruss, and the core 1 and core 2 irqs are left not-connected Aug 27 12:44:11 i.e. pru won't be able to get an irq (without conflicting with linux), but really at this packet rate polling shouldn't be an issue Aug 27 12:45:08 btw, I'm assuming you've already considered whether there's no alternative to using udp for something like this? :/ Aug 27 12:46:10 actually, your network had better be completely isolated since any traffic by any other party could already demolish your latency Aug 27 12:46:50 which also means linux won't have any use for the network interface anyway, so you could also just make it exclusive for PRU Aug 27 12:47:25 zmatt: well, not really... afaik udp ensures lowest latency due to minimal header-overhead Aug 27 12:47:52 that doesn't help you if there's some fatso 1500-byte packet sent by someone else ahead of it in the transmit queue of the switch Aug 27 12:48:34 that's 120 us Aug 27 12:48:39 boom Aug 27 12:49:07 I actually realized I may have mixed up bits and bytes earlier, lemme recheck whether the situation is even worse than I imagined Aug 27 12:50:08 zmatt: ok. but the real throughput of NIC should be 100Mbps, while at that rate I'll be using only around 30Mbps Aug 27 12:51:14 zmatt: this lets me think that the latency due to bigger packet will be absorbed by "empty" network device Aug 27 12:53:18 minimum packet size + IPG is 672 bit-times = 6.72 us Aug 27 12:53:23 you want to send a packet every 10 us Aug 27 12:53:31 that's way more than 30% utilization Aug 27 12:54:03 and I have no idea what you meant by your last statement Aug 27 12:56:50 even if you use managed switches able to strictly honor priority tagging, if there's an 1KB or larger packet already in the transmit queue to the BBB then it's game over, 80us window is already gone Aug 27 12:59:05 hence you'll have to use an isolated or at least very strictly regulated network Aug 27 12:59:44 zmatt: ok, now I'm confused too :( to recap: I need to receive and enqueue for pru 100K packets/sec. each packet has 24 bytes payload. The latency between udp recv and data enqueue must be less than 80/90 usec. This is only related to the time a packet needs to pass through the BBB CPU. I've nearly 400/500usec from when the packet is received by th Aug 27 12:59:44 e BBB and when the SPI trasfer has to be completed. Aug 27 13:00:49 ok I assumed latency = latency, why would it matter *where* the latency is accumulated? Aug 27 13:03:14 zmatt: because there are many elements involved in the project each introducing a different latency. Aug 27 13:05:33 but you understand that if anyone sends a large packet to the BBB you're basically screwed since it would add 80us or more additional latency, which means the BBB has 80us less time to process it Aug 27 13:05:38 (to meet total latency) Aug 27 13:05:39 zmatt: my target is to keep small the BBB "element" latency but others are working on the other component and the real latency of each component cannot actually be measured. We have some requirements but they can change. Aug 27 13:07:56 zmatt: yes I understand. but even if that packet is sent to a kernel queue, then my application won't process it. Furthermore, the system does not have any other udp packet source (it's isolated, not connected to internet). Aug 27 13:08:12 the problem isn't the kernel's queue, it's the queue of the switch your BBB is connected to Aug 27 13:08:25 and not just udp, any kind of large packet Aug 27 13:08:53 any tcp connection for example will usually result in large packets Aug 27 13:09:01 zmatt: the only UDP packages I can expect have 24 bytes payload. Others are due to ssh connection or ping, arp... Aug 27 13:09:29 all of which have to go through the same ethernet connection Aug 27 13:11:56 zmatt: ok, the switch my BBB is connected to, how many packages can process per second? Aug 27 13:12:55 zmatt: should I refer to the raw ethernet bandwidth? Aug 27 13:13:11 with priority tagging and some effort you can have your udp packets be directly available to pruss without even going through any kernel queue (and you'll need something like priority tagging anyway, together with switches that understand them, to have any hope of keeping network latency reasonably bounded) Aug 27 13:13:53 the processing ability of your switch will vary greatly depending on the switch Aug 27 13:15:29 zmatt: ok, so if the 3-port switch is not able to manage 100K pps, even using PRU the problem cannot be solved... correct? Aug 27 13:16:26 I was referring to your upstream switch actually, though obviously all components along the route need to support that rate Aug 27 13:16:30 that wasn't my point though Aug 27 13:18:48 zmatt: anyway, the performance cannot be higher than the raw ethernet bandwidth. My former calculation was wrong: not 30% usage but around 50%: 100K*66bytes (24 payload + 8 (UDP) + 20 (IP) + 14 (ETH)) = 6.6Mbytes/sec that means 52.8Mbit/s. Aug 27 13:19:52 without prioritization, your urgent udp packets could easily get stuck behind other traffic in a queue in some switch anywhere along the route. even if prioritization is applied by all switches, then if a switch has already begun transmitting a large packet your urgent udp packet would still have to wait behind it Aug 27 13:20:29 so if you're going to allow any other traffic on your network it would have to be examined with extreme scrutiny Aug 27 13:21:26 and your calculation is still wrong. min ethernet packet size is 64 bytes excluding preamble and inter-packet gap Aug 27 13:21:34 preamble is 8 bytes Aug 27 13:22:53 minimum IPG is officially 96 bit-times, i.e. equivalent to 12 bytes Aug 27 13:23:10 so effective minimum packet size is 84 byte-times Aug 27 13:23:29 672 bit-times, which is a number I also quoted earlier Aug 27 13:23:57 so but utilization would be 67.2% Aug 27 13:24:05 *bus Aug 27 13:24:10 *link really, it's not a bus Aug 27 13:27:25 basically your project sounds like a nightmare, you're using a "best-effort" networking technology in ways it was never designed for Aug 27 13:36:02 maybe EtherCAT would be suitable, but the BBB doesn't support it Aug 27 13:39:15 oh actually it's worse, your packet exceeds the minimum size: max(24+8+20+18, 64)+8+12 = 90 bytes = 720 bits Aug 27 13:39:21 72% link utilization Aug 27 13:49:24 maybe if you're lucky and there's no interfering traffic on the network, handle the packets entirely in PRU (no linux involvement), and you optimize everything well... it might perhaps be possible. I'm not sure Aug 27 13:50:55 using gigabit ethernet would greatly reduce latency... the am335x supports it, but unfortunately bbb doesn't Aug 27 13:52:37 though that won't help with *processing* the packets fast enough obviously... I can't really say how much margin you have there, would require more analysis than I'm inclined to put in Aug 27 13:52:45 especially since I'm just talking to myself apparently Aug 28 01:09:52 Yes...hello. Aug 28 01:10:48 Is there any person using the BBB to make robots with the Motor Bridge Cape? Aug 28 01:20:43 Without venturing around too often and without making this a "your issue" problem, can someone point me to where I can find Flask support in freenode? Aug 28 01:28:14 In Flask, are there possibilities for me to command my application to control features on my robot? Aug 28 01:30:07 ...<<<< on break. Aug 28 01:37:55 Okay...this is my idea: Make a small, portable R/C mower with the BBB and the Motor Bridge Cape. I have not gotten into the R/C part of the controls yet but I wanted to make the R/C part of the mower computer-browser based where I could point and click to control the two electric motors. Aug 28 01:49:58 I should have learned more JS instead. Aug 28 01:53:43 Is there a way to use the BBB with Python to control robots via webserver? Aug 28 01:54:46 I know this sounds silly. I cannot find out for some reason. Aug 28 02:45:04 Okay...I am out for now. The day beat me once more. Aw! **** ENDING LOGGING AT Sun Aug 28 02:59:58 2016