**** BEGIN LOGGING AT Mon Aug 19 02:59:59 2013 Aug 19 12:35:31 cyrusff: is there a way to have odhcp6c not send a release when exiting ? Aug 19 12:35:42 because of that my PD keeps changing :( Aug 19 12:36:00 yes Aug 19 12:36:16 http://wiki.openwrt.org/doc/uci/network#protocol.dhcpv6 Aug 19 12:36:28 ah great Aug 19 12:36:34 sorry should have looked there again Aug 19 12:36:51 now I should try to get my initial PD back :P Aug 19 12:37:01 and then configure the norelease :) Aug 19 12:41:01 haha Aug 19 12:41:06 good luck Aug 19 12:41:58 crap, someone else has it :( Aug 19 12:42:10 we're getting a /56 from our ISP Aug 19 12:43:15 and I initially had the first one, so the hints I defined for the different segments in my network were the only digits in that part of the address Aug 19 13:01:46 luka r37814 trunk/package/boot/uboot-envtools/ files/ramips Makefile * [package] uboot-env: fix spurious esac within ramips uci-defaults Aug 19 14:30:57 build #323 of rb532 is complete: Failure [failed shell_12] Build details are at http://buildbot.openwrt.org:8010/builders/rb532/builds/323 Aug 19 14:33:55 build #323 of ppc44x is complete: Failure [failed shell_12] Build details are at http://buildbot.openwrt.org:8010/builders/ppc44x/builds/323 Aug 19 15:05:52 are we getting close to a 12.09.1 release with all the fancy new IPv6 stuff, or is that wishful thinking ? Aug 19 15:30:17 build #336 of uml is complete: Failure [failed compile_5] Build details are at http://buildbot.openwrt.org:8010/builders/uml/builds/336 Aug 19 15:34:38 build #306 of sibyte is complete: Failure [failed shell_12] Build details are at http://buildbot.openwrt.org:8010/builders/sibyte/builds/306 Aug 19 16:48:31 build #355 of orion is complete: Failure [failed compile_8] Build details are at http://buildbot.openwrt.org:8010/builders/orion/builds/355 Aug 19 17:45:03 build #316 of avr32 is complete: Failure [failed compile_5] Build details are at http://buildbot.openwrt.org:8010/builders/avr32/builds/316 Aug 19 18:22:02 hey blogic you around? Aug 19 18:24:23 nbd: hey Aug 19 18:24:36 nbd: i wonder if you have any ideas left Aug 19 18:27:21 zajec_: you're using windows on one side, right? Aug 19 18:29:55 nbd: that's right Aug 19 18:30:06 nbd: i was also using Windows when testing it with the original firmware Aug 19 18:31:24 blogic: in watchdog.c, wdt_frequency is too low for (at least, ppc40x book-e) which has a maximum timeout of 3. so maybe init it at 1 instead of 5 Aug 19 18:32:48 zajec_: last time i ran windows iperf, i got wildly inconsistent and bogus results from it Aug 19 18:33:02 those were somewhat mitigated by manually specifying a bigger TCP window size Aug 19 18:33:06 try that Aug 19 18:41:13 nbd: i can test that, but it's unlikely :( Aug 19 18:41:32 nbd: as I said, that was working fine when using that machines with original firmware Aug 19 18:42:19 and i also get poor performance with "iperf -s" running on OpenWrt (yeah, I know it's not the best idea for performance testing, but it also shows some problems) Aug 19 18:42:40 btw. did you verify that there are no unaligned accesses left? Aug 19 18:42:52 yes Aug 19 18:42:56 using debugfs Aug 19 18:43:15 (and analyzing code) Aug 19 18:43:56 nbd: i should still implement that paddings in SKBs... Aug 19 18:44:14 but I don't see __user_copy anymore in "perf top", when running "iperf -s" on my notebook Aug 19 18:44:24 so I guess it's not the cause of the perofmrance problems Aug 19 19:27:35 zajec_: maybe it's time to do a full pcap of tcp traffic and run tcptrace to analyze it Aug 19 19:27:46 to figure out if there's any weird latency or packet loss in there that reduces tcp traffic Aug 19 19:28:01 because it's odd that the performance is low, even though cpu load is low as well Aug 19 19:31:33 nbd: thanks for this tip! Aug 19 19:43:59 nbd: i also analyzed that padding thing you suggested Aug 19 19:44:06 nbd: sent you e-mail about that Aug 19 20:15:08 nbd: do you mean "ag71xx" by Atheros ethernet driver? Aug 19 20:15:14 yes Aug 19 20:15:41 why you guys hate so much mainlining stuff? ;) Aug 19 20:15:55 we don't Aug 19 20:16:15 is that mainlines? Aug 19 20:16:17 whoops Aug 19 20:16:20 I got fooled by Google results Aug 19 20:16:25 *mainlined Aug 19 20:16:32 it's not mainline yet Aug 19 20:16:44 it's just that nobody had time to mainline it yet Aug 19 20:16:58 mainly because there are large parts of the ar71xx platform not mainlined yet Aug 19 20:17:07 and that's being held up by the lack of device tree support Aug 19 20:17:12 which still needs to be implemented Aug 19 20:17:37 :( Aug 19 20:17:42 ok Aug 19 20:17:53 i'll stare at that ag71xx a little :) Aug 19 20:25:47 that cache magic is still not 100% clear to me Aug 19 20:25:52 so i'll have to analyze that Aug 19 20:26:36 I wonder if it's enough to use netdev_alloc_skb (without playing with NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info)) and just add build_skb usage Aug 19 20:26:52 you can't mix netdev_alloc_skb and build_skb Aug 19 20:26:57 because build_skb expects a kmalloc'd buffer Aug 19 20:27:04 not a full skb Aug 19 20:27:15 ah Aug 19 20:27:17 thanks Aug 19 20:27:43 so the cache thing is about the time that passes between the sk_buff struct being initialized and it being used Aug 19 20:28:27 if you have a ring of 256 packets, you have to receive 255 packets after netdev_alloc_skb until the skb is passed to the network stack Aug 19 20:28:35 which leaves a long time for it to be purged from the cache Aug 19 20:28:50 especially because the cache is pretty small on such routers Aug 19 20:30:34 nbd: really? why is that? I use netdev_alloc_skb very early and after receiving one packet I can already read it from the DMA mapped memory Aug 19 20:30:50 nbd: why I should wait for other 255 packets to arrive? Aug 19 20:31:10 that assumes that data is being copied Aug 19 20:31:19 which we want to avoid at all costs Aug 19 20:31:38 I don't use skb_copy_from_linear_data_offset anymore Aug 19 20:31:45 right Aug 19 20:31:59 and because of that, there's this delay of other packets being received frist Aug 19 20:32:09 look at it this way: Aug 19 20:32:25 (sorry I don't understand eveything so well :( ) Aug 19 20:32:28 in the rx path when receiving a packet, you're passing a filled skb to the network stack Aug 19 20:32:35 and filling the dma ring slot with a new skb Aug 19 20:32:41 that' right Aug 19 20:32:47 undertand it to that point :) Aug 19 20:32:50 and that new skb will only get used as soon as the hw reaches that ring slot again Aug 19 20:33:01 sure Aug 19 20:33:07 which is *after* it has filled all the other slots Aug 19 20:33:10 because it's a ring Aug 19 20:33:13 sure Aug 19 20:33:24 and that's the time inbetween that allows it to go cold in cache Aug 19 20:33:32 ok, that makes sense Aug 19 20:33:38 is your solution going to avoid that? Aug 19 20:33:42 yes Aug 19 20:33:52 because you only have kmalloc'd buffers in the ring Aug 19 20:33:53 we can't alloc buffer *after* receiving the packets... can we? ;) Aug 19 20:33:55 not sk_buff Aug 19 20:34:13 build_skb allocates a skb without the data buffer Aug 19 20:34:20 and takes the kmalloc'd pointer as its data buffer Aug 19 20:34:36 right before that skb gets passed to the network stack Aug 19 20:34:44 there's no ring of preallocated skbs anymore Aug 19 20:34:47 only preallocated data buffers Aug 19 20:34:58 sec, let me read that twice :) Aug 19 20:35:49 nbd: ok, that makes sense... ring of preallocated data buffers and not preallocated skbs Aug 19 20:35:52 that's clear to me Aug 19 20:36:04 but that data buffers can still end up in the cold cache Aug 19 20:36:12 so how does it improve anything? Aug 19 20:36:41 is this better because sk_buff specific memory is in hot cache? Aug 19 20:36:48 does it make so big difference? Aug 19 20:37:07 i thought that sk_buff struct isn't that important at all Aug 19 20:37:10 the data buffers are purged from cache already Aug 19 20:37:15 i thought it's all about data buff Aug 19 20:37:17 by the dma mapping ops Aug 19 20:37:21 ah, ok Aug 19 20:37:22 since the hw has to fill the data Aug 19 20:37:26 that's right Aug 19 20:37:27 which bypasses the cpu cache Aug 19 20:37:49 so do you think that bringing sk_buff specific data is so expensive/ Aug 19 20:37:52 build #272 of ep93xx is complete: Failure [failed compile_5] Build details are at http://buildbot.openwrt.org:8010/builders/ep93xx/builds/272 Aug 19 20:38:02 i didn't think about this Aug 19 20:38:08 I was focusinf on packet data only Aug 19 20:38:30 using build_skb vs netdev_alloc_skb is mostly optimizing for the case where the cpu is under heavy memory access pressure Aug 19 20:38:36 i can see your point now Aug 19 20:38:41 so basically it only makes a real difference on a loaded system Aug 19 20:38:53 and it won't solve the problem you're debugging at the moment Aug 19 20:39:16 nbd: thank you for being patient and explaining that to me... you definitely saved me an hour or two Aug 19 20:39:25 nbd: ouch Aug 19 20:39:35 ok, so about fixing the current issue Aug 19 20:39:37 nbd: i still hopes it can be related Aug 19 20:39:41 nbd: yeah ;) Aug 19 20:39:53 nbd: do you still suggest tcpdump + tcptrace? Aug 19 20:39:55 one possibility might be that there is an error in the way dma mapping is used Aug 19 20:40:04 could you please kill all the code completely that does the copying Aug 19 20:40:10 instead of just bypassing it Aug 19 20:40:12 for easier review Aug 19 20:40:14 sure Aug 19 20:40:21 and then send me the resulting bgmac.[ch] for review Aug 19 20:40:27 i'll see if i can spot any errors there Aug 19 20:40:31 ok! Aug 19 20:40:42 nbd: i'll take care of that tomorrow, ok? Aug 19 20:40:44 sure Aug 19 20:40:46 it's almost 11pm here Aug 19 20:40:50 same here Aug 19 20:40:51 not sure what's your timezone ;) Aug 19 20:40:53 ah Aug 19 20:40:54 ok :p Aug 19 20:40:59 i'm in germany Aug 19 20:41:26 oh :) it seems i know a lot of developer from Germ :) Aug 19 20:41:31 :) Aug 19 20:41:51 ok, I'm going to take a shower and go sleep Aug 19 20:41:54 i know the dma stuff fairly well, so if there's a simple bug there, i'll find it Aug 19 20:41:57 have to wake up tomorow somehow ;) Aug 19 20:42:07 nbd: great :) Aug 19 20:42:09 yeah, i have to get up early tomorrow as well Aug 19 20:42:28 nbd: good night then :) and thanks again for helping me with that Aug 20 00:06:41 is anyone currently building for a lantiq ar9 target? Aug 20 00:26:41 build #239 of octeon is complete: Failure [failed compile_5] Build details are at http://buildbot.openwrt.org:8010/builders/octeon/builds/239 **** ENDING LOGGING AT Tue Aug 20 02:59:59 2013