**** BEGIN LOGGING AT Sat Nov 09 02:59:58 2013 Nov 09 09:18:51 build #7 of mvebu is complete: Failure [failed compile_4] Build details are at http://buildbot.openwrt.org:8010/builders/mvebu/builds/7 Nov 09 10:02:54 <_trine> I have been trying for more than a week now, compiling Trunk for my dockstar in different ways but it seems impossible to get a good working version. I understand ths is a broad statement however it does represent the current state of trunk for the dockstar. I have now had to revert to an older version which as far as I can tell works perfectly well. Work needs to be done to iron out the faults which appear to have crept i Nov 09 10:02:54 <_trine> nto the current Trunk for at least the kirkwood Marvel platform. Nov 09 11:55:11 cyrus r38696 packages/lang/luasocket/ patches/0001-Add-interface-support.patch patches Makefile * luasocket: update to 3.0-rc1 + add interface support Nov 09 15:55:51 tripolar r38697 packages/net/ git/Makefile git/patches/100-convert_builtin.patch * [packages] git: update to 1.8.4.3 Nov 09 16:48:56 DonkeyHotei: i just disabled PPP support and i think that stopped the memory leak Nov 09 16:51:41 disabled it in the kernel? Nov 09 16:53:07 DonkeyHotei: yeah i dropped kmod-ppp and the userspace ppp tools Nov 09 16:53:32 userspace tools shouldn't affect it Nov 09 16:53:50 they probably don't even work without the kmod-ppp Nov 09 16:54:04 of course not Nov 09 16:56:14 you know what, i left the kmod in and stopped the leak anyway, just by turning off the default pppoe in /etc/config/network Nov 09 16:57:29 DonkeyHotei: on yeah i see it, it has pppoe with username foo and bar Nov 09 16:58:27 DonkeyHotei: perhaps there is a mechanism that tries to establish a connection with those ppp details and every time it does so the ppp kernel module allocates some memory without freeing it Nov 09 16:58:37 likely Nov 09 16:59:25 but if that's in kmod-ppp it should also be present on devices with no dsl hardware Nov 09 16:59:35 however Nov 09 16:59:55 br2684ctl was a factor Nov 09 17:00:00 indeed it was Nov 09 17:03:12 i think we have enough for a useful bug report already Nov 09 17:04:35 when i tried openwrt on the actiontec pk5000, i don't remember this happening, so i'm still thinking it's the dsl driver Nov 09 17:07:13 the real test would be if we could somehow run br2684ctl without dsl hardware Nov 09 17:08:09 a dummy atm device Nov 09 17:09:57 on the ATM on Linux howto it says "If you have no real ATM hardware, you can still exercise the API by using the ATM over TCP ``driver''. It emulates ATM devices which are directly wired to remote devices (i.e. there is no VPI/VCI swapping)." Nov 09 17:23:06 JyZyXEL: do you have a non-lantiq device you can try that on? Nov 09 17:23:23 no :( Nov 09 17:26:14 i have a fon2201 i could use Nov 09 17:31:29 root@OpenWrt:/# atmtcp create Nov 09 17:31:30 ioctl ATMTCP_CREATE: Bad address Nov 09 17:31:32 root@OpenWrt:/# atmtcp virtual listen Nov 09 17:31:34 ioctl SIOCSIFATMTCP: Bad address Nov 09 17:32:39 im not sure why this is not working Nov 09 17:37:55 does it need kernel support? Nov 09 17:38:18 DonkeyHotei: yeah i just noticed that it does Nov 09 17:38:26 kmod-atmtcp Nov 09 17:38:32 what kmod is it? Nov 09 17:38:34 ok Nov 09 17:56:01 JyZyXEL: are you doing it? Nov 09 17:57:21 yeah working on it Nov 09 17:57:40 so should i put the fon2201 away? Nov 09 17:58:37 sure, its always better to have more results Nov 09 17:58:54 if you build without the dsl driver and use atmtcp over loopback to run br2684ctl, i don't see the point Nov 09 17:59:39 so far i've just gotten br2684ctl to freeze on me :) Nov 09 18:03:27 doing atmtcp bg virtual listen; br2684ctl -b -c 0 -e 0 -p 1 -a 0.1.32 Nov 09 18:03:49 br2684ctl locks up for good Nov 09 18:04:36 what's the -b? Nov 09 18:04:42 background Nov 09 18:11:28 hmm look like if i do "atmtcp create; atmtcp virtual" and then "br2684ctl -b -c 0 -e 0 -p 1 -a 1.1.32" it works Nov 09 18:15:54 i added config atm-bridge 'atm' and the 'wan' pppoe stuff in /etc/config/network and did a /etc/init.d/network restart Nov 09 18:16:07 should be virtual connect, no? Nov 09 18:16:58 ? Nov 09 18:17:41 "atmtcp create; atmtcp virtual connect" Nov 09 18:18:22 hmm, well i'm not sure what the best way is, but what i got seems to be working Nov 09 18:19:00 sorry, "atmtcp create; atmtcp virtual connect 127.0.0.1" Nov 09 18:19:11 root@OpenWrt:/# cat /proc/net/atm/br2684 Nov 09 18:19:12 dev nas0: num=4, mac=f8:1a:67:d8:b1:09 (set) Nov 09 18:19:15 root@OpenWrt:/# cat /proc/net/atm/devices Nov 09 18:19:17 Itf Type ESI/"MAC"addr AAL(TX,err,RX,err,drop) ... [refcnt] Nov 09 18:19:19 0 atmtcp 000000000000 0 ( 0 0 0 0 0 ) 5 ( 0 0 0 0 0 ) [1] Nov 09 18:19:21 1 atmtcp 000000000000 0 ( 0 0 0 0 0 ) 5 ( 86 0 0 0 0 ) [2] Nov 09 18:19:40 and in ps: 1025 root 872 S br2684ctl -b -c 0 -e 0 -p 1 -a 1.1.32 Nov 09 18:21:11 ill leave it be for a while Nov 09 18:31:18 DonkeyHotei: seems that slab is accumulating again :) Nov 09 18:31:55 same rate? Nov 09 18:32:50 i'll have to observe longer to get a sense of the rate Nov 09 18:33:19 the increases seem to be tied with the /usr/sbin/pppd Nov 09 18:33:51 it keeps exiting and getting started over and over again Nov 09 18:34:29 was that happening before? Nov 09 18:34:54 sure, thats how it works Nov 09 18:35:23 ok, observe the slab rate Nov 09 18:35:33 its set to try once, then exit and then loop Nov 09 18:36:35 you left out the dsl driver, correct? Nov 09 18:36:41 DonkeyHotei: thats right, i did Nov 09 18:36:52 and the ltq-atm driver Nov 09 18:37:28 it would seem there is a bug in one of the ppp kernel modules Nov 09 18:37:53 see if the slab rate is the same Nov 09 18:41:03 DonkeyHotei: here are my last nights results with the default config: http://pastebin.com/raw.php?i=jkMb1aQi Nov 09 19:09:51 DonkeyHotei: rate is same Nov 09 19:10:39 next, try ripping out all atm stuff and putting pppoe on an eth port Nov 09 19:11:58 good idea Nov 09 19:29:11 hauke r38698 trunk/target/linux/ generic/files/drivers/net/phy/adm6996.h generic/files/include/linux/platform_data/adm6996-gpio.h generic/files/drivers/net/phy/adm6996.c * kernel: adm6996: add support for ADM6996L and GPIO interface Nov 09 19:29:43 build #402 of rb532 is complete: Failure [failed shell_12] Build details are at http://buildbot.openwrt.org:8010/builders/rb532/builds/402 Nov 09 19:31:29 hauke r38699 trunk/target/linux/ brcm47xx/patches-3.10/209-b44-register-adm-switch.patch brcm47xx/patches-3.10/210-b44_phy_fix.patch brcm47xx/config-3.10 * brcm47xx: register ADM6996L switch Nov 09 19:32:07 build #402 of ppc44x is complete: Failure [failed shell_12] Build details are at http://buildbot.openwrt.org:8010/builders/ppc44x/builds/402 Nov 09 20:09:52 hauke r38700 trunk/ target/linux/brcm47xx/patches-3.10/075-MIPS-BCM47XX-fix-detection-of-some-boards.patch package/kernel/broadcom-diag/src/diag.c * brcm47xx: add detection for Linksys WRT54GS V1 Nov 09 20:11:05 hauke r38701 trunk/package/kernel/broadcom-diag/src/diag.c * broadcom-diag: fix sys button on Asus rt-n16 Nov 09 20:58:04 JyZyXEL: did you get a chance to try it? Nov 09 20:58:48 DonkeyHotei: yeah, i disabled atm and pppoa and put the pppoe on eth0.1 Nov 09 20:59:08 slab keeps increasing Nov 09 20:59:20 same rate? Nov 09 20:59:27 haven't measured rate yet Nov 09 21:02:15 ill do it now for 30 minutes Nov 09 21:03:11 one thing i would like to try would be to speed up the pppd so that it loops faster Nov 09 21:04:15 don't Nov 09 21:04:33 it would probably make the leak faster Nov 09 21:08:39 build #374 of sibyte is complete: Failure [failed shell_12] Build details are at http://buildbot.openwrt.org:8010/builders/sibyte/builds/374 Nov 09 21:18:37 time to add some debugging to the kernel Nov 09 21:20:09 :q Nov 09 21:20:09 l Nov 09 21:20:13 argh Nov 09 21:28:47 -EWIN :) Nov 09 21:33:42 JyZyXEL: was the rate any different? Nov 09 21:34:07 nope Nov 09 21:34:11 still the same Nov 09 21:34:49 so we know it's a pppoe issue, potentially affecting EVERY owrt device Nov 09 21:35:33 yeah Nov 09 22:22:59 build #437 of orion is complete: Failure [failed compile_8] Build details are at http://buildbot.openwrt.org:8010/builders/orion/builds/437 Nov 09 22:38:19 it doesn't looks like kmemleak is working Nov 09 22:39:53 i don't get the /sys/kernel/debug/kmemleak Nov 09 22:45:45 JyZyXEL: have you activated it when building the kernel? Nov 09 22:46:10 Hauke: yes Nov 09 22:46:38 the last time I checked it worked for we with OpenWrt Nov 09 22:46:49 perhaps it doesn't work for MIPS34k Nov 09 22:47:19 I used it under mips32r1 (bcm47xx) Nov 09 22:51:25 [ 0.000000] kmemleak: Kernel memory leak detector disabled Nov 09 22:51:27 [ 0.000000] kmemleak: Early log buffer exceeded (463), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE Nov 09 22:53:31 looks like when it gets disabled its irreversible Nov 09 23:37:17 looks like the culprit is insmod Nov 09 23:37:46 its doing kmallocs whenever the pppd does its stuff Nov 09 23:40:38 the commandline is Nov 09 23:40:49 : /sbin/insmod pppox Nov 09 23:49:37 DonkeyHotei: execute this a few times: for i in `seq 100`; do /sbin/insmod pppox; done; free Nov 09 23:50:06 pppox? Nov 09 23:50:15 its the ppp helper Nov 09 23:51:24 for some reason what ever system owrt has in place for pppd, that command gets executed every time it tries to establish a ppp connection Nov 09 23:51:50 insmod: can't insert 'pppox': File exists Nov 09 23:52:18 you should get that 100 times, read how much free space you have and then execute it again :p Nov 09 23:54:47 free space increases Nov 09 23:57:23 for me the free space decreases Nov 09 23:58:00 you can't insmod something that's already insmodded Nov 09 23:58:50 true, but at least on my platform every time you try it, it consumes memory that you do not get back Nov 09 23:59:30 i'm doing this on the fonera+ Nov 09 23:59:44 try it on your ar9? Nov 10 00:00:10 JyZyXEL: you are looking at the +/- buffers/cache line, right? Nov 10 00:00:16 lemme replug the serial console Nov 10 00:00:34 zinx: yes Nov 10 00:01:19 cat /proc/meminfo | grep Slab would also work Nov 10 00:01:48 because insmod is causing kmallocs that do not get kfree'd Nov 10 00:05:05 tried it on ar9, still decreases after a while Nov 10 00:06:26 DonkeyHotei: which value are you talking about? Nov 10 00:09:34 nvm you were right, on lantiq it behaves as you say Nov 10 00:09:43 but not on atheros Nov 10 00:10:22 phew :) Nov 10 00:12:00 while true; do for i in `seq 100`; do /sbin/insmod pppox 2> /dev/null; done; free; done Nov 10 00:12:13 for more dramatic effect (press ^c to stop) :) Nov 10 00:12:20 no need Nov 10 00:12:23 hehe Nov 10 00:12:36 where is your memory now! Nov 10 00:13:17 but why is the memory eaten on lantiq but not on atheros? they're both mips Nov 10 00:14:16 perhaps because lantiq has so much of their own stuff in the kernel? Nov 10 00:14:49 that's not meaningful Nov 10 00:15:26 do they use a different kernel version? Nov 10 00:16:02 at this point there might be enough information for blogic to get interested Nov 10 00:16:13 3.8.13 on atheros Nov 10 00:16:49 Linux version 3.10.18 (fld@HA) (gcc version 4.6.4 (OpenWrt/Linaro GCC 4.6-2013.05 r38578) ) #15 Sat Nov 9 01:17:51 EET 2013 Nov 10 00:16:50 Linux version 3.10.18 (fld@HA) (gcc version 4.6.4 (OpenWrt/Linaro GCC 4.6-2013.05 r38578) ) #15 Sat Nov 9 01:17:51 EET 2013 Nov 10 00:16:52 SoC: VR9 rev 1.2 Nov 10 00:17:06 CPU revision is: 00019556 (MIPS 34Kc) Nov 10 00:18:20 cpu model : MIPS 4KEc V6.4 Nov 10 00:18:21 B Nov 10 00:19:32 cpu model : MIPS 34Kc V4.12 Nov 10 00:19:37 that's ar9 Nov 10 00:23:53 found the script that does the insmod: /lib/netifd/proto/ppp.sh Nov 10 00:24:57 we need to find out why insmod eats the memory and fix it, not just mask the issue by removing insmod Nov 10 00:25:10 that is true Nov 10 00:25:56 kernel boog Nov 10 00:27:25 doesn't seem to matter what you try to insmod Nov 10 00:27:40 interesting Nov 10 00:28:22 nbd: are you by any chance still up? Nov 10 00:31:04 DonkeyHotei: check out /sbin/insmod ipv6 i think it consumes even more memory Nov 10 00:31:46 is this memory freed after you unload the module? Nov 10 00:34:00 it is not Nov 10 00:35:24 looks like /sbin/insmod mac80211 takes a nice 100KiB chunk every time you execute it :p Nov 10 00:44:13 JyZyXEL: looks like i misinterpreted the output on atheros: Nov 10 00:44:43 total used free shared buffers Nov 10 00:44:43 Mem: 13128 12380 748 0 508 Nov 10 00:44:43 -/+ buffers: 11872 1256 Nov 10 00:44:43 Swap: 0 0 0 Nov 10 00:44:43 total used free shared buffers Nov 10 00:44:44 Mem: 13128 12316 812 0 416 Nov 10 00:44:46 -/+ buffers: 11900 1228 Nov 10 00:44:48 Swap: 0 0 0 Nov 10 00:45:24 the buffers line was what i should have looked at Nov 10 01:08:52 this does not appear to happen on amd64, so you might ask in #mipslinux Nov 10 01:21:07 build #415 of uml is complete: Failure [failed compile_5] Build details are at http://buildbot.openwrt.org:8010/builders/uml/builds/415 **** ENDING LOGGING AT Sun Nov 10 02:59:58 2013