**** BEGIN LOGGING AT Tue Jan 04 02:59:58 2011 Jan 04 03:04:18 jow: getting used to gdb again, mostly Jan 04 03:37:45 jow: the problem appears to be in fchown Jan 04 03:42:27 build #52 of ramips is complete: Failure [failed compile_6] Build details are at http://tksite.gotdns.org:8010/builders/ramips/builds/52 Jan 04 03:54:48 jow: which is a syscall, which means it's a kernel problem Jan 04 03:56:23 jow: interestingly it appears to only happen with a fresh overlay. Jan 04 04:37:04 jow: the problem can be reproduced without passwd: create a new extroot and boot into it; touch test; chown 0:0 test Jan 04 04:37:05 that hangs Jan 04 05:22:32 build #57 of ppc40x is complete: Success [build successful] Build details are at http://tksite.gotdns.org:8010/builders/ppc40x/builds/57 Jan 04 06:06:05 build #59 of at91 is complete: Failure [failed compile_4] Build details are at http://tksite.gotdns.org:8010/builders/at91/builds/59 Jan 04 06:37:16 build #56 of ubicom32 is complete: Failure [failed compile_4] Build details are at http://tksite.gotdns.org:8010/builders/ubicom32/builds/56 Jan 04 07:04:51 build #59 of brcm47xx is complete: Failure [failed compile_4] Build details are at http://tksite.gotdns.org:8010/builders/brcm47xx/builds/59 Jan 04 07:28:42 gmorning Jan 04 08:09:18 build #45 of ifxmips is complete: Failure [failed compile_6] Build details are at http://tksite.gotdns.org:8010/builders/ifxmips/builds/45 Jan 04 08:39:30 xMff: ping Jan 04 08:39:59 [florian]: you around? Jan 04 08:58:21 hi Jan 04 08:58:31 I try to build the SDK of openwrt. Jan 04 08:58:59 but I found there are a lot of static path in './staging_dir/host/bin/*' Jan 04 08:59:28 if there one option configure the openwrt use the 'dynamic' path ? Jan 04 12:29:39 build #55 of orion is complete: Success [build successful] Build details are at http://tksite.gotdns.org:8010/builders/orion/builds/55 Jan 04 12:37:48 https://dev.openwrt.org/browser/trunk/target/linux/ar71xx/base-files/etc/uci-defaults/wrt160nl Jan 04 12:38:00 this file is not executable Jan 04 12:38:19 that means that it is never executed and remains on /etc/uci-defaults Jan 04 12:38:23 not sure where the problem is Jan 04 12:38:36 but if I do a clean install, it will be sitting on that folder Jan 04 12:41:46 actually the problem looks that some files at trunk/target/linux/ar71xx/base-files/etc/uci-defaults are executable and some other arent Jan 04 13:08:05 nunojpg * r24896 /packages/utils/restorefactory/Makefile: [packages] restorefactory: added default settings Jan 04 13:47:35 juhosg * r24897 /trunk/target/linux/ar71xx/base-files/etc/uci-defaults/ (dir-825 mzk-w04nu nbg460n_550n_550nh tl-wr1043nd wrt160nl): Jan 04 13:47:35 ar71xx: make uci-default scripts executable Jan 04 13:47:35 Reported-by: Nuno Gonçalves Jan 04 13:47:37 juhosg * r24898 /trunk/target/linux/ramips/files/drivers/net/ramips_esw.c: Jan 04 13:47:37 ramips: ramips_esw: fix typos Jan 04 13:47:37 Patch from #8577. Jan 04 14:23:49 jow * r24899 /packages/libs/libiconv/ (Makefile src/include/iconv.h): [packages] libiconv: declare api functions as extern "C", solves linking with C++ applications (#8529) Jan 04 14:23:52 xMff: it looks like the problem is in the kernel after all Jan 04 14:24:15 xMff: so now I'm trying to follow the code Jan 04 14:26:09 unfortunately from what I've read, there's not much in the way of interactive debugging for the kernel Jan 04 14:26:30 so passwd etc. hun at a syscall? Jan 04 14:26:34 *hung Jan 04 14:26:39 yep Jan 04 14:26:44 fchown Jan 04 14:26:49 hm Jan 04 14:27:08 I think it's a mutex problem Jan 04 14:27:17 because of mini_fo Jan 04 14:28:10 http://patchwork.ozlabs.org/patch/69070/ Jan 04 14:30:56 this one looks interesting too: https://bugzilla.kernel.org/show_bug.cgi?id=14452 Jan 04 14:31:00 hey, I actually was looking at the right part of the code....cool Jan 04 14:31:36 but is probably already fixed Jan 04 14:31:45 confused .32 with something new Jan 04 14:32:11 yeah, I'm on 2.6.35 Jan 04 14:32:17 cshore: there's kgdb (part of mainline since a few kernel versions iirc), but setting it up might be a bit of work Jan 04 14:33:31 my personal suspicion is the problem is that mini_fo_setattr call update_notify with the mutex already set Jan 04 14:34:17 because it's called by update_notify itself Jan 04 14:34:38 sorry: notify_change Jan 04 14:37:20 KanjiMonster: do you have any info on doing so? Jan 04 14:39:41 cshore: not really. Never tried to use it myself, so I can only offer some google hits regarding it (e.g. http://www.linux-mips.org/wiki/Linux/MIPS_Porting_Guide#kgdb ) Jan 04 14:52:48 #8579, yet more trac spam Jan 04 15:10:07 build #47 of ixp4xx is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/ixp4xx/builds/47 Jan 04 15:10:12 build #47 of kirkwood is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/kirkwood/builds/47 Jan 04 15:10:55 build #45 of xburst is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/xburst/builds/45 Jan 04 15:13:06 nunojpg * r24900 /packages/utils/ (restorefactory/Makefile watchcat/Makefile): [packages] restorefactory: directory fix Jan 04 15:14:36 build #56 of orion is complete: Failure [failed compile_1] Build details are at http://tksite.gotdns.org:8010/builders/orion/builds/56 Jan 04 15:21:31 build #47 of au1000 is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/au1000/builds/47 Jan 04 15:22:24 build #51 of ar71xx is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/ar71xx/builds/51 Jan 04 15:22:37 build #61 of atheros is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/atheros/builds/61 Jan 04 15:22:38 build #48 of kirkwood is complete: Failure [failed compile_2] Build details are at http://tksite.gotdns.org:8010/builders/kirkwood/builds/48 Jan 04 15:36:48 build #52 of ps3 is complete: Failure [failed compile_3] Build details are at http://tksite.gotdns.org:8010/builders/ps3/builds/52 Jan 04 16:03:50 build #60 of at91 is complete: Exception [exception failed slave lost shell_13 compile_12] Build details are at http://tksite.gotdns.org:8010/builders/at91/builds/60 Jan 04 16:03:52 build #57 of ubicom32 is complete: Exception [exception failed slave lost shell_13 compile_12] Build details are at http://tksite.gotdns.org:8010/builders/ubicom32/builds/57 Jan 04 16:03:54 build #53 of ramips is complete: Exception [exception failed slave lost shell_13 compile_12] Build details are at http://tksite.gotdns.org:8010/builders/ramips/builds/53 Jan 04 16:03:57 build #60 of brcm47xx is complete: Exception [exception failed slave lost shell_13 compile_12] Build details are at http://tksite.gotdns.org:8010/builders/brcm47xx/builds/60 Jan 04 16:24:11 cshore: have you tried lockdep? Jan 04 16:42:56 kaloz * r24901 /trunk/package/mac80211/Makefile: [package/mac80211/carl9170]: fix md5sum, use our mirror with a fixed-up filename to make sure we get the right fw version Jan 04 17:10:16 PaulFerster: no, thanks for the tip Jan 04 17:24:11 nunojpg * r24902 /packages/utils/watchcat/Makefile: [packages] watchcat: added default settings Jan 04 17:52:50 nbd: ping Jan 04 19:45:42 apparently the chown doesn't even get to notify_change, which is where I thought the problem was Jan 04 19:51:22 cshore: i hoped that with lockdep-enabled kernel you'll see the reason for lockup immediately :) Jan 04 19:52:28 I'm really amazed by the level of the debug tools offered by Linux nowadays, that's too great to be true :) All this lockdep, ftrace and other trace facilities, kgdb, etc etc. Jan 04 19:54:00 PaulFerster: it's apparently not a locking issue....lockdep at least let me know that I was looking for the wrong problem Jan 04 19:54:52 cshore: but you still seeing a "hanged" syscall, right? Jan 04 19:56:10 cshore: ftrace should be able to reveal the flow. Jan 04 19:58:51 PaulFerster: yeah, I'm trying figure out where inside the syscall the problem is...how do you enable ftrace? (I've been doing printk's) Jan 04 20:00:04 nvm, found it Jan 04 20:06:58 anyone else seeing problems with e2fsprogs breaking because it wants libuuid? the strange thing is that libuuid is a subtree of the e2fsprogs project, so I don't understand why that would break. Jan 04 20:54:47 hmmm....it is notify_change that's the problem....apparently before where I was checking Jan 04 21:16:05 nunojpg * r24903 /packages/utils/watchcat/files/initd_watchcat: [packages] watchcat: changed initd sequence Jan 04 21:16:44 nunojpg * r24904 /packages/net/sshtunnel/files/initd_sshtunnel: [packages] sshtunnel: changed default retry time Jan 05 01:04:35 build #49 of x86 is complete: Failure [failed compile_6] Build details are at http://tksite.gotdns.org:8010/builders/x86/builds/49 Jan 05 02:22:23 error is in security_inode_need_killpriv Jan 05 02:23:21 you tracked it down? Jan 05 02:23:43 That's as far as I've got so far (that's where it hangs) Jan 05 02:24:00 in notify_change when it calls that function Jan 05 02:26:57 odd, shouldn't that be a no-op if selinux etc. are not enabled? Jan 05 02:29:38 yeah Jan 05 02:29:54 but that's where it hangs Jan 05 02:30:12 I can't even a place in 2.6.36 where it is set to something Jan 05 02:30:15 +find Jan 05 02:31:57 ah nvm Jan 05 02:32:01 include/linux/security.h Jan 05 02:32:06 is xattr stuff enabled in the kernel? Jan 05 02:32:18 not unless it is by default Jan 05 02:32:41 seems to boild down to cap_inode_need_killpriv in security/commoncap.c Jan 05 02:33:25 maybe mini_fo needs an getxattr stub? Jan 05 02:34:56 no actually it has Jan 05 02:35:16 it does another mutex lock in there if kernel > 2.6.16 Jan 05 02:37:41 hmmmI think that second mutex_lock is wrong...probably should be an unlock Jan 05 02:39:24 oops indeed Jan 05 02:39:35 wtf Jan 05 02:40:08 heh Jan 05 02:42:38 building Jan 05 02:42:58 now checked the default configs Jan 05 02:43:18 2.6.35 and 2.6.37 are lacking target/linux/generic/config-2.6.36:# CONFIG_EXT4_FS_XATTR is not set Jan 05 02:43:32 .30-34 and .36 have it Jan 05 02:44:18 maybe this is made xattr creep into your image Jan 05 02:44:31 which then exposed the unbalanced mutex Jan 05 02:46:27 it looks to me like EXT4_FS_XATTR is mentioned either way in the config file; is it possible that it is enable by default by the kernel? Jan 05 02:46:53 not sure Jan 05 02:48:16 xMff: hmmm....in my build_dir CONFIG_EXT4_FS_XATTR=y even though I have never touched any such setting Jan 05 02:48:24 yes its default y Jan 05 02:48:30 in the upstream Kconfig Jan 05 02:48:39 ok, that's why then Jan 05 02:49:06 and you had bad luck with your target, its default config does not blacklist xattr Jan 05 02:49:12 right Jan 05 02:51:55 jow * r24905 /trunk/target/linux/generic/ (config-2.6.35 config-2.6.37): [generic] disable CONFIG_EXT4_FS_XATTR on kernel 2.6.35 and 2.6.37, this exposes a mutex bug in mini_fo leading to hard lockups with various filesystem operations Jan 05 02:53:28 next time I won't be so certain I can't find the problem in the kernel Jan 05 02:53:48 I was having a crisis of confidence Jan 05 02:53:49 what technique did you use btw? Jan 05 02:53:53 printk Jan 05 02:53:56 :D Jan 05 02:54:34 lockdep didn't detect the bug Jan 05 02:54:49 would be cool if you could verify the mini_fo fix as well Jan 05 02:54:59 building now **** ENDING LOGGING AT Wed Jan 05 02:59:58 2011