**** BEGIN LOGGING AT Wed Jul 28 02:59:56 2010 Jul 28 07:15:05 Morning mythripk Jul 28 07:17:00 morgen Jul 28 07:19:03 morgen Jul 28 07:21:07 Mornin' all Jul 28 07:42:03 morning lag Jul 28 09:32:37 lool, btw, my ext2 image probs were caused by the FS running out of inodes Jul 28 09:32:57 formatting with a higher default value should solve it Jul 28 09:34:44 ogra: Any luck? Jul 28 09:35:03 archive is out of sync Jul 28 09:35:12 i'm waiting for it to test the fix i have Jul 28 09:35:24 How do you mean 'out of sync'? Jul 28 09:35:28 Out of sync with what? Jul 28 09:35:41 computer-janitor: Depends: python-fstab (>= 1.2) but it is not installable Jul 28 09:35:41 libmailtools-perl: Depends: libtimedate-perl but it is not installable Jul 28 09:35:48 ogra: it's quite surprizing TBH; do you resize the fs at some point? Jul 28 09:36:01 (I saw the chat and agree it's a number of inode problem) Jul 28 09:36:02 lool, nope only later on first boot Jul 28 09:36:36 i create an empty file with dd and format it, then loop mount it and cp -ax Jul 28 09:36:44 thats all i do Jul 28 09:37:56 it might be the way dd creates the file if you use count=0 and seek=<$imagesize> Jul 28 09:38:24 ? Jul 28 09:38:45 i know that creates a file with holes (which i.e. swapon complains about) Jul 28 13:39:10 Anybody at GUADEC? Jul 28 13:50:54 lool: nope, blew it off to get work done... Jul 28 14:18:55 Check my logic: the qemu vm issues exist in the -static builds as well; -static + chroot works better for at least partially unknown reasons Jul 28 14:19:45 cwillu_at_work: you mean, with rootstock? Jul 28 14:19:59 rsalveti, yes Jul 28 14:20:09 yep, had a hug debugging day yesterday Jul 28 14:20:15 oh, really? Jul 28 14:20:24 with full vm things are slower, seg faults and hangs Jul 28 14:20:26 too bad I missed it, because the -static is really hurting me right now :) Jul 28 14:20:42 with user emulation sucks with programs that request info from /proc Jul 28 14:20:49 like the stupid mono package Jul 28 14:20:50 I'm getting mysterious "method http died" messages, and if I shuffle things around, the mysterious deaths move to pip Jul 28 14:21:14 I've been running fine for months, and then yesterday my images just started failing Jul 28 14:21:46 it's almost as if some update to a package I was installing broke things Jul 28 14:22:23 cwillu_at_work: hm, what package failed? Jul 28 14:22:42 rsalveti, as far as I can tell, no package failed Jul 28 14:22:51 hm Jul 28 14:22:58 it's just that some process will randomly die after aptitude finishes Jul 28 14:23:09 cwillu_at_work: oh, ok Jul 28 14:23:09 (installing packages in the chroot) Jul 28 14:23:25 cwillu_at_work: what distro version are you trying to bootstrap? Jul 28 14:23:29 lucid Jul 28 14:23:41 debootstrap finishes fine Jul 28 14:24:01 cwillu_at_work: I'm planning to change to user mode emulation when running with root, like what you're doing, and also add native arm support Jul 28 14:24:03 I could push things out to first boot, but I'd really prefer not too Jul 28 14:24:12 today, I mean Jul 28 14:24:20 which, rootstock? Jul 28 14:24:22 yep, that sucks Jul 28 14:24:30 cwillu_at_work: yep Jul 28 14:24:34 sec Jul 28 14:25:18 full vm doesn't work, lots of bugs, and user mode emulation works fine for most of the cases Jul 28 14:25:46 then if you still can't create the rootfs, do it on arm Jul 28 14:27:33 so, yesterday, did you figure anything out re: triggering it? Jul 28 14:29:18 How much of the vm issues can be attributed to issues with the kernel targets? Might we just be seeing something odd there, especially with -updates things could be loosely tested for VM targets. Jul 28 14:29:32 chroot isn't using the arm kernel though Jul 28 14:29:47 yep, just full vm Jul 28 14:29:55 persia: with full vm I'm getting the same behavior with different kernel Jul 28 14:29:58 but that's something I haven't checked: whether I'm using -updates as my source Jul 28 14:30:23 cwillu_at_work: first, if you use maverick's qemu, you'll get the unsupported syscall for pselect again Jul 28 14:30:27 and a huge backlog Jul 28 14:30:42 I don't follow Jul 28 14:30:48 then if you install anything related with mono, it'll hang Jul 28 14:31:06 cwillu_at_work: this was fixed for lucid, but we have a regression for maverick Jul 28 14:31:22 I'm not targeting maverick :p Jul 28 14:31:30 apt-get uses pselect, and this syscall is implemented at lucid Jul 28 14:31:40 happens if you're using maverick as the host :-) Jul 28 14:31:46 not doing that either Jul 28 14:32:00 cwillu_at_work: also, I get a seg fault while installing humanity-icon-theme Jul 28 14:32:17 * cwillu_at_work repeats himself: Jul 28 14:32:27 same package set worked fine a week ago :) Jul 28 14:32:51 The lucid case should be fairly different from the maverick case, but the lucid Vm kernels come from the linux source package, which doesn't see much careful testing on updates except for i386 and amd64, usually. Jul 28 14:33:07 persia, we're not using the vm kernels at all though Jul 28 14:33:14 persia, qemu-arm-static doesn't require one Jul 28 14:33:25 cwillu_at_work: but for me rootstock is working fine for most of the basic cases Jul 28 14:33:39 cwillu_at_work: what packages are you requesting rootstock to install? Jul 28 14:33:46 rsalveti, it was for me too, up until a week ago :) Jul 28 14:33:52 sec Jul 28 14:34:05 I'm just going to pastebin my version Jul 28 14:34:14 ok Jul 28 14:34:18 cwillu_at_work: Sorry then: I thought the issue was comparison of -static to the VM case. Ignore me :) Jul 28 14:34:36 persia, well, it kinda is, but I'm focusing on the parts where -static fails in the same way :) Jul 28 14:35:52 http://pastebin.com/1iynGS4j Jul 28 14:35:54 line 620 Jul 28 14:36:02 you should be able to run that locally if you remove the git calls Jul 28 14:36:34 ok, will try Jul 28 14:36:38 if I don't download the packages first, aptitude will die while installing firefox (i.e., second invocation) Jul 28 14:36:38 just a sec Jul 28 14:36:49 as written, it makes it through to the pip calls Jul 28 14:37:03 ugh, which are commented out in this version :p Jul 28 14:37:43 you might need an empty modules.d directory in the working dir Jul 28 14:38:05 given locally cached downloads, it takes about 20-25 minutes to run Jul 28 14:48:52 rsalveti, it seems like anything which touches the network dies after that point Jul 28 14:49:14 if I remove the pip calls and the aptitude update call at the end, it finishes Jul 28 14:55:35 rsalveti, actually, there's an odd thing: Jul 28 14:56:00 I split up the installer into multiple files as you noticed, which are each called in the same chroot, but different invocations of it Jul 28 14:56:24 (that was done to try to isolate things after they went weird yesterday) Jul 28 15:00:28 .... Jul 28 15:00:47 hmm, it could be the other arm chroot I had open to build packages was breaking things? Jul 28 15:03:02 That should have no effect. I've routinely had multiples open (via schroot) without any apparent effect. Mind you, that sample may not be large enough to prove a negative. Jul 28 15:03:27 persia, arm chroot's? Jul 28 15:04:18 arm-on-amd64 foreign schroots Jul 28 15:04:26 oaky Jul 28 15:04:26 Err, armel-on-amd64 Jul 28 15:05:42 yep Jul 28 15:06:06 well, I'm rerunning without the other chroot's open Jul 28 15:06:48 I'd run that test a few times, as there's supposed to be some separation. If you can demonstrate a convincing effect, then we clearly need to do something more advanced with LXC Jul 28 15:07:23 cwillu_at_work: sorry, will look at it now, was doing other things Jul 28 15:07:59 np Jul 28 15:08:23 persia, I've run 40-50 builds in the last day, and hundreds over the last few months Jul 28 15:08:50 I haven't established that the extra chroot was the cause though, if that's what you're asking Jul 28 15:08:59 but whatever changed is consistent Jul 28 15:09:12 cwillu_at_work: I figured, but I doubt you have data on which of them were run with a simultaneous build chroot active (although I'd be happy to know otherwise). Jul 28 15:09:34 I could figure it out (both have start and end timelogs) Jul 28 15:09:39 Are you targeting -updates? I really think that's the most likely source of regression. Jul 28 15:09:59 I checked, I'm not Jul 28 15:10:06 -security ? Jul 28 15:10:08 take a look at the pastebin I posted Jul 28 15:10:10 How about on the host? Jul 28 15:10:20 I haven't applied updates this week yet Jul 28 15:10:23 at it worked on friday Jul 28 15:10:30 s/at/and/ Jul 28 15:10:41 Ugh. phase-of-the-moon problem :( Jul 28 15:10:46 :) Jul 28 15:11:05 MIRROR="http://repository:3142/ports.ubuntu.com/ubuntu-ports" Jul 28 15:11:05 REAL_MIRROR="http://ports.ubuntu.com/ubuntu-ports" Jul 28 15:11:05 COMPONENTS="main universe" Jul 28 15:11:27 Right. That should be the same as it was at release. Jul 28 15:12:13 the reason I mention the other chroot isn't so much that I had builds running at the same time (that was one of the earlier things I checked), but rather that I never closed the chroot itself Jul 28 15:12:34 that's the test I'm running right now Jul 28 15:12:40 That really shouldn't have an effect Jul 28 15:13:11 you've said this :) Jul 28 15:13:16 I'll know in ten minutes Jul 28 15:15:30 Well, a simultaneously active chroot could have an effect if the chroot boundaries are insufficient, leading to a need to do more with LXC, but an inactive chroot is about the same whether chroot() has been called on it or not. Jul 28 15:18:47 it had /proc, /sys/, dev and so forth mounted inside it Jul 28 15:19:01 But no files open, right? Jul 28 15:19:11 whatever a shell would have, yes Jul 28 15:19:17 here's the thing though: Jul 28 15:19:24 (have you looked at the pastebin yet? :p) Jul 28 15:19:38 I wouldn't expect a shell to have enough open to make a difference, but maybe Jul 28 15:19:38 I split the script that runs in the chroot into four pieces Jul 28 15:19:39 yes Jul 28 15:19:40 to debug this Jul 28 15:19:50 each script is run in a separate chroot, sequentially Jul 28 15:20:27 Now, I can rerun rootstock, and it'll get up to the same point each time (i.e., first download works, next thing to touch the network dies) Jul 28 15:20:37 but... the next thing to touch the network is in a different chroot Jul 28 15:20:45 and it still dies Jul 28 15:20:51 hm, werid Jul 28 15:20:55 ...even though re-running rootstock doesn't Jul 28 15:21:47 I'm pretty sure this will reduce to a config change that I forgot I made or something silly like that, but even so, I don't think I'm doing anything that _should_ be broken :) Jul 28 15:21:49 Very odd. Jul 28 15:22:24 At least it's isolated enough that it can be debugged, so once it's known, it can be made to never happen. Jul 28 15:23:12 I'm secretly hoping that this is the same trigger as the grief in qemu, but I'm not sure how that plays into what you said rsalveti earlier about lucid patches Jul 28 15:24:15 moments away from knowing Jul 28 15:24:33 nope, that wasn't it :p Jul 28 15:24:34 damn Jul 28 15:25:12 * persia is glad the LXC integration isn't actually required, as that has looked painful the last few investigations Jul 28 15:25:22 LXC? Jul 28 15:25:42 cwillu_at_work: I'm running it here, will let you know if it worked or not Jul 28 15:25:42 http://lxc.sourceforge.net/ Jul 28 15:25:49 k Jul 28 15:26:07 rsalveti, I don't think there's too many hardcoded dependencies on my environment Jul 28 15:26:23 Basically, one can create even more segregation than with a regular chroot, which we haven't (quite) needed for anything yet, but I keep expecting it when people start talking about issues with multiple simultaneous chroots. Jul 28 15:26:33 cwillu_at_work: I removed most of the stuff I could easily identify Jul 28 15:26:33 ah, k Jul 28 15:26:59 mirror, rsync, etc Jul 28 15:27:40 the rsync might be of interest Jul 28 15:29:48 (of modules.d, at least) Jul 28 15:35:16 pulling up a shell inside the chroot after it starts dying Jul 28 15:35:37 I'd like to confirm that it's network related activity Jul 28 15:46:15 cwillu_at_work: yep, worked fine Jul 28 15:54:02 okay, bash prompt up Jul 28 15:54:41 yep, definitely something weird on this box Jul 28 15:58:10 ifconfig eth0 shows information, ifconfig dies with ": error fetching interface information: Device not found" Jul 28 15:58:22 with no indication of which device it's looking for Jul 28 16:00:34 host repository Jul 28 16:00:34 qemu: Unsupported syscall: 250 Jul 28 16:00:34 errno2result.c:111: unable to convert errno to isc_result: 38: Function not implemented Jul 28 16:00:34 socket.c:3851: epoll_create failed: Function not implemented Jul 28 16:00:34 /usr/bin/host: isc_socketmgr_create: unexpected error Jul 28 16:03:09 That's exceedingly annoying. Is that unique to one install, or replicable? Jul 28 16:05:08 not sure yet; copying the files over to my desktop to try it there Jul 28 16:10:44 strace doesn't work :D Jul 28 16:11:23 strace-in-chroot or strace-on-qemu? Jul 28 16:11:31 chroot Jul 28 16:11:42 qemu can't emulate ptrace right now Jul 28 16:11:45 and it would probably be hard Jul 28 16:11:56 which is the package for binfmt? Jul 28 16:11:58 Well, the issue is in qemu, if you're getting "Function not implemented". Try stracing that. Jul 28 16:12:06 binfmt-support? Jul 28 16:12:16 that's the base, but it's pluggable. Jul 28 16:12:45 You want to edit the binfmt entry for armel binaries to call strace, or run host strace attaching to a PID. Jul 28 16:17:52 attached Jul 28 16:19:57 will be a moment, hit ctrl-c in the prompt once too many times Jul 28 17:05:43 * cwillu_at_work cries Jul 28 17:29:48 persia, did you want an strace of qemu when a "host google.ca" fails? Jul 28 17:32:22 persia, rsalveti, http://pastebin.com/8agEskqT Jul 28 17:32:51 http://pastebin.com/6EfBPFLu is the shell output Jul 28 17:50:54 there's another thing I didn't notice before: Jul 28 17:51:13 my build environment is an unpacked copy of the output of my rootstock Jul 28 17:53:00 which I haven't regenerated in a few weeks Jul 28 17:53:13 same problems in it though Jul 28 18:01:25 er, not quite Jul 28 18:01:34 host dies in the same way, pip and apt seem to work fine :/ Jul 28 18:16:09 and, success Jul 28 18:16:21 bouncing everything through a local proxy on 127.0.0.1 works **** ENDING LOGGING AT Thu Jul 29 02:59:57 2010