**** BEGIN LOGGING AT Mon Oct 03 02:59:57 2011 Oct 03 08:47:37 diwic: the contextgetstate.patch worked fine.. and solved the exception bug on arm Oct 03 08:48:02 xranby, \o/ Oct 03 08:48:25 xranby, was it enough to bring up the hajpa hajpa? Oct 03 08:48:35 yeah at least for one of the two jvm's Oct 03 08:48:48 the other jvm had its own classloader bug.. Oct 03 08:50:09 xranby, but that was maybe unrelated to pulseaudio? Oct 03 08:50:36 diwic: correct the pulse-audio java layer looks bugfixed now , thanks! the only thing that can break it now are if the java virtual machine have missed to implement some part of the jvm specification Oct 03 08:50:53 ok Oct 03 08:52:24 diwic: and for arm we are in the situation that oracle do not provide an opensource reference implementation like on x86 :/ so it takes time to get everything super polished Oct 03 08:53:37 xranby, ok so a lot of that reference implementation is written in platform dependent language (e g assembly?) Oct 03 08:53:57 xranby, vaguely remember you tried to explain some of that stuff at latest UDS Oct 03 08:55:24 diwic: yes highly platform dependent, and optimized java virtual machine port contains about 80000 lines of platformspecific code Oct 03 08:55:37 ouch Oct 03 11:11:54 hm, is the ARM cross compiler toolchain for x86 currently broken? Oct 03 11:12:00 (in oneiric) Oct 03 11:14:53 apparently the libgcc1-armel-cross package is missing Oct 03 14:34:10 diwic: it turned out that the reverence implementation did not strictly implement the jni spec :) http://icedtea.classpath.org/hg/icedtea6/rev/23b9bb41de6d Oct 03 14:34:25 now \o/ hajpa hajpa work on arm Oct 03 14:34:46 hajpa hajpa! Oct 03 14:37:36 infinity, ac100-tarball-installer in moderation queue Oct 03 16:07:04 ppisati: Can you look at Bug 865479? Thanks. Oct 03 16:07:06 Launchpad bug 865479 in linux-ti-omap4 "wl1271: ERROR ELP wakeup timeout" [Undecided,New] https://launchpad.net/bugs/865479 Oct 03 16:07:17 Not sure that it is critical, just that it exists. Oct 03 16:09:50 infinity: I'm still consistently getting oem-config to respawn. Looking at the log files, I believe this may be the problem: Oct 3 08:12:05 localhost ubiquity: debconf: DbDriver "config": /var/cache/debconf/config.dat is locked by another process: Resource temporarily unavailable Oct 03 16:12:00 Hard to tell, as that message is almost a minute before the respawn message, but it is the only negative message before the next instance of oem-config respawn in syslog. Oct 03 16:14:32 I tried to get mongodb working over teh weekend but didn't make it very far Oct 03 16:42:10 GrueMaster: I can't get this respawn thing happening at all. :/ Oct 03 16:42:32 GrueMaster: I can get the installer to explode in general based on cron.daily eating the system alive. That's about it. Oct 03 16:42:47 (And I plan to hack around that with a kludge today) Oct 03 16:44:27 Pfft. Figures. Most of the bugs I find are easily reproducible here, but no where else. Oct 03 16:46:46 I wonder if the cron.daily stuff is blocking me. Maybe it is doing an apt-get update at the same time? Oct 03 16:48:00 That could explain why I see it and others don't. Timing. Oct 03 16:48:54 Oh, and the slideshow works again. :) Oct 03 17:05:37 what steps are involved in bringing support for a new chip like the Cortex A9 to an OS Oct 03 17:07:37 The A9 is new? Oct 03 17:08:21 GrueMaster: The cron.daily thing can just explode in general due to system load. I think the debconf DB thing might be a red herring, though. Let me check the log on a successful install run. Oct 03 17:10:03 GrueMaster: Yeah, I have that same debconf locking spew on a successful install. Oct 03 17:10:19 GrueMaster: So, while it's likely a bug somewhere, it's probably not causing an issue either. Oct 03 17:10:43 infinity: new to that os :) Oct 03 17:10:50 I just had ubiquity fall apart when I tried to run without any networking. Really bad. Oct 03 17:12:01 brandini: Well, things like A8->A9 are really just about building support into toolchains and kernels for shiny new features you might care about, and building support into bootloaders to bring them up. Oct 03 17:12:08 Hmm. Something caused the filesystem to remount read-only. Oct 03 17:12:14 brandini: But since all your binaries already run (yay backward compat), it's not much effort. Oct 03 17:12:32 ok Oct 03 17:13:07 GrueMaster: Read-only filesystems usually point to hardware hating you, with a 5% chance of kernel bug... Oct 03 17:13:25 (Well, less than 5%, since we tightly control our target platforms here, and we aren't all seeing ro filesystems) Oct 03 17:13:53 infinity: I'm the guy that nails that 5%. Oct 03 17:14:03 Or your hardware hates you. ;) Oct 03 17:14:30 I hate to beat the same dead horse, but when working on SD, I'd tend to blame cards for a filesystem going tits-up before anything else. Oct 03 17:14:43 And I really think we should do most of our test installs on hard drives. Oct 03 17:14:49 Multiple SD cards and multiple Panda's? I highly doubt I can be having a complete failure here. The odds are against it being my HW. Oct 03 17:14:57 Except hitting a card once in a while to make sure the code paths still do what we think they do. :P Oct 03 17:15:43 GrueMaster: The odds for hardware are still higher than the idea that a deterministic automated installer only fails in one person's house. Oct 03 17:15:51 infinity: Can't test on hard drives. The preinstalled images are designed around SD. It would be equivalent to testing in a VM. Oct 03 17:15:58 GrueMaster: And you've been beating on the same SD cards for a while, I'd guess. Oct 03 17:16:21 Some are almost brand new since A2. Oct 03 17:16:27 GrueMaster: Preinstalled would work on an HDD just fine. Oct 03 17:16:37 (Just need to write it differently) Oct 03 17:16:47 But yeah. I know the situation we're stuck in. I just dislike it. :P Oct 03 17:16:58 I have 5 cards in front of me, and I only trust one of them. Oct 03 17:17:03 And that trust won't last forever. Oct 03 17:18:50 I have ~2 different cards per board at my disposal. Varying sizes, speeds, an brands. To see these issues across multiple SD cards on Panda A1, A2, and A3 systems is extremely rare. Oct 03 17:19:24 Well, wait, which issues? The ro filesystem above sounded like a one-off. Oct 03 17:19:31 Remember, I have almost 10 years in hardware validation. I know very well how to triage hardware failures. Oct 03 17:20:01 That was the first time I saw it, but also the first time I booted w/o networking. Oct 03 17:20:27 Correlation and causation not being the same thing. :P Oct 03 17:20:39 And ubiquity crashed with a UBI failure of some sort. Just getting ready to look at the log. Oct 03 17:20:53 But it's possible 'fixrtc' stopped working or some such, which could lead to network->badtime->fsbreak. Oct 03 17:21:05 But that should break early. Oct 03 17:21:20 The log will be useless if the filesystem is ro. Unless you're really lucky. Oct 03 17:21:28 fixrtc only runs during boot in initrd I thought. Oct 03 17:21:28 dmesg is more likely to be useful. Oct 03 17:21:58 Well, it's an at-boot thing. To fix the clock... Oct 03 17:22:05 We don't then break the clock later. :P Oct 03 17:22:14 Since these images don't have any way to get a terminal session without first modifying the image prior to boot, there is no way to debug a live session. Oct 03 17:22:22 So, if it's not working, then without ntpdate, your clock will be a sad panda. Oct 03 17:23:10 Yeah, that's so obviously an ARM preinstalled bug. I noticed it with the last week of debugging. :/ Oct 03 17:23:29 Since a real live system has the "ubnutu" user, and a real oem-config system has an oem user. Oct 03 17:23:32 But we fail to have either. Oct 03 17:23:52 Not fixing that before release, though. Oct 03 17:24:04 Worth having on the TODO if jasper survives. Oct 03 17:26:15 Log files are a bust. complete corruption prior to any ubiquity info. Interesting to see network manager completely freak out though. Oct 03 17:26:49 and I have asked for debug hooks of some sort since we started doing preinstalled images. Oct 03 17:58:07 GrueMaster: When does the loop happen? Oct 03 17:58:17 GrueMaster: I'm going to sit here and try to reproduce this... Oct 03 17:58:42 Right around the time it says that it is copying log files. Oct 03 17:58:48 GrueMaster: Does everything complete (including the removal)? Oct 03 17:59:15 Does not start the removal. Oct 03 17:59:33 Hrm. Oct 03 17:59:57 I also might not be able to fix the anacron issue this cycle, the more I think about it. Oct 03 18:00:21 The fix needs to be in ubiquity, and might be regression-inducing. Oct 03 18:00:39 I realised I can't hack around it in jasper, because we don't actually have a sane clock yet. Oct 03 18:00:41 Nothing (other than what I previously stated about the config.dat) indicates an issue in any log files I have. Oct 03 18:02:34 Hmm. Doesn't appear to like me hacking in an additional debug user prior to oem-config running. It has been sitting on "Creating User" for a while now. Oct 03 18:16:32 Grrr, Doesn't appear to like my shadow entry. Oct 03 21:08:46 GrueMaster: *poke* Oct 03 21:09:44 GrueMaster: Can you do some test runs on your problematic systems with 's/update-apt-xapian-index -q/update-apt-xapian-index -q -u/' in /etc/cron.daily/apt ? Oct 03 21:09:59 GrueMaster: Seems to have made mine slightly less grumpy. Oct 03 21:32:55 infinity: On it now. Oct 03 21:33:06 (took an extended lunch break). Oct 03 21:33:27 GrueMaster: All good. Food sounds like a stellar plan to me too. Oct 03 21:51:31 Starting the test now on a freshly zero'd & flashed drive. Even if this solves the oem-config respawn, we still have another issue where ubiquity crashes when no network available. Oct 03 21:56:39 GrueMaster: Do you have logs for the no-network thing? That one couldn't have been silent... I hope. Oct 03 21:56:51 GrueMaster: Or even an apport-filed bug would be nice. Oct 03 21:58:04 Well, that was part of the problem. ubiquity crashed saying it would spawn a desktop session for debugging, but that failed to materialize. And nothing was captured in the logs as the system had gone read-only. Oct 03 21:58:53 I'll try to reproduce it as well. I would really like to have an image that I can get to a login for testing though. Otherwise I am just spinning my wheels on useless stuff. Oct 03 21:58:59 Oh, that was the read-only one, right. Oct 03 21:59:11 I suspect that one might not be reproducible, but if it is, great. Oct 03 21:59:54 If the FS isn't readonly, ubiquity crashes will spawn apport, which is good enough for filing bugs with logs, at any rate. Oct 03 22:00:25 If jasper survives another cycle, we should remember to add the oem user, though. Oct 03 22:00:38 I'll prep another SD to test that, but for desktop, I am kind of limited to single tasking (1 monitor & keyboard for Pandas). Oct 03 22:00:51 Oh, wait. No. There shouldn't be one, it's deleted by oem-config-prepare before rebooting. Oct 03 22:01:01 So, oem-config in general just has no way to debug it. Hrm. Oct 03 22:01:41 Not that I have found useful. Oct 03 22:01:59 I've been testing on my ac100 too. Which picks up slightly different bugs just due to the timing of having faster storage. Oct 03 22:02:16 Panda and ac100 together seem to get most everything.. Except your respawn issue. :/ Oct 03 22:02:21 And fail. The cron.daily fix is bust. Oct 03 22:02:40 Well, it does address *a* problem. Just not yours, apparently. Oct 03 22:03:16 And I honestly can't get ubiquity to crash/respawn at all (or finish/respawn, or any combination thereof), so I'm kinda stumped. Oct 03 22:03:37 A full set of logs (heck, just /var/log tarred up) from the SD might be enlightening. Oct 03 22:05:01 Unfortunately, it isn't as enlightening as one would expect. I have yet to find a significant entry in any logs under /var/log. Oct 03 22:05:33 The best I had was the debconf config.dat file (which you said wasn't significant). Oct 03 22:09:56 Sigh. I hate the automount "feature". It really annoys me. And what's with the AC100 showing (and mounting) all of the SERVICEV001 partitions? Oct 03 22:12:22 GrueMaster: It doesn't? Oct 03 22:12:25 GrueMaster: At least, it doesn't here. Oct 03 22:12:42 GrueMaster: You sure that's not a freshly-flashed card? Oct 03 22:12:47 Could be because I upgraded from Natty. Oct 03 22:12:52 (It's only SERVICEV001 after it's booted) Oct 03 22:13:02 I'm talking about the eMMC. Oct 03 22:13:35 Oh. yeah, I get no weirdness here. Oct 03 22:13:44 You might want to try the recent images. :) Oct 03 22:14:03 They actually work better than OMAP. Oct 03 22:14:07 Which is a bit sad. Oct 03 22:14:16 But yay for faster storage. Oct 03 22:14:17 As I said, if there is >.01% chance of weirdness, I will hit it. Oct 03 22:14:55 Anyhow. Going to take a late lunch. Oct 03 22:15:03 Back later to keep banging on things. Oct 03 22:16:14 On the looping installer thing, if you can just let it settle a bit after it loops, so it's sure to have flushed some buffers, yank the card, and give me /var/log in a tarball, maybe I'll spot something that doesn't add up compared to one here. Oct 03 22:17:15 I usually wait until it comes up, then switch to text console and 3-finger reset so everything shuts down cleanly. Oct 03 22:18:53 You could just alt-sysrq-s to force a sync, wait for the SD light to go out, and yank it. That has the advantage of basically being a snapshot of the problem area without reboot fluff afterward. Oct 03 22:20:26 reboot fluff is minimal (and tarball is in my people.c.c dir (firtname)/20111003-oem-fail-logs.tgz Oct 03 22:20:55 Great, I'll look at it after lunch. Oct 03 22:21:04 starting on the network-less run. Oct 03 22:21:05 That's the re-spawn one? Oct 03 22:21:20 Yes. Oct 03 22:21:28 Kay. Oct 03 22:21:41 W/o network crashes much more ugly before it even gets that far.\ Oct 03 22:29:54 Grr. Network crash didn't happen this time around, but oem-config respan did. **** ENDING LOGGING AT Tue Oct 04 02:59:57 2011