**** BEGIN LOGGING AT Mon Mar 25 02:59:59 2013 Mar 25 06:35:51 RP2: thx for the informations of friday, will look into it laters. was AFK over the weekend. Mar 25 08:09:31 good morning Mar 25 08:18:26 morning all Mar 25 08:36:35 morning all Mar 25 08:37:09 good morning Mar 25 11:34:38 sameo_: no systemd service file for neard? i thought this was cutting edge ;) Mar 25 12:25:07 is there a one-liner to clean all -native packages ? I'm seeing each one die on autoconf .. and doing a manual clean-all, rebuild is getting old .. fast Mar 25 12:25:34 zeddii: the S!=B change landed, which autotools really doesn't like. easiest is to blow away your tmp. Mar 25 12:25:55 you'll get it for all non-native packages that use B!=S Mar 25 12:26:38 * zeddii nods. Mar 25 12:26:55 I can live with that, no slower than what I'm currently doing :P Mar 25 12:34:09 zeddii: FWIW, easiest way to blow away native is rm -r tmp/stamps/x86_64-linux tmp/sysroots/x86_64-linux Mar 25 12:34:22 zeddii: but as rburton notes, it won't help in this case Mar 25 13:58:27 hello Mar 25 14:01:00 fray: FWIW, debugedit failed on mips too Mar 25 14:01:34 same type of failure? Mar 25 14:01:45 I'm thinking the problem has to do with some sectiosn being moved around, but I'm not sure.. Mar 25 14:02:21 fray: I didn't dive into it but looks similar Mar 25 14:02:22 the issue, at least on PPC, is that it gets to two "NOBITS" sections and they have load addresses and sizes set.. but the addresses are the same.. Mar 25 14:03:01 when the system processes, it's generating a new unique "build-id".. and it gets to these sections, and the load address comes back at '0'.. it then tries to memcpy from '0' and the returned size (> 0) Mar 25 14:03:11 which of course immediately blows up Mar 25 14:03:45 debugedit is such a gross hack... Mar 25 14:04:00 what I don't get though.. I've looked a newer elfutils and newer debugedit and neither have any changes for this particular issue that I've seen.. Mar 25 14:04:26 I'm still investigating.. but something odd is certainly happening.. and I think it's due to gcc 4.7... Mar 25 14:04:45 walters, no disagreement here.. in place editing of dwarf info is sketchy.. but it works Mar 25 14:05:42 (I'd love an alternative to debuginfo, but I've yet to find one) Mar 25 14:07:36 fray: does a load address of 0 make sense? Mar 25 14:07:36 i did a bit of research recently and dumped the results here https://bugzilla.gnome.org/show_bug.cgi?id=695816 Mar 25 14:07:37 Bug 695816: was not found. Mar 25 14:07:51 RP_, a load of 0 w/ a size > 0 doesn't make sense to me.. Mar 25 14:08:09 tl;dr: debian skips installing the source files and just ships the dwarf, which means there's no need for the "edit" part of debugedit Mar 25 14:08:50 walters, which also means it's nearly impossible to do production debugging of sources Mar 25 14:09:17 it's very typicaly in our environment that you have a runtime only system.. attach a debugger and have the debuginfo and associated sources on a remote machine.. Mar 25 14:09:27 something has to correlate those sources to the debuginfo to the original binary (on the target) Mar 25 14:09:47 otherwise the user has to manually specify the location of everything to the debugger.. and that's not acceptable from a user standpoint Mar 25 14:09:59 well you just find the git revisions that were used to build the target, and check those out on the client system Mar 25 14:10:14 sorry, doesn't work that way in commercial environments for embedded Mar 25 14:10:30 they have an archive of the -dbg packages generated along side their sources and have that available for remote debugging Mar 25 14:10:42 (or in some cases to load on to the production system if you are running gdb on the target itself) Mar 25 14:10:49 apparently arjan's distro has some debug magic, but i'm not sure what he does that's special Mar 25 14:10:53 everything has to be selfcontained.. Mar 25 14:10:55 it should be on fenrus somewhere Mar 25 14:11:44 fray, i'm certainly not debating that people rely on the system as it is today, but there's also more than one way to slice the problem Mar 25 14:12:00 debugedit serves two purposes.. the first is to collect all dwarf referenced sources.. the second is o change the references to avoid the many intervention problem Mar 25 14:12:19 two programs to do the same would be fine with me.. I have no love of debugedit.. but what it does is needed still Mar 25 14:12:41 i like the microsoft solution to this problem which is a symbol server Mar 25 14:12:52 (the adjusting the build-id is a third facet of what it does.. one I'm still not convinced is useful for embedded.. but it is used) Mar 25 14:13:01 rather than having to install the complete debuginfo on the client *or* the target Mar 25 14:13:14 walters, that works well if you have a singular distribution.. doesn't work when every use is creating their own custom distribution Mar 25 14:13:22 yeah, that's nice when you have stable packages Mar 25 14:13:39 man, fenrus uses a fuse file system to fetch debug packages on the fly Mar 25 14:13:44 fray, not sure how it's that different to put up -dbg packages in a repo versus a server Mar 25 14:13:46 Fedora, Ubuntu, etc could all use a central system..but I don't see OE being able to do it.. (users of OE could setup their own and do it) Mar 25 14:14:14 walters, there is no 1 set of oe, yp, angstrom, etc sources (or the way they are constructed) Mar 25 14:14:38 each use sets up their own distro settings, add their own customizations, etc.. so the debuginfo is specific to the distribution the end user has generated Mar 25 14:14:49 i understand that yes =) Mar 25 14:16:28 to return to the topic, i wonder if i introduced this regression by telling GCC to generate build IDs, and that code has always been buggy on MIPS Mar 25 14:16:28 * RP_ comments out his sanity checks locally to allow builds to "work" :/ Mar 25 14:16:41 walters: and ppc? Mar 25 14:16:54 walters: We're seeing this on mips and ppc that I know of... Mar 25 14:16:58 from what I can tell, a change maybe a year ago happened, which also broke the prelinker in some environments. Mar 25 14:17:14 The order of the sections changed, and loadable sections started popping up after non-loadable.. Mar 25 14:17:20 I wonder if this case is similar Mar 25 14:17:43 RP_: hm, i guess that's unlikely since I know Red Hat Enterprise Linux definitely builds for PPC(64), although there may be a cross issue Mar 25 14:17:46 fray: could be Mar 25 14:19:08 fray: anyhow, just wanted to give you another datapoint. Let me know if you want me to do anything with this, otherwise I'll leave it with you Mar 25 14:19:41 well if I check the load address against NULL, and then skip the entry it works around the rpboelm.. Mar 25 14:19:48 but I'm still trying to figure out if that is reasonable or not Mar 25 14:20:23 fray: why does it not fault when we run against a binary the second time? Mar 25 14:21:53 because by that point either the binaries been manipulated already -- or it's been trashed.. Mar 25 14:21:55 I'm not sure yet Mar 25 14:22:09 I didn't compare the bad (orig) vs the bad (adjusted/crashed) Mar 25 14:22:32 fray: just wondering if it gives any more data points Mar 25 14:22:55 rburton, link? Mar 25 14:22:57 fray: is there a function in elfutils or similar we could call to compute the checksum instead? Mar 25 14:23:02 what I'm surprised by is that RH/Fedora doesn't seem to have seen this problem.. Mar 25 14:23:02 walters: git.fenrus.org Mar 25 14:23:10 both from google searches and just code inspection Mar 25 14:23:30 they may not be using beecrypt though.. and libnss might have a check for the data addr != null or something Mar 25 14:24:16 rburton, looks like http://git.fenrus.org/git/?p=projects/fenrus-debug-info.git;a=tree Mar 25 14:24:32 walters: yeah. arjan's crazy toy distro. Mar 25 14:24:40 and part of me wonders if it even matters, if we skip the sections we still get reproducable build-ids.. and thats all they need to be.. Mar 25 14:25:05 system("mkdir -p /usr/lib/.debug &> /dev/null"); ew ew ew Mar 25 14:25:07 walters: not sure if the crazy is just install-on-demand or whether there's more to it than that Mar 25 14:25:21 walters: step away whilst you can Mar 25 14:25:39 install on demand isn't actually that crazy.. Mar 25 14:25:49 go back to walter's MS comment.. it's a reasonable idea in fact.. Mar 25 14:25:57 I'm just not sure "fuse" is the right answer Mar 25 14:26:02 yeah Mar 25 14:26:49 I'd rather see the debuggers, tooling (valgrind?) be adjusted in some way to know how to pull whatever they need from the server... Mar 25 14:26:55 via a service even.. Mar 25 14:27:10 I just have an objection to fuse being used in amny cases because it requires root privileges to setup Mar 25 14:28:27 speaking of build-id.. that is something we should probably pursue in the 1.5 development.. instead of -just- the ".debug" directory.. we have a .build-id symlink as well.. that way we're sure the debuggers and finding the right version.. Mar 25 14:29:16 fray: don't they check if the hashes between the two files match? Mar 25 14:29:51 honestly, I don't know if they do Mar 25 14:30:13 I know if the .build-id links are available, gdb will use them in preference to the .debug files.. which could mean a quicker resolution Mar 25 14:30:19 "could" Mar 25 14:30:36 I've very rarely used build-id outside of the canned environments like Fedora Mar 25 14:33:59 stupid pacific timezone.. I was hoping Khem would be alive by now Mar 25 14:38:12 ok.. so what is happening is the hash function is being called with void * data = 0, size_t size = 64 Mar 25 14:38:25 there is a check in beecrypt for a size of '0', but not a data == NULL Mar 25 14:38:44 so that explains why it's falling through an blowing up Mar 25 14:38:54 I still don't know why the data is 0 though, that is what is coping out of libelf Mar 25 14:44:06 * fray goes to figure out how the binary gets mangled during the crash Mar 25 14:59:31 RP, so the differences between the pre segfault and post.. is that the source references have been changed, and the build-id was partially recomputed Mar 25 14:59:38 other then that, I see no difference Mar 25 14:59:44 -00000100 47 4e 55 00 38 c6 fa 1d 75 71 cd 0e bc 85 89 fe |GNU.8...uq......| Mar 25 14:59:44 -00000110 97 68 22 00 98 ac 6a 0c 00 00 00 83 00 00 00 18 |.h"...j.........| Mar 25 14:59:44 +00000100 47 4e 55 00 00 00 00 00 00 00 00 00 00 00 00 00 |GNU.............| Mar 25 14:59:44 +00000110 00 00 00 00 00 00 00 00 00 00 00 83 00 00 00 18 |................| Mar 25 14:59:52 that's the comparison in the build-id Mar 25 15:00:03 looks like the later was just zero'd Mar 25 15:00:50 it looks to me when debugedit is run a second time, it just spits out the 0:0 debugid Mar 25 15:01:09 doesn't look to me like it is actually processing/recomputing.. which would avoid the crash Mar 25 15:17:32 rburton: Will fix that soon :) Mar 25 15:33:05 Interesting.. I just noticed something Mar 25 15:33:09 Section Headers: Mar 25 15:33:09 [Nr] Name Type Addr Off Size ES Flg Lk Inf Al Mar 25 15:33:40 [22] .got PROGBITS 000239d0 0139d0 000014 04 WAX 0 0 4 Mar 25 15:33:40 [23] .plt NOBITS 000239e4 0139e4 000234 00 WAX 0 0 4 Mar 25 15:33:40 [24] .bss NOBITS 00023c18 0139e4 0001c8 00 WA 0 0 8 Mar 25 15:33:46 the address changes, but the offset doesn't Mar 25 16:10:04 fray: so its not updating the offsets? Mar 25 16:12:19 that -should- be ok.. I think.. Mar 25 16:12:39 I'm working on a workaround patch.. Mar 25 16:12:55 simply checks for '0' on the data size and skips the md5sum operation Mar 25 16:29:08 hmm, what has changed in bitbake 1.16 and above to sort FILES_${PN}-dev before FILES_${PN}? Mar 25 16:34:07 denix: bitbake.conf changed Mar 25 16:34:31 denix: there was discussion on the list, a lot of auditing and so on Mar 25 16:35:04 yeah, trying to find that. what was the topic? Mar 25 16:37:15 RP_ well, I have a workaround.. Mar 25 16:37:18 ah, duh, PACKAGES controls that order... silly me. Mar 25 16:37:44 need more coffee I guess, can't see obvious things :) Mar 25 16:37:56 denix: its ok, I know the feeling ;-) Mar 25 16:38:04 fray: the check for zero? Mar 25 16:38:23 fray: probably worth sending out so we can at least patch up the error paths Mar 25 16:38:58 ya, will do Mar 25 16:39:03 I'm confident it's 'safe'.. Mar 25 16:39:11 just not sure if it's a final fix or not Mar 25 16:41:53 RP_, sent to the list as an RFC Mar 25 16:42:08 the buildid is being generated consistently.. so it should be ok.. Mar 25 16:42:23 fray: I'm tempted to merge this, fix the error paths and then if we need to improve on it we can Mar 25 16:42:28 ya Mar 25 16:42:38 I need to get someone like Khem to look at this and explain WTF is going on Mar 25 16:43:06 I'm about 90% convinced that it's the NOBITS and offset addresses being the same that's the culpret.. but that stuff is buried deep in libelf.. Mar 25 16:43:12 I havn't actually found the load routines yet.. Mar 25 16:55:06 fray: I've sent out my version of the error patch, I used bb.fatal Mar 25 16:55:14 fray: and covered a few more cases Mar 25 16:57:25 ok Mar 25 16:57:34 RP_, talking to one of our ELF experts.. Mar 25 16:57:53 the two sections that are causing the problem .plt and .bss.. these (NOBITS) sections are dynamically allocated, even though they have a size.. Mar 25 16:58:04 so libelf is reporting no data (data point == NULL) but a size... Mar 25 16:58:12 so I think my workaround might be the actual fix afterall Mar 25 16:58:52 * fray just noticed the clock on his dev machine is around 20 minutes fast.. huh Mar 25 16:58:56 fray: makes you wonder how this ever worked... Mar 25 16:59:18 fray: unless beecrypt just drops null or something Mar 25 16:59:21 or used to Mar 25 17:00:15 fray: I added more error handling and now we're seeing more errors :( Mar 25 17:00:39 beecrypt has a check for a size of '0'.. but assumes you wouldn't pass it an address of 0 Mar 25 17:00:48 I wonder if this has been broken for over a year Mar 25 17:01:12 remember when prelink went nuts on PPC? that was when the .plt/.bss sections got moved into the middle of the section table, instead of atthe end Mar 25 17:01:17 fray: I know I saw it in my logs a while ago but never reproduced Mar 25 17:01:19 I suspect this is the same kind of problem Mar 25 17:01:41 I now know I likely tried the wrong arch :/ Mar 25 17:02:24 what is interesting is only the buildid is wrong.. it happens that the crash comes after all of the other work that debugedit does.. the buildid seems one of the last steps.. Mar 25 17:02:32 it zero's it.. then calculates a new one (and crashes) Mar 25 17:02:51 fray: right, so the impact isn't that bad... Mar 25 17:03:25 fray: I also noticed we need to remove the debug sources list before we rerun the command or the file grows each time Mar 25 17:03:34 interesting Mar 25 17:03:47 fray: testing some things locally in that area Mar 25 17:04:15 that in turn causes other commands to break it would appear Mar 25 17:04:39 I'll have to throw this at the autobuilder to ensure we have all the silent errors fixed :/ Mar 25 17:05:10 no problem Mar 25 17:06:03 BTW the processdebugsrc -might- legitamitely come out non-zero.. I'm not sure.. Mar 25 17:06:12 there are cases where the referenced sources don't actually exist... Mar 25 17:06:31 fray: why wouldn't they exist? Mar 25 17:06:53 there are some internal references that look like filenames, but aren't.. Mar 25 17:06:57 things like "" Mar 25 17:07:21 BTW our elf guy suggested instead of checking or a data == 0, instead check if the section was loadable or not Mar 25 17:07:32 fray: I guess I'll soon find out... Mar 25 17:08:38 either way it should work.. but what we're doing now depends on specific behavior of libelf.. checking for loadable sections is "generic" Mar 25 17:08:55 I'd say run the autobuilder test.. I'll see if I can find the right way to check for this Mar 25 17:08:56 fray: right, sounds sensible at least... Mar 25 17:09:07 and then the final fix will likely be both.. :P Mar 25 17:09:20 fray: Yes, I'll probably merge this and then we can figure out the final fix Mar 25 17:09:33 whilst we figure out if there are any other issues in the meantime Mar 25 17:10:55 so I'm at least confident the values being generated are sane.. Mar 25 17:12:16 denix: Hi, can you check if icecc.bbclass (patch on ML) works correctly with your external-toolchain setup? Mar 25 17:12:34 denix: http://lists.linuxtogo.org/pipermail/openembedded-core/2013-March/037147.html Mar 25 17:13:20 fray: You're right about the source copying, will have to drop that check Mar 25 17:13:43 Any of them that mask the 2> /dev/null are expected to have failures Mar 25 17:13:56 JaMa: thanks, I'll see if I get some time... Mar 25 17:29:56 tracey needs a better mugshot: http://posscon.org/speaker/tracey-erway/ **** ENDING LOGGING AT Tue Mar 26 02:59:58 2013