**** BEGIN LOGGING AT Tue Jun 30 02:59:57 2020 Jun 30 06:04:57 Morning! Jun 30 06:09:56 Morning! Jun 30 06:10:42 Tofe: flashed Tenderloin with new LXC image also stuck on pulsing LuneOS logo, so I suspect it's something with 3.4 kernels Jun 30 06:11:02 I will check logs and lxc-checkconfig a bit later for any clues Jun 30 06:11:13 On Hammerhead seemed container was running though Jun 30 06:20:59 Herrie: it could be that for some reason we never get to the "android startup finished" step Jun 30 06:27:43 Tofe: Well yes that's an option I'll debug a bit after coffee and breakfast Jun 30 06:59:45 This is lxc-checkconfig: Some items are missing, but they worked with previous LXC still: https://paste.ubuntu.com/p/wHpSPXqTmb/ Jun 30 07:03:04 Let me try to simply enable those values and see if kernel at least compiles :P Jun 30 07:13:28 Which is usually the biggest challenge on 3.4 with "exotic" flags :P Jun 30 07:23:36 I don't see any big issue in the config checker either Jun 30 07:23:58 Herrie: do you have the lxc log for our android container? Jun 30 07:25:17 Tofe: https://paste.ubuntu.com/p/X56q52hc8r/ Jun 30 07:25:27 Well kernel compiles with the additional flags Jun 30 07:25:32 lxc-start android 20200630055653.526 ERROR utils - utils.c:lxc_can_use_pidfd:1834 - Kernel does not support pidfds Jun 30 07:25:45 And lxc-start android 20200630055653.608 ERROR cgfsng - cgfsng.c:cg_hybrid_get_controllers:657 - Found hierarchy not under /sys/fs/cgroup: "/dev/cpuctl rw,nosuid,nodev,noexec,relatime shared:23 - cgroup cgroup rw,cpuacct,cpu Jun 30 07:25:54 That's probably not helping Jun 30 07:25:58 Let me try the new kernel Jun 30 07:27:30 I guess we need pidfds Jun 30 07:31:08 But which flag provides it, because we have CONFIG_PID_NS Jun 30 07:31:19 Only PID one I have in Mido that isn't in Tenderloin is CONFIG_PROC_PID_CPUSET Jun 30 07:31:36 I'm not really sure; but maybe it's not critical ? Jun 30 07:32:09 Booting new kernel now, let's see if it improves things Jun 30 07:33:04 Nope still the same Jun 30 07:36:29 It seems somehow lunasysmgr fails Jun 30 07:36:50 https://paste.ubuntu.com/p/26VN8DJSHH/ Jun 30 07:37:28 Jun 30 07:33:23 tenderloin systemd[1]: luna-sysmgr.service: Main process exited, code=killed, status=11/SEGV Jun 30 07:38:38 Install the dbg and load up gdb? Jun 30 07:43:25 https://paste.ubuntu.com/p/Xy9f8hyXsr/ Jun 30 07:43:34 Seems something Qt related somehow Jun 30 08:17:56 in init ? that's weird. Jun 30 08:18:36 can you install qtbase-dbg ?... it's a bit of a pain, but we'll have the exact pointer Jun 30 08:47:49 Tofe: sure will do in a bit Jun 30 08:51:00 Tofe: In LXC log the only difference seems to be lxc-start android 20200630073314.124 WARN start - start.c:lxc_try_preserve_ns:125 - No such file or directory - Kernel does not support preserving mnt namespaces Jun 30 08:51:01 lxc-start android 20200630073314.124 WARN start - start.c:lxc_try_preserve_ns:125 - No such file or directory - Kernel does not support preserving pid namespaces Jun 30 08:51:14 But it are warnings these shouldn't be critical really Jun 30 08:51:26 My Tenderloin also has: lxc-start android 20200630073343.172 WARN cgfsng - cgfsng.c:get_hierarchy:172 - There is no useable blkio controller Jun 30 08:51:34 Which doesn't appear on mido Jun 30 09:15:02 Tofe: GDB with qtbase_dbg: Jun 30 09:15:03 https://paste.ubuntu.com/p/B5HgBPNtxM/ Jun 30 09:43:03 Herrie: that's a really weird crash there Jun 30 09:44:00 it crashes in " inline QObject *parent() const { return d_ptr->parent; } ", so I guess d_ptr is invalid... Jun 30 09:45:42 Herrie: are you sure of your build? Jun 30 09:46:13 Because I also see "warning: the debug information found in "/usr/lib/.debug/libQt5Widgets.so.5.15.0" does not match "/usr/lib/libQt5Widgets.so.5" (CRC mismatch).", which shouldn't happen if everything is fine Jun 30 10:02:11 Tofe: well it's testing with just LXC updated or at least it should be... Jun 30 10:02:25 I can redo it though just to be sure Jun 30 10:02:52 I built Hammerhead then Mido then Tenderloin Jun 30 10:02:58 From same build Jun 30 11:08:35 Tofe: Will remove tmp-glibc and redo from fresh meta-webos-ports and meta-smartphone Jun 30 11:27:37 ok Jun 30 12:51:53 OK busy flashing new image Jun 30 12:52:01 Keeping the kernel changes should be fine I guess? Jun 30 12:52:16 They don't hurt to be there anyway for LXC Jun 30 12:52:40 https://github.com/shr-distribution/linux/commit/99a587c55ffd8a5047ca6454f60f62d0df1f3779 Jun 30 12:53:35 yes, that should be fine Jun 30 12:56:20 Tofe: For the mnt namespaces there are some links here that might be useful in case we'd want to backport: https://github.com/topjohnwu/Magisk/issues/209 Jun 30 13:13:39 Seems the same Jun 30 13:13:42 LEt me pull some logs Jun 30 13:15:02 LXC: https://paste.ubuntu.com/p/mZsgMjVkNf/ Jun 30 13:22:03 Seems logcat might give some clues? Jun 30 13:23:46 https://paste.ubuntu.com/p/WbJnckMZbW/ Jun 30 13:30:41 Let me try Tenderloin logcat from working LXC maybe Jun 30 13:33:03 See if there's any obvious differences Jun 30 13:33:06 Also LXC log Jun 30 13:41:06 I don't see anything really critical, maybe the "There is no useable blkio controller", but I'm not sure what it is Jun 30 13:56:16 Interesting, latest testing image also fails but with my new kernel Jun 30 13:56:22 Let me try with the accompanying kernel Jun 30 14:02:33 I'll just build currnet testing, seems image on server is 3 weeks old already Jun 30 15:10:32 OK back from picking up son from school, new image ready to flash. Let's see... Jun 30 15:11:02 Let's first try current testing with kernel, then current testing with updated kernel defconfig and see if they behave the same Jun 30 15:15:22 Herrie: I can also test your hammerhead image? Maybe I can spot something in adb Jun 30 15:15:55 and my hammerhead is just taking dust these days, I don't have time to play with the mainline variant Jun 30 15:16:07 Tofe: Well let me try this first Jun 30 15:16:13 Then I'll upload the Hammerhead ZIP Jun 30 15:16:26 I switched the m.2 SSD in my router, so need to re-setup things there a bit it seems Jun 30 15:16:36 Put a 2TB instead of 1TB there now :P Jun 30 15:17:25 There was a very good deal on Amazon for it ;) Jun 30 15:17:58 2TB for 175 or so ;) Jun 30 15:18:20 Tofe: Ah seems I recreated the same paths, so things should still work Jun 30 15:18:25 I put the hammerhead image there Jun 30 15:19:14 Herrie: ok, I hope it'll fit ! ;) Jun 30 15:19:44 That what fits? Jun 30 15:19:53 the image, on your SSD Jun 30 15:20:07 Yes LOL Jun 30 15:20:11 I put it already Jun 30 15:20:19 ok, let me fetch that one Jun 30 15:20:19 I sitll need to copy everything from the old one Jun 30 15:20:30 That's why it's very empty fo rthe moment Jun 30 15:20:51 Hammerhead has some things lxc-config complains about (but lxc 2.0.8 worked with it anyway) Jun 30 15:21:00 Fixes probably similar as for Tenderloin Jun 30 15:21:42 ok Jun 30 15:21:50 otherwise, it's a classic testing build ? Jun 30 15:23:05 Tofe: Yeah it should be really Jun 30 15:23:16 ok Jun 30 15:23:19 I deleted my meta-wop, meta-smartphone and refetched the testing layers Jun 30 15:23:38 Only changes should be the lxc-config in meta-smartphone and the lxc update in meta-wop Jun 30 15:23:49 luna-sysmgr still crashes? Jun 30 15:24:39 or is that just on tenderloin Jun 30 15:24:45 Tofe: Yeah that's what it did for me on Hammerhead too it seems Jun 30 15:24:50 Just get quick pulsing luneos logo Jun 30 15:25:01 lxc-info told me container was running Jun 30 15:28:12 ok Jun 30 15:28:33 Once TP is done unpacking image can hook up my Hammerhead again too :P Jun 30 15:28:34 I quite don't see why luna-sysmgr would be crashing when we upgrade lxc... Jun 30 15:28:41 Yeah it's weird Jun 30 15:29:22 Well luna-appmanager also crashed, but I think that's because of luna-sysmgr Jun 30 15:29:26 Seems luna-sysmgr does first Jun 30 15:29:35 And it's weird it's OK on Mido Jun 30 15:31:30 Also ADB is flaky on my Hammerhead, which isn't helping Jun 30 15:32:30 Not sure it's my specific Hammerhead or our builds Jun 30 15:33:29 Hmmz my tenderloin testing build is also stuck with the same Jun 30 15:33:51 So maybe JaMa somehow broke something in meta-qt5 Jun 30 15:36:19 image flashed, let's go Jun 30 15:38:11 With LXC 2.0.8 also lunasysmgr segv Jun 30 15:38:27 " ERROR: Library '/vendor/lib/egl/libGLESv2S3D_adreno.so" is that normal, in the end? Jun 30 15:39:07 Tofe: Well it shouldn't be there, can be solved with a simple symlink Jun 30 15:39:21 Ideally we fix this in Halium build, but it shouldn't be critical from what I read Jun 30 15:39:30 That one also got my initial attention Jun 30 15:39:58 UBPorts had similar issue: https://github.com/ubports/ubuntu-touch/issues/352 Jun 30 15:40:44 "ln -s /vendor/lib/egl/libGLESv2_adreno.so /vendor/lib/egl/libGLESv2S3D_adreno.so" Jun 30 15:41:31 but here we can't do that, it's a mounted image Jun 30 15:41:38 or can we Jun 30 15:41:57 maybe if I remount rw... Anyway. The issue seems to be luna-sysmgr. Jun 30 15:42:05 Tofe: Not so easily due to read-only Jun 30 15:42:16 LEt me find the bits in Halium and fix those right away before we forget Jun 30 15:42:57 "QWarning: WARNING: QApplication was not created in the main() thread." --> ??? we only have one thread at that point... Jun 30 15:43:59 how big is qtbase-dbg for hammerhead ?... Jun 30 15:44:07 I could also download that one Jun 30 15:44:58 We updated libhybris too, isn't it? Jun 30 15:45:32 Let me checlk Jun 30 15:47:05 libhybris not yet was just testing it Jun 30 15:47:31 The qtbase-dbg_5.15.0+git0+f6fe4bbab7-r0.2_cortexa8t2hf-neon-halium.ipk is about 250MB Jun 30 15:47:57 ok, that's fine, I'll download it. Jun 30 15:48:11 Let me put it for you... I assume you want luna-sysmgr one too? Jun 30 15:48:31 what we can try it to move https://github.com/webOS-ports/luna-sysmgr/blob/webOS-ports/master/Src/Main.cpp#L749 more to the beginning of the main() function, i.e. around line 690 Jun 30 15:48:52 Herrie: yes please :) Jun 30 15:50:52 Both are there Jun 30 15:50:58 Tofe: Well I can try that at my side Jun 30 15:51:18 thanks Jun 30 15:51:37 That's the only thing I have to propose so far Jun 30 15:52:13 It's a bit weird this would break all of the sudden though Jun 30 15:54:53 I agree... Though we never tested Qt 5.15 on hammerhead, or ?... Jun 30 15:55:52 ah but the crash is for all your device tests Jun 30 15:56:18 well, we'll see. I'm loading the dbg files. Jun 30 15:56:25 adbs Jun 30 15:56:29 ooops Jun 30 15:56:36 Tofe: I thought I did with the LS2 upgrade, but they were happening about the same time Jun 30 15:56:45 So could be somehow I tested it with 5.14 still Jun 30 15:57:00 I tested LS2 on Tenderloin for sure Jun 30 15:57:18 5.15 is pretty recent, all considered Jun 30 15:57:29 Only other change it could be somehow is the Android headers change Jun 30 15:57:59 I didn't follow, but did you bump libhybris already ? Jun 30 15:58:50 No Jun 30 15:58:53 ok Jun 30 15:58:53 I didn't Jun 30 15:59:02 But JaMa reworked the android-headers a bit Jun 30 16:00:11 https://github.com/shr-distribution/meta-smartphone/commits/zeus Jun 30 16:00:15 I've had build issues with android-headers in the past, when successively building tenderloin and then hammerhead Jun 30 16:00:27 but it didn't build, so it was quite clear Jun 30 16:01:58 Tofe: Yeah that's what I would also expet Jun 30 16:02:18 And my pulsing logo rotates, so that means that stuff seems fairly OK ;) Jun 30 16:03:02 Tofe: The change to luna-sysmgr didn't help by the way Jun 30 16:03:53 https://github.com/shr-distribution/meta-smartphone/commit/d8bfbae3dbc4806abc250814e7ff810e17659cbc does that work for 5.1, where the branch was "master" ?... Jun 30 16:04:59 ah, 5.1 branch exists also and is a copy of master. good then. Jun 30 16:05:08 Tofe: Yeah I created the 5.1 Jun 30 16:05:14 So it could be simplified a bit ;) Jun 30 16:05:25 Since bshah gave me full access in Halium LOL Jun 30 16:05:35 :) I'm also admin up there Jun 30 16:05:38 Having master and all others with version didn't make much sense to me ;) Jun 30 16:05:44 (I think) Jun 30 16:05:52 Tofe: Yes think so too Jun 30 16:06:01 yes, there shouldn't be any "master" at all Jun 30 16:07:48 Yes, but well... That's how things start out usually ;) Jun 30 16:08:20 https://bpa.st/AT4A corrupted stack already ?... Jun 30 16:11:27 ah, no, I just had to restart gdb to load luna-sysmgr symbols Jun 30 16:31:48 Tofe: only other recent change I can think of that might influence luna-sysmgr is the nyx-modules-hybris bump Jun 30 16:32:00 But that wasn't rocket science really Jun 30 16:33:39 it actually crashes near https://github.com/qt/qtbase/blob/5.15/src/corelib/kernel/qcoreapplication.cpp#L837 Jun 30 16:33:48 I'm still investigating... Jun 30 16:49:50 Tofe: Blame gives https://github.com/qt/qtbase/commit/782df5b41dd3ab098fd1d3233339079487e1812f Jun 30 16:52:31 the steps I have with the debugger don't make much sense; I've put a breakpoint in the constructor of QAbstractEventDispatcher and in QThreadPool::globalInstance, but I never go there... How can the eventDispatcher already exist, in that case ?... Jun 30 16:54:16 most intriguing is the fact that we only have one thread, so the warning is very curious Jun 30 16:55:06 it could be that a curious bug has been introduced with that commit, yes Jun 30 17:16:10 Herrie: there is an early call to the qtsensor framework... can you provide me qtsensors-dbg ? Jun 30 17:22:28 ... and it looks like it's created by a nasty static global object, even before main() is called Jun 30 17:22:52 Tofe: dinner give me 5-10 mins Jun 30 17:23:03 take your time :) Jun 30 17:23:09 I also saw that luna-sysmgr seems to force itself on cpu0 Jun 30 17:23:35 Not sure that might be related? Jun 30 17:24:10 not necessarily Jun 30 17:24:51 but here we have some Qt objects that are created and used, even before any QApplication is created. That isn't a supported scenario in Qt, so it's a good lead Jun 30 17:28:04 Tofe: qtsensors-sensorfw-plugin-dbg_5.14.1+git0+9414e7e355-r0.1_hammerhead.ipk ? Jun 30 17:28:14 Or sensorfw-dbg_0.11.4+git0+4f97982dd9-r0.1_hammerhead.ipk ? Jun 30 17:28:16 Or you want both? Jun 30 17:28:26 mmh there's nothing else for qtsensors ? Jun 30 17:29:05 libqt5sensors-dbg Jun 30 17:29:21 put all 3 of them Jun 30 17:30:16 Found all 3 there Jun 30 17:30:31 Weird this shows up now, but well Jun 30 17:30:39 Could be the 5.15 changes somehow triggered it Jun 30 17:30:44 Weird we didn't see it before Jun 30 17:30:49 thanks Jun 30 17:30:53 Could be it's somehow only showing on the 3.4 kernels Jun 30 17:31:50 I'll redo Mido quickly as well now, just to make sure :S Jun 30 17:33:00 if it's sensor related, it could be device specific Jun 30 17:33:42 let's now see if I get the full stack... Jun 30 17:37:55 Well I have all -dbg here if needed :P Jun 30 17:38:23 https://github.com/qt/qtsensors/blob/5.15/src/sensors/qsensormanager.cpp#L56 this is built as soon at the .so of qtsensors is loaded Jun 30 17:39:26 it it's nothing new at all :/ Jun 30 17:50:00 Just flashing mido now to check Jun 30 17:50:48 mido boots OK Jun 30 17:51:47 Should I just try a libhybris bump on Tenderloin or Hammerhead to see if it changes anything? Jun 30 18:00:24 Well that image is ready, let me flash it just to exclude libhybris possible causes Jun 30 18:00:38 We were on quite an old version so things might have gotten "fixed" upstream Jun 30 18:02:53 Easy enough to build & test Jun 30 18:06:06 With latest libhybris it boots.... Jun 30 18:06:14 On Hammerhead Jun 30 18:06:48 Let me do the LXC changes as well then Jun 30 18:06:56 no crash ? good Jun 30 18:07:17 All seem to work with LXC 2.0.8 and latest libhybris Jun 30 18:07:21 I get UI, sound, rotation Jun 30 18:08:06 great Jun 30 18:08:31 But no firstuse Jun 30 18:08:36 Let me reflash just to be sure Jun 30 18:15:08 OK boots, sound, UI, rotation but no FirstUse Jun 30 18:16:46 And my ADB is flaky so not easy to debug Jun 30 18:16:54 I guess I'll try tenderloin after I put kids to bed Jun 30 18:17:26 Tofe: Hammerhead with newer libhybris and lxc 208 on FTP Jun 30 18:17:32 I'll also make one with lxc 403 now Jun 30 18:22:42 Tofe: Both on FTP Jun 30 18:29:08 LXC 4.0.3 image seems to work on Hammerhead for me too.. Still no ADB and FirstUse though Jun 30 19:32:14 Tofe: This is Tenderloin journal with kernel defconfig changes, latest libhybris & lxc 4.0.3: https://paste.ubuntu.com/p/j8BGPgKTBv/ Jun 30 19:33:12 FirstUse probably doesn't show due to: Jun 30 19:30:20 tenderloin LunaAppManager[1133]: terminate called after throwing an instance of 'std::bad_alloc' Jun 30 19:33:12 Jun 30 19:30:20 tenderloin LunaAppManager[1133]: what(): std::bad_alloc Jun 30 19:59:54 damn, what's with all these weird issues... Jun 30 20:05:00 Tofe: Well status is slightly better Jun 30 20:05:08 Only luna-appmanager Jun 30 20:08:46 Luna-sysmgr behaves now it seems Jun 30 20:12:40 How would I go about debugging this? Luna-appmanager dbg and gdb as well? Jun 30 20:14:18 Ah this might help? https://github.com/qt/qtbase/commit/782df5b41dd3ab098fd1d3233339079487e1812f Jun 30 20:14:36 Errm https://agateau.com/2009/tip-of-the-day-finding-the-source-of-a-bad_alloc-exception/ Jun 30 20:17:48 Tofe: well mido works OK so I suspect something in 3.4 kernel Jun 30 20:18:08 But could be something armv7 vs aarch64 **** ENDING LOGGING AT Wed Jul 01 03:01:14 2020