**** BEGIN LOGGING AT Tue Nov 03 03:00:33 2020 Nov 03 09:28:36 Good Morning Nov 03 10:11:03 morning Nov 03 10:30:08 Morning! Nov 03 10:38:55 please don't trigger any builds on bonaire, I'm retesting the speed with test-oe-build-time Nov 03 10:57:31 ok ! Nov 03 10:59:15 Morning, JaMa: OK Nov 03 11:08:52 We "upgraded" the builder and he had a test from before. hoping to see good performance gains. Nov 03 11:14:20 ka6sox: What did you upgrade? Just curious? Nov 03 11:20:56 Seems from inital build it improved quite a bit in terms of build time Nov 03 11:44:25 ka6sox: since when has bonaire 20 threads available instead of 8? I've noticed it in september https://github.com/webOS-ports/jenkins-jobs/commit/82c896db12005fc02bc26175f7d76a714af249d3 and just checked that previous test-oe-build-time test was already with 20 threads, but it was definitely just 8 when I wrote jenkins-job.sh initially Nov 03 11:47:08 probably september :) Nov 03 11:47:44 I don't remember. Nov 03 11:48:04 has to be earlier, test-oe-build-time test on bonaire was in Mar 2020 Nov 03 11:48:49 it is 20 now right? Nov 03 11:48:54 right Nov 03 11:50:32 JaMa, if I gave you enough RAM to build in tmpfs would you do so? Nov 03 11:51:14 we're building in tmpfs since the beginning Nov 03 11:51:40 oh, I didn't know..i should look at the script. Nov 03 11:52:04 no we have 98G ram, 80G is the tmpfs mount Nov 03 11:52:07 now Nov 03 11:52:43 would more help you? Nov 03 11:54:05 not with performance, only with number of MACHINEs we can build in row, before cleanining the tmpfs Nov 03 11:54:30 now we can barely fit 2-3 when they have different architecture Nov 03 11:55:44 okay let me see about getting a few more GBs for you :) Nov 03 11:56:10 but that's not big deal as jenkins pipeline will just build them in reasonable chunks so it usually just works - only exception when someone manually sneaks in another build hoping it will still fit in running pipeline - usually me :) Nov 03 11:56:45 or when e.g. unstable builds still use 40G and someone triggers testing builds without cleaning manually triggered unstable builds first Nov 03 11:57:04 I believe that another project is no longer using the resources and I can give more RAM Nov 03 11:57:23 if they are unused, then sure :) Nov 03 11:58:02 who provided khem's ftbuilders? are those from comcast? Nov 03 12:01:31 ka6sox: what else was upgraded on bonaire other than ubuntu? Did you upgrade the hypervizor as well or something like that? I would like to see how big impact was the ubuntu upgrade itself (because that's really surprising - unless it's somehow related to better handling virtio stuff from host) Nov 03 12:02:22 it also might be that the other project wasn't using the physical box at the same time, so this VM is significantly faster, because it runs alone Nov 03 12:13:10 the diffeence is using a HVM vs PVM Instance Nov 03 12:13:19 we switched from PV to HVM Nov 03 12:13:40 and I ran tests on OE builders to prove what was giving you the boost. Nov 03 12:13:55 I am seeing 2X performance boost Nov 03 12:17:28 JaMa, are you done with your testing? Nov 03 12:18:01 ka6sox: no it will probably take rest of today Nov 03 12:18:20 okay please let me know when you are done Nov 03 12:18:26 I will add another 72GB of RAM Nov 03 12:24:33 interesting about PV to HVM looks like Amazon EC2 is switching to HVM now as well Nov 03 12:24:57 what is used on bonaire? Xen? Nov 03 12:24:58 yes, you forced us to switch by upgrading to 20.04 (known problem) Nov 03 12:25:01 yes Nov 03 12:25:26 but your report led us to test as well. Nov 03 12:25:32 aha, sorry wasn't aware of this, all the servers I've upgraded were HW without VM Nov 03 12:25:32 seeing 2X perfomance boost Nov 03 12:25:44 and glad that it was worth it in the end :) Nov 03 12:25:49 its all good...we needed to do that. Nov 03 12:26:02 well..a "do nothing" build for OE took an hour before. Nov 03 12:26:08 now 19 minutes Nov 03 12:26:54 we have been kinda dragging our feet..you helped push us over the edge and get it done. Nov 03 12:27:13 (so we could upgrade to 20.04 as well)( Nov 03 12:27:47 I do like how quiet it is at this hour...but hard to be up this early. Nov 03 12:28:18 Teaching Yocto "In Europe" :) Nov 03 12:29:25 JaMa, the DataCentre that servers are in are 2kms from Kart Racing in Fremont. Nov 03 12:32:26 I think I went to kart centrum only once since kids were born :/ Nov 03 12:32:55 wait till they are old enough...you can take them! Nov 03 12:33:03 wondering how much faster it would be on the same server running natively :) Nov 03 12:33:15 Theoretically 4% Nov 03 12:33:46 at LGE I was able to push against using VMs (as all slaves were managed by puppet, so easy to re-spin it whenever something goes bad Nov 03 12:34:17 ka6sox: yes maybe 4% now, but maybe 104% when PV was used, right? Nov 03 12:34:19 we are starting to roll out Ansible...so same thing. Nov 03 12:34:31 we had ansible before as well Nov 03 12:34:31 JaMa, agreed...too much overhead! Nov 03 12:35:24 I can see the difference in our builds on the Grafana. Nov 03 12:35:30 much more efficient. Nov 03 12:36:24 is it possible that something didn't really work correctly with PV? from what I've read now the difference shouldn't be so big Nov 03 12:36:46 it is VERY likely it was broken for a long time. Nov 03 12:36:58 we didn't expect what we saw. Nov 03 12:37:14 ok Nov 03 12:37:25 again, your report + changing Bonaire to 20.04 helped us to move forward and see this. Nov 03 12:38:04 we are looking at implementing IceCC as well for a test of OE workloads. Nov 03 12:38:30 lets see how it will compare with vSphere-VM-28t-55G in the benchmark now Nov 03 12:39:04 before this change bonaire was 3-4 times slower than this Nov 03 12:39:16 I need to benchmark mine sometime...I had the R5 1600 before Nov 03 12:39:45 R5900 will be released in 2 days, right :) Nov 03 12:40:00 that's what I read! Nov 03 12:40:29 I want to do more tweaking of builds so I got the 3900X to play with. Nov 03 12:43:13 I'll be happy to include the results from it as it's really good value (now I have results from my older 1600AF and then threadrippers, something reasonable between was missing for fair comparision) Nov 03 12:43:53 okay after I finish teaching this class this week I will run the tests and report Nov 03 12:44:06 send a pull request :) Nov 03 14:53:23 test-oe-build-time just failed on bonaire because of memory, do you want to reboot it now or should I restart the test? Nov 03 14:53:27 ka6sox: ^ Nov 03 15:04:29 can I add more memory before you restart? Nov 03 15:04:41 JaMa, ^^ Nov 03 15:04:47 ka6sox: yes Nov 03 15:04:57 if you're not in the class right now :) Nov 03 15:05:05 on a break Nov 03 15:10:02 okay you should have 160GB Nov 03 15:11:43 JaMa, if you don't see it I may need to reboot it Nov 03 15:13:31 JaMa, you do have 160GB Nov 03 15:26:34 cool, lets see if it passes the benchmark without oomk now :) Nov 03 15:28:45 running again Nov 03 16:24:55 bonaire hammered, but coping well Load average: 132.62 111.68 105.25 Nov 03 18:49:43 JaMa: the module issue seems specific to the modules themselves: I'm able to load a dunfell module with my gatesgarth kernel Nov 03 18:54:13 yes, if I brutally overwrite all the modules, luneos boots fine with all the modules there... Nov 03 18:55:20 also I debugged kmod, and it fails in the syscall itself, so it's a bit hard to say Nov 03 19:11:37 I might be way over my league, but in the working dunfell module the ".plt" ELF section is marked as "NOBITS", whereas in gatesgarth ".plt" is marked "PROGBITS"... I tried googling this, and it seems that it shouldn't change for a given arch... But again, I might just have misunderstood how it works. Nov 03 19:17:47 I tested on hammerhead and pinephone... did we actually test on another aarch64 ?... let me try tissot. Nov 03 20:26:14 this mentions the same error it seems https://marc.info/?l=linux-kernel&m=159715647514061&w=2 Nov 03 20:32:27 ng more reports about this issue, so let's just Nov 03 20:32:47 https://marc.info/?l=linux-kernel&m=159886722025614&w=2 has some work around Nov 03 20:33:11 and https://marc.info/?l=linux-kernel&m=159887061926470&w=2 better fix, checking what ended in the kernel in the end Nov 03 20:33:40 hey, I'm on the right track ! Nov 03 20:35:46 yes, that last patch seems a bit better Nov 03 20:36:05 all this is pretty recent, from this summer Nov 03 20:38:22 JaMa: seems they suggest dropping "IS_ENABLED(CONFIG_DYNAMIC_FTRACE) &&" as a minimal patch first Nov 03 20:39:14 yes, I'm updating my kernel checkout to see if this went in in the end of if they merged only the proper patch Nov 03 20:39:29 it's not dropped on the pinephone's, at least Nov 03 20:39:34 but it's slow because I've left it to update all remotes :) Nov 03 20:41:37 e0328feda79d9 arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE - from 5.9-rc4 Nov 03 20:42:14 then 596b0474d3d9b kbuild: preprocess module linker script Nov 03 20:42:47 from 5.10-rc1, but I guess the first simple change is enough to unblock you Nov 03 20:42:52 and more likely to apply cleanly Nov 03 20:43:26 I'm already rebuilding Nov 03 20:44:27 JaMa, how is it going? Nov 03 20:44:34 you should be able to drop reversed-disable-gold-linker.patch as well Nov 03 20:44:51 now kernel.bbclass should correctly use bfd everywhere (will check locally) Nov 03 20:45:10 ka6sox: still in the first build Nov 03 20:45:23 NOTE: Running task 7959 of 7999 Nov 03 20:45:26 but close Nov 03 20:45:33 okay Nov 03 20:45:44 designed to stress I see. Nov 03 20:45:59 although most of my builds are like that too now that we have converted. Nov 03 20:46:38 it took 11 hours before, now it's almost finished in ~ 5 Nov 03 20:47:22 I guess it might have helped a little bit :) Nov 03 20:48:53 do you mind if I instrument up Bonair with Netdata? Nov 03 20:50:57 it's fine with me Nov 03 20:51:39 just trying to see how close we are getting to OOMK Nov 03 20:52:05 surprising behavior on bonaire, firefox was usually built last (after waiting for rust-native to finish) Nov 03 20:52:21 but now firefox is done, but chromium-x11, qtwebengine, qtwebkit are still running Nov 03 20:52:50 I'll bet you that there are a 100 compile threads running right now Nov 03 20:53:00 either rust-native was surprisingly fast (it doesn't paralelize very well, so on very fast machines it takes forever compared to other components) Nov 03 20:53:05 all cc1plus Nov 03 20:53:23 yes, there are 105 Nov 03 20:53:45 I saw that too on my last build of meta-oe Nov 03 20:53:47 JaMa: thanks for the help, the patch works :) Nov 03 20:53:51 we're fine with memory since you added more today Nov 03 20:54:02 JaMa, good Nov 03 20:54:03 before we got 10+ OOMK in first 2 hours of the build Nov 03 20:54:22 Tofe, congrats! Nov 03 20:54:24 now, highest peek I've seen was about 3/4 (with around 60G in tmpfs) Nov 03 20:54:43 Tofe: it's also in my jansa/gatesgarth branch with the gold removal and indentation fix :) Nov 03 20:55:27 JaMa: ah, great, I'll just take it from there then, so far I basically modified the linux source by hand Nov 03 20:55:33 Tofe: but only build tested on my side, but as we don't use gold for kernel anymore I don't expect any change in runtime Nov 03 20:55:50 https://github.com/webOS-ports/meta-pine64-luneos/commit/bd6843a21408e1ee98be043b317b919902c94cce Nov 03 20:56:10 should I create PR or merge it or do you already have your own commit polished? :) Nov 03 20:56:46 you can merge it, I don't have anything yet on my side :) Nov 03 20:57:33 done Nov 03 20:57:40 thanks ! Nov 03 20:59:10 I wonder if 3.18.31 used on tissot has the same issue Nov 03 21:00:16 qtwebkit finished, last mile! 95.8G/157G Nov 03 21:01:35 excellent Nov 03 21:01:35 I'm flashing tissot right now Nov 03 21:02:36 abaco is building qemuarm_64 right now..123GB of RAM. Nov 03 21:02:43 LA is close to 160 Nov 03 21:02:56 LA? Nov 03 21:03:04 Load Average Nov 03 21:06:06 JaMa: same for tissot: "insmod: ERROR: could not insert module staging/prima/wlan.ko: Invalid module format" Nov 03 21:06:15 looks like we'll have to patch all the aarch64 kernels Nov 03 21:06:58 I'll look at that tomorrow Nov 03 21:15:44 Tofe: ok, checking if the same patch does apply for tissot Nov 03 21:16:45 JaMa, I expect to see essentially the same results as the dual-Xeon-E5-2670-8-channels Nov 03 21:17:04 since that is what Bonaire really is Nov 03 21:17:51 ka6sox: but that dual-Xeon-E5-2670-8-channels was running from HDDs while bonaire build is in tmpfs Nov 03 21:18:03 ah, different Nov 03 21:18:09 still same hW Nov 03 21:18:15 + tmpfs :) Nov 03 21:18:22 I think bonaire should be a bit faster, but looks like it won't be Nov 03 21:18:57 but 4-build-all test isn't as interesting for me as e.g. 5-build-8 Nov 03 21:19:23 because with bonaire hammered really bad it won't have enough bandwidth for tmpfs to make any difference Nov 03 21:19:39 kay Nov 03 21:20:20 let me disable builds for Bonaire in Jenkins..you will have to bring the node back online when we are done with testing Nov 03 21:26:19 the fix for tissot doesn't apply cleanly as whole arch/arm64/kernel/module-plts.c file was introduced only in v4.6-rc1 and we're on 3.18 kernel, still looking for the correct section where to change this Nov 03 21:34:18 qtwebengine done, only last horse is in the race now and that's chromium-x11 Nov 03 21:34:36 and looks like tissot kernel is broken by hardknott changes already hosttools/ld: scripts/dtc/dtc-parser.tab.o:(.bss+0x50): multiple definition of `yylloc'; scripts/dtc/dtc-lexer.lex.o:(.bss+0x0): first defined here Nov 03 23:27:30 first build on bonaire finished, 3 hours faster (8 instead of 11 hours) Nov 03 23:27:34 ==> 4-build-all-cores.log <== Nov 03 23:27:34 63.35user 13.15system 7:55:45elapsed 0%CPU (0avgtext+0avgdata 28552maxresident)k Nov 03 23:27:37 10976inputs+136outputs (2major+13755minor)pagefaults 0swaps Nov 03 23:27:40 ==> VM-20t-98G/4-build-all-cores.log <== Nov 03 23:27:42 71.57user 18.97system 10:58:48elapsed 0%CPU (0avgtext+0avgdata 29160maxresident)k Nov 03 23:27:45 7824inputs+8880outputs (3major+13942minor)pagefaults 0swaps **** ENDING LOGGING AT Wed Nov 04 02:59:57 2020