**** BEGIN LOGGING AT Thu Oct 17 02:59:57 2019 Oct 17 04:43:57 hello! Oct 17 04:47:29 I had my device out of memory recently, rpc was taking almost all the free memory. /proc/PID/status said the memory was being used as RssAnon (around 60Mb). Do we have a memory leak? How can I get more details about what is causing it? Oct 17 04:48:23 I think it happened 3 times already in the last 2 days Oct 17 04:56:51 last time I checked, tmpfs doesn't make much difference in the build time from scratch Oct 17 05:01:13 year old data, GCE n1-highcpu-64 57GB RAM 46min, n1-highcpu-96 86GB RAM 34min Oct 17 05:02:53 damn can't find the times on the sdd, but they were almos the same Oct 17 05:02:58 s/sdd/ssd/ Oct 17 05:10:23 > all these toolchain rebuilds are getting annoying Oct 17 05:10:23 sorry about that ;) Oct 17 05:10:23 ynezz: good morning Oct 17 05:10:23 ynezz: if I'd get a build device, what would I focus on? Oct 17 06:16:37 msgito: you mean rpcd? Oct 17 06:19:05 jow: what do you think of the luci weblate PR? is 40k lines per PR a bit much? Oct 17 06:20:18 aparcar[m]: if its only once due to the new files it is fine Oct 17 06:20:34 aparcar[m]: is it possible to have it signoff its commits? Oct 17 06:21:56 jow: yes I edit the templat that it contains a signed off by the translation author Oct 17 06:22:09 it is currently apache 2, is that correct? weblate somehow detected that automatically Oct 17 06:22:49 yep, apache 2 sounds right Oct 17 06:25:36 okay I'd merge this now and after that all further PR should be smaller Oct 17 06:26:09 aparcar[m]: don't forget about adding the new language codes (e.g. zh_hanc) to luci.mk Oct 17 06:26:21 otherwise these translations will not get packaged Oct 17 06:26:36 will do in an additional PR okay? Oct 17 06:26:41 yeah, sure Oct 17 06:27:10 and does every pr need a signed of line? that would mean I close the current weblate one and edit it manually Oct 17 06:28:38 jow: and please please look at my buildinfo prepare patch, that would finally create json files Oct 17 06:29:33 aparcar[m]: every commit within the pr Oct 17 06:29:38 otherwise the dco check bot is unhappy Oct 17 06:30:43 jow: okay will edit it manually Oct 17 06:36:25 jow: is the luci master branch protected so that no direct pushes are possible? Oct 17 06:53:35 aparcar[m]: direct pushes should be allowed Oct 17 06:53:48 it is protected against force-pushes Oct 17 06:54:12 jow: ack thanks Oct 17 07:21:50 jow: I had my device out of memory recently, rpcd was taking almost all the available memory. /proc/PID/status said the memory was being used as RssAnon (around 60Mb). Do we have a memory leak? How can I get more details about what is causing it? Oct 17 07:23:05 jow: why are there sometimes po/templates and po/en at the same time? aren't both always the same? Oct 17 07:23:18 msgito: there was a memory leak in some old rpcd verions. Can you tell me the exact version of rpcd installed? Oct 17 07:24:10 aparcar[m]: technically speaking, po templates are always empty while en translations are filled out Oct 17 07:24:36 the en translations are there for completeness and to potentially allow for overriding the source code message strings Oct 17 07:24:43 in practise they're redundant Oct 17 07:24:59 jow: so should we get rid of them? Oct 17 07:25:16 hm, probably makes sense, yeah Oct 17 07:25:24 let's see... Oct 17 07:25:29 unless we start with things like en_US vs. en_GB Oct 17 07:25:49 the current state likely is a mix of both Oct 17 07:27:29 jow: Package: rpcd Version: 2019-09-09-e2a7bc4c-1 Oct 17 07:28:34 jow: I guess we want do be able to have both, else it just oscillate Oct 17 07:29:05 msgito: hm, seems like you've discovered a yet-unknown leak then. Can you check "ubus call session list" to see if there's old session data piling up? Oct 17 07:29:16 msgito: if yes, then this is the cause. If not, you've found a leak Oct 17 07:34:32 jow: weblate can automatically create .mo files, is that of any interest simplifying the compilation of luci apps? Oct 17 07:35:01 no, as we do not use *.mo files Oct 17 07:35:22 LuCI uses a custom binary format, .lmo Oct 17 07:35:42 all right Oct 17 07:36:03 due to a wrong initial setup (not my fault) there is another massive PO PR - sorry Oct 17 07:36:52 Hi people I am using wsl 2 to build OpenWrt and I am having that bug agin. My build fails and is somthing to do with build enviroment path I will try and get a log in a min, but can anyone remember what it was be fore? Oct 17 07:38:23 This: Oct 17 07:38:24 find: The relative path 'Files/WindowsApps/CanonicalGroupLimited.UbuntuonWindows_1804.2019.521.0_x64__79rhkp1fndgsc' is Oct 17 07:38:24 included in the PATH environment variable, which is insecure in combination with the -execdir action of find. Please re Oct 17 07:38:24 move that entry from $PATH Oct 17 07:38:32 Tapper: spaces in the path coul be a problem Oct 17 07:38:48 and I guess rthe above is exactly that, it comes from "Program Files" I suppose Oct 17 07:38:57 jow: I don't see anything that may looks like old data. Here is what I am doing now... Restart rpcd and no login. it get stable around ~150kb. Once I log in (it jump to I think around 450kb,it have sense) it goes directly to my previous page, admin/network/wireless. Without doing anything on the browser, it starts to raise little by tittle, some Oct 17 07:38:58 times is just 4Kb in 5 seconds, some times is 50, or 150... touching, Oct 17 07:39:13 I fixt it be fore but cant remember who lol Oct 17 07:39:49 msgito: sounds like a genuine leak then. I guess it is caused by libiwinfo Oct 17 07:39:59 or rpcd-mod-iwinfo Oct 17 07:40:00 ubus call session list only show 2 sessions, the unauthenticated, and mine Oct 17 07:40:09 mmmm Oct 17 07:40:41 so if you are right, then there should no be memory leak if I don't ever get to open this pages? I am asking to test it right now... Oct 17 07:40:56 I guess so Oct 17 07:41:02 let's see Oct 17 07:41:05 the wireless overview page is a self refreshing one Oct 17 07:41:22 it will poll for new data every 5 seconds or so which will trigger, among others, various iwinfo operations Oct 17 07:41:44 could you test if e.g. the initial status overview page or the network interface overview pages trigger the same issue? Oct 17 07:43:09 going to try a few ones, the first flashops, that one doesn't collect any wireless data at all Oct 17 07:47:19 msgito: I can reproduce it Oct 17 07:48:36 interesting, we got memory leak :) Oct 17 07:48:43 yes :( Oct 17 07:49:11 I can see that some pages are stable, while others not. interfaces and wireless not Oct 17 07:49:36 also, when you open it, you can see that memory jump suddenly Oct 17 07:49:42 yeah Oct 17 07:49:48 and then incrmeentally Oct 17 07:49:58 how can we get more details about what is leaking? Oct 17 07:50:12 i really wish to be able to help on this and fix it :) Oct 17 07:50:20 will use valgrind under x86 Oct 17 07:50:46 I tried to use it directly on my device, but it stops Oct 17 07:50:46 its a bit annoying to set up and is not working well on mips afair Oct 17 07:50:54 it also requires debug builds of rpcd Oct 17 07:51:05 it's the first time ever I tried to use valgrind hahaha Oct 17 07:51:59 * russell-- is seeing a bunch of occasional SIGSEGV's on a WDR3600, from dmesg: http://sprunge.us/K6Nh4Q Oct 17 07:52:03 I see, pity, it would take me more time to do it that what we really desire... cause I need to learn and set it up the env :( Oct 17 07:52:47 I was looking for tools like pmap to get more memory details Oct 17 07:53:23 russell--: that does not look healthy at all Oct 17 07:56:12 * russell-- looks at his other wdr3600's to see if there are similar segv's Oct 17 07:56:23 russell--: gcc 8? Oct 17 07:57:29 ynezz: OpenWrt SNAPSHOT, r11185-f690b6f472 Oct 17 07:58:26 ath79? Oct 17 07:59:14 msgito: interestingly rpcd on 19.07 does not appear to leak Oct 17 07:59:15 yes Oct 17 07:59:29 jow: FYI https://ynezz.gitlab.io/-/rpcd/-/jobs/323420230/artifacts/build/scan/2019-10-17-075741-163-1/index.html Oct 17 08:00:09 ynezz: ah thanks, will check Oct 17 08:01:21 didn't have time to enable CI on iwinfo yet Oct 17 08:02:16 there appear to be false positives in the leak bug reports though Oct 17 08:02:41 the one complaining about leaked "args" in line 904 appears to be unable to follow the fork() logic Oct 17 08:02:46 indeed, it's possible, it's just scan-build from clang Oct 17 08:03:44 but yet its probably better to free() after the exec() instead of explaining why its not an issue Oct 17 08:03:51 we would need to add some unit tests and add valgrind to the mix (have it on my TODO) Oct 17 08:04:20 i'm not seeing this on other wdr3600's, maybe just a hardware problem Oct 17 08:05:06 I was testing something on ath79 and latest snapshot yesterday, didn't hit any issues (apart from the broken sysupgrade which is likely to be broken in 19.07 as well) Oct 17 08:06:22 uhm... broken sysupgrade? Oct 17 08:06:29 as in need to force or as in bricks device? Oct 17 08:07:48 you need to use force and so could brick your device Oct 17 08:08:53 I've reported it to rmilecki already, so maybe he has some fix in the works Oct 17 08:09:35 other thing... I need to change /usr/lib/lua/luci/sys.lua and the header.htm from the themes to add a hint to the web... this is related with a PR that I have on openwrt, but I am not sure how should I deal with this change, considered that they "are" part of the same change, but they are on different repositories Oct 17 08:09:58 I am talking about PR https://github.com/openwrt/openwrt/pull/2408 Oct 17 08:10:08 jow: https://paste.ubuntu.com/p/snDxc27mX3/ Oct 17 08:10:58 jow: and as validate_firmware_image is first called by sysupgrade first (strips the metadata) then the next metadata check in procd/upgraded fails Oct 17 08:11:09 ynezz: that looks like this sysupgrade bug where the check_image stzuff modifies the image Oct 17 08:11:29 but it was working just fine month or so ago Oct 17 08:12:24 or in other words, I had image based on 2019.06 master code, replaced it with 2019.10 image where it's borken Oct 17 08:14:45 I don't want to dig into this deeper as I expect complete rewrite :) Oct 17 08:15:01 I remember fixing a very similar bug in master a while back Oct 17 08:15:31 https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=9808bd279927bcd2d3a78d19a55229b93bbbcf05 Oct 17 08:16:05 I suppose the bug got reintroduced and nobody fixed fwtool Oct 17 08:16:17 so much for "all in a fine shape, why don't you just release" Oct 17 08:16:56 well, I was quite nervous about those new sysupgrade backports to 19.07 :) Oct 17 08:19:26 anyway it would still be rc0, so some breakage is expected Oct 17 08:20:28 breaking sysupgrade is not acceptable, even for rc0 Oct 17 08:20:36 8b4109c2b4d60495d046157d1baca9b1cdbf8dc8 seems to be the culprit Oct 17 08:20:43 it introduced fwtool -t *yet again* Oct 17 08:20:54 it was reverted twice already Oct 17 08:21:20 it's on ath79, I'm not using ar71xx Oct 17 08:21:29 but yeah, I've spotted that as well Oct 17 08:21:45 its also totally unrelated to rafals sysupgrade changes Oct 17 08:22:28 the commit above got cherry picked to 19.07 so it could be a likely candidate Oct 17 08:22:43 seems we/I need to touch fwtool before we release Oct 17 08:22:58 jow: thanks for finding that commit Oct 17 08:23:31 jow: i'm leaving for a weekend soon, I won't have much time until next week and I'm not really familiar with fwtool at all Oct 17 08:23:37 me neither Oct 17 08:23:49 but I guess we need to fix fwtool -t to not modify its argument Oct 17 08:23:56 will see if I can look at that later Oct 17 08:25:35 Jow: I need to change /usr/lib/lua/luci/sys.lua and the header.htm from the themes to add a hint to the web... this is related with a PR that I have on openwrt, but I am not sure how should I deal with this change, considered that they "are" part of the same change, but they are on different repositories... I am talking about PR Oct 17 08:25:36 https://github.com/openwrt/openwrt/pull/2408 Oct 17 08:27:47 jow: rmilecki: I'll retest with the images from snapshots Oct 17 08:30:53 looking at https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=8b4109c2b4d60495d046157d1baca9b1cdbf8dc8 again, it seems like we need an "fwtool -t" mode which writes the stripped image contents to stdout Oct 17 08:31:29 which already exists as "-T" Oct 17 08:32:31 so instead of `fwtool -q -t -i /dev/null "$1"; dd if="$1" ...` it likely should be `fwtool -q -T -i /dev/null "$1" | dd if="$1" ...` Oct 17 08:32:46 *dd without if= Oct 17 08:34:05 flashing 19.07 snapshot on my c7v5 Oct 17 08:38:11 19.07 sysupgrade over 19.07 works Oct 17 08:41:53 ar71xx 19.07 -> ath79 snapshot works as well Oct 17 08:44:14 ath79 snapshot -> ath79 snapshot works as well Oct 17 08:44:25 * ynezz wonders what is broken here :( Oct 17 08:47:12 do'h, I still have that https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=41770add03ad77a0ce41ed424ad050238f7d9272 locally Oct 17 08:48:25 rmilecki: jow: I'm sorry for the noise, sysupgrade is not broken in snapshot/19.07 as the issue is caused by my locally patched openwrt Oct 17 09:00:35 jow: that commit affects small amount of devices which use combined-image (I didn't test sysupgrade with luci), also -T would be fine if it wouldn't cut last 16 bytes for images without metedata (the bug I metioned previously. Oct 17 09:01:45 tmn505: would you be able to look into the 16byte bug? Oct 17 09:01:59 ... finally got valgrind-massif to work Oct 17 09:02:17 the key to tracing musl programs is --pages-as-heap=yes, without that it does not trace any allocations Oct 17 09:04:44 jow: I would if I would konw how to code, so no, that's out of my skillset. Oct 17 09:06:33 this seems like a good candidate to start some unit tests in CI, how to reproduce it? Oct 17 09:07:30 ynezz: do you mean me? Oct 17 09:08:00 fwtool -q -T -i /dev/null Oct 17 09:08:03 talking about that fwtool 16byte bug, mentioned over and over again Oct 17 09:08:07 jow: ^ Oct 17 09:09:05 fwtool -q -T -i /dev/null 'random_image_without_metadata' > 'striped_image' Oct 17 09:09:17 ynezz: ^ Oct 17 09:32:23 thanks Oct 17 09:42:25 msgito: I think I tracked down the leak Oct 17 10:06:57 Jow: excellent news! Oct 17 10:08:14 Jow: could you help me with my other question? Oct 17 10:20:36 Got disconnected Oct 17 10:20:43 Jow: I need to change /usr/lib/lua/luci/sys.lua and the header.htm from the themes to add a hint to the web... this is related with a PR that I have on openwrt, but I am not sure how should I deal with this change, considered that they "are" part of the same change, but they are on different repositories... I am talking about PR Oct 17 10:20:44 https://github.com/openwrt/openwrt/pull/2408 Oct 17 10:21:17 ynezz: there is a report in #openwrt of effectively a typo in the pll-data in qca9558_netgear_ex7300.dtsi, affecting both 19.07 and master. can you push a quick fix to both? Oct 17 10:24:13 msgito: you create a PR in the other repository and put links to the other's PR along with text something like "depends on.... " Oct 17 10:24:39 cool, thanks! Oct 17 10:26:29 DonkeyHotei: well I don't see a reason for any rush, please use standard patch submission channels Oct 17 10:27:08 OK Oct 17 10:27:30 speaking of fixes, there's one upstream for ar9331 as well Oct 17 10:27:51 what does it fix? Oct 17 10:31:28 interrupt reg size in dts IIRC Oct 17 10:32:12 sounds like it should be pushed also Oct 17 10:33:54 MIPS: dts: ar9331: fix interrupt-controller size Oct 17 10:34:55 0889d07f3e4b171c453b2aaf2b257f9074cdf624 Oct 17 10:36:23 but it seems like it was picked up by stable bot already https://lore.kernel.org/lkml/20191009170558.32517-13-sashal@kernel.org/ Oct 17 10:57:47 ynezz: https://github.com/openwrt/openwrt/pull/2498 Oct 17 10:58:42 DonkeyHotei: thanks, just a small nitpick,it should be vice versa, first patch master, then ask for backport to 19.07 Oct 17 11:06:32 does anyone know if it's possible to somehow tell GitHub (or some other external tool/service) to send me directly the diffs of the PRs to the email instead of this useless summary? Oct 17 11:17:29 ynezz: you can take the repo url + pr number + .diff and run curl -L https://github.com/openwrt/openwrt/pull/2492.diff Oct 17 11:19:22 but, but, but this is missing commit description Oct 17 11:19:38 than you can add .patch Oct 17 11:19:38 but, thanks, got an idea Oct 17 11:20:36 d'oh! it's even written in that PR email, down at the footer Oct 17 11:20:44 https://github.com/openwrt/openwrt/pull/2498.patch works Oct 17 11:21:19 it alsa works with multiple patches in a single MR Oct 17 11:21:25 curl foo|git am Oct 17 11:21:52 dhewg: I think you need curl -L Oct 17 11:22:04 well, my use case is, that I would like to see the content in my email, without actually doing any additional step Oct 17 11:22:36 but it's going to be just some script which would send me this as a reply to that PR email Oct 17 11:23:07 aparcar[m]: never done that for github, worked for me so far Oct 17 11:37:31 hmmm, that rpcd leak seems to be caused by ubus_handle_data() Oct 17 13:57:32 tmn505: jow: fwtool -q -T -i /dev/null && use that image || fwtool failed Oct 17 13:57:38 so it works for me Oct 17 13:58:24 ynezz: uhm, I thought the issue was that the output of "fwtool -T" is truncated Oct 17 13:58:45 by removing that -q you would see `Data not found` error, which means, that fwtool is unable to find the firmware metadata you're asking it to remove Oct 17 14:02:51 so yeah, it's truncated, but it can be truncated for other reasons as well Oct 17 14:03:22 uhm... no? Oct 17 14:03:29 I mena its not supposed to truncate stuff Oct 17 14:04:09 ynezz: did You compare size before and after? Oct 17 14:04:27 I did, I can see it's truncated Oct 17 14:04:36 weird Oct 17 14:04:47 but you're doing it wrong Oct 17 14:04:52 its very simple to reproduce Oct 17 14:05:07 root@mir3g:~# echo 12345678901234567890 > /tmp/test.img Oct 17 14:05:07 root@mir3g:~# fwtool -i /dev/null -T /tmp/test.img Oct 17 14:05:07 Data not found Oct 17 14:05:07 12345root@mir3g:~# Oct 17 14:05:26 there are several checks and you're not supposed to use the resulting image if the fwtool returns error exit status Oct 17 14:05:47 -> 16 byte truncated Oct 17 14:06:00 or it can be 51200 byte truncated Oct 17 14:06:00 but still it shouldn't truncate it Oct 17 14:06:12 ynezz: I don't follow your interpretation Oct 17 14:06:39 fwtool -i /dev/null -T /tmp/test.img && use_image || some_error Oct 17 14:06:49 I disagree Oct 17 14:07:15 -T: Output firmware image without extracted chunks to stdout (using -s, -i) Oct 17 14:07:35 if no chunks are extracted, it should output the entire firmware image Oct 17 14:07:50 the entire stdout operation mode makes no sense if its not idempotent Oct 17 14:08:02 how can you asure this? Oct 17 14:08:19 imagine memory error, or other syscall failure Oct 17 14:08:35 c'mon Oct 17 14:08:49 cat and dd are suspectible to "memory error or other syscall failure" too Oct 17 14:09:45 but you can't prevent those, that's life Oct 17 14:10:07 sure you can now go ahead and change any fwtool -T foo | blah to (fwtool -T foo || cat foo) | blah Oct 17 14:10:24 until the next PR will resurface doing exactly the same mistake Oct 17 14:11:50 idempotence is key and I strongly disagree with "fwtool -T is supposed to output the parts of the firmware image without metadata unless metadata is not present, then it will randomly return undefined chunks of data" Oct 17 14:12:48 its also not concruent with "fwtool -t" which will *not* do that Oct 17 14:14:46 root@mir3g:~# echo 12345678901234567890 > /tmp/test.img; fwtool -i /dev/null -t /tmp/test.img; wc -c /tmp/test.img Oct 17 14:14:49 Data not found Oct 17 14:14:52 21 /tmp/test.img Oct 17 14:14:54 root@mir3g:~# echo 12345678901234567890 > /tmp/test.img; fwtool -i /dev/null -T /tmp/test.img > /tmp/trunc.img; wc -c /tmp/trunc.img Oct 17 14:14:57 Data not found Oct 17 14:15:00 5 /tmp/trunc.img Oct 17 14:15:02 root@mir3g:~# Oct 17 14:15:44 well, it's using ftruncate(man ftruncate) under the hood so it can fail for whatever reason, so it would make sense to handle that reason and not blindly assuming, that fwtool returns always correct and flashable image Oct 17 14:16:15 note that we're talking about using "fwtool -T" in a *readonly* testing command sequence Oct 17 14:16:47 also given memory constraints in tmpfs, it is often not *possible* to create a temporary copy to workaround -T bugs Oct 17 14:18:04 ok, makes sense Oct 17 14:19:49 this should be then probably changed `fwtool -q -T -s /dev/null "$1" | ucert -V -m - -c "/tmp/sysupgrade.ucert" -P /etc/opkg/keys` in package/base-files/files/lib/upgrade/fwtool.sh Oct 17 14:20:20 I guess so Oct 17 14:20:30 I suppose it works fine with signature & cert and truncates without Oct 17 14:55:44 ynezz: the following change fixes the issue for me: http://sprunge.us/PRLi9V Oct 17 14:56:20 the problem is that the extract_tail() is modifying the data buffer which does not matter for "-t" mode since Oct 17 14:56:26 it does not print the buffers Oct 17 14:58:17 I'm stuck with unit tests :p Oct 17 14:58:19 metadata_keep is used here: https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/system/fwtool/src/fwtool.c#l383 to print what has been extracted from the buffer previously Oct 17 15:11:46 jow: no idea what you've fixed or how it affects me, but thanks for fixing it. Oct 17 15:13:08 makes me wonder if those are false positives https://ynezz.gitlab.io/-/openwrt-fwtool/-/jobs/323895602/artifacts/build/scan/2019-10-17-142430-143-1/index.html Oct 17 15:15:37 ynezz: well the garbage ones look correct to me if one substitutes garbage with "untrusted external input" Oct 17 15:16:09 which is kind of expected here since we're casting buffer data from fread() to structs and then access members of these structs Oct 17 15:16:37 at least this is what I understand is the intention of the reported logic error bug Oct 17 15:19:45 ynezz: actually not a false positive Oct 17 15:21:56 if I see it correctly then a file which is an exact multiple of 31744 bytes plus a valid 16 byte firmware image trailer will trigger undefined behaviour as "buf" will point to uninitialized memory then Oct 17 15:23:15 luckily "validate_metadata" simply returns 0/-1 depending on the value of the version member Oct 17 15:23:29 so it has no security implications Oct 17 15:23:34 at least as far as I can see Oct 17 20:30:06 jow: could you please update the buildbots to only run "buildinfo" instead of "prepare" as a last step? https://patchwork.ozlabs.org/patch/1175893/ Oct 17 22:48:33 I would call it a day https://gitlab.com/ynezz/openwrt-fwtool/blob/master/tests/shunit2/main.sh **** ENDING LOGGING AT Fri Oct 18 02:59:58 2019