**** BEGIN LOGGING AT Mon May 25 02:59:57 2020 May 25 05:59:13 jow: with your diff procd doesn't start properly, it hangs at: May 25 05:59:15 [ 7.180510] procd: Not starting instance ubus::instance1, command not set May 25 06:23:07 jow: switching from blob_len(attr) to blob_pad_len(attr) in the blobmsg_check_array() doesn't change a thing - both return 36 in my case May 25 07:03:32 jow: what if we use __blobmsg_for_each_attr() BUT we pass *calculated* data length to it? https://pastebin.com/3gj5NCvy May 25 07:03:34 it works for me May 25 07:04:14 s/data length/remaining blob length that should be data length/ ? May 25 07:32:09 blogic: not yet, maybe today/tomorrow May 25 07:32:20 win 37 May 25 07:34:57 ynezz: cool May 25 08:32:44 jow: i sent patch fwiw May 25 08:58:59 looks like my EdgeRouter Lite 's link started flapping once I enabled EEE on the switch May 25 09:06:25 rmilecki: okay, so if Felix thinks its okay then it probably is... May 25 09:06:44 rmilecki: and now that we successfully papered over the problem we'll never find out why it was broken i nthe first place :D May 25 09:06:51 jow: i'm looking at failing test ynezz pointed May 25 09:07:11 https://gitlab.com/openwrt/project/libubox/-/jobs/565368042 May 25 09:07:44 is it some invalid memory read by blobmsg_check_array_len() ? May 25 09:16:33 ynezz: how can I see what exactly has failed? https://pastebin.com/hW2Uxrhs May 25 09:18:07 http://sprunge.us/rYnett May 25 09:21:58 make test CTEST_OUTPUT_ON_FAILURE=1 May 25 09:22:07 for ctest output May 25 09:22:21 perfect, that's what I was looking for :) May 25 09:22:30 ah, right, /bin/sh: line 4: valgrind: command not found May 25 09:25:40 ynezz: it seems my system produces different "Aborted" message format May 25 09:25:41 https://pastebin.com/Ag1SVuLH May 25 09:25:51 "Aborted (core dumped)" vs. "/bin/sh: line 10: 4205 Aborted (core dumped) test-b64_decode 2> output.log" May 25 09:26:50 ynezz: how can I run "fuzz" tests using "make test"? May 25 09:30:17 cd build/tests/fuzz; make test May 25 09:30:37 ynezz: what happened to the layerscape 4.14 removal patch? did you just forget to update your staging tree online or has it been lost somewhere? May 25 09:31:39 rmilecki: or just run `./build/tests/fuzz/test-fuzz tests/fuzz/corpus` May 25 09:32:10 rmilecki: fuzzer-enabled-binary [optional path to the corpus input] May 25 09:33:01 rmilecki: could you please write down what was unclear to you during this CI journey? It would help me to write better documentation for this, thanks :) May 25 09:33:15 i got tests compiled without "fuzz" :| it seems "fuzz" get compiled only IF(CMAKE_C_COMPILER_ID STREQUAL "Clang") May 25 09:33:25 indeed May 25 09:34:24 you can use the same docker image (huge) as used on CI https://gitlab.com/ynezz/openwrt-ci/#usage-example (2nd example) May 25 09:34:50 otherwise you need to install clang 9 with libfuzzer May 25 09:36:05 err, 1st example, just replace `ci-native-scan-build` with `ci-native-checks` May 25 09:42:59 adrianschmutzler: yeah, I just forget to update my staging tree, feel free to push, I'll just rebase later, no problem May 25 09:46:16 okay, I'll push then. I just became aware because somebody sent a patch for 4.14 today, so he can keep it to 5.4 then. May 25 09:51:45 ok ,I got it cmake -D UNIT_TESTING=1 -D CMAKE_C_COMPILER=clang ../ May 25 09:52:05 build #127 of bcm47xx/generic is complete: Failure [failed] Build details are at http://buildbot.openwrt.org/master/images/builders/bcm47xx%2Fgeneric/builds/127 blamelist: Yangbo Lu May 25 09:59:23 huh, that is time consuming, I tried fuzz tests without my libubox patch May 25 09:59:24 3/3 Test #3: test-fuzz ........................ Passed 301.26 sec May 25 10:04:19 well, this is just 5 minutes for CI, which depends heavily on fuzzing corpus May 25 10:05:11 ideally this should be run for hours, so far I run it manually on 96 vCPU GCE from time to time May 25 10:05:30 adding other fuzzers would help as well May 25 10:10:23 good thing is I can reproduce that test failing locally now :) May 25 10:11:16 it seems that the blobmsg_check_attr_len() call is not enough to validate the payload length May 25 10:11:47 seems the crash is triggered by a blob header not having the minimum required length (>= sizeof(uint32_t)) May 25 10:16:17 fwiw fuzz test failure gets fixed by [PATCH RFC libubox] blobmsg: another attrs iteration fix for blobmsg_check_array_len() May 25 10:27:02 rmilecki: it looks very hacky though May 25 10:27:10 and I May 25 10:27:39 'm confused about the cause May 25 10:28:26 did blobmsg_check_array_len() always overread and it just occasionally blew up when an array at the end of a buffer was checked? Are we dealing with actually truncated or corrupted data and if so, where does it come from? May 25 10:37:04 Hi people. Hows it going? May 25 10:47:32 forward :) May 25 11:15:23 jow: take a look at that original report: http://lists.infradead.org/pipermail/openwrt-devel/2020-May/023514.html May 25 11:15:24 for that JSON part { "core": "unlimited" } blobmsg_check_array_len() started getting length 36 instead of 24 May 25 11:15:26 36 length includes header May 25 11:15:27 __blobmsg_for_each_attr should iterate over 24 B of data, if it reads *extra* 12 B *after* data it will read some random memory May 25 11:17:08 rmilecki: yeah but thats not an isolated testcase May 25 11:17:28 jow: what do you mean by isolated? May 25 11:17:36 `{ "core": "unlimited" }` is one of a dozen or so child attributes of a larger blob message May 25 11:17:52 ok May 25 11:18:22 ah, you're curious why other attr-s don't trigger the same bug? May 25 11:18:41 so if we'd construct a blobmsg solely consisting of "core": "unlimited" and checking this using blobmsg_check_array_len(), would it overread too? May 25 11:18:45 yes May 25 11:20:05 rmilecki: I mean if blobmsg_check_array_len() is always overreading, why isn't crashing random stuff left and right all the time? May 25 11:20:56 or is it just overreading one-element arrays and simply skipping the last element of multi-element-array maybe? May 25 11:22:12 actually I can't find any user for it in LXR May 25 11:22:33 blobmsg_check_array_len() seems to be used by blobmsg_check_attr_list_len() but I cannot find any user of that one May 25 11:22:46 apart from the fuzzing tests May 25 11:24:32 ah nvm, blobmsg_check_array(), not blobmsg_check_array_len() May 25 11:24:36 sigh May 25 11:26:13 okay, so there's no known users for the *_len() variant and one single user for the non-len blobmsg_check_array() variant and that happens to be procd's instance.c May 25 11:27:32 it seems so, I'm checking that JSON now May 25 11:27:39 that one uses blobmsg_check_array() quite frequently for whole bunch of attrs: https://lxr.openwrt.org/source/procd/service/instance.c#L1149 May 25 11:28:00 I wan't to understand the whole picture May 25 11:28:38 my assumption is that the crash only happens if one of INSTANCE_ATTR_{ENV,NETDEV,FILE,LIMITS,ERROR} is the last child element of a service blobmsg May 25 11:30:20 jow: https://pastebin.com/PUuqu161 May 25 11:31:09 that JSON doesn't have ATTR_ENV, ATTR_NETDEV, ATTR_FILE or ATTR_ERROR May 25 11:31:27 so there is only 1 instance_fill_array() and so only 1 blobmsg_check_attr_list() call May 25 11:31:37 any other call would probably fail as well May 25 11:35:14 okay, so blobmsg_check_array_len() receives an overall length which is too large, right? May 25 11:35:23 right May 25 11:35:44 it is invocated through https://lxr.openwrt.org/source/libubox/blobmsg.c#L115 May 25 11:36:34 shouldn't the blob_len() expression get fixed there instead? May 25 11:36:55 it has to be blob length that gets passed May 25 11:37:10 as blobmsg_check_array_len() needs to access header and data May 25 11:37:16 the length of the outer containg blob including header? May 25 11:37:26 *containing May 25 11:38:21 don't know May 25 11:39:31 it seems to be underspecified May 25 11:39:49 not a good sign for new security enahcned api May 25 11:39:58 https://lxr.openwrt.org/source/libubox/blobmsg.c#L115 May 25 11:40:38 line 167 seems to be imply that the length should be overall, so not payload length but attr header + payload length (and padding?) May 25 11:47:51 jow: i have simple test case May 25 11:47:54 ubus call service add '{ "name": "foo", "instances": { "foo": { "command": [ "/bin/true" ], "limits": { "core": "unlimited" }, "x": 5 } } }' May 25 11:48:09 (run it on OpenWrt without my libubox commit) May 25 11:48:47 while parsing "limits" (!!!) "x" gets read May 25 11:49:25 okay, that's what I was expecting May 25 11:49:38 blobmsg_check_attr_list() -> blobmsg_check_array() should read ONLY "core" as there are no more attrs in THAT blob May 25 11:49:45 yep May 25 11:49:46 but it reads "x" and value 5 too May 25 11:49:56 which results in finding BLOBMSG_TYPE_STRING and BLOBMSG_TYPE_INT32 May 25 11:50:05 and failed validation (2 different types) May 25 11:50:40 so your latest fix makes sense... it just looks/feels odd because it pokes at internals instead of using some kind of accessor or macro May 25 11:51:24 I'm surprised that there's no clean blob_() kind of macro May 25 11:51:29 ynezz pointer blobmsg_data_len() - let me check that May 25 11:51:41 no that one is slightly different I'm afraid May 25 11:52:07 oh my May 25 11:52:28 it does the same pointer calculation but adds the result to blob_len(attr) May 25 11:52:42 yes I see now May 25 11:53:29 what is <((uint8_t *)blobmsg_data(attr) - (uint8_t *)blob_data(attr))> calculating exactly? May 25 11:53:48 header length May 25 11:53:49 the start offset of the payload data? May 25 11:55:39 I was thinking of blobmsg_hdrlen() but that does something entirely different, of course May 25 11:56:07 lol May 25 11:56:23 my brain is saying "no" at this point :D May 25 11:58:28 however I think you can simply use instead of May 25 11:59:03 since blobmsg_data(attr) also already trusts length information in the blob attribute header May 25 11:59:33 see https://lxr.openwrt.org/source/libubox/blobmsg.h#L77 May 25 12:00:31 actually I'm not sure if that introduces yet another security issue May 25 12:03:05 an invalid headerlength would result in the pointer returned by blobmsg_data() to be incremented/decremented by -65535..65535 bytes May 25 12:03:19 *by at most May 25 12:04:26 this might result in a too large "rem" result which will then cause overreads again May 25 12:07:24 i see May 25 12:08:45 I think you need to store the results of <(uint8_t *)blobmsg_data(attr)> and <(uint8_t *)blob_data(attr)> in intermediate variables and perform further checks May 25 12:08:51 is there a check for blobmsg data length (claimed by header content) not being biggern than blob length? May 25 12:09:12 for example ensure that the result of blobmsg_data(attr) is greater than that of blob_data(attr) May 25 12:09:58 and that it is smaller than May 25 12:10:21 no, I didn't find such a macro / check function May 25 12:13:32 rmilecki: I mean something like that: https://pastebin.com/0Qq9qTCf May 25 12:41:56 is that possible to simplify that blob vs. blobmsg mess? May 25 12:43:51 I wish I knew May 25 12:44:55 I don't understand the distinction well enough to judge the feasibility May 25 12:45:17 i just learned something like blobmsg_buf_init() exists May 25 12:49:08 blobmsg_buf_init() is the only function calling blob_buf_init() with non-zero id May 25 12:49:13 blobmsg_buf_init() is used nowhere ;) May 25 12:51:56 I think a blobmsg is a structure nested into a blob May 25 12:52:42 a blob is id_len (8 bits type (aka ID), 24 bits length) + data payload May 25 12:53:27 a blobmsg follows the id_len and consists of a name_len (16bit) and a variable length name string May 25 12:53:53 any payload data then follows the variable length name (32bit aligned) May 25 12:54:37 I think the extra set of blobmsg_*() apis is needed to handle the blob attributes while somehow transparently dealing with the name TLV squeezed in front May 25 12:55:13 due to that, there's different notions of "payload" May 25 12:55:43 a blob payload is simply the data after the id_len member (pointing to the start of the blobmsg hdr) May 25 12:56:02 a blobmsg payload is the actual payload after the id_len member, the namelen member and the variable length name May 25 12:56:23 I guess that when working with data from JSON, everything is a blobmsg May 25 12:57:02 question is if there's actual low level blob (non-blobmsg) users May 25 12:57:12 maybe it makes sense to get rid of that lower level api May 25 12:59:14 but then there's also something like an extended blob (?) May 25 13:00:41 oh my, it's actually used somewhere (blobmsg_format_json_with_cb) May 25 13:01:19 ah nvm, I think BLOB_ATTR_EXTENDED is a blob flag which specifies whether a blobmsg is contained May 25 13:02:06 right, i see it now May 25 13:03:18 I find the term blobmsg confusing May 25 13:03:49 blob_with_name or blob_named or namedblob would be more clear May 25 13:04:17 blobmsg somehow implies to me that it is some kind of framing structure, like the outermost blob attribute with some additional metadata May 25 13:05:04 likewise blob_is_extended() should be blob_contains_blobmsg() or blob_is_named() or similar May 25 13:05:37 ack May 25 13:05:52 i started looking at what uses blob_for_each_attr https://lxr.openwrt.org/ident?i=blob_for_each_attr May 25 13:06:03 i noticed fstools/blockd.c which is interesting May 25 13:07:35 in ubus method handler is receives blobmsg, it copies it using memcpy(...., blob_raw_len(msg)) and then it suddenly trears it as blob (not msg)? May 25 13:07:58 i need a break, i'm going for a dinner May 25 13:11:27 I believe you can treat any blobmsg as blob May 25 13:11:37 at least as long as you do not interpret its payload May 25 13:55:11 ynezz, rmilecki: i noticed that the various length checks in the blobmsg code are quite broken May 25 13:55:25 i'm almost done cleaning up the code May 25 14:12:48 nbd: great May 25 14:20:34 jow: thanks for your blob vs. blobmsg description, it mostly confirmed what I understood May 25 14:20:53 jow: one thing I don't undesratnd is why do we need/have duplicated types (ids) May 25 14:21:21 why blobmsg uses its own types instead of just reusing blob ids? May 25 14:21:31 they are *mostly* the same May 25 14:22:12 just to separate blobmsg implementation from blob? so blob doesn't know about blobmsg specific types? May 25 14:22:23 seems like some overengineering May 25 14:25:04 rmilecki: unfortunately I don't know May 25 14:48:14 that was fun ... May 25 14:48:26 * blogic starts working on the device manager May 25 14:49:41 someone reported an ath10k-ct bug against 19.07.03, and it is using quite old firmware. Any chance someone could backport ath10k-ct to stable? I haven't made any scary changes lately, so it has mostly been bugfixes since the December 2019 image that is in stable. May 25 15:07:38 nbd: can we merge blob and blobmsg? e.g. make blob_get_*() check BLOB_ATTR_EXTENDED and handle blobmsg header if present May 25 15:08:07 jow: any opinion on that? May 25 15:09:42 (that's just some first idea to start with sth) May 25 15:12:01 not sure if that makes sense, they're used differently May 25 15:12:06 blob is indexed by id May 25 15:12:15 in blobmsg, the id specifies the type and fields are indexed by name May 25 15:18:07 nbd: speaking of id (aka types), why blobmsg has its own types? it seems it could use blob id for the same purpose\ May 25 15:19:48 blob doesn't have array/table May 25 15:20:12 which could be added I guess May 25 15:21:17 blob types are also not stored in the data, they're used for validation purposes only May 25 15:23:55 well, blobmsg also doesn't store type in *its* data May 25 15:24:42 nbd: i'm afrad it's intuitive for you only ;) May 25 15:24:50 no offence May 25 15:27:34 if i had designed it today instead of 10 years ago, i probably would have done a lot of things differently ;) May 25 15:32:55 nbd: what about refactoring it, assuming someone can find time for that? May 25 15:33:06 nbd: could we refactor existing code somehow into cleaner one? May 25 15:33:24 nbd: or should we use alternative and translate between two when needed? May 25 15:33:50 do you know any existing project that could be used in place of blob(msg)? May 25 15:33:55 i'd be fine with it if somebody picks up that task. but we need to be careful that the benefits of the cleaner code strongly outweigh the cost of API breakage, forcing people to re-learn, regressions, etc. May 25 15:35:08 i'm not aware of a suitable replacement, though i haven't looked in a long time May 25 15:35:31 nbd: a short description how improved desgin could look like? May 25 15:35:39 i'm just wondering at this point May 25 15:35:46 i'm not starting to code anything yet ;) May 25 15:36:27 maybe start with making a prioritized list of where you see the biggest flaws you'd like to see fixed May 25 15:38:16 nbd: #1 is not knowing if blob_attr is blob or blobmsg, that's why I thought about shared reading functions May 25 15:39:05 well, blobmsg is a layer around blob May 25 15:39:39 so i don't think the blob api should deal with blobmsg specifics directly May 25 15:39:42 [11:45:35 CEST] rmilecki: sorry, the blob stuff is an incomprehensible maze to me May 25 15:39:43 [11:45:52 CEST] never sure whether I am dealing with a blob, a blobmsg, what len to use etc. May 25 15:39:55 blobmsg_len functions also work on plain blobs May 25 15:39:59 same for data May 25 15:40:19 nbd: what do we need raw blob then, instead of implementing blobmsg? May 25 15:40:30 do we have some real users that can't use blobmsg? May 25 15:41:00 raw blob is for things where you want msgs to be more tightly packed without carrying a name for every field May 25 15:41:03 a more low level api May 25 15:41:19 nbd: right but do we really need that? May 25 15:41:23 is it really user anywhere? May 25 15:41:27 ubus uses it internally May 25 15:41:47 i have some other projects here and there that use it May 25 15:41:53 including as a wire format May 25 15:42:02 where adding names to the mix would add more network bloat May 25 15:42:56 the format semantics are comparable to netlink May 25 15:44:25 well, i don't see how to improve it then May 25 15:44:46 if we need raw blob for optimization purposes May 25 15:44:56 we don't really want blob to be aware of blobmsg May 25 15:45:07 that propably puts us exactly where we are May 25 15:45:22 ok, i'll probably forget about it once that validation is fixed May 25 15:46:22 i guess one simple way to improve things is to rework users of blobmsg to start using blobmsg_buf_init and blobmsg_len consistently instead of using the blob api May 25 15:46:50 to remove sources of confusion about blob vs blobmsg May 25 15:48:54 btw. i sent the validation fixes to the list May 25 15:50:20 they also make the code a bit smaller and easier to read and understand May 25 15:50:22 in my opinion May 25 15:51:03 thanks, looking at it now May 25 15:51:42 using blobmsg_ consistently is a good idea and I already started looking at code that uses raw blob api for no good reason May 25 15:54:00 nbd: regarding 1/3, we made a loop :D see https://git.openwrt.org/?p=project/libubox.git;a=commitdiff;h=cd75136b1342e1e9dabf921be13240c6653640ed and https://git.openwrt.org/?p=project/libubox.git;a=commitdiff;h=75e300aeec25e032a9778bea34c713969960d1f0 May 25 15:58:32 :) May 25 15:59:31 the use of the len fields was wildly inconsistent in the validation code May 25 15:59:38 so no wonder it got switched back and forth a few times May 25 16:02:09 i just hope your change doesn't regress ynezz's case that made him switch away from blob_raw_len() May 25 16:04:06 i ran the full test suite and the fuzzer May 25 16:04:58 the switch away from blob_raw_len was because before your change the function assumed it was getting the data length instead of the attribute length May 25 16:05:10 your change cleaned that up, but didn't update that callsite May 25 16:05:12 ah, that makes sense May 25 16:08:42 fwiw test-fuzz passed for me locally too May 25 16:10:50 interesting May 25 16:10:57 i could easily reproduce the fuzz failure May 25 16:19:08 ynezz: please let me know if the libubox changes work for you as well May 25 16:20:55 nbd: me too May 25 16:21:01 nbd: it passes for me with your patches May 25 16:49:43 look all good, thanks nbd May 25 16:49:47 *looks May 25 17:15:33 hey zorun looks like in 19.07.3 release is bugfixed FS#2848, maybe it should appear in https://openwrt.org/releases/19.07/notes-19.07.3 May 25 17:56:52 jow: what’s the issue with strlcpy() or strlcat() ? May 25 19:08:36 who has access to the buildbot configs? somebody needs to update mpc85xx/generic to mpc85xx/p1010 May 25 20:04:23 anyone knows what's up with libfuse in the packages feed? I'm getting multiple warnings like this `WARNING: Makefile 'package/feeds/packages/davfs2/Makefile' has a dependency on 'libfuse', which does not exist` May 25 20:04:35 and I've just updated and install -a May 25 20:12:28 and here we go again. build failed on wolfssl, which I did not even enabled May 25 20:12:31 ffs May 25 20:14:33 stintel: install -f May 25 20:16:11 stintel: https://github.com/openwrt/packages/issues/12291 May 25 20:17:18 https://gist.github.com/799ff6988bba58ec3773eac6e78faebb May 25 20:18:03 no idea about that. May 25 20:18:13 but it shouldn't even be built May 25 20:18:24 aclocal.real: error: aclocal: file '/home/stijn/Development/LEDE/source/staging_dir/target-x86_64_musl/host/share/aclocal/glib-gettext.m4' does not exist May 25 20:18:29 this is a plain router without wifi, so don't need it for hostapd or whatever May 25 20:18:32 missing gettext? May 25 20:19:04 and afaict there is nothing that has wolfssl in PKG_BUILD_DEPENDS May 25 20:19:17 or git grep is not showing it May 25 21:01:28 nbd: i know what I'd like to see - struct blobmsg_attr May 25 21:01:50 or maybe something like struct namedblob_attr as suggested by jow May 25 21:02:13 that would make it clear what API functions should be used May 25 21:27:55 guifipedro: thanks, added May 25 21:28:42 guifipedro: not everything can go there, you can always find all changes and bug fixed in https://openwrt.org/releases/19.07/changelog-19.07.3 May 25 23:13:39 build #128 of bcm47xx/generic is complete: Success [build successful] Build details are at http://buildbot.openwrt.org/master/images/builders/bcm47xx%2Fgeneric/builds/128 **** ENDING LOGGING AT Tue May 26 02:59:57 2020