**** BEGIN LOGGING AT Mon Dec 16 03:01:54 2019 **** BEGIN LOGGING AT Mon Dec 16 07:26:31 2019 Dec 16 09:18:48 Please kindly assist to get Certification of non-use for < Red phosphorus> Dec 16 09:18:57 MPN: BBONE-BLACK-4G Dec 16 09:19:17 nag68: ask your distributor/manufacturer Dec 16 09:19:42 please provide contact details Dec 16 09:20:26 I'm sure you can find those on your distributor/manufacturer's website Dec 16 09:20:33 that's where I'd look for them anyway Dec 16 09:21:19 this is a community support channel for the beaglebone, we don't have any particular involvement with any of the manufacturers or distributors (there are multiple) Dec 16 15:17:07 m Dec 16 15:33:35 Moanin! (here) I've been trying to use the watchdog timer on a Black, and it's working perfectly. Problem is every time I kick it the system log gets a line that says "watchdog didn't stop!". But it does kick it. I've been scouring the web for a number of hrs and haven't found much. Dec 16 15:35:07 My recollection is it didn't do that when I started down this path; in that instance I was opening a fd and using ioctl to do everything, but that left the watchdog inaccessible. I changed it to close the fd after each touch and this started to appear. Dec 16 15:43:19 This is exactly the case. I just put the code back to doing ioctl on a fd that's left open for the entire run, and nothing in the system log. Any idea why closing the fd, then accessing the watchdog later, causes this message? Dec 16 16:23:25 It doesn't seem to matter if I use ioctl or system(wdctl) to set the timeout. If what I do closes the device it work perfectly but logs stop failures each time I kick it. Dec 16 16:26:31 I'm on an older kernal b/c it's a level of effort to change that for this appliance. Looks like maybe it's up to 4.14 or so? I saw one whole ref to this being a bug in < 4.8, and comment to that ref saying >= 4.8 didn't actually fix it. Dec 16 16:47:01 Ragnorok: the watchdog fd should normally never be closed Dec 16 16:47:41 also, note that if you're using systemd, you'd normally let it manage the watchdog (if you want a watchdog on individual services, systemd offers that too) Dec 16 16:48:22 the "watchdog didn't stop!" is a warning that the watchdog fd has been closed while leaving the watchdog running Dec 16 16:49:02 Interesting. If I don't close it, it's locked by my process, which isn't a service. Dec 16 16:49:29 if your process isn't a service, how can it be responsible for the watchdog? Dec 16 16:49:42 If I never run my process (that might keep it open) then it's accessible as expected. Dec 16 16:49:59 what do you mean "accessible" ? accessible by whom, for what? Dec 16 16:50:15 normally only one process will ever touch the watchdog Dec 16 16:50:29 I dnn't see the connection. I can manage the watchdog using CLI w/o requiring a service. When I say "accessbile", I mean by CLI / anything that's not my process. Dec 16 16:51:27 that sounds like a hack, presumably it exists to offer a way to kick the watchdog before the watchdog service is running (e.g. in initramfs) or maybe for debugging, I don't know Dec 16 16:51:54 normal use for the watchdog is that one process opens the fd and kicks it periodically Dec 16 16:52:06 most commonly that process would be either systemd or watchdogd Dec 16 16:52:12 I don't either. Hours of scouring the web have only yielded info on using /dev/watchdog essentially from CLI. Nothing about services. Dec 16 16:52:29 I've never heard of it being used from the CLI, that's news to me Dec 16 16:52:47 Well ioctl or cli. I view them as similiar from my code's perspective. Dec 16 16:52:59 no they're a huge difference Dec 16 16:53:08 cli necessarily means open and close Dec 16 16:53:24 as opposed to the normal pattern of leaving it open Dec 16 16:53:28 Also nothing about close is bad wrt /dev/watchdog. Dec 16 16:53:55 not sure, iirc it depends on kernel config whether that will stop the watchdog or not Dec 16 16:54:01 and if it doesn't, you'll get that warning in the kernel log Dec 16 16:54:02 It seems like I should write a service, then, and have my process tell that service to kick it? Dec 16 16:54:46 If systemd is doing something along those lines, nothing has appeared regarding that, either. Dec 16 16:55:07 In fact systemd hasn't appeared at all regarding searching for watchdog interaction. Dec 16 16:55:13 just set RuntimeWatchdogSec=30s or something like that in /etc/systemd/system.conf Dec 16 16:55:20 Except to say it's involved at boot & shutdonw. Dec 16 16:55:30 But how do I kick it, then? Dec 16 16:55:37 systemd will kick it Dec 16 16:55:58 How do I have it reboot if my process crashes? Dec 16 16:56:08 configure in your service config file Dec 16 16:56:17 My process isn't a service. Dec 16 16:56:44 very much sounds to me like it should be Dec 16 16:56:56 otherwise it has no business rebooting the system Dec 16 16:57:23 lol Dec 16 16:58:12 if it runs as a systemd service you ensure that if your service crashes it will be restarted or (at your option) the whole system will be restarted Dec 16 16:58:17 *you can ensure Dec 16 16:58:42 I'll see how ugly is it to change it. I dno't really care if it's a service or not. It's not a service b/c scripting and debug are vastly simpler if I don't have more automagic in the way. Dec 16 16:59:18 having systemd manage the watchdog also ensure that you can have multiple services that need to be kept alive... which you can't do with the hardware watchdog Dec 16 16:59:48 what's your service written in? Dec 16 17:00:09 My process is written in C++. Dec 16 17:00:19 ah because you mentioned scripting, hence I wondered Dec 16 17:00:48 I use scripting to do everything, as was done in days of yore before all this automagic got crammed in. Dec 16 17:00:52 if it were nodejs then I would have had a handy module I wrote => https://github.com/dutchanddutch/node-sd-daemon#service-watchdog Dec 16 17:02:00 Everything ofc being "job control" or whatever you want to call it. The process is simpleton CLI application that just runs. Dec 16 17:02:38 sounds like I'm going to be a happier person the less I know about it ;) Dec 16 17:03:21 anyway, you're still free to open the watchdog manually, just don't close it Dec 16 17:04:46 so wait... if your process isn't a service, how is it started again automatically after reboot? Dec 16 17:04:50 Perhaps. Either way I'll see about doing the service thing, since that's what it takes to make it work how I want. Leaving it open doesn't really work b/c troubleshooting can be very difficult if I can't turn the watchdog off interactively. Dec 16 17:04:54 since a watchdog is pretty useless without that Dec 16 17:05:02 rc.local ofc. Dec 16 17:05:12 in other words it *is* a service, just a horrid one Dec 16 17:05:34 Horrid is a bit strong. That's how things used to be done. Dec 16 17:05:42 why not take whatever you stick into rc.local into a separate systemd service? Dec 16 17:05:52 (note that rc.local is also just another systemd service) Dec 16 17:06:00 That's what I was going to look into doing, to make a "a service". Dec 16 17:06:12 hold on, I have a nice template Dec 16 17:06:21 https://pastebin.com/KXVdTNrL Dec 16 17:06:40 Thanks! I'll work on this and see what happens. Dec 16 17:16:48 Ragnorok: https://pastebin.com/RYfr9xfh some additional useful directives for you Dec 16 17:17:02 Roit toe. Thanks again! Dec 16 17:17:29 you can also have systemd just restart your service, and fall back to rebooting the system if your service fails too many times within a configured interval Dec 16 19:08:22 What causes the limit on simultaneous (virtual) serial connections that can be open? after 6 it will throw an exception. Using Python's pyserial library and 12 devices connected on a 12-port USB hub. Dec 16 19:08:42 hmm, there shouldn't be any Dec 16 19:09:05 always happens. i'd need to juggle the connections. Dec 16 19:09:19 what's the error you get? Dec 16 19:09:32 also, it doesn't happen on the pi4. haven't tried a 3 Dec 16 19:09:36 uh, let me dig that up! Dec 16 19:13:20 will take me a while to do it again, of course i didn't put it into my notes.. Dec 16 19:14:04 when asking for help with an error, it's generally useful to have the actual error at hand :P Dec 16 19:16:12 ha, yes my bad: Dec 16 19:16:14 Traceback (most recent call last): Dec 16 19:16:15 call last): Dec 16 19:16:18 serial.serialutil.SerialException: [Errno 5] could not open port /dev/ttyACM6: [Errno 5] Input/output error: '/dev/ttyACM6' Dec 16 19:16:32 full error when hitting the 7th device Dec 16 19:16:59 import serial Dec 16 19:17:08 curious, anything in kernel log? Dec 16 19:18:38 while i consider myself knowledgeable with *nix, i don't know a lot about checking the logs! Dec 16 19:18:46 tail -f /var/log/kern.log ? Dec 16 19:19:18 dunno, I'd generally use journalctl -k -f Dec 16 19:20:15 awesome, thanks for the command. this looks like IRQ error Dec 16 19:20:25 if you have a syslogd that writes to /var/log/kern.log then you can also consult that file, but I haven't used any syslogd in many years so I'm not sure what it writes where Dec 16 19:20:39 gotcha Dec 16 19:20:39 can you share the error via pastebin.com or similar paste service? Dec 16 19:23:00 both kernel and python errors are above but i'm working on that Dec 16 19:23:17 ? Dec 16 19:23:47 https://pastebin.com/vuVKjR9f Dec 16 19:27:46 is that what you meant..? Dec 16 19:27:55 the last two lines yes Dec 16 19:28:44 okay so it's about musb running out of endpoints... which makes sense, musb is a small usb otg controller, it's not really a full-featured usb host controller. though I wonder why the rpi wouldn't be affected, I thought it also used musb? Dec 16 19:29:10 Ya that seemed curious to me. does pi4 use two of those or something? Dec 16 19:29:46 based on what I'm reading, apparently some kind of multiplexing should kick in when it runs out of endpoints, although then the question becomes why that isn't happening Dec 16 19:31:56 got a link to that kind of documentation? i'm not terribly familiar with the BBB. just realized i didn't mention it was a Black but either they're similar or you figured that out! Dec 16 19:32:32 http://e2e.ti.com/support/processors/f/791/t/778528?Linux-AM3358-MUSB-driver-and-endpoint-usage was the first google hit Dec 16 19:35:37 which kernel version are you using? Dec 16 19:37:48 uname -r Dec 16 19:38:14 ? Dec 16 19:38:37 latest stretch IoT version? Dec 16 19:38:48 you just said 20:37 < rbw> uname -r Dec 16 19:39:50 i downloaded the latest stretch IoT version to card and did a number of updates. but not sure how to verify kernel version directly Dec 16 19:40:14 with uname -r Dec 16 19:40:35 ok, so i'm trying to say i wouldn't doubt that i screwed something up.. Dec 16 19:40:39 but okay, so probably 4.14 Dec 16 19:40:53 no I see no reason to assume that Dec 16 19:41:16 I'm just trying to understand why this endpoint allocation scheme is failing Dec 16 19:42:32 though in some sense the core issue is just overburdening of the simplistic musbhdrc controller, which was never meant to handle that many devices at the same time Dec 16 19:43:38 ya, that's probably just it. i've been exploring it as an embedded controller for some usb devices. workaround may have to be to juggle the connections. Dec 16 19:45:21 still, the kernel has a workaround that should allow an arbitrary number of bulk endpoints... it just doesn't seem to be working for some reason Dec 16 19:46:30 hmm, does CDC-ACM use an interrupt endpoint? lemme check Dec 16 19:51:13 rbw: could you dump the descriptors of one of your devices with lsusb -D /dev/bus/usb/001/xxx (where xxx is the device number as listed in plain "lsusb") Dec 16 19:52:39 sure, one minute here.. Dec 16 19:53:50 zmatt can I send it to you directly? Dec 16 19:54:06 I think I understand what's happening... all available endpoint pipes are occupied by bulk endpoints, and subsequently the allocation for an interrupt endpoint fails. that also explains the workaround in that E2E thread of preemptively forcing bulk endpoints to use the shared EP1 Dec 16 19:54:15 just use pastebin or similar paste service Dec 16 19:56:07 https://pastebin.com/UJ0trYpi Dec 16 19:56:39 yeah, bulk in, bulk out, interrupt in Dec 16 19:58:22 well i'm impressed! I have much to learn.. does that mean it is expected behavior for this type of device? Dec 16 19:58:32 so the workaround in that thread might work for you too... although the guy said he ended up having to limit the endpoint-sharing to only bulk-in (so each bulk-out would still use its own dedicated musb endpoint)... so I don't know if you'd still run into problems with 12 ACM devices Dec 16 20:00:12 OK. it makes sense on some level, but it's an area i'm not too familiar with! Gives me a lot to go by though. Thanks! Dec 16 20:00:32 like I said: the core issue is just that the musbhdrc controller isn't designed to handle that many devices at the same time Dec 16 20:01:53 using more than 4 ACM devices already involves workarounds in the kernel Dec 16 20:02:44 Trying to ask intelligently, it is designed for no more than 6 USB devices? which are occupied when opening them as virtual serial devices? Dec 16 20:03:16 i know as much that there is some magic to map USB to the ACM/virtual serial devices.. Dec 16 20:03:58 it supports 15 endpoints simultaneously (excluding control endpoints). Every usb serial (CDC-ACM) device uses 3 endpoints Dec 16 20:04:18 it has nothing to do with ACM specifically, it's purely about the number of endpoints used Dec 16 20:04:32 the hub also uses an endpoint Dec 16 20:05:31 journalctl is very annoying. When I do journalctl -f it keeps dumping every log possible until it gets to the bottom, then it does --Reboot-- and shows a paltry ten lines of the end of the log. Is that normal? Dec 16 20:05:46 Ragnorok: ehh what? Dec 16 20:05:56 @zmatt ok, that makes sense. although i do get to open 6 with the hub. Anyway, this is awesome, thanks for your time!. Dec 16 20:06:16 rbw: like I said, the kernel already has workarounds to be able to support more than it otherwise could Dec 16 20:06:49 ah, just not as many as i hit.. Dec 16 20:06:51 and it is apparently possible to further raise the limit somewhat using a kernel patch (at expense of performance) Dec 16 20:07:17 gotcha. thanks again! Dec 16 20:07:24 but 12 ACM devices would really be pushing it Dec 16 20:07:42 Ragnorok: that sounds utterly bizarre Dec 16 20:07:48 Since I'm forced to use this goofy systemd thing I'm also using its log, which means I have to use journalctl to see it. It's always been a massive PITA. Unless there's a way to run a service CLI w/o using systemd start, I have to use journalctl -f to see what it's doing, and it never worked right. Dec 16 20:08:18 It also routinely fails to retain logs across boots, even though I have it configured to do so, and the dir it creates has content. Dec 16 20:08:27 Ragnorok: I mean, that's not any different from being started by rc.local Dec 16 20:08:41 weird, journal works quite well for me Dec 16 20:08:52 Logs have never worked with that launch means either. Dec 16 20:09:36 I have a logging wrapper that uses systemd for logging and it's fine for the current session. Usually. Prior boots are never complete. Dec 16 20:09:53 while doing clear reboots? Dec 16 20:09:57 *clean Dec 16 20:10:13 also, what version of systemd are you using? Dec 16 20:10:22 But once in a while it simply doesn't log things. It's not rate limit, or if it, it's not telling me. Clean being normal shutdown and power cycle? Yes. Dec 16 20:10:58 clean being software-requested shutdown yes (as opposed to unceremoniously cutting power) Dec 16 20:11:20 if it's ratelimiting it will tell you Dec 16 20:11:23 Right. I always do a clean reboot unless I have no choice. Dec 16 20:12:13 the journalctl -f thing sounds really odd to me, it should basically work just like tail -f Dec 16 20:12:21 and I've never seen it do otherwise Dec 16 20:12:26 I'm not clever enough to know what systemd version I'm using. Let me go look. Dec 16 20:12:34 I always see it do otherwise. It's nighmarish. Dec 16 20:12:38 systemctl --version Dec 16 20:13:08 I'd expect that to tell me the version of systemctl, not systemd. Dec 16 20:13:13 those are the same Dec 16 20:13:32 It's Linux. I would never assume that. Says 215. Dec 16 20:13:48 they're in one package Dec 16 20:14:14 v215 ... release july 2014, ok Dec 16 20:14:30 so, that might be a bit old :P Dec 16 20:15:24 Everything here is a bit old. It's that updating an appliance problem. I don't have three weeks to fix all the broken packages right now. I did try a kernel update earlier today and it was the expected heinous tragedy. Dec 16 20:15:40 holy cow that's old... that's not even the latest version of debian jessie Dec 16 20:15:46 *for Dec 16 20:16:07 I just need a watchdog to work on thing that been working flawlessly for months. That's it. Dec 16 20:16:28 I mean, yeah, if I strain myself to think back that far then yeah maybe journal still had issues back then Dec 16 20:16:46 But using a service, and journalctl, debugging is nightmarish. Dec 16 20:16:46 so just use the watchdog manually like you were doing... just don't close it Dec 16 20:17:09 I don't even know if service watchdogs were a thing yet in such an ancient version of systemd Dec 16 20:17:59 I mean, journalctl has worked fine for years, but maybe not *that* many years, I don't really quite remember Dec 16 20:18:00 It seems to build find. It did log a couldn't stop the one time it seemed like it loaded, but I haven't yet determined where in the code it was when that happened. Dec 16 20:18:16 couldn't stop = fd closed Dec 16 20:18:57 My memory is it was better at some point in the past, but it's never really be super great at retaining logs across boots. It gets ... fragmented. Syslog was much more reliable. Dec 16 20:19:56 you're using an ancient version, I'm inclined to blame any and all problems you're having with it on it simply being an ancient version unless you can reproduce them on an uptodate version :P Dec 16 20:20:25 If I just leave it open like I was, then if anything at all goes wrong at any time, I can't kill the process manually without the watchdog resetting the system. Dec 16 20:20:38 That is the simplest course of action. Dec 16 20:21:10 I mean, that's how a watchdog is supposed to work Dec 16 20:21:23 there's no difference between manually killing a process and it crashing Dec 16 20:21:34 Had I know it would take three days to fail to get a simple watchdog timer working I may have spend that time trying to update it. Dec 16 20:22:03 you can still avoid a system reset if you restart the service (or stop the watchdog) soon enough after killing your process Dec 16 20:22:09 If I have access to the watchdog external to the process I can do a kill ; echo 'V' and stop it from rebooting if that's what I need to happen. Dec 16 20:22:24 echo 'V' ? Dec 16 20:22:42 if you kill the process, the fd is closed hence you'll be able to access the watchdog again Dec 16 20:23:03 echo 'V' > /dev/watchdog shuts it off. My xp is that doesn't work cleanly or I wouldn't be worried about it. Dec 16 20:23:07 (do make sure you don't leak the fd to additional processes if you spawn any, make sure it's close-on-exec) Dec 16 20:23:40 Perhaps I've been opening it incorrectly, then. Dec 16 20:23:57 is the kernel configured to allow the watchdog to be stopped? Dec 16 20:24:03 It is now. Dec 16 20:25:27 Lastest madness is systemd is trying to CHDIR to my script file, not the WorkingDirectory I set. Or that's what the log looks like. Dec 16 20:25:29 anyway, I don't really have any experience with dealing with the watchdog since systemd takes care of that for me. it does evidently close and reopen it when systemd reexecs (based on the warning in kernel log), so that at least seems to work fine Dec 16 20:25:54 Ragnorok: what does the log say? Dec 16 20:26:14 you have the strangest problems... Dec 16 20:26:27 systemd[802]: Failed at step CHDIR spawning : No such file or directory. Dec 16 20:26:46 I do. It's my computer karma. I routinely have problems that no one else has heard of. Dec 16 20:26:54 okay so it's not trying to chdir to your script file Dec 16 20:27:19 it's saying it's failing to chdir() as part of spawning your script Dec 16 20:27:31 i.e. WorkingDirectory is set incorrectly Dec 16 20:27:58 don't forget to systemctl daemon-reload after changing the .service file. (current systemctl will warn you about this, but dunno if a 4.5-year old one does) Dec 16 20:28:25 5.5-year sorry Dec 16 20:28:56 WorkingDirectory= and ExecStart=/