**** BEGIN LOGGING AT Fri Sep 22 03:00:00 2017 **** BEGIN LOGGING AT Fri Sep 22 06:00:49 2017 Sep 22 20:48:30 pabs3: ping Sep 22 20:50:46 aaah oh wow, mail trickling in Sep 22 22:13:54 pabs3: mail.om is dog slow, something prolly hogging the system Sep 22 22:14:30 we had same effect several years ago already Sep 22 22:14:42 can't recall what been the culprit back when Sep 23 02:19:59 DocScrutinizer05: looks like massive load from a lot of web processes Sep 23 02:20:22 load average: 112.52, 109.11, 111.62 Sep 23 02:21:04 eeek Sep 23 02:21:11 io? Sep 23 02:21:21 or real cPU Sep 23 02:21:27 CPU* Sep 23 02:21:42 maybe check SMART Sep 23 02:22:18 aha, also there is a RAID recheck going on Sep 23 02:22:21 or stop apache temporarily? Sep 23 02:22:37 aaah that explains it, almost suspected that Sep 23 02:22:44 apache stopped already Sep 23 02:23:36 I would also suspect one of the RAID drives has a nasty SMART report Sep 23 02:25:15 though last (and only) time I had trouble with my RAID1, in the end a 50ct worth power Y-cable was the culprit Sep 23 02:26:01 SMART seems fine Sep 23 02:26:03 it shown as sdb having more power cycles in SAMRT than sda Sep 23 02:26:32 nothing in syslog during last 3 days? Sep 23 02:27:14 maybe the controller Sep 23 02:28:04 actually, what's a recheck? Sep 23 02:29:23 scans the raid1 to make sure both disks agree Sep 23 02:29:46 [=>...................] check = 6.7% (72258624/1073740664) finish=403.7min speed=41336K/sec Sep 23 02:29:52 resync=DELAYED Sep 23 02:33:43 so a frankenstein fsck Sep 23 02:34:28 is that stuff scheduled, or maybe after N boots like fsck in fstab? Sep 23 02:34:42 I mean, what triggered it? Sep 23 02:34:52 the reboot? Sep 23 02:36:19 that was like >49h ago already, and the recheck is at 6.7% Sep 23 02:36:27 40* Sep 23 02:39:37 there is a cron job in mdadm and also a bunch of RAID1 devices Sep 23 02:40:05 one more RAID1 to check after the current one too Sep 23 02:40:29 it is possible the reboot triggered a check somehow too Sep 23 02:41:23 aiui those checks should run in background and not slow down normal functional IO, no? Sep 23 02:42:11 IOW the job runs only while system-IO idle Sep 23 02:42:31 and gets suspended for all normal IO activity Sep 23 02:43:27 which would explain why it's at 7% after 40h, *if* there was lots of normal system load like actually somebody hogging HTTP Sep 23 02:43:28 yeah that should be how it works, not sure tho Sep 23 02:44:03 starting apache again, lets see what happens Sep 23 02:44:09 :-) Sep 23 02:44:12 02:40:37 up 1 day, 18:53, 1 user, load average: 0.91, 3.08, 31.77 Sep 23 02:44:25 maybe check access logs, if there are any Sep 23 02:47:16 bing is walking the svn web interface... Sep 23 02:47:57 load is still okish Sep 23 02:48:05 02:48:00 up 1 day, 19:00, 1 user, load average: 2.13, 2.24, 20.36 Sep 23 02:48:18 I'll keep an eye on it Sep 23 02:59:51 let me check mail RTT **** ENDING LOGGING AT Sat Sep 23 03:00:02 2017