**** BEGIN LOGGING AT Thu Jul 21 02:59:56 2011 Jul 21 03:22:05 hi guys Jul 21 03:28:40 hey jeramee Jul 21 13:31:17 ppisati: testing the latest 3.0.0-1~dd and seeing kjournald timeouts with IO tests, unsure if they're new or not Jul 21 13:55:37 mahmoh: how did you get it? Jul 21 13:56:17 ppisati: he said he was running io stress tests Jul 21 13:57:02 one thing i noticed were the usb resets found before the hung task warnings Jul 21 13:57:12 usb hardisk? Jul 21 13:57:19 i believe so Jul 21 14:02:29 ppisati: yeah, it has a usb disk enclosure plugged into an externally powered hub Jul 21 14:05:24 unfortuntaley it seems usb support is flaky Jul 21 14:05:36 and it has been like this for a while Jul 21 14:06:00 e.g. lp 709245 Jul 21 14:06:02 Launchpad bug 709245 in linux-ti-omap4 "panda: USB disk IO slow" [High,Confirmed] https://launchpad.net/bugs/709245 Jul 21 14:06:36 if you ping the board while doing I/O tests, performance improve Jul 21 14:07:44 it's getting pinged now and still seeing the resets Jul 21 14:11:19 can you try with a natty kernel? Jul 21 14:12:50 natty? I guess, can you point me to a package? I think GrueMaster confirmed yesterday that he saw it with natty and maverick, the usb ping problem Jul 21 14:13:36 ppisati: I have tested this with a natty image as well as a maverick image. Jul 21 14:13:51 I think it makes more sense to crash or dump when the problem occurs so we can see what's actually happening, no? cmagina? Jul 21 14:13:57 And apparently it has also been reproduced on a beagleXM. Jul 21 14:14:47 It is a problem with either the usb chip or the driver for it. prpplague has more insight. Jul 21 14:15:48 mahmoh: doubt it, as the crash would occur ~2 minutes after the usb reset occurred and that is the probable culprit for the hang Jul 21 14:16:43 could we lower the timeout and attempt the crash that way? Jul 21 14:16:55 imo the best thing we can do is a fill a new bug for it (how to reproduce it, logs, etcetc) so we can forward it to more people Jul 21 14:17:06 TI, usb ml, arm, more people can see it, etcetc Jul 21 14:17:15 or throw USB into debug? what's the best approach here? Jul 21 14:17:26 but yes, the usb support for all the omap chips so far always had problems Jul 21 14:19:29 http://pastebin.ubuntu.com/649155/ Jul 21 14:20:15 uhmmm Jul 21 14:20:29 updatedb.mlocat? at the same time with an IO test? i bet it's stuck! :) Jul 21 14:22:07 what's updatedb.mlocat from? Jul 21 14:24:23 mahmoh: from your pastebin -> Started Run 1 @ 05:29:28[27121.715026] INFO: task updatedb.mlocat:7519 blocked for more than 120 seconds. Jul 21 14:28:42 heh, missed that Jul 21 14:31:16 Next time you guys run io tests, you should run "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" first. That will disable the timeout errors (what you are seeing). Jul 21 14:31:19 yep, but what is "updatedb.mlocat", part of the fs? Jul 21 14:33:04 It may have nothing to do with the test suite. Notice you also have an error in that pastebin from java. Jul 21 14:33:39 no, updatedb.mlocate generates the index that the locate commands uses Jul 21 14:33:48 its a cron job Jul 21 14:33:55 you should disable it Jul 21 14:34:24 No, leave the background tasks alone. Just run the echo command above. Jul 21 14:35:02 it's part of the default system plus we're on a dual core board, it should be fine Jul 21 14:35:15 its not cpu bound, its io bound Jul 21 14:35:26 we're hitting generic kernel timeouts on panda. We either disable them (as above) or we change the kernel config. I suggest manually disabling them first. Jul 21 14:35:28 as for the timeouts, those should not occur either, I want to see when they happen Jul 21 14:35:47 the hung task timeout is just notifying you of the problem Jul 21 14:35:52 of a problem Jul 21 14:35:58 exactly Jul 21 14:36:06 this could be that the system is thrashing as well Jul 21 14:36:07 Find out what the default timeout is and try increasing it. Jul 21 14:36:19 120s Jul 21 14:36:42 So try doubling it and rerunning. Jul 21 14:37:47 i would leave it, its not the problem, what we need to know is if we are getting completely hung or if the system is just thrashing under the load and as a result some processes are not getting the time they need Jul 21 14:38:05 I'm not tweaking the system at this point except to debug this problem Jul 21 14:38:21 so how do we find that out? Jul 21 14:38:26 we could enable sar data gathering and see if we are seeing high io wait times Jul 21 14:38:35 not sure if there is an easier way Jul 21 14:39:22 hm, it is running the threaded io tests after all, hm, and elevator=deadline Jul 21 14:39:33 Look at your logs and see exactly what test reproduces this, then rerun just that test with different timeout settings. Jul 21 14:43:45 ppisati: so this is with the 300-1~dd kernel, but I'm pretty sure it'll happen with stock, should I switch back or push forward? I'm going to try the same thing on two other boards Jul 21 14:44:45 the problem is there's no schedule as to when this occurs but I may be able to force it running background IO tasks (like updatedb.mlocat) Jul 21 14:44:45 mahmoh: I think you are beating a dead horse. We need to keep moving forward. Jul 21 14:45:42 you may be right but if there's a fundamental problem that needs to be fixed then it should be looked at; I can dedicate one board for this and push on on the other ones Jul 21 14:46:18 mahmoh: 252 is currently hovering around 70% io_wait Jul 21 14:46:30 There is a fundamental problem, but we do not have the resources or time to dedicate everything to it. Others outside ubuntu are also looking at it. Jul 21 14:46:56 One board is good. 3 is a bit much. Jul 21 14:47:39 and by adding updatedb, it hit ~90% Jul 21 15:10:59 persia, you there? Jul 21 15:35:51 mahmoh: In reply to server kernel question in #u-meeting, there probably won't be a server specific kernel. Jul 21 15:36:23 then install should add a scheduler line to boot, Daviey? Jul 21 15:36:47 I don't know what the differences are on x86, or if they have any affect on armel. May be worth looking into after Alpha 3. Jul 21 15:37:31 sounds like just a bug to me, where should it go if it's against the installer vs. kernel image? Jul 21 15:38:40 It can be added to the server preinstalled images fairly easily I would think. Not sure about netinstall. Jul 21 15:39:08 should be both Jul 21 15:39:14 We should run some benchmarks on it to see if it helps performance on armel. Jul 21 15:39:18 GrueMaster ; What are the major differences, if any, between the desktop and the server kernel for x86? Jul 21 15:39:31 I know it should be both, just not sure how to go about it with netinstall. Jul 21 15:39:49 Martyn: ENOIDEA. Jul 21 15:39:55 copy that Jul 21 15:39:57 it should install server kernel when selecting Ubuntu Server Jul 21 15:39:59 I'll take a look at it right now Jul 21 15:40:05 Hence why I raised the question. Jul 21 15:40:06 scheduler is diff. Jul 21 15:40:12 janimo: i couldn't fidn any marvin24 3.x kernel on gitorius Jul 21 15:40:37 ppisati, I think he justs sends stuff to lkml or the tegra list, at least that was my impression Jul 21 15:40:42 are you on #ac100 ? Jul 21 15:41:02 I do not know of a 3.0 kernel of his, just saw him active upstream Jul 21 15:44:18 yep, i'm on ac100 Jul 21 15:47:35 GrueMaster: so where should an arm-server-kernel bug request go? and where should a net-install-server bug go? Jul 21 15:48:24 I have no idea. This is the first server work i have done with Ubuntu. Jul 21 15:49:16 it's not server related it's either kernel image related or installer related, where do those bugs live? Jul 21 15:49:45 In launchpad. beyond that I don't know. Jul 21 15:50:10 File a bug on the kernel for tracking purposes. Jul 21 15:50:44 From there, a comparison should be made on x86, and the relevant bits tested on arm. Jul 21 17:11:07 Hello, I am using ubuntu linux on SDcard .. my root (/) filesystem is not the total size of the sdcard Jul 21 17:11:21 my (df -h) is http://pastebin.com/fw6zeG5D Jul 21 17:13:29 arcaico: What image is this ? Jul 21 17:16:53 Linux FriendlyARM 2.6.28.6-FriendlyARM #2 Sat Jun 26 13:24:08 CST 2010 armv6l GNU/Linux --- by: http://www.friendlyarm.net/ Jul 21 17:21:06 I have no idea what that is or where it comes from. Not an Ubuntu linux for sure. Jul 21 17:22:14 * GrueMaster doesn't even see any references to Ubuntu on their web page. Jul 21 17:22:49 GrueMaster, http://www.youtube.com/watch?v=mBN0BUWoxa0 Jul 21 17:23:33 http://kanebebe.dip.jp/download/ARM11-6410-DVD/images/Ubuntu/ Jul 21 17:23:55 Unfortunately, we dropped support for 9.10 at the beginning of this release cycle, and since 10.04 have only supported armv7. Jul 21 17:24:32 You might have better luck with debian arm. Jul 21 17:26:14 Ok GrueMaster, I will give a researched Jul 21 17:29:11 mahmoh: Have you been able to do preseeding with netinstall yet? Jul 21 17:29:26 not yet, should work fine though Jul 21 17:30:04 Ok. No worries, just curious. Jul 21 17:30:30 * GrueMaster was getting ready to reinstall on one of the pandas in the pool. **** ENDING LOGGING AT Fri Jul 22 02:59:57 2011