kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #137737
[Bug 131094] Re: Heavy Disk I/O harms desktop responsiveness
It might not be good to stir up such an old bug, but it gets regularly
updated and new complains so maybe a new approach might help.
So let us make one thing clear, IMHO if something overloads your machine with disk I/O it has to stall it.
So the solutions paths are more like this:
a) beat it with more Processsing / IO HW
b) mitigate the effect as far as possible
c) avoid the overload before it starts
The issue is a common one - so I'll keep my explanations general and not
specific to trackerd or any other case that was mentioned before.
### a) beat it with more Processsing / IO HW ###
There are way more expensive machines out there which can handle way more I/O without being slown down. The reason is that they have more I/O Cards, virtual functions to spread over CPUs handling that and at the high end servers with totally different I/O IRQ designs.
We should agree that on cheap/slow or even medium machines I/O overload just *IS* an issue to responsiveness.
But that isn't important - the question is what can a normal user do about it and spending x000000 $ on a machine isn't the solution.
### b) mitigate the effect as far as possible ###
So regarding mitigation there were already some approaches in this bug discussion.
Like using ionice and several dirty ratio tunings, but all these don't prevent the I/O overload.
E.g. if you overload the system with only "Best Effort" I/O class, the only difference it makes is that "other I/O" might pass faster, but your system is still fairly busy => unresponsive
Also dirty ratios come down to spending the process remaining time slice to clean up dirty memory as soon as a certain level is reached, now while you can configure higher ratios (at the price of endangering integrity) it also won't stop the burst of I/O. No instead it will allow to submit more data to dirty the page cache and thereby indirectly more I/O overloading the system again.
### c) avoid the overload before it starts ###
It must be said, since this bug starts back in 2007 and a lot of the reports are related to I/O+*sync that just for sync&journaling various filesystem and general kernel improvements have been mad. Several posts in this bug confirm this already.
Now what I didn't see people trying throttle the processes that overload the system.
Throttling at => https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
As any - this approach has certain limitations, but it is a new way to tackle the overall issue.
It also need certain cgroup and filesystem features (like accounting writeback through pagecache) which might only be available in modern ubuntu releases.
### Experiment ###
As an experiment to prove the solution I use the tools fio and latencytop to compare:
1. no background load checking latencytop
2. running a random read/write mutlithread fio in background checking latencytop
3. running a throttled random read/write mutlithread fio in background checking latencytop
# Background Load #
A fio job file like this:
[global]
ioengine=libaio
rw=randrw
bssplit=1k/25:4k/50:64k/25
size=512m
directory=/home/paelzer/latencytest
iodepth=8
[dio]
direct=1
numjobs=8
[pgc]
direct=0
numjobs=8
# Case 1 - No background load => almost no latency
Cause Maximum Percentage
Waiting for event (select) 5,0 msec 39,7 %
Waiting for event (poll) 5,0 msec 33,9 %
Userspace lock contention 4,8 msec 25,7 %
[do_wait] 2,7 msec 0,4 %
[ep_poll] 2,4 msec 0,2 %
Reading from file 0,9 msec 0,0 %
Reading EXT3 directory htree 0,2 msec 0,0 %
[hrtimer_nanosleep] 0,1 msec 0,0 %
# Case 2 - Unrestricted background load overloading the I/O subsystem shows massive impact
- ext4 data/log writes
- memory management due to trashing page cache
...
=> Fast
Jobs: 16 (f=16): [m(16)] [6.7% done] [92482KB/99.50MB/0KB /s] [6302/6483/0 iops] [eta 01m:51s]
Cause Maximum Percentage
[ext4_file_write_iter] 91,8 msec 0,3 %
[wait_transaction_locked] 63,4 msec 0,1 %
Marking inode dirty 61,2 msec 0,9 %
[SyS_io_destroy] 46,3 msec 0,3 %
[lru_add_drain_all] 18,0 msec 0,1 %
[__block_write_begin] 16,8 msec 38,5 %
[__lock_page_killable] 16,2 msec 34,7 %
[read_events] 5,0 msec 21,2 %
Waiting for event (poll) 5,0 msec 1,9 %
# Case 3 - Now the same workload but contained in a blkio throttled cgroup
mkdir /sys/fs/cgroup/blkio/limitbgload
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 29,3G 0 disk
├─sda1 8:1 0 28,3G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 1021M 0 part
# Limit to 4MB/s write and 8 MB/s write speed
echo 8:0 $((1024*1024*4)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.write_bps_device
echo 8:0 $((1024*1024*8)) > /sys/fs/cgroup/blkio/limitbgload/blkio.throttle.read_bps_device
cgexec -g blkio:limitbgload fio causelatency.fiojob
The workload shows throttling is working:
Jobs: 16 (f=16): [m(16)] [22.0% done] [6724KB/8915KB/0KB /s] [577/598/0 iops] [eta 09m:25s]
But we can also see its desired effect avoiding to overload the system with I/O.
Cause Maximum Percentage
[__lock_page_killable] 132,2 msec 46,5 %
[__block_write_begin] 131,4 msec 47,9 %
fsync() on a file (type 'F' for details) 30,7 msec 0,0 %
Marking inode dirty 21,5 msec 0,1 %
[ext4_file_write_iter] 5,2 msec 0,0 %
Waiting for event (select) 5,0 msec 1,4 %
Userspace lock contention 5,0 msec 1,0 %
Waiting for event (poll) 5,0 msec 1,7 %
[read_events] 4,9 msec 1,3 %
=> this shows almost only the stalls due to throttling itself which are wanted
=> the dirtying and filesystem latencies are way smaller now
=> the system "feels" right regarding responsiveness
### TL;DR ###
- huge machines just beat I/O overload with more HW or better I/O Architecture
- Code improves to mitigate effects but can never be perfect for *ALL* users at once (especially in the default config)
- try throttling your processes overloading I/O if you are not requiring its results asap
=> Let us discuss if that would be an option and if so let us close this bug and open a separate one requesting configurable throttling for each component applicable like trackerd and so many other I/O heavy background tasks
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/131094
Title:
Heavy Disk I/O harms desktop responsiveness
Status in linux package in Ubuntu:
Confirmed
Bug description:
Binary package hint: linux-source-2.6.22
When compared with 2.6.15 in feisty, heavy disk I/O causes increased
iowait times and affects desktop responsiveness in 2.6.22
this appears to be a regression from 2.6.15 where iowait is much lower
and desktop responsiveness is unaffected with the same I/O load
Easy to reproduce with tracker - index the same set of files with
2.6.15 kernel and 2.6.22 kernel and the difference in desktop
responsiveness is massive
I have not confirmed if a non-tracker process which does heavy disk
i/o (especially writing) replicates this yet - will do further
investigation soon
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094/+subscriptions