Yay to I/O limits! 29 October 2013 @ 08:12 pm
Kir Kolyshkin

Today we are releasing a somewhat small but very important OpenVZ feature: per-container disk I/O bandwidth and IOPS limiting.

OpenVZ have I/O priority feature for a while, which lets one set a per-container I/O priority -- a number from 0 to 7. This is working in a way that if two similar containers with similar I/O patterns, but different I/O priorities are run on the same system, a container with a prio of 0 (lowest) will have I/O speed of about 2-3 times less than that of a container with a prio of 7 (highest). This works for some scenarios, but not all.

So, I/O bandwidth limiting was introduced in Parallels Cloud Server, and as of today is available in OpenVZ as well. Using the feature is very easy: you set a limit for a container (in megabytes per second), and watch it obeying the limit. For example, here I try doing I/O without any limit set first:

root@host# vzctl enter 777
root@CT:/# cat /dev/urandom | pv -c - >/bigfile
 88MB 0:00:10 [8.26MB/s] [         <=>      ]

Now let's set the I/O limit to 3 MB/s:

root@host# vzctl set 777 --iolimit 3M --save
UB limits were set successfully
Setting iolimit: 3145728 bytes/sec
CT configuration saved to /etc/vz/conf/777.conf
root@host# vzctl enter 777
root@CT:/# cat /dev/urandom | pv -c - >/bigfile3
39.1MB 0:00:10 [   3MB/s] [         <=>     ]

If you run it yourself, you'll notice a spike of speed at the beginning, and then it goes down to the limit. This is so-called burstable limit working, it allows a container to over-use its limit (up to 3x) for a short time.

In the above example we tested writes. Reads work the same way, except when read data are in fact coming from the page cache (such as when you are reading the file which you just wrote). In this case, no actual I/O is performed, therefore no limiting.

Second feature is I/O operations per second, or just IOPS limit. For more info on what is IOPS, go read the linked Wikipedia article -- all I can say here is for traditional rotating disks the hardware capabilities are pretty limited (75 to 150 IOPS is a good guess, or 200 if you have high-end server class HDDs), while for SSDs this is much less of a problem. IOPS limit is set in the same way as iolimit (vzctl set $CTID --iopslimit NN --save), although measuring its impact is more tricky.</o>

Finally, to play with this stuff, you need:

  • vzctl 4.6 (or higher)
  • Kernel 042stab084.3 (or higher)
Note that the kernel with this feature is currently still in testing -- so if you haven't done so, it's time to read about testing kernels.
Permanent Link15 comments | Leave a comment
Post a new comment
Оживший юзерпикtobotras on October 30th, 2013 - 05:56 am
Great! Congrats, guys! :)
Stanislav Kinsboursky on October 30th, 2013 - 06:32 am
Congrats for what? For making the feature "opensource"?
Оживший юзерпикtobotras on October 30th, 2013 - 01:24 pm
(Anonymous) on October 30th, 2013 - 08:16 am
Hi, I'd like to ask - how is this implemented? Does it need IO accounting support in dcache?

We're using ZFS on Linux, which completely bypasses dcache and doesn't have custom hooks for IO accounting yet. Is that going to work?

If not, what can I do (in terms of code/where) to make it work?
Kir Kolyshkin: nepalk001 on October 30th, 2013 - 03:16 pm
Our sysadmins say it works for ZFS, too.
(Anonymous) on May 29th, 2014 - 01:23 pm
It doesn't seem to be working on zfsonlinux.
(Anonymous) on October 30th, 2013 - 09:34 am
Awesome, excellent news thanks!
LiveJournal: pingback_botlivejournal on October 30th, 2013 - 10:03 am
Yay to I/O limits!
User blinohod referenced to your post from Yay to I/O limits! saying: [...] Оригинал взят у в Yay to I/O limits! [...]
rleir on October 30th, 2013 - 12:24 pm
Thanks Kir!
Simon Boulet on October 30th, 2013 - 02:16 pm
Excellent! This works for asynchronous writes too?
Kir Kolyshkin: nepalk001 on October 30th, 2013 - 02:58 pm
yep (and that answers your recent questions on users@ I guess)
(Anonymous) on November 1st, 2013 - 10:04 pm
This is great new and something that OpenVZ sorely needs!
Hopefully the testing goes well, and this gets released into stable soon...
Kir Kolyshkin: nepalk001 on November 1st, 2013 - 11:30 pm
We have already tested this internally. Now more testing is to be done by you, I mean users. This is one of the reasons we are releasing those testing kernels.
Gordon P on February 16th, 2014 - 05:04 am
Any further updates on when this might be stable? I know it's a much awaited feature!
Kir Kolyshkin: nepalk001 on March 5th, 2014 - 01:55 am
042stab084.12 which has this feature went to stable on 10th of December 2013‎.