You probably thought we have abandoned 2.6.27 kernel branch. Well, we ourselves thought we did (although it was not yet officially announced). Then, out of a sudden, kernel 2.6.27-repin.1 is released, rebasing to latest upstream kernel (2.6.27.57), and fixing OpenVZ bug #1593.
The thing is, this kernel is called after Ilya Repin, a leading Russian painter and sculptor of the Peredvizhniki artistic school. One of his best paintings is called "Unexpected Return", and I happen to enjoy the original in Tretyakov Gallery here in Moscow a couple of weeks ago. So here it is: the unexpected return of 2.6.27 kernel. It took Ilya 4 years to finish the painting, it took Pavel 6 months to release the fix. Better late than never, that is.
I have added vswap confguration samples to vzctl git. Basically, you set physpages and swappages and leave every other beancounter at unlimited. For example, this is how ve-vswap-256m-conf.sample looks like:
As you can see, physpages (ie RAM size) is set to 256 megabytes, while swappages (ie swap size) is set to 512 megabytes, all the other beancounters are unlimited. Wow, it's never been easier to configure your containers!
Now, we can utilize this stuff using RHEL6 based kernel. This is what we see from inside the container:
[root@localhost ~]# vzctl enter 103
entered into CT 103
[root@localhost /]# free
total used free shared buffers cached
Mem: 262144 23936 238208 0 0 10968
-/+ buffers/cache: 12968 249176
Swap: 524288 0 524288
Hard CPU limit (ability to specify that you don't want this container to use more than X per cent of CPU no matter what) is back in latest RHEL6-based kernel, 042test006.1, which has just been released.
The feature was only available for the stable (i.e RHEL4 and RHEL5-based) kernels, and was missing from all of our development kernels from 2.6.20 to 2.6.32. So while it was always there in stable branches, the feeling is like it's back.
In order to use CPU limit feature, set the limit using vzctl set $CTID --cpulimit X, where X is in per cent of one single CPU. For example, if you have single 2 GHz CPU and want container 123 to use no more than 1 GHz, use vzctl set 123 --cpulimit 50. If you have 2 GHz quad-core system and want to use no more than 4 GHz, use vzctl set 123 --cpulimit 200. Well, in the second case it might be better to just use --cpus 2. Anyways, see vzctl man page.
We have just released a new RHEL6-based kernel, 042test005. It is shaping up pretty good — as you can see from the changelog, it's not just bug fixes but also performance improvements. If you haven't tried it yet, I suggest to do it today! Do not postpone this until 2011 — after all, this is what will become the next stable OpenVZ kernel.
RHEL6 kernel needs an appropriate (i.e. recent) Linux distribution. If you don't want latest Fedora releases, can't afford RHEL6, and tired of waiting for CentOS 6, I suggest you go with Scientific Linux 6 (SL6). This is yet another RHEL6 clone developed and used by CERN, Fermilabs and other similar institutions.
Yesterday a guy with his name written in Cyrillic letters ("Марк Коренберг") and a @gmail.com email address posted a kernel exploit to the Linux kernel mailing list (aka LKML). This morning one brave guy from our team tried to run it on his desktop -- and had to reboot it after a few minutes of total system unresponsiveness.
The bad news are the exploit is pretty serious and causes Denial of Service. It looks like most kernels are indeed vulnerable.
The good news is OpenVZ is not vulnerable. Why? Because of user beancounters.
Of course, if you set all beancounters to unlimited, exploit will work. So don't do that, unless your CT is completely trusted. Those limits are there for a reason, you know.
You might have noticed that we have announced a new kernel branch
named rhel5-testing a while ago (back in July,
to be more specific). The idea is pretty simple: at the same time as giving the new
kernel to our internal QA we are releasing it to rhel5-testing.
Although this change imposes some more work on me (more kernels
to release, scripts to run, changelogs to prepare), I'm pleased
to say that this model works very well. First, vendors who use
our kernels as a base for theirs (for example, OWL) now enjoy
earlier access to the sources. Second, new kernels get more
testing coverage due to OpenVZ users who choose to use this
branch. Finally, it works as a “technology preview”.
Now, let me explain why we have so strange version numbers
in the recent rhel5-testing kernels — kernels 028stab07x
are intermixed with 028stab070.y. The thing is, we still keep
updating 028stab070.y with new fixes and upstream (RHEL) updates,
while 028stab07x is a newer “sub-branch” which adds a few new
features:
live migration of containers with NFS and AutoFS mounts
iotop working in containers and the host system
Because of these new features, these kernels haven't reached the
stability yet so we keep releasing those in rhel5-testing.
Hopefully soon it will end up being stable enough and we
will abandon 028stab070.y in favor of 028stab078 (or so).
Update: this post was mostly written yesterday. Today we have just released
028stab078.1 kernel.
I am still at the OpenVZ booth at LinuxTag 2010 in Berlin. At least two people asked me about the status of OpenVZ kernel for the upcoming Debian Squeeze. Specifically, they said, there is no openvz kernel in "testing" repository (i.e. what will become Squeeze when it will be released). My guess is some more people interesting in that, so here's the public answer.
We are working pretty close with the Debian kernel team, you can see some traces of that on either debian-kernel AT lists.debian.org or debian AT openvz.org mailing lists. Specifically, we work together to bring good quality OpenVZ kernel to Squeeze, and this was one of the main reasons for us to port to 2.6.32.
But yesterday we tried to search for openvz linux-image on packages.debian.org and it gave us no results for testing. I then emailed Max Attems (who maintains our kernels in Debian) and this is his response:
it should be there now, the switch to libata did uphold testing transition of linux-2.6 for quite some time, so testing had an outdated linux-2.6 for quite some while
Indeed, the kernel is now there. So yes, Squeeze will have OpenVZ kernel, and I guess it can also be used by people who switched to Ubuntu 10.4.
We have just announced that we stop making new releases for OpenVZ kernel branches 2.6.24, 2.6.26, and 2.6.18. So, from now on we only have 2.6.27, 2.6.32, RHEL4-2.6.9 and RHEL5-2.6.18. Removing the number of parallel kernel branches we have to maintain really helps to concentrate on supporting the remaining ones and moving to mainline. I hope that doesn't affect anyone too much -- from where I stand most users run either stable (i.e. RHEL5-2.6.18) or bleeding edge (2.6.32, before it used to be 2.6.27). In any case, we are not dropping support for vendor kernels, such as OpenVZ kernels in Debian and Ubuntu -- those are still supported from us for the lifetime of the distributions that carry it, we will help with OpenVZ bugs in those kernels through the usual channel.
On the remaining branches. Last Thursday we did an update to 2.6.32 kernel fixing some nasty bugs found in the first public version, and today we updated 2.6.27 kernel as well. Speaking of 2.6.27, it will eventually be dropped as well, but we will keep maintaining it for at least a few more months.
Stable kernel update (RHEL5.5 based, 028stab069...) is currently in testing, but don't expect it to be released real soon now -- previous experience tells us that .y updates are not that easy. We also anticipate to open RHEL6-2.6.32 branch soon, since Red Hat already shooted a beta of their upcoming release.
I am preparing an updated set of precreated templates; those should be ready tonight or tomorrow, available from the usual place.
In addition to a bunch of updated templates, this time we add a few new ones: - Fedora 10 (aka Cambridge) - openSUSE 11.1 - Ubuntu 9.04 (aka The Jaunty Jackalope)
OpenSUSE is interesting -- apparently they dropped yum (which was available in 10.3 and 11.0 but not in 11.1) and now they have something called zypper. Also note that openSUSE lacks the code name. Apparently the SUSE guys are already aware of the issue and have a plan to fix it -- the next release (openSUSE 11.2) will be codenamed Fichte, after the German XIIX century philosopher. Subsequent openSUSE releases will also be named after famous philosophers -- Rousseau, Voltaire, Lessing (although I'm not sure which Lessing do they have in mind, probably Theodor). Interesting... maybe they got the naming idea from OpenVZ kernels. ;)
Also, during the next update (i.e. in about a month, not now) we are going to remove a few templates that are old and unsupported: - Debian 3.1 "Sarge" (EOL 30 Mar 2008) - Fedora 7 (EOL 13 Jul 2008) - openSUSE 10.3 (EOL 19 Sep 2008) - Fedora 8 (EOL 7 Jan 2009) - Ubuntu 7.10 (EOL 18 Apr 2009) Anybody who's using those distros inside containers should updated to something more (r|d)ecent and supported. You have been warned.
PS For people who use our stable kernels (i.e. RHEL5 branch) -- please note that you have to update to the latest kernel (028stab062.3 at the moment) in order to use Fedora 10 in containers. This is due to a few new system calls recently added to the Linux kernel which Fedora 10 userland expect to have in the kernel. Those syscalls were just backported to our RHEL5 branch by the OpenVZ team.
From time to time, somebody critisizes OpenVZ kernel patch for its intrusiveness and size. Right, it is big and intrusive -- it adds a whole lot of new features into the kernel. But how big is it?
Our engineer prepared some stats on three different kernels: 1. OpenVZ stable kernel (based on 2.6.18-RHEL5); 2. OpenVZ development kernel (based on 2.6.27); 3. RHEL5.3 kernel (based on 2.6.18). You can see the results by clicking the image at the right.
Some notes for the graph. For OpenVZ kernels, we distinguish between core kernel changes and the stuff that is built as modules. For RHEL kernel, we break the patchset down into a few categories, such as drivers, Xen, GFS, ext4 and so on; "other" means everything not covered by any other category. The numbers are thousands lines of code added and deleted, combined. A table below the graph has some more details, like how many files were changed, how many lines added and deleted.
Now to the conclusions. Two major points can be made: 1. Even without drivers, RHEL5 kernel patches add/delete 434 KLOCs*, which is 8.5x times bigger then OpenVZ kernel modifications (51 KLOC). So, yes, OpenVZ patch set is big, but not that big. 2. OpenVZ based on mainstream 2.6.27 kernel requires 40% less** modifications to the kernel due to on-going effort to integrate the functionality into mainstream.
* KLOC is a thousand lines of source code. ** we only count the core changes, omitting the modules.
I tried it and was able to migrate a CentOS 7 container... but the Fedora 22 one seems to be stuck in the "started" phase. It creates a /vz/private/{ctid} dir on the destination host (with the same…
The fall semester is just around the corner... so it is impossible for me to break away for a trip to Seattle. I hope one or more of you guys can blog so I can attend vicariously.
Comments
Do you still stand by your opinions above now in 2016?…