I thought that Monday is a good day to release a new version of vzctl which was worked on for the last six months. There are a few important changes in this release, let me go through it.
First, we have finally removed all the cronscripts trickery required for CT reboot. The thing is, if the container owner issues 'reboot' command from inside, the container just stops. Now, something needs to be done on the host system to start it again. Until this release, this was achieved by a hackish combination of vzctl (which adds an initscript inside container to add a reboot mark) and a cron script (which checks for the stopped containers having that reboot mark and starts those). Yet another cron script takes care about a situation when a CT is stopped from the inside -- in this case some cleanup needs to be done from the host system, namely we need to unmount the CT private area, and remove the routing and ARP records for the CT IP.
There are a few problems with this cron-based approach. First, initscript handling can be different in different distributions, and it's really hard to support all of the distros. Second, cron script is run every 5 minutes, which means a mean time to reboot (or clean up network rules) is 2.5 minutes. To say it simple, it's all hackish and unreliable.
Now, this hairy trickery is removed and replaced by a simple and clean daemon called vzeventd, which listens to CT stop and reboot events, and runs clean and simple scripts. No more trickery, no more waiting for reboot. The only catch is this requires support from the kernel (which comes in a form of vzevent kernel module).
Second, new vzctl is able to start Fedora 14 containers on our stable (i.e. RHEL5-2.6.18) kernels. The thing is, Fedora 14 have glibc patched to check for specific kernel version (>=2.6.32 in this case) and refuse to work otherwise. This is done to prevent glibc from using the old kernels with some required features missing. We patch our kernels to have those features, but glibc just checks the version. So, our recent kernels is able to set osrelease field of uname structure to any given value for a given container. Now, vzctl 3.0.25 comes with a file (/etc/vz/osrelease.conf) which lists different distros and their required kernel version, which it sets during start and exec.
I want to briefly mention yet another feature of recent vzctl (which, again, needs kernel support) -- an ability to delegate a PCI device into a container. It is only supported on RHEL6 kernel at the moment, and the only devices that we have tried are NVidia GPUs.
Besides these three big things, there are a lot of improvements, fixes, and documentation updates all over the tree. I don't know of any known regressions in this release but I guess it's not entirely Bug Free. Fortunately there's a way to handle it -- if anything really bad appears in this version, it will be fixed by a quick 3.0.25.1 update. This worked pretty well for vzctl-3.0.24, should work fine this time, too.
Yesterday a guy with his name written in Cyrillic letters ("Марк Коренберг") and a @gmail.com email address posted a kernel exploit to the Linux kernel mailing list (aka LKML). This morning one brave guy from our team tried to run it on his desktop -- and had to reboot it after a few minutes of total system unresponsiveness.
The bad news are the exploit is pretty serious and causes Denial of Service. It looks like most kernels are indeed vulnerable.
The good news is OpenVZ is not vulnerable. Why? Because of user beancounters.
Of course, if you set all beancounters to unlimited, exploit will work. So don't do that, unless your CT is completely trusted. Those limits are there for a reason, you know.
You might have noticed that we have announced a new kernel branch
named rhel5-testing a while ago (back in July,
to be more specific). The idea is pretty simple: at the same time as giving the new
kernel to our internal QA we are releasing it to rhel5-testing.
Although this change imposes some more work on me (more kernels
to release, scripts to run, changelogs to prepare), I'm pleased
to say that this model works very well. First, vendors who use
our kernels as a base for theirs (for example, OWL) now enjoy
earlier access to the sources. Second, new kernels get more
testing coverage due to OpenVZ users who choose to use this
branch. Finally, it works as a “technology preview”.
Now, let me explain why we have so strange version numbers
in the recent rhel5-testing kernels — kernels 028stab07x
are intermixed with 028stab070.y. The thing is, we still keep
updating 028stab070.y with new fixes and upstream (RHEL) updates,
while 028stab07x is a newer “sub-branch” which adds a few new
features:
live migration of containers with NFS and AutoFS mounts
iotop working in containers and the host system
Because of these new features, these kernels haven't reached the
stability yet so we keep releasing those in rhel5-testing.
Hopefully soon it will end up being stable enough and we
will abandon 028stab070.y in favor of 028stab078 (or so).
Update: this post was mostly written yesterday. Today we have just released
028stab078.1 kernel.
When I have a high temperature, i.e. fever, I am very talkative. I just measured it up to 39.6deg;C (103.3°F). Now I don't have anyone to talk to verbally, so I'm blogging. You have been warned. But no, this post is not contageous.
The big problem is I caught a cold during the first flight, and now I feel strange. I can not listen to the talks (except for the keynote), I am taking pills ( a story about pills, way smallerCollapse ) and such, gargling the NaCl solution, and all that. So far it helps a little -- I either have a fever or feel like a slowpoke under the drugs.
My talk is moved from Wednesday evening to Thursday morning (not because of me, and I only found it while getting a badge). I hope I will be less of a hot vegetable by that time. I need to make it because I already did most of it.
I am still at the OpenVZ booth at LinuxTag 2010 in Berlin. At least two people asked me about the status of OpenVZ kernel for the upcoming Debian Squeeze. Specifically, they said, there is no openvz kernel in "testing" repository (i.e. what will become Squeeze when it will be released). My guess is some more people interesting in that, so here's the public answer.
We are working pretty close with the Debian kernel team, you can see some traces of that on either debian-kernel AT lists.debian.org or debian AT openvz.org mailing lists. Specifically, we work together to bring good quality OpenVZ kernel to Squeeze, and this was one of the main reasons for us to port to 2.6.32.
But yesterday we tried to search for openvz linux-image on packages.debian.org and it gave us no results for testing. I then emailed Max Attems (who maintains our kernels in Debian) and this is his response:
it should be there now, the switch to libata did uphold testing transition of linux-2.6 for quite some time, so testing had an outdated linux-2.6 for quite some while
Indeed, the kernel is now there. So yes, Squeeze will have OpenVZ kernel, and I guess it can also be used by people who switched to Ubuntu 10.4.
I am standing here at the LinuxTag 2010 event, so if you are in Berlin this week come to our booth to say hello (and maybe recommend a local beer place to go).
One visitor asked me if it's possible to run Firefox inside a container (with the main purpose to browse insecure sites). Yes, it is possible, there are two ways -- using Xvnc and SSH's X forwarding. I just implemented it here (using the latter way), and want to share the experience, because there are a few rough edges here and there.
We have just announced that we stop making new releases for OpenVZ kernel branches 2.6.24, 2.6.26, and 2.6.18. So, from now on we only have 2.6.27, 2.6.32, RHEL4-2.6.9 and RHEL5-2.6.18. Removing the number of parallel kernel branches we have to maintain really helps to concentrate on supporting the remaining ones and moving to mainline. I hope that doesn't affect anyone too much -- from where I stand most users run either stable (i.e. RHEL5-2.6.18) or bleeding edge (2.6.32, before it used to be 2.6.27). In any case, we are not dropping support for vendor kernels, such as OpenVZ kernels in Debian and Ubuntu -- those are still supported from us for the lifetime of the distributions that carry it, we will help with OpenVZ bugs in those kernels through the usual channel.
On the remaining branches. Last Thursday we did an update to 2.6.32 kernel fixing some nasty bugs found in the first public version, and today we updated 2.6.27 kernel as well. Speaking of 2.6.27, it will eventually be dropped as well, but we will keep maintaining it for at least a few more months.
Stable kernel update (RHEL5.5 based, 028stab069...) is currently in testing, but don't expect it to be released real soon now -- previous experience tells us that .y updates are not that easy. We also anticipate to open RHEL6-2.6.32 branch soon, since Red Hat already shooted a beta of their upcoming release.
Today I came across the page which compares OpenVZ to KVM to Xen. Leaving Xen aside, from that one it looks like KVM is ways better, it got all the green pluses, while OpenVZ got all the dull minuses, except for a few features where it says "limited support".
For example, from the author's POV, KVM supports cool features such as "Independent kernel" and "Independent kernel modules" , while OpenVZ lacks all that. I am not mentioning "Full control on sockets and processes" -- definitely, such things as sockets and processes are completely out of control when you use OpenVZ, to the extent that you can not distinguish between a process, a socket, and a potato! (Was that sarcasm? Yes, in fact I don't have an idea of what do they mean by that statement...)
But such a comparison is inspiring, so I invested 15 minutes of my time and made my own, titled Car vs bike. It clearly states that a car is better than a bike -- its capacity is higher and it doesn't require lots of muscle power. After all, it has powered steering wheel (not mentioning powered windows) and can come with an automatic gearbox, air conditioning and even a sunroof! A bike, from the other side, is missing a lot of features -- even windshield wipers are absent which are standard for every car since about 1925!
Actually, I didn't stop there and made yet another comparison, titled Bike vs car. Now it's perfectly clear that a bike is a better choice than a car, since it's cheaper, ecologically clean, and you can even take it with you on a train! A car is big and heavy, it requires periodical refuelling and a parking spot.
Both comparisons are on the openvz wiki, so feel free to edit and add more features!
OpenVZ will have a booth at the upcoming SCALE8x conference in Los Angeles, California, USA.
I want to design a new t-shirt for the conference (and other future events). So far we have two designs (about which I wrote before here): first "container lifecycle" and then "kernel classics" (you can see both at the shop). Now I want to have something as geeky as the first design, which looks like a screenshot from a terminal, but using a dark-colored t-shirt (I think dark green will fit well).
If you have any suggestions for the design, or yet better can draw it (or a mock-up) -- please speak up here or email me (kir at openvz org). If OpenVZ will take your design I promise to post two t-shirts to you.
Some of you may recall that last December I did an experiment where I created 638 OpenVZ containers on an HP Proliant DL380 G5 machine with dual quad-core CPUs and 32GB of RAM. I stopped there because I ran into an error. Well, one of the OpenVZ / Parallels developers suggested a fix back in July both as a comment to my article and as a comment to the bug report... but somehow I overlooked it until I ran across it again the other day when cleaning out my email.
I finally got a chance to give it a try and sure enough it removed the limit I had run into (the sysctl kernel.pid_max default setting being too low) and I verified it by creating 700 containers.
At first I decided to stop there but then I got an email from Kir asking if disk space was going to end up being my real limitation. I'm wondering if Kir has seen other experiments that go to this extreme or if he is simply a good guesser (with some inside information)? Anyway, I decide to bump it up to 1,000 containers. Sure enough, the machine is handling it just fine.
I didn't do a completely new write up, I just wrote a few more comments to the original article and you can find it here:
I tried it and was able to migrate a CentOS 7 container... but the Fedora 22 one seems to be stuck in the "started" phase. It creates a /vz/private/{ctid} dir on the destination host (with the same…
The fall semester is just around the corner... so it is impossible for me to break away for a trip to Seattle. I hope one or more of you guys can blog so I can attend vicariously.
Comments
Do you still stand by your opinions above now in 2016?…