A new 2.6.27-based OpenVZ branch is opened, and the first 2.6.27-based OpenVZ is released.
The idea of using names instead of numbers for kernel releases is working for 2.6.26, and we decided to have some fun with 2.6.27 kernels, too. These kernels are [to be] named after famous Russian painters, of course in the alphabetical order.
First 2.6.27 OpenVZ kernel is named after Ivan Konstantinovich Aivazovsky. Aivazovsky is a great painter and I'd love to add a link to some of his masterpieces here, but while trying to find a good reproduction of "The Ninth Wave" I realized that a typical notebook/PC is incapable of displaying such art. Then you try to fit a 2x3 meters painting into a 22" computer screen, nothing good is to be expected. Even in case of high resolution copy stored in a lossless format you either see a full picture but details are lost, or you see some part of it with all the details but then you don't see the full picture. So be aware that a painting that you see online is a pathetic shadow of what you can enjoy in a real (i.e. offline) museum or art gallery.
The 2.6.27-aivazovsky kernel, on the other side, is perfect for you PC, so enjoy.
When my colleague Pavel Emelyanov returned from the 2008 Linux kernel summit back in September he brought a small present for me -- a Gumstix Overo (every LKS participant got one for free; yet another reason to become a high-profile kernel developer!). Overo is a computer (well, actually a set of boards and cables) with a CPU board the size of a gum stick, featuring TI OMAP3 CPU, 128 megs of RAM and a microSD slot. It also has 802.11g Wi-Fi and Bluetooth but those happens to be completely dead as this the first beta release of hardware.
For the last few days I was digging into a project to make OpenVZ running on this Overo thing. That involved patching OpenVZ kernel to support ARM architecture, building vzctl package (.ipk) for ARM using bitbake, and creating a template.
It was amazingly easy to port the OpenVZ kernel to ARM; you can see here that besides a big-all-in-one-openvz-for-2.6.27 patch I only had to add 4 tiny ARM-specific patches (1, 2, 3, 4). For vzctl, it was even easier -- all I had to do is to add openvz syscall numbers for ARM which were added, and create a bitbake recipe file.
Creating a template for ARM architecture was tougher but I managed to win that fight, too -- you can find a Debian Lenny template here.
Please note that all this is still in very alpha stage -- there are errors, bugs, ugly warnings, you have to modify some things in place etc. But it's working. If someone is interested in running OpenVZ on ARM hardware, please let me know -- leave a comment here or email kir (A) openvz (.) org.
We are going to release first 2.6.26-based kernel soon -- it went to testing today and hopefully will be released next week.
We are also changing the versioning scheme -- instead of boring numbers like 001, 002, 003 etc., every 2.6.26 OpenVZ kernel will be named after one or another great Russian writer. We will do it in alphabetical order so there will be no upgrade pain.
Here is an example of how things are working in the free software world.
We at OpenVZ use kernels from Red Hat Enterprise Linux as a base for our OpenVZ kernels. This is because vendors such as Red Hat invest a lot of work into making their kernels really stable. The usual recipe for a super-stable kernel is to pick a mainstream kernel and marinate it in QA for at least half a year (more for the best results), doing bugfixing and cherry-picking of fixes and driver updates from the mainstream. This way one have enough time to test it, plus (at least in theory) one get new fixes but do not get new bugs slipped into one's kernel. This is what Red Hat (and other guys such as Novell/SUSE) does for their kernels, and believe me it's quite a lot of work to do, and the end result is of great value.
Here comes the beauty of free software: now everybody can use the result of Red Hat's work. Yes, this is exactly what we do. At this point you might stand up saying: all right, Red Hat invested a lot of resources into something you use for free, this does not look like a fair deal.
Fortunately I have a good answer. Here is the list of bug (i.e. software defect) reports that were fixed in Red Hat Enterprise Linux kernels thanks to OpenVZ team (in some way): #405521, #247379, #205335 , #210852, #168659, #243252, #207463, #228461, #243263, #224541, #232209, #232211, #239767, #220971, #400651, #214778, #203894, #212144, #215715, #241096, #241096, #439670. These 22 bugs are all kernel bugs, most are security-related (and therefore quite serious). Almost all the bug reports from the list include patches (i.e. changes to code to fix a problem reported), so those are not like "hey, you have a problem", but rather "you have a problem and here's the solution".
The majority of those bugs were found while testing OpenVZ kernels. This is what we contribute back. This is also a lot of work and of great value -- some of those bugs were really hard to find and/or fix.
The latest (23rd) addition to the above list is bug #454865, which is actually a regression in a new version of RHEL4 kernel. Again, this report not only includes a clear description of what's wrong, but also a test case program which reproduces the bug, and a patch to fix it. Clear test cases are very important because those can be included into a validation test suite, to make sure bugs are not popping out for the second time (which sometimes happens in the real world).
This is just one example, a close-up picture. The big picture is free software developers and users helping other developers and users. Unus pro omnibus, omnes pro uno.
Linus has released 2.6.26-rc1 yesterday. Here rc1 means this is the first "release candidate" for 2.6.26, and the merge window is now closed, so for the next two months or so before final 2.6.26 release only bugfixes will be accepted.
And I just can't resist the temptation to post my new favorite image here, so you can enjoy it too:
As you may already know Linux kernel 2.6.25 is released today. Among many other things (see Linux 2.6.25 changelog at kernelnewbies.org for details), it moves us one step closer to having containers in the mainstream Linux. Or maybe even two steps.
First is memory controller. The code is submitted by Balbir Singh (of IBM), and is mostly based on an earlier work by Pavel Emelyanov (of OpenVZ), Balbir and some others. It uses the "control groups" (cgroups) framework introduced earlier by Paul Menage of Google. Basically, memory controller (in its current form) lets one to control the amount of physical memory used by a group of processes (i.e. by a container). This is a vital feature for containers since all the containers are using the same RAM resource, so for containers to co-exist nicely they should not be allowed to use too much memory. Now, system administrator can set a per-container memory limits. The whole technology is known as User Beancounters (or just Beancounters) in OpenVZ world -- it's just we have more different parameters (and thus knobs and dials) in OpenVZ.
But, in a sense, the memory controller that is now in mainstream is better than one we have in OpenVZ. The one in mainstream limits the amount of physical (RSS) pages used by a container, and if this limit is exceeded, pages are swapped out. Well, in fact, they are not swapped out -- this would cause the unnecessary disk I/O activity in case it's just a container limit being hit, and otherwise there is enough memory on the system. In this case container's memory pages are put to the swap cache. In case of global memory shortage this swap cache will be freed, i.e. swapped out to disk. To summarize, this cool feature allows to have containers with strict memory limits, but decent overall system behavior.
The second feature (and thus the second step) is network namespaces -- an ability to for containers to have their own network stacks. This is still a work in progress. The first bits and pieces appeared in 2.6.24. A lot of network namespaces code (more than 200 changesets I guess) now appeared in 2.6.25, and despite my earlier predictions it's still not the end of the journey. A lot more code (also about 200 changesets) is now in net-2.6.26 tree (networking subsystem branch maintained by David Miller), scheduled to be included in Linux 2.6.26. Risking to be wrong for the second time, I'm still thinking that in Linux 2.6.26 we will likely have fairly complete implementation of net namespaces. A short description of what we will try to have in 2.6.26 as it comes for networking is here.
Speaking of 2.6.26 -- looks like it will be our next base kernel. We are now maintaining 2.6.24-based development branch (which is also used for OpenVZ-enabled Ubuntu Hardy Heron kernels), and will start porting OpenVZ patchset to 2.6.26 soon.
Finally, here's the graph that shows how many changesets, per kernel release, our team has contributed. No need to comment it I guess.
Also, here's the list of top10 contributors to the Linux 2.6.25. Our company is #7.
Top changeset contributors by employer
(None) 1188 (9.3%)
Red Hat 1181 (9.3%)
Novell 817 (6.4%)
IBM 703 (5.5%)
Intel 472 (3.7%)
Bartlomiej Zolnierkiewicz 307 (2.4%)
Parallels 278 (2.2%) <---
Oracle 255 (2.0%)
bunk@kernel.org 227 (1.8%)
(Academia) 225 (1.8%)
Pavel Emelyanov has made it to top10 of developers.
Developers with the most changesets
Bartlomiej Zolnierkiewicz 307 (2.4%)
Adrian Bunk 234 (1.8%)
Patrick McHardy 225 (1.8%)
Ingo Molnar 213 (1.7%)
Paul Mundt 207 (1.6%)
Greg Kroah-Hartman 172 (1.4%)
Thomas Gleixner 166 (1.3%)
Jesper Nilsson 166 (1.3%)
Pavel Emelyanov 160 (1.3%) <---
Harvey Harrison 150 (1.2%)
Another prominent OpenVZ guy is Denis Lunev, who is number 26 in the list with 87 changesets. The full list of people contributed to this release is more than 1200 lines long.
For those of you who are not yet aware, Linux kernel 2.6.24 is finally out.
OpenVZ is (and has been, for the past few years) a good contributor to the mainline kernel. But in this release we are really doing better than before: 215 patches written by OpenVZ people submitted to the 2.6.24 kernel during the period of its development (i.e. last 3½ months). This is about 2% of all the patches that were merged into 2.6.24.
Most of that patches are for PID namespaces, preliminary support for net namespaces (i.e. network stack virtualization for containers), and various bugfixes.
PID namespace is now almost complete and quite usable, although it's marked as "experimental" for now. For the technical description of the feature, see this lwn.net article.
Net namespace is a work-in-progress, and there are already a lot of patches stacked in Dave Miller's net-2.6.25 tree for future inclusion into the 2.6.25 mainline kernel. The feature is expected to be complete and usable by 2.6.25 kernel release, with IPv6 support coming a bit later.
Jon Corbet of LWN.net also wrote about the 2.6.24 kernel statistics (back when it was still at a RC stage) here. Note that OpenVZ's Pavel Emelyanov is number 5 in "Most active developers" (by changeset) list, with 146 patches contributed.
One of the goals of OpenVZ project is to integrate containers functionality into the mainstream Linux kernel. As you know, most of the new kernel code goes through Andrew Morton, the right hand of Linus Torvalds.
I just came across the video of Andrew speaking at the LinuxWorld Expo 2007. Among the other topics, he tells what is going to be in the kernel in a year or so. It is quite interesting to see what he thinks of containers -- to see that part, scroll to 40:58.
Update: here's the transcription of the relevant part, provided by dowdle.
The one prediction I am prepared to make is that over the next 1 to 2 years there'll be quite a lot of focus in the core of the Linux kernel on the project which has many names. Some people call it containerization, others will call it operating system virtualization, other people will call it resource management. It's a whole cloud of different features which have different applications.
It can be used for machine partitioning, to partition workloads amongst one machine, otherwise known as workload management.
Server consolidation. Well, you have a whole bunch of servers which are 30 percent loaded -- move all those things onto one the machine without having to tread on each others toes.
Resource management. A number of people in the high end numerical computing want this; numerical computing area want resource management. Other people who are running world famous web search engines also want resource management in their kernel. In fact, the major, central piece of the whole containerization framework is from an engineer at Google. It's in my tree at present and I'm hoping to get it in at 2.6.24. It's just a framework for containerization. A whole lot of other stuff is going to plug in underneath it, which is under development at present.
So an example of resource management is you might have a particular group of processes, [and] you want to not let it use more than 200 MB of physical memory, and a certain amount of disk bandwidth, network bandwidth, a certain amount of CPU -- so you can just have this little blob and give it maximum amount of resources it can consume, let it run without letting it trash everything else which is running on the machine. So that is a resource management application. People also need this feature for high availability... and I'm still not really sure I understand why.
Also the OpenVZ product, which comes out of the development team in Russia -- that's a mature project that is mainly for web server virtualization, having lots and lots of different instances of the web server on one machine, not have one excessively taking resources away from another. They've been working very hard and very patiently, and with great accommodation on this project. I hope slowly we'll start moving significant parts of the OpenVZ product into the Linux kernel in a way in which it's acceptable to all the other stake holders, so that those guys don't end up carrying such a patch burden.
Here is good news for SLES users. I'm happy to report that the OpenVZ team resumed working on the SLES10-based OpenVZ kernel a few months ago, and we now have pretty stable SLES10 OpenVZ kernel. I encourage all SLES users to try it out.
The SLES10 kernel itself is based on the Linux kernel 2.6.16, and until SLES11 comes out, it remains the most "enterprise" (read stable and supported) kernel coming from Novell/SUSE. So, what we did is we took that kernel and ported our OpenVZ patchset to it. The only feature missing is I/O priority support, which is because the disk CFQ scheduler used in 2.6.16 is way too old. Other than that, it's a pretty decent kernel, and while we haven't declared it as stable yet we will do so really soon.
Last week I went to Cambridge, UK with my colleague Pavel Emelyanov to take part in the LinuxConf Europe and the containers mini-summit, as well as the Linux Kernel Summit session devoted to containers. Pavel, who works in the OpenVZ kernel team, is now working on integrating our technology into the mainstream Linux kernel. To his credit, the memory controller and the PID namespace patch (see my recent blog post), which were integrated into -mm recently, are mostly due to him.
The first event in Cambridge was LinuxConf Europe, where we both presented our talks on containers -- mine was a general introduction to virtualization, containers, and OpenVZ, while Pavel described some intimate details of memory controller (read "beancounters") implementation.
The next day we had to skip the LinuxConf to take part in the containers mini-summit. This was an event for all the containers shareholders to discuss what and how to present the containers topic at the Kernel Summit. Unfortunately, Eric Biederman (Linux Networx) and Paul Menage (Google) came later, and Balbir Singh (IBM) was buzy with VM mini-summit, so we did this mini-summit in two rounds. First round was with Pavel (OpenVZ), Cedric Le Goater (IBM), Oren Laadan (of Zap -- a checkpointing and live migration project), Kamezava Hiroyuki (of Fujitsu Japan, mostly interested in resource management), and Paul (who joined us over Skype). The second round was with Eric, Paul, and Balbir -- the next day in the hall. The results of this mini-summit are a few threads on containers@ mailing list, plus a few documents here.
Finally, there was 30-minute topic on the Kernel Summit devoted to the containers. Paul and Eric have summarized what we have done so far, and what are we going to do next. There was not much discussion, which I think is healthy because now everybody knows about containers and why they are needed. Slides from the talk are available here. Jonathan Corbet (of Linux Weekly News) also provided a summary of the topic (this is still subscriber-only content, but since I'm a subscriber I can share a free link with you).
It feels like we are making good progress and are on the right path to a containers implementation in the Linux kernel. You can see some people helping to make this happen in this photo. Click the image for larger version.
I tried it and was able to migrate a CentOS 7 container... but the Fedora 22 one seems to be stuck in the "started" phase. It creates a /vz/private/{ctid} dir on the destination host (with the same…
The fall semester is just around the corner... so it is impossible for me to break away for a trip to Seattle. I hope one or more of you guys can blog so I can attend vicariously.
Comments
Do you still stand by your opinions above now in 2016?…