Log in

No account? Create an account

Previous Entry | Next Entry

Good news for all of us on the virtualization front!

The latest prepatch for the stable Linux kernel tree, 2.6.19-rc1, now includes some pieces of OS-level virtualization from OpenVZ, IBM, and Eric Biederman. Those patches have been sitting in -mm (Andrew Morton’s) tree for some time already, and now, during the “2.6.19 merge window,” Andrew has submitted them to Linus Torvalds. So it’s now a part of “vanilla” Linux, and will be finally released as a part of the 2.6.19 kernel when it is released.

So, what exactly went into the Linux kernel? Essentially, three sets of patches that implementing three features needed for any OS-level virtualization solution.

First is IPC virtualization, otherwise known as IPC namespace, contributed by OpenVZ’s Kirill Korotaev and Pavel Emelianov. IPC stands for inter-process communication. This is functionality that enables different processes to create shared memory segments, send messages to each other, and use semaphores. In a virtualized system, you don’t want a container (VE) to see IPC objects from another container.

Second is utsname() virtualization (otherwise known as UTS namespace), contributed by Serge Hallyn from IBM. utsname() returns basic information about the kernel being run (same as displayed by uname -a) — such as the kernel version/release, host and domain names, and system architecture (for example, i686). So, before we had a single utsname structure in the kernel, visible to all the processes. Why do we need to virtualize it? At the very least every virtualized system should have its own hostname. We might want to change other fields, too.

Third is preliminary work needed to introduce PID namespaces feature, mostly contributed by Eric W. Biederman (and also some bits from Oleg Nesterov, IBM's Sukadev Bhattiprolu and Cedric Le Goater). Every container (VE) should be able to use its own set of process IDs (PIDs), and should not see another container's PIDs. Eric's approach is to not use pid directly in the kernel, but use a pointer to the struct pid — a structure that could hold both PID and VEID (i.e. container ID). Submitted set of patches cleans up different places in kernel where it uses PID directly, to switch to struct pid.

I am really happy it is a community work and a community process (like I said before). We see different parties bringing in code and expertize, reviewing each other's code, making suggestions, exchanging ideas and improving things — to everybody's benefit!

These are just the first steps. Much more is needed to have full OS-level virtualization in the mainstream Linux kernel. Don’t worry — we are already working on that. A few days ago Kirill sent another iteration (v5) of beancounter patchset for further review and possible inclusion. Beancounters can be used to implement per-VE limits and guarantees for certain resources such as memory.


( 2 comments — Leave a comment )
Oct. 12th, 2006 02:46 pm (UTC)
Q: Offtopic
Patch from http://forum.openvz.org/index.php?t=msg&goto=6283& (or something else like this) will be included?

Because nobosy can run quagga (aka zebra) under openvz kernels ;(((
Oct. 12th, 2006 05:09 pm (UTC)
Re: Q: Offtopic
From the same forum thread:
> patch diff-ve-netlink-perm-20061004 queued for 2.6.9-023stab030 and 2.6.8-022stab078.23

From RHEL4 kernel 2.6.9-023stab030.1 changelog:
> diff-ve-netlink-perm-20061004

So it is already included into RHEL4 and will be included in the next release of 2.6.8-based kernel.

BTW if you are using stable kernel, you might try to switch to RHEL4-based one — it will become our next stable branch soon.
( 2 comments — Leave a comment )

Latest Month

July 2016

Page Summary

Powered by LiveJournal.com
Designed by Tiffany Chow