|A shiny new vzctl 4.4 was released just today. Let's take a look at its new features.|
As you know, vzctl was able to download OS templates automatically for quite some time now, when vzctl create --ostemplate was used with a template which is not available locally. Now, we have just moved this script to a standard /usr/sbin place and added a corresponding vztmpl-dl(8) man page. Note you can use the script to update your existing templates as well.
Next few features are targeted to make OpenVZ more hassle-free. Specifically, this release adds a post-install script to configure some system aspects, such as changing some parameters in /etc/sysctl.conf and disabling SELinux. This is something that has to be done manually before, so it was described in OpenVZ installation guide. Now, it's just one less manual step, and one less paragraph from the Quick installation guide.
Another "make it easier" feature is automatic namespace propagation from the host to the container. Before vzctl 4.4 there was a need to set a nameserver for each container, in order for DNS to work inside a container. So, the usual case was to check your host's /etc/resolv.conf, find out what are nameservers, and set those using something like
Now, since defaults for most container parameters can be set in global OpenVZ configuration file /etc/vz/vz.conf, if it contains a line like
Another small new feature is ploop-related. When you start (or mount) a ploop-based container, fsck for its inner filesystem is executed. This mimics the way a real server works -- it runs fsck on boot. Now, there is a 1/30 or so probability that fsck will actually do filesystem check (it does that every Nth mount, where N is about 30 and can be edited with tune2fs). For a large container, fsck could be a long operation, so when we start containers on boot from the /etc/init.d/vz initscript, we skip such check to not delay containers start-up. This is implemented as a new
Thanks to our user and contributor Mario Kleinsasser, vzmigrate is now able to migrate containers between boxes with different VE_ROOT/VE_PRIVATE values. Such as, if one server runs Debian with /var/lib/vz and another is CentOS with /vz, vzmigrate is smart enough to note that and do proper conversion. Thank you, Mario!
Another vzmigrate enhancement is option -f/--nodeps which can be used to disable some pre-migration checks. For example, in case of live migration destination CPU capabilities (such as SSE3) are cross-checked against the ones of the source server, and if some caps are missing, migration is not performed. In fact, not too many applications are optimized to use all CPU capabilities, therefore there are moderate chances that live migration can be done. This --nodeps option is exactly for such cases -- i.e. you can use it if you know what you do.
This is more or less it regarding new features. Oh, it makes sense to note that default OS template is now centos-6-x86, and NEIGHBOR_DEVS parameter is commented out by default, because this increases the chances container networking will work "as is".
Fixes? There are a few -- to vzmigrate, vzlist, vzctl convert, vzctl working on top of upstream kernel (including some fixes for CRIU-based checkpointing), and build system. Documentation (those man pages is updated to reflect all the new options and changes.
A list of contributors to this vzctl release is quite impressive, too -- more than 10 people.
As always, if you find a bug in vzctl, please report it to bugzilla.openvz.org.
|OpenVZ ploop is a wonderful technology, and I want to share more of its wonderfulness with you. We have previously covered ploop in general and it's write tracker feature to help speed up container migration in particular. This time, I'd like to talk about snapshots and backups.|
But let's start with yet another ploop feature -- it's expandable format. When you create a ploop container with say 10G of disk space, ploop image is just slightly larger than the size of actual container files. I just created centos-6-x86 container -- ploop image size is 747M, and inside CT df shows that 737M is used. Of course, for empty ploop image (with a fresh filesystem and zero files) the ratio will be worse. Now, when CT is writing data, ploop image is auto-growing up to accomodate the data size.
Now, these images can be layered, or stacked. Imagine having a single ploop image, consisting of blocks. We can add another image on top of the first one, so that new reads will fall through to the lower image (because the upper one is empty yet), while new writes will end up being written to the upper (top) image. Perhaps this image will save some more words here:
The new (top) image is now accumulating all the changes, while the old (bottom) one is in fact the read-only snapshot of the container filesystem. Such a snapshot is cheap and instant, because there is no need to copy a lot of data or do other costly operations. Of course, ploop is not limited to only two levels -- you can create much more (up to 255 if I remember correctly, which is way above any practical limit).
What can be done with such a snapshot? We can mount it and copy all the data to a backup (update: see openvz.org/Ploop/backup). Note that such backup is very fast, online and consistent. There's more to it though. A ploop snapshot, combined with a snapshot of a running container in memory (also known as a checkpoint) and a container configuration file(s), can serve as a real checkpoint to which you can roll back.
Consider the following scenario: you need to upgrade your web site backend inside your container. First, you do a container snapshot (I mean complete snapshot, including an in-memory image of a running container). Then you upgrade, and realize your web site is all messed up and broken. Horror story, is it? No. You just switch to before-upgrade snapshot and keep working as it. It's like moving back in time, and all this is done on a running container, i.e. you don't have to shut it down.
Finally, when you don't need a snapshot anymore, you can merge it back. Merging process is when changes from an upper level are written to a lower level (i.e. the one under it), then the upper level is removed. Such merging is of course not as instant as creating a snapshot, but it is online, so you can just keep working while ploop is working with merge.
All this can be performed from the command line using vzctl. For details, see vzctl(8) man page, section Snapshotting. Here's a quick howto:
Create a snapshot:
Mount a snapshot (say to copy the data to a backup):
Rollback to a snapshot:
Delete a snapshot (merging its data to a lower level image):
|Big news for every serious OpenVZ user. Finally we have it!|
Parallels, a sponsor behind OpenVZ project, is now offering an OpenVZ Maintenance Partnership program. The program provides bug resolution support and feature development to the OpenVZ community. The OpenVZ Maintenance Partnership has a small annual fee and provides two benefits to partnership members.
Partnership members will receive a support ID that will allow them to submit up to 10 high priority bugs per year. These bugs will be placed at the highest priority level in the development stack.
Partnership members will also be able to submit a feature request(s) which will be reviewed by the Parallels engineering team. They will work with you to clarify the requirements and implementation options and provide an implementation estimate and a schedule.
Learn more and join the OpenVZ Maintenance Partnership here
|For the last two weeks or so we've been working on vzstats -- a way to get some statistics about OpenVZ usage. The system consists of a server, deployed to http://stats.openvz.org/, and clients installed onto OpenVZ machines (hardware nodes). This is currently in beta testing, with 71 servers participating at the moment. If you want to participate, read http://openvz.org/vzstats and run |
So far we have some interesting results. We are not sure how representative they are -- probably they aren't, much more servers are needed to participate-- but nevertheless they are interested. Let's share a few preliminary findings.
First, it looks like almost no one is using 32-bits on the host system anymore. This is reasonable and expected. Indeed, who needs system limited to 4GB of RAM nowdays?
Second, many hosts stay on latest stable RHEL6-based OpenVZ kernel. This is pretty good and above our expectations.
Third, very few run ploop-based containers. We don't understand why. Maybe we should write more about features you get from ploop, such as instant snapshots and improved live migration.
location: Los Angeles
Current Mood: Excited
Current Music: Prince
|The OpenVZ Project has a booth at SCALE 11x. Kir and I will be staffing it. I wrote a lengthy blog post with the day 1 report where you can read more if you want:|
I talked with Kir for about a hour last night and asked him a lot of questions. I got a lot of inside information that I will be sharing in the coming days... I promise... with pictures even.
If you are in the Southern California area and can make it to SCALE, please stop by our booth. We are #93. Assuming there is a reasonable Internet connection at the booth, I also try to be "Live from SCALE" in the #openvz IRC channel. The times are Saturday 10:00 - 18:00 PST and Sunday 10:00-16:00 PST.
|You know I love to write about performance comparisons, right?|
I just came across the one I saw some time ago and forgot about it, but now it's linked from the Wikipedia article about OpenVZ. The paper (presented at OLS'08) is comparing performance of OpenVZ, Linux-VServer, Xen, KVM and QEMU with the performance of non-virtualized Linux. The results are shocking! OpenVZ is twice slower than reference system when doing bzip -9 of an iso image! Yet better, on a dbench test OpenVZ performance was about 13% of a reference system (which is almost eight times slower), while Linux-VServer gave about 98%.
Well guys, I want to tell you just one thing. If someone offers to sell you a car for 13% of the usual price — don't buy it, it's a scam and a seller is not to be trusted. If someone tells you OpenVZ is 2 (or 8) times slower than non-virtualized system — don't buy it either!
I mean, I do know how complicated it is to have a sane test results. I spent about a year leading SWsoft QA team, mostly testing Linux kernels, and trying to make sure test results are sane and reproducible (not dependent on the moon phase etc). There are lots and lots of factors involved. But the main idea is simple: if results doesn't look plausible, if you can't explain them, dig deeper and find out why. Once you will, you will know how to exclude all the bad factors and conduct a proper test.
Update: comments disabled due to spam
We have updated vzctl, ploop and vzquota recently (I wrote about vzctl here). Some changes in packaging are tricky, so let me explain why and give some hints.
For RHEL5-based kernel users (i.e. ovzkernel-2.6.18-028stabXXX) and earlier kernels
Since ploop is only supported in RHEL6 kernel, we have removed ploop dependency from vzctl-4.0 (ploop library is loaded dynamically when needed and if available). Since you have earlier vzctl version installed, you also have ploop installed. Now you can remove it, at the same time upgrading to vzctl-4.0. That "at the same time" part is done via yum shell:
# yum shell > remove ploop ploop-lib > update vzctl > run
That should fix it. In the meantime, think about upgrading your systems to RHEL6-based kernel which is better in terms of performance, features, and speed of development.
For RHEL6-based users (i.e. vzkernel-2.6.32-042stabXXX)
New ploop library (1.5) requires very recent RHEL6-based kernel kernel (version 2.6.32-042stab061.1 or later) and is not supposed to work with earlier kernels. To protect ploop from earlier kernels, its packaging says "Conflicts: vzkernel < 2.6.32-042stab061.1" which usually prevents ploop 1.5 installation on systems having those kernels.
To fix this conflict, make sure you run the latest kernel, and then remove the old ones:
Then you can run update as usual:
Hope that helps
Update: comments disabled due to spam.
OpenVZ project is 7 years old as of last month. It's hard to believe the number, but looking back, we've done a lot of things together with you, our users.
One of the main project goals was (and still is) to include the containers support upstream, i.e. to vanilla Linux kernel. In practice, OpenVZ kernel is a fork of the Linux kernel, and we don't like it that way, for a number of reasons. The main ones are:
So, we were (and still are) working hard to bring in-kernel containers support upstream, and many key pieces are already there in the kernel -- for example, PID and network namespaces, cgroups and memory controller. This is the functionality that lxc tool and libvirt library are using. We also use the features we merged into upstream, so with every new kernel branch we have to port less, and the size of our patch set decreases.
One of such features for upstream is checkpoint/restore, an ability to save running container state and then restore it. The main use of this feature is live migration, but there are other usage scenarios as well. While the feature is present in OpenVZ kernel since April 2006, it was never accepted to upstream Linux kernel (nor was the other implementation proposed by Oren Laadan).
For the last year we are working on CRIU project, which aims to reimplement most of the checkpoint/restore functionality in userspace, with bits of kernel support where required. As of now, most of the additional kernel patches needed for CRIU are already there in kernel 3.6, and a few more patches are on its way to 3.7 or 3.8. Speaking of CRIU tools, they are currently at version 0.2, released 20th of September, which already have limited support for checkpointing and restoring an upstream container. Check criu.org for more details, and give it a try. Note that this project is not only for containers -- you can checkpoint any process trees -- it's just the container is better because it is clearly separated from the rest of the system.
One of the most important things about CRIU is we are NOT developing it behind the closed doors. As usual, we have wiki and git, but most important thing is every patch is going through the public mailing list, so everyone can join the fun.
vzctl for upstream kernel
We have also released vzctl 4.0 recently (25th of September). As you can see by the number, it is a major release, and the main feature is support for non-OpenVZ kernels. Yes it's true -- now you can have a feeling of OpenVZ without installing OpenVZ kernel. Any recent 3.x kernel should work.
As with OpenVZ kernel, you can use ready container images we have for OpenVZ (so called "OS templates") or employ your own. You can create, start, stop, and delete containers, set various resource parameters such as RAM and CPU limits. Networking (aside from routed-based) is also supported -- you can either move a network interface from host system to inside container (
Having said all that, surely OpenVZ kernel is in much better shape now as it comes for containers support -- it has more features (such as live container shapshots and live migration), better resource management capabilities, and overall is more stable and secure. But the fact that the kernel is now optional makes the whole thing more appealing (or so I hope).
You can find information on how to setup and start using vzctl at vzctl for upstream kernel wiki page. The page also lists known limitations are pointers to other resources. I definitely recommend you to give it a try and share your experience! As usual, any bugs found are to be reported to OpenVZ bugzilla.
Update: comments disabled due to spam
Current Mood: amused
|In the last week or so I've had several questions from new OpenVZ users about container migration. They want to know how it works, what is the difference between live / online migration vs. offline... and how long it will take. I made a silent screencast a few years ago but it is rather dated... and since I can easily add sound now, I thought I'd make an updated screencast. I didn't really put a lot of production value into it but overall it meets its objective. I posted it to archive.org as a webm but they automatically convert it to flash and embed it... so enjoy whichever format you prefer.|
|Checkpoint/restore, or CPT for short, is a nice feature of OpenVZ, and probably the most amazing one. In a nutshell, it's a way to freeze a container and dump its complete state (processes, memory, network connections etc) to a file on disk, and then restore from that dump, resuming processes execution as if nothing happened. This opens a way to do nifty things such as live container migration, fast reboots, high availability setups etc.|
It is our ultimate goal to merge all bits and pieces of OpenVZ to the mainstream Linux kernel. It's not a big secret that we failed miserably trying to merge the checkpoint/restore functionality (and yes, we have tried more than once). The fact that everyone else failed as well soothes the pain a bit, but is not really helpful. The reason is simple: CPT code is big, complex, and touches way too many places in the kernel.
So we* came up with an idea to implement most of CPT stuff in user space, i.e. as a separate program not as a part of the Linux kernel. In practice this is impossible because some kernel trickery is still required here and there, but the whole point was to limit kernel intervention to the bare minimum.
Guess what? It worked even better that we expected. As of today, after about a year of development, up to 90% of the stuff that is needed to be in the kernel is already there, and the rest is ready and seems to be relatively easy to merge (see this table to get an idea what's in and what's not).
As for the user space stuff, we are happy to announce the release of CRtools version 0.1. Now, let me step aside and quote Pavel's announcement:
The tool can already be used for checkpointing and restoring various individual
What's next? We will be rebasing OpenVZ to Linux Kernel 3.5 (most probably) and will try to re-use CRIU for checkpoint and restore of OpenVZ containers, effectively killing a huge chunk of out-of-tree kernel code that we have in OpenVZ kernel.
* - In fact it was Pavel Emelyanov, our chief kernel architect, but it feels oh so nice to say we that we can't refrain).
Update: comments disabled due to spam.