In order to demonstrate our analyzer's diagnostic capabilities, we analyze open-source projects and write articles to discuss any interesting bugs we happen to find. We always encourage our users to suggest interesting open-source projects for analysis, and note down all the suggestions we receive via e-mail. Sometimes they come from people closely related to the project we are asked to check. In this article, we will tell you about a check of the components of the OpenVZ project we have been asked to analyze by the project manager Sergey Bronnikov.
( Read more...Collapse )
There will eventually be two distinct versions... a free version and a commercial version. So far as I can tell they currently call it Virtuozzo 7 but in a comparison wiki page they use the column names Virtuozzo 7 OpenVZ (V7O) and Virtuozzo 7 Commercial (V7C). The original OpenVZ, which is still considered the stable OpenVZ release at this time based on the EL6-based OpenVZ kernel, appears to be called OpenVZ Legacy.
Odin had previously released the source code to a number of the Virtuozzo tools and followed that up with the release of spec-like source files used by Virtuozzo's vztt OS Template build system. The plan is to migrate away from the OpenVZ specific tools (like vzctl, vzlist, vzquota, and vzmigrate) to the Virtuozzo specific tools although there will probably be some overlap for a while.
The release includes source code, binary packages and a bare-metal distro installer DVD iso.
Bare Metal Installer
I got a chance to check out the bare-metal installer today inside of a KVM virtual machine. I must admit that I'm not very familiar with previous Virtuozzo releases but I am a semi-expert when it comes to OpenVZ. Getting used to the new system is taking some effort but will all be for the better.
I didn't make any screenshots yet of the installer... I may do that later... but it is very similar to that of RHEL7 (and clones) because it is built by and based on CloudLinux... which is based on EL7.
What is CloudLinux? CloudLinux is a company that makes a commercial multi-tenant hosting product... that appears to provide container (or container-like) isolation as well as Apache and PHP enhancements specifically for multi-tenant hosting needs. CloudLinux also offers KernelCare-based reboot-less kernel updates. CloudLinux's is definitely independent from Odin and the CloudLinux products are in no way related to Virtuozzo. Odin and CloudLinux are partners however.
Why is the distro based on CloudLinux and does one need a CloudLinux subscription to use it? Well it turns out that Odin really didn't want to put forth all of the effort and time required to produce a completely new EL7-clone. CloudLinux is already an expert at that... so Odin partnered with CloudLinux to produce a EL7-based distro for Virtuozzo 7. While CloudLinux built it and (I think) there are a few underlying CloudLinux packages, everything included is FOSS (Free and Open Source Software). It DOES NOT and WILL NOT require a CloudLinux subscription to use... because it is not related to CloudLinux's product line nor does it contain any of the CloudLinux product features.
The confusion was increased when I did a yum update post-install and if failed with a yum repo error asking me to register with CloudLinux. Turns out that is a bug in this initial release and registration is NOT needed. There is a manual fix of editing a repo file in /etc/yum.repos.ed/) and replacing the incorrect base and updates URLs with a working ones. This and and other bugs that are sure to crop up will be addressed in future iso builds which are currently slated for weekly release... as well as daily package builds and updates available via yum.
More Questions, Some Answers
So this is the first effort to merge Virtuozzo and OpenVZ together... and again... me being very Virtuozzo ignorant... there is a lot to learn. How does the new system differ from OpenVZ? What are the new features coming from Virtuozzo? I don't know if I can answer every conceivable question but I was able to publicly chat with Odin's sergeyb in the #openvz IRC channel on the Freenode IRC network. I also emailed the CloudLinux folks and got a reply back. Here's what I've been able to figure out so far.
Why CloudLinux? - I mentioned that already above, but Odin didn't want to engineer their own EL7 clone so they got CloudLinux to do it for them and it was built specifically for Virtuozzo and not related to any of the CloudLinux products... and you do not need a subscription from Odin nor CloudLinux to use it.
What virtualization does it support? - Previous Virtuozzo products supported not only containers but a proprietary virtual machine hypervisor made by Odin/Parallels. In Virtuozzo 7 (both OpenVZ and Commercial so far as I can tell) the proprietary hypervisor has been replaced with the Linux kernel built-in one... KVM. See: https://openvz.org/QEMU
How about libvirt support? - Anyone familiar with EL7's default libvirtd setup for KVM will be happy to know that it is maintained. libvirtd is running by default and the network interfaces you'd expect to be there, are. virsh and virt-manager should work as expected for KVM.
Odin has been doing some libvirt development and supposedly both virsh and virt-manager should work with VZ7 containers. They are working with upstream. libvirt has supposedly supported OpenVZ for some time but there weren't any client applications that supported OpenVZ. That is changing. See: https://openvz.org/LibVirt
Command line tools? - OpenVZ's vzctl is there as is Virtuozzo's prlctl.
How about GUIs or web-based management tools? - That seems to be unclear at this time. I believe V7C will offer web-based management but I'm not sure about V7O. As mentioned in the previous question, virt-manager... which is a GUI management tool... should be usable for both containers and KVM VMs. virsh / virt-manager VZ7 container support remains to be seen but it is definitely on the roadmap.
Any other new features? - Supposedly VZ7 has a fourth-generation resource management system that I don't know much about yet. Other than the most obvious stuff (EL7-based kernel, KVM, libvirt support, Virtuozzo tools, etc), I haven't had time to absorb much yet so unfortunately I can't speak to many of the new features. I'm sure there are tons.
About OS Templates
I created a CentOS 6 container on the new system... and rather than downloading a pre-created OS Template that is a big .tar.gz file (as with OpenVZ Legacy) it downloaded individual rpm packages. It appears to build OS Templates on demand from current packages on-demand BUT it uses a caching system whereby it will hold on to previously downloaded packages in a cache directory somewhere under /vz/template/. If the desired OS Template doesn't exist already in /vz/template/cache/ the required packages are downloaded, a temporary ploop image made, the packages installed, and then the ploop disk image is compressed and added to /vz/template/cache as a pre-created OS Template. So the end result for my CentOS 6 container created /vz/template/cache/centos-6-x86_64.plain.p
The only OS Template available at time of writing was CentOS 6 but I assume they'll eventually have all of the various Linux distros available as in the past... both rpm and deb based ones. We'll just have to wait and see.
As previously mentioned, Odin has already released the source code to vztt (Virtuozzo's OS Template build system) as well as some source files for CentOS, Debian and Ubuntu template creation. They have also admitted that coming from closed source, vztt is a bit over-complicated and not easy-to-use. They plan on changing that ASAP but help from the community would definitely be appreciated.
How about KVM VMs?
I'm currently on vacation and only have access to a laptop running Fedora 22... that I'm typing this from... and didn't want to nuke it... so I installed the bare-metal distro inside of a KVM virtual machine. I didn't really want to try nested KVM. That would definitely not have been a legitimate test of the new system... but I expect libvirtd, virsh, and virt-manager to work and behave as expected.
Despite the lack of perfection in this initial release Virtuozzo 7 shows a lot of promise. While it is a bit jarring coming from OpenVZ Legacy... with all of the changes... the new features... especially KVM... really show promise and I'll be watching all of the updates as they happen. There certainly is a lot of work left to do but this is definitely a good start.
I'd love to hear from other users to find out what experiences they have. If I've made any mistakes in my analysis, please correct me immediately.
Congrats Odin and OpenVZ! I only wish I had a glass of champagne and could offer up a respectable toast... and that there were others around me to clank glasses with. :)
We are ready to announce publishing of binaries compiled from open components:
- Virtuozzo installation ISO image
- RPM packages (kernel and userspace)
- Source RPM packages (kernel and userspace)
- Debug RPM packages (kernel and userspace)
- EZ templates (CentOS 7 x86_64, CentOS 6 x86_64 etc)
FAQ (Frequently Asked Questions)
Q: Can we use binaries or Virtuozzo distribution in production?
A: No. Virtuozzo 7 is in pre-Beta stage and we strongly recommend to avoid any production use. We continue to develop new features and Virtuozzo 7 may contain serious bugs.
Q: Would it be possible to upgrade from Beta 1 to Beta 2?
A: Upgrade will be supported only for OpenVZ installed on Cloud Linux (i.e. using Virtuozzo installation image of OpenVZ installed using yum on Cloud Linux).
Q: How often you will update Virtuozzo 7 files?
A: RPM package (and yum repository) - nightly, ISO image - weekly.
Q: I don't want to use your custom kernel or distribution. How to use OpenVZ on my own Linux distribution? A: We plan to make available OpenVZ for vanilla kernels and we are working on it. If you want it - please help us with testing and contribute patches . Pay attention that using OpenVZ with vanilla kernel will have some limitations because some required kernel changes are not in upstream yet.
I updated the Fedora 22 OS Template I contributed so it was current with the release today... and for the fun of it I recorded a screencast showing how to make a Fedora 22 MATE Desktop GUI container... and how to connect to it via X2GO.
Must get moose and squirrel!
Linux Containers is an ancient technology, going back to last century. Indeed it was 1999 when our engineers started adding bits and pieces of containers technology to Linux kernel 2.2. Well, not exactly "containers", but rather "virtual environments" at that time -- as it often happens with new technologies, the terminology was different (the term "container" was coined by Sun only five years later, in 2004).
Anyway, in 2000 we ported our experimental code to kernel 2.4.0test1, and in January 2002 we already had Virtuozzo 2.0 version released. From there it went on and on, with more releases, newer kernels, improved feature set (like adding live migration capability) and so on.
It was 2005 when we finally realized we made a mistake of not employing the open source development model for the whole project from the very beginning. This is when OpenVZ was born as a separate entity, to complement commercial Virtuozzo (which was later renamed to Parallels Cloud Server, or PCS for short).
Now it's time to admit -- over the course of years OpenVZ became just a little bit too separate, essentially becoming a fork (perhaps even a stepchild) of Parallels Cloud Server. While the kernel is the same between two of them, userspace tools (notably vzctl) differ. This results in slight incompatiblities between the configuration files, command line options etc. More to say, userspace development efforts need to be doubled.
Better late than never; we are going to fix it now! We are going to merge OpenVZ and Parallels Cloud Server into a single common open source code base. The obvious benefit for OpenVZ users is, of course, more features and better tested code. There will be other much anticipated changes, rolled out in a few stages.
As a first step, we will open the git repository of RHEL7-based Virtuozzo kernel early next year (2015, that is). This has become possible as we changed the internal development process to be more git-friendly (before that we relied on lists of patches a la quilt but with home grown set of scripts). We have worked on this kernel for quite some time already, initially porting our patchset to kernel 3.6, then rebasing it to RHEL7 beta, then final RHEL7. While it is still in development, we will publish it so anyone can follow the development process.
Our kernel development mailing list will also be made public. The big advantage of this change for those who want to participate in the development process is that you'll see our proposed changes discussed on this mailing list before the maintainer adds them to the repository, not just months later when the the code is published and we'll consider any patch sent to the mailing list. This should allow the community to become full participants in development rather than mere bystanders as they were previously.
Bug tracking systems have also diverged over time. Internally, we use JIRA (this is where all those PCLIN-xxxx and PSBM-xxxx codes come from), while OpenVZ relies on Bugzilla. For the new unified product, we are going to open up JIRA which we find to me more usable than Bugzilla. Similar to what Red Hat and other major Linux vendors do, we will limit access to security-sensitive issues in order to not compromise our user base.
Last but not least, the name. We had a lot of discussions about naming, had a few good candidates, and finally unanimously agreed on this one:
Please stay tuned for more news (including more formal press release from Parallels). Feel free to ask any questions as we don't even have a FAQ yet.
Merry Christmas and a Happy New Year!
The picture describes how we develop kernel releases. It's bit more complicated than the linearity of version 1 -> version 2 -> version 3. The reason behind it is we are balancing between adding new features, fixing bugs, and rebasing to newer kernels, while trying to maintain stability for our users. This is our convoluted way of achieving all this:
As you can see, we create a new branch when rebasing to a newer upstream (i.e. RHEL6) kernel, as regressions are quite common during a rebase. At the same time, we keep maintaining the older branch in which we add stability and security fixes. Sometimes we create a new branch to add some bold feature that takes a longer time to stabilize. Stability patches are then forward-ported to the new branch, which is either eventually becoming stable or obsoleted by yet another new one.
Of course there is a lot of work behind these curtains, including rigorous internal testing of new releases. In addition to that, we usually provide those kernels to our users (in rhel6-testing repo) so they could test new stuff before it hits production servers, and we can fix more bugs earlier (more on that here). If you are not taking part in this testing, well, it's never late to start!
For many years, when people asked us how can they help the project, our typical answer was something like "just use it, file bugs, spread the word". Some people were asking specifically about how to donate money, and we had to say "currently we don't have a way to accept", so it was basically "no, we don't need your money". It was not a good answer. Even if our big sponsor is generously helping us with everything we need, that doesn't mean your money would be useless.
Today we have opened a PayPal account to accept donations. Here:
How are we going to spend your money? In general:
- Hardware for development and testing
- Travel budget for conferences, events etc.
- Accolades for active contributors
In particular, right now we need:
- About $200 to cover shipping expenses for donated hardware
- About $300 for network equipment
- About $2000 for hard drives
Donations page is created on our wiki to track donations and spending. We hope that it will see some good progress in the coming days, with a little help from you!
NOTE that if you feel like spending $500 or more, there is yet another way to spend it -- a support contract from Parallels, with access to their excellent support team and 10 free support tickets.
It's been two months since we have released vzctl 4.7 and ploop 1.11 with some vital improvements to live migration that made it about 25% faster. Believe it or not, this is another "make it faster" post, making it, well, faster. That's right, we have more surprises in store for you! Read on and be delighted.
Asynchronous ploop send
This is something that should have been implemented at the very beginning, but the initial implementation looked like this (in C):
/* _XXX_ We should use AIO. ploopcopy cannot use cached reads and * has to use O_DIRECT, which introduces large read latencies. * AIO is necessary to transfer with maximal speed. */
Why there are no cached reads (and, in general, why ploop library is using O_DIRECT, and the kernel also uses direct I/O, bypassing the cache)? Note that ploop images are files that are used as block devices containing a filesystem. That ploop block device and filesystem access is going through cache as usual, so if we'd do the same for ploop image itself, it would result in double caching and waste of RAM. Therefore, we do direct I/O on lower level (when working with ploop images), and allow usual Linux cache to be used on the upper level (when container is accessing files inside the ploop, which is the most common operation).
So, ploop image itself is not (usually) cached in memory, and other tricks like read-ahead are also not used by the kernel, and reading each block from the image takes time as it is read directly from disk. In our test lab (OpenVZ running inside KVM with a networked disk) it takes about 10 milliseconds to read each 1 MB block. Then, sending each block to the destination system over network also takes time, it's about 15 milliseconds in our setup. So, it takes 25 milliseconds to read and send a block of data, if we do it one after another. Oh wait, this is exactly how we do it!
Solution -- let's do reading and sending at the same time! In other words, while sending the first block of data we can already read the second one, and so on, bringing the time required to transfer one block down to MAX(Tread, Tsend) instead of Tread + Tsend.
Implementation uses POSIX threads. Main ploop send process spawns a separate sending thread and two buffers -- one for reading and one for sending, then they change place. This works surprisingly well for something that uses threads, maybe because it is as simple as it could ever be (one thread, one mutex, one conditional). If you happen to know something about pthreads programming, please review the appropriate commit 55e26e9606.
The new async send is used by default, you just need to have a newer ploop (1.12, not yet released). Clone and compile ploop from git if you are curious.
As for how much time we save using this, it happen to be 15 microseconds instead of 25 per block of data, or 40% faster! Now, the real savings depend on the number of blocks needs to be migrated after container is frozen, it can be 5 or 100, so overall savings can be from 0.05 to 1 second.
ploop copy with feedback
The previous post on live migration improvements described an optimization of doing fdatasync() on the receiving ploop side before suspending the container on the sending side. It also noted that implementation is sub-par:
The other problem is that sending side should wait for fsync to finish, in order to proceed with CT suspend. Unfortunately, there is no way to solve this one with a one-way pipe, so the sending side just waits for a few seconds. Ugly as it is, this is the best way possible (let us know if you can think of something better).
So, the problem is trivial -- there's a need for a bi-directional channel between ploop copy sender and receiver, so the receiver can say "All right, I have synced the freaking delta back to sender". In addition, we want to do it in a simple way, not much more complicated as it is now.
After some playing around with different approaches, it seemed that OpenSSH port forwarding, combined with tools like netcat, socat, or bash /dev/tcp feature, can do the trick of establishing a two-way pipe between ploop copy sides.
The netcat (or nc) is available in various varieties, which might or might not be available and/or compatible. socat is a bit better, but one problem with it is it ignores its child exit code, so there is no (simple) way to figure out an error. A problem with /dev/tcp feature is it is bash-specific, but bash itself might not be universally available (Debian and Ubunty users are well aware of that fact).
So, all this was replaced with a tiny home-grown vznnc ("nano-netcat) tool. It is indeed nano, as there is only only 200 lines of code in it. It can either listen or connect to a specified port at localhost (note that ssh is doing real networking for us), and run a program with its stdin/stdout (or a specified file descriptor) already connected to that TCP socket. Again, similar to netcat, but small, to the point, and hopefully bug-free.
Finally, with this in place, we can make sending side of ploop copy to wait for feedback from the receiving side, so it can suspend the container as soon as remote side finished syncing. This makes the whole migration a bit faster (by eliminating the previously hard-coded 5 seconds wait-for-sync delay), but it also helps to slash the frozen time as we suspend the container as soon as we should, so it won't be given any extra time to write some more data we'll be needing to copy while it's frozen.
It's hard to measure the practical impact of the feature, but in our tests it saves about 3.5 seconds of total migration time, and from 0 to a few seconds of frozen time, depending on container's disk I/O activity.
For the feature to work, you need the latest versions of vzctl (4.8) and ploop (1.12) on both nodes (if one node doesn't have newer tools, vzmigrate falls back to old non-feedback way of copying). Note that those new version are not yet released at the time of writing this, but you can get ones from git and compile yourself.
Using ssh connection multiplexing
It takes about 0.15 seconds to establish an ssh connection in our test lab. Your mileage may wary, but it can't be zero. Well, unless an OpenSSH feature of reusing an existing ssh connection is put to work! It's call connection miltiplexing, and when used, once a connection is established, you can have subsequent ones in practically no time. As vzmigrate is a shell script and runs a number of commands at remote (destination) side, using it might save us some time.
Unfortunately, it's a relatively new OpenSSH feature and is implemented quite ugly in a version available in CentOS 6 -- you need to keep one open ssh session running. They fixed it in a later version by adding a special daemon-like mode enabled with ControlPersist option, but alas not in CentOS 6. Therefore, we have to maintain a special "master" ssh connection for the duration of vzmigrate. For implementation details, see commits 06212ea3d and 00b9ce043.
This is still experimental, so you need to specify
flag to vzmigrate
to use it. You won't believe it (so, go test it yourself!), but this alone
slashes container frozen time by about 25% (which is great, as we fight for
every microsecond here), and improves the total time taken by vzmigrate
by up to 1 second (which is probably not that important but still nice).
Current setup used for development and testing is two OpenVZ instances
running in KVM guests on a
box. While it's convenient and oh so very space-saving, is is probably
not the best approximation of a real world OpenVZ usage.
So, we need
some better hardware:
If you are able to spend $500 to $2000 on ebay, please let us know by email to donate at openvz dot org so we can arrange it. Now, quite a few people offered hosted servers with a similar configuration. While we are very thankful to all such offers, this time we are looking for physical, not hosted, hardware.Update: Thanks to FastVPS, we got the Supermicro 4 node server. If you want to donate, see wiki: Donations.