<?xml version="1.0" encoding="utf-8"?>
<!-- If you are running a bot please visit this policy page outlining rules you must respect. https://www.livejournal.com/bots/ -->
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:lj="https://www.livejournal.com">
  <id>urn:lj:livejournal.com:atom1:openvz</id>
  <title>OpenVZ</title>
  <subtitle>OpenVZ</subtitle>
  <author>
    <name>OpenVZ</name>
  </author>
  <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/"/>
  <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom"/>
  <updated>2018-08-27T18:52:37Z</updated>
  <lj:journal userid="9392309" username="openvz" type="community"/>
  <link rel="service.feed" type="application/x.atom+xml" href="https://openvz.livejournal.com/data/atom" title="OpenVZ"/>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:52998</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/52998.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=52998"/>
    <title>An interview with OpenVZ kernel developer, from 2006</title>
    <published>2015-08-11T20:46:29Z</published>
    <updated>2015-08-11T23:35:32Z</updated>
    <category term="kernel"/>
    <category term="history"/>
    <category term="interview"/>
    <content type="html">&lt;p&gt;It was almost 10 years ago that I organized a kerneltrap.org interview with our at-that-time kernel team leader Andrey Savochkin, which was published on April 18, 2006. As years go by, kerneltrap.org is no more, Andrey moved on to &lt;a href="http://www.nes.ru/en/people/catalog/s/andrei-savochkin" target="_blank" rel="nofollow"&gt;got a PhD in Economics and is now an Assistant Professor&lt;/a&gt;, while OpenVZ is still here. Read on for this great piece of memorabilia.&lt;/p&gt;
&lt;hr /&gt;
  &lt;div class=""&gt;&lt;p&gt;Andrey Savochkin leads the development of the kernel portion of OpenVZ, an operating system-level server virtualization solution.  In this interview, Andrey offers a thorough explanation of what virtualization is and how it works.  He also discusses the differences between hardware-level and operating system-level virtualization, going on to compare OpenVZ to VServer, Xen and User Mode Linux.&lt;/p&gt;

&lt;p&gt;Andrey is now working to get OpenVZ merged into the mainline Linux kernel explaining, "&lt;i&gt;virtualization makes the next step in the direction of better utilization of hardware and better management, the step that is comparable with the step between single-user and multi-user systems.&lt;/i&gt;"  The complete OpenVZ patchset weighs in at around 70,000 lines, approximately 2MB, but has been broken into smaller logical pieces to aid in discussion and to help with merging.&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: Please share a little about yourself and your background...&lt;/p&gt;
&lt;p&gt;&lt;a href="http://static.openvz.org/lj/andrey.jpg" target="_blank" target="_blank" rel="nofollow"&gt;&lt;img src="https://imgprx.livejournal.net/aa7304c290b344510afcffb05f32869c3b77af53/NnIMG7IBcdh6MYcj5RfB2PYwZD06lRdc91mZmr2Jl_aTh4KfP2wQc_-ikonH-4JE1neAhbQGyhwz5xFTwvLp1w" align="left" border="0" fetchpriority="high" /&gt;&lt;/a&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: I live in Moscow, Russia, and work for &lt;a href="http://www.swsoft.com/" target="_blank" target="_blank" rel="nofollow"&gt;SWsoft&lt;/a&gt;.  My two major interests in life are mathematics and computers, and I was unable to decide for a long time which one I preferred.&lt;/p&gt;
&lt;p&gt;I studied in Moscow State University which has a quite strong mathematical school, and got M.Sc. degree in 1995 and Ph.D. degree in 1999. The final decision between mathematics and computers came at the time of my postgraduate study, and my Ph.D. thesis was completely in the computer science area, exploring some security aspects of operating systems and software intended to be used on computers with Internet access.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  What is your involvement with the OpenVZ project?&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: The OpenVZ project has kernel and userspace parts. For the kernel part, we have been using the a development model close to the model of the mainstream Linux kernel, and for a long time I accumulated and reviewed OpenVZ kernel patches and prepared "releases". Certainly, I've been contributing a lot of code to OpenVZ.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  What do you mean when you say that your development model is close to the kernel development model?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Linux kernel development model implies that the developers can't directly add their changes to the main code branch, but publish their changes. Other developers can review and provide comments, and, more importantly, there is a dedicated person who reviews all the changes, asks for corrections or clarifications, and finally incorporates the changes into the main code branch.  This model is extremely rare in producing commercial software, and in the open source software world only some projects use it. Linux kernel has been using this model from the beginning quite effectively.&lt;/p&gt;
&lt;p&gt;In my opinion, this model is very valuable for software that has high reliability requirements and, at the same time, is complex and difficult to debug by traditional means (such as debuggers, full state dump on failure, and so on).&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  OpenVZ is described as an "Operating System-level server virtualization solution".  What does this mean?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:  First, it is a virtualization solution, that is, it enables multiple environments (compartments) on a single physical server, and each environment looks like and provides the same functionality as a dedicated server.  We call these environments Virtual Private Servers (VPSs), or Virtual Environments (VEs). VPSs on a single physical server are isolated from each other, and also they are isolated from the physical hardware. Isolation from the hardware allows to implement on top of OpenVZ an automated migration of VPSs between servers that does not require any reconfiguration for running the VPSs on a very different hardware. A fair and efficient resource management mechanism is also included, as one of the most important components for a virtualization solution.&lt;/p&gt;
&lt;p&gt;Second, OpenVZ is an operating system-level solution, virtualizing access to the operating system, not to the hardware. There are many well-known hardware-level virtualization solutions, but operating system-level virtualization architecture gives many advantages over them. OpenVZ has better performance in some areas, considerably better scalability and VPS density, and provides unique management options in comparison with hardware-level virtualization solutions.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  How many VPSs can you have on one piece of hardware?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: That depends on the hardware and the "size" of VPSs and applications in them. For experimental purposes OpenVZ can run hundreds of small VPSs at the same time; in production environment -- tens of VPSs. Virtuozzo has higher density and can run hundreds production VPSs.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  When you talk about the migration of VPSs between servers, do you mean that a VPS can be running on one server and then migrate to another server where it will continue running, somewhat like a cluster?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: OpenVZ VPS will be stopped and started again, so there will be some downtime.  But this migration doesn't require any reconfiguration or other manual intervention related to IP addresses, drivers, partitions, device names or anything else. That means in the first place that taking hardware offline for maintenance or upgrade, replacement of hardware and similar things become much more painless, and this is a certain advantage of virtualization. Then, since OpenVZ allows to fully automate manipulations with VPS as a whole, it makes implementation of load balancing (as well as fail-over and other features of clustering) more easy.&lt;/p&gt;
&lt;p&gt;Virtuozzo has additional functionality called Zero-Downtime Migration.  It provides the ability to migrate a VPS from one server to another without downtime, without restart of processes and preserving network connections.  This functionality will be released as part of OpenVZ in April.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: Can you explain how the resource management mechanism works?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: In virtualization solutions resource management has two main requirements. First, it should cover enough resources to provide good isolation and security (and the isolation and security properties of resource management are one of the main differentiators between OpenVZ and VServer). Next, resource management should be flexible enough to allow high utilization of hardware when the resource demands of VPSs or virtual machines change.&lt;/p&gt;
&lt;p&gt;OpenVZ resource management operates the following resource groups:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;CPU
&lt;/li&gt;&lt;li&gt;memory
&lt;/li&gt;&lt;li&gt;pools of various OS objects
&lt;/li&gt;&lt;li&gt;disk quota
&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Each group may have multiple resources, like low memory and high memory, or disk blocks and disk inodes. Resource configuration can be specified in terms of upper limits (which may be soft or hard limits, and impose an upper boundary on the consumption of the corresponding resource), in terms of shares (or weights) for resource distribution, or in terms of guarantees (the amount of resources guaranteed no matter what other VPSs are doing).&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  What are some common uses of server virtualization?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Just examples are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Server consolidation -- moving the content of multiple servers into VPSs on a single server to reduce management (and hardware) costs.&lt;/p&gt;
&lt;/li&gt;&lt;li&gt;
&lt;p&gt;Disaster Recovery -- providing redundant environments for replication and fast data and application recovery.&lt;/p&gt;
&lt;/li&gt;&lt;li&gt;
&lt;p&gt;Improving server security -- by creating multiple VPSs and moving different services (HTTP, FTP, mail) into different VPSs.&lt;/p&gt;
&lt;/li&gt;&lt;li&gt;
&lt;p&gt;Creation of multiple environments and replication of environments for software testing and development.&lt;/p&gt;
&lt;/li&gt;&lt;li&gt;
&lt;p&gt;Hosting -- hosting service providers use Virtuozzo/OpenVZ to bridge the gap between and exceed shared and dedicated services.  Typical Virtuozzo/OpenVZ based hosting services include VPSs and Dynamic Servers which provide isolation, root access and guaranteed and burstable resources to customers.
&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  What prevents multiple operating systems running on the same server using OpenVZ from affecting each other?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
Isolation between multiple VPSs consists of&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;separation of entities such as processes, users, files and so on, and
&lt;/li&gt;&lt;li&gt;resource control.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Let's first speak about separation of processes and similar objects. There are two possible approaches to this separation: access control and separation of namespace. The former means that when someone tries to access an object, the kernel checks whether he has access rights;  the latter means that objects live in completely different spaces (for example, per-VPS lists), do not have pointers to objects in spaces other than their own and, thus, nobody can get access to objects to which he isn't supposed to get the access.&lt;/p&gt;
&lt;p&gt;OpenVZ uses both of these two approaches, choosing the approaches so that they do not reduce performance and efficiency and do not degrade isolation.&lt;/p&gt;
&lt;p&gt;In the theory of security, there are strong arguments in favor of both of these approaches.  For a long period of time different military and national security agencies in their publications and solutions preferred the first approach, accompanying it with logging. Many authors on different occasions advocate for the second approach. In our specific task, virtualization of the Linux kernel, I believe that the most important step is to identify the objects that need to be separated, and this step is absolutely same for both approaches. However, depending on the object type and data structures these two approaches differ in performance and resource consumption. For search in long lists, for example, namespace separation is better, but for large hash tables access control is better. So, the way the isolation is implemented in OpenVZ provides both safety and efficiency.&lt;/p&gt;
&lt;p&gt;Resource control is the other very important part of VPS isolation.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: When relying on namespace separation, what prevents a process in one VPS from writing to a random memory address that just happens to be used by another VPS?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Processes can't access physical memory at random addresses. They only have their virtual address space and, additionally, can get access to some named objects: processes identified by a numeric ID, files identified by their path and so on. The idea of namespace separation is to make sure that a process can identify only those objects that it is authorized to access.  For other objects, the process won't get "permission denied" error, it will be unable to see them instead.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: Can you explain a little about how resource control provides virtual private server isolation?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Resource control is very related to resource management.  It ensures that one VPS can't harm others through excessive use of some resources.  If one VPS had been able to easily take down the whole server by exhausting some system resource, we couldn't say that VPSs are really isolated from each other.  Implementing resource control, we in OpenVZ tried to prevent not only situations when one VPS can bring down the whole server, but also possibilities to cause significant performance drop for other VPSs.&lt;/p&gt;
&lt;p&gt;One of part of resource control is accounting and management of CPU, memory, disk quota, and other resources used by each VPS.  The other part is virtualization of system-wide limits.  For instance, Linux provides a system-wide limit on the number of IPC shared memory segments.  For complete isolation, this limit should apply to each VPS separately - otherwise, one VPS can use all IPC segments and other VPS will get nothing.  But certainly, most difficult part of resource control is accounting and management of resources like CPU and system memory.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  How does OpenVZ improve upon other virtualization projects, such as VServer?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: First of all, OpenVZ is a completely different project than VServer and has different code base.&lt;/p&gt;
&lt;p&gt;OpenVZ has bigger feature set (including, for example, netfilter support inside VPSs) and significantly better isolation, Denial-of-Service protection and general reliability. Better isolation and DoS protection comes from OpenVZ resource management system, which includes hierarchical CPU scheduler and User Beancounter patch to control the usage of memory and internal kernel objects. Also, we've invested a lot of efforts in the creation of the system of quality assurance, and now we have people who manually test OpenVZ as well as a large automated testing system.&lt;/p&gt;
&lt;p&gt;Virtuozzo, a virtualization solution built on the same core as OpenVZ, provides much more features, has better performance characteristics and includes many additional management capabilities and tools.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  What are some examples of hardware-level virtualization solutions?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: VMware, Xen, User Mode Linux.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  How does OpenVZ compare to Xen?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: OpenVZ has certain advantages over Xen.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;OpenVZ allows to utilize system resources such as memory and disk space much more efficiently, and because of that has better performance on memory-critical workloads.  OpenVZ does not run separate kernel in each VPS and saves memory on kernel internal data. However, even bigger efficiency of OpenVZ comes from dynamic resource allocation.  Using Xen, you need to specify in advance the amount of memory for each virtual machine and create disk device and filesystem for it, and your abilities to change settings later on the fly are very limited.  When running multiple VPSs, at each moment some VPSs are handling load burst and are busy, some are less busy and some are idle, hence the dynamic assignment of resources in OpenVZ can significantly improve the utilization of resources.  With Xen, you have to slice the server for the worst-case scenario and maximal resource usage by each VPS;  with OpenVZ you usually can slice basing on average usages.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;OpenVZ provides more management capabilities and management tools. To start, OpenVZ has from out of the box ability to immediately create VPSs based on various Linux distributions, without preparation of disk images, installing hundreds of packages and so on. But most importantly, OpenVZ has the ability to access files and start from the host system programs inside VPS. It means that a damaged VPS (having lost network access or unbootable) can be easily repaired from the host system, and that a lot of operations related to management, configuring or software upgrade inside VPSs can be easily scripted and executed from the host system. In short, managing Xen virtual machines is like managing separate servers, but managing a group of VPSs on one computer is more like managing a single multi-user server.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Operating system inside Xen virtual machine is not necessarily able to use all capabilities of the hardware;  for instance, support of SMP and more that 4GB of RAM inside virtual machines will appear only in Xen 3.0. OpenVZ is as scalable as Linux when hardware capabilities increase. SMP and more than 4GB have been supported in OpenVZ from the very beginning.  Recently we've built OpenVZ for x86_64 platform, and it was a straightforward job not requiring going into architecture details.  So, OpenVZ is far more hardware independent than Xen, and hence is able to start to use new hardware capabilities much faster.
&lt;/p&gt;&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;There is one point where Xen will have certain advantage over OpenVZ.  In version 3.0, Xen is going to allow to run Windows virtual machines on Linux host system (but it isn't possible in the stable branch of Xen).&lt;/p&gt;
&lt;p&gt;Again, I need to note that the above describes my opinion about the main differences between OpenVZ and Xen. Virtuozzo has many additions to OpenVZ, and, for instance, there is Virtuozzo for Windows solution.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  How does OpenVZ compare to User Mode Linux?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
What I've said before about advantages of OpenVZ over Xen also apply when OpenVZ is compared with User Mode Linux.&lt;/p&gt;
&lt;p&gt;The unique feature of User Mode Linux is that you can run it under standard debuggers for studying Linux kernel in depth.  In other aspects, User Mode Linux does not have as many features as Xen, and Xen is superior in performance and stability.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  Is OpenVZ portable?  That is, can we expect to see the technology ported to other kernels?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Well, OpenVZ is portable between different Linux kernels (but the amount of efforts to port between 2 kernels certainly depends on how different the kernels are).  On our FTP there are OpenVZ ports to SLES 10, Fedora Core 5 kernels. The ideas of OpenVZ are broadly portable, and we even had them implemented on FreeBSD kernel (but by now this FreeBSD port has been dropped).&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
Why was the FreeBSD port dropped?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
We decided to focus on Linux version to implement new ideas as fast as possible.
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: How widely used is OpenVZ?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: OpenVZ in its current form has just been released to the public, but we've already got considerable number of downloads (and questions). Virtuozzo, a superset of OpenVZ, already has a large number of installations. I'd estimate that currently 8,000+ servers with 400,000 VPSs on them run Virtuozzo/OpenVZ code.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  Is there any plan to try and get OpenVZ merged into the mainline Linux kernel?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Yes, we'd like to get it merged into the mainstream Linux and are working in that direction.  Virtualization makes the next step in the direction of better utilization of hardware and better management, the step that is comparable with the step between single-user and multi-user systems. Virtualization will become more demanded with the growth of hardware capabilities, such as multi-core systems that are currently in the Intel roadmap.  So, I believe that when OpenVZ is merged into the mainstream, Linux will instantly become more attractive and more convenient in many usage scenarios.  That's why I think OpenVZ project is so interesting project, and that's why I've invested so much of my time into it.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  How large are the changes required in the Linux kernel to support OpenVZ?  Can they be broken into small logical pieces?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: The current size of the OpenVZ kernel patch is about 2MB (70,000 lines). This size is not small, but it is less than 10% of the average size of the changes between minor versions in 2.6 kernel branch (e.g., 2.6.12 to 2.6.13).  OpenVZ patch split into major parts is presented here [ed: dead link].  OpenVZ code can also be viewed and downloaded from GIT repository at &lt;a href="http://git.openvz.org/" target="_blank" rel="nofollow"&gt;http://git.openvz.org/&lt;/a&gt;. One of the large parts (about 25%) is various stability fixes, which we are submitting to the mainstream.  Then comes virtualization itself, general management of resources, CPU scheduler, and so on.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
What efforts have been made so far to try and get OpenVZ merged into the kernel?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
OpenVZ patch was split into smaller pieces, easier for us to explain and for the community to accept.  Then, in the last couple of months, some virtualization pieces have been send to the linux-kernel mailing list and actively discussed there.&lt;/p&gt;
&lt;p&gt;The biggest argument was whether we want "partial" virtualization, when VPSs can have, for example, isolated network but common filesystem space.  In my personal opinion, in some perfect world such partial virtualization would be ok.  But in real life, subsystems of Linux kernel have a lot of dependencies on each other: every subsystem interacts with proc filesystem, for example.  Virtualization is cheap, so its easier to to have complete isolation, both from the implementation point of view and then for use and management of VPSs by users.&lt;/p&gt;
&lt;p&gt;The process of submitting OpenVZ patches into the mainstream keeps going. Also, we are working with SuSE, RedHat (RHEL and Fedora Core), Xandros, and Mandriva to include OpenVZ in their distributions and make it available and well supported for maximum number of users.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
What do you think is the biggest obstacle that could keep OpenVZ from being merged into the mainline Linux kernel?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
I don't see any serious obstacles. OpenVZ code is available, its functionality has been proven to be very useful - I think it is now running on 8,000+ servers. So, it is just a matter of continuing the discussion to make everyone involved agree what exactly we want to have in Linux and how technically we want to organize these new capabilities.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: You've referred to OpenVZ as a subset of Virtuozzo.  What is Virtuozzo, and what does it add over OpenVZ?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: OpenVZ is SWsoft's contribution to the community. Virtuozzo is a commercial product, built on the same core backend, with many additional features and management tools.&lt;/p&gt;
&lt;p&gt;Virtuozzo provides much more efficient resource sharing through VZFS filesystem, and better scalability and higher VPS per node density because of that;  new generation resource and service level management;  different system of OS and application templates;  tools for VPS migration between nodes and for conversion of a dedicated server into a VPS;  monitoring, statistics and traffic accounting tools;  additional management APIs and various GUI and Web-based tools, including self-management and recovery tools for VPS users and owners.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
Are there plans to eventually release any of this additional functionality under the GPL?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: SWsoft, the company that I work for, is very positive about Open Source movement, and has been contributing a lot of code to the Open Source. OpenVZ is a big piece of code contributed to the community, and people working for our company have submitted many fixes to the mainstream Linux kernel not related to OpenVZ.  I believe it is very likely that many parts of our additional code working on top of OpenVZ will eventually be also released under GPL.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
Why was a subset of Virtuozzo, OpenVZ, released as open source?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
The company believes that this code should belong to the community. The strength of Linux is that innovations spread fast and become available to everyone, and we should be in line with it. Moreover, we believe that virtualization must and will become a part of the OS, and we want to speed up this process.&lt;/p&gt;
&lt;p&gt;When it comes to the kernel parts of the code, GPL license just requires them to be released under GPL.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: How many people from SWsoft are working on OpenVZ?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
I don't think anyone at SWsoft works on OpenVZ 100% of his time. But, I guess, 15 to 20 people from SWsoft have made significant contributions to OpenVZ.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
Do all improvements to Virtuozzo that could also benefit OpenVZ get merged into OpenVZ?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;:&lt;br /&gt;
If some improvements were made in a course of Virtuozzo development but belong to the OpenVZ part, they would certainly be released. Everything that is related to core functionality, to virtualization, isolation and protection between VPSs is immediately pushed from Virtuozzo to OpenVZ.
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:  What other kernel projects have you contributed to?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Well, I've been contributing to the Linux kernel here and there from 1996. Historically, the area where I contributed most code was networking, including TCP, routing and other parts.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;:&lt;br /&gt;
What are some examples of the networking code that you've contributed?&lt;br /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: Many pieces here and there. I maintained eepro100 driver for some time, I wrote inetpeer cache, contributed some pieces to window management algorithm and MTU discovery in TCP, to routing code, and so on.&lt;/p&gt;
&lt;p&gt;Well, OpenVZ and especially its resource management part will be another my major contribution.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: How do you enjoy spending your free time when you're not working on OpenVZ?&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Andrey Savochkin&lt;/i&gt;: I like reading and read a lot. In music, I'm very fond of the Baroque period and try to attend every such concert in Moscow. When I have time for a longer vacation, I enjoy diving.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Jeremy Andrews&lt;/i&gt;: Thanks for all your time in answering my questions!&lt;/p&gt;
&lt;/div&gt;
&lt;a name='cutid1-end'&gt;&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:51375</id>
    <author>
      <name>Сергей Бронников</name>
    </author>
    <lj:poster user="estetus" userid="12957684"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/51375.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=51375"/>
    <title>[Security] Important information about latest kernel	updates</title>
    <published>2015-07-23T13:40:11Z</published>
    <updated>2015-07-24T10:13:01Z</updated>
    <category term="kernel"/>
    <category term="security"/>
    <content type="html">Last time we released a few kernel updates with security fixes:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;li&gt;Critical security issue was fixed in &lt;a href="https://openvz.org/Download/kernel/rhel6/042stab108.7" target="_blank" rel="nofollow"&gt;OpenVZ kernel 2.6.32-042stab108.7&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;OpenVZ kernel team discovered security issue that allows privileged user inside&lt;br /&gt;container to get access to files on host. All kind of containers affected: simfs, ploop and vzfs. Affected all kernels since 2.6.32-042stab105.x&lt;br /&gt;&lt;br /&gt;Note: RHEL5-based kernels 2.6.18, Red Hat and mainline kernels are not affected.&lt;br /&gt;&lt;br /&gt;&lt;li&gt;8 security issues fixed in &lt;a href="https://openvz.org/Download/kernel/rhel6/042stab108.8" target="_blank" rel="nofollow"&gt;OpenVZ kernel 2.6.32-042stab108.8&lt;/a&gt;&lt;/li&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2014-3184.html" target="_blank" rel="nofollow"&gt;CVE-2014-3184&lt;/a&gt; HID: off by one error in various _report_fixup routines&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2014-3940.html" target="_blank" rel="nofollow"&gt;CVE-2014-3940&lt;/a&gt; missing check during hugepage migration&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2014-4652.html" target="_blank" rel="nofollow"&gt;CVE-2014-4652&lt;/a&gt; ALSA: control: protect user controls against races &amp; memory disclosure&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2014-8133.html" target="_blank" rel="nofollow"&gt;CVE-2014-8133&lt;/a&gt; x86: espfix(64) bypass via set_thread_area and CLONE_SETTLS&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2014-8709.html" target="_blank" rel="nofollow"&gt;CVE-2014-8709&lt;/a&gt; net: mac80211: plain text information leak&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2014-9683.html" target="_blank" rel="nofollow"&gt;CVE-2014-9683&lt;/a&gt; buffer overflow in eCryptfs&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2015-0239.html" target="_blank" rel="nofollow"&gt;CVE-2015-0239&lt;/a&gt; kvm: insufficient sysenter emulation when invoked from 16-bit code&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="https://www.redhat.com/security/data/cve/CVE-2015-3339.html" target="_blank" rel="nofollow"&gt;CVE-2015-3339&lt;/a&gt; kernel: race condition between chown() and execve()&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Note: RHEL5-based kernels 2.6.18 are not affected.&lt;br /&gt;&lt;br /&gt;It is quite critical to install latest OpenVZ kernel to protect your systems.&lt;br /&gt;Please reboot your nodes into fixed kernels or install live patches from &lt;a href="http://kernelcare.com/" target="_blank" rel="nofollow"&gt;Kernel Care&lt;/a&gt;.&lt;br /&gt;&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:49158</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/49158.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=49158"/>
    <title>OpenVZ past and future</title>
    <published>2014-12-26T23:57:50Z</published>
    <updated>2014-12-26T23:57:50Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="vzcore"/>
    <category term="virtuozzo core"/>
    <category term="git"/>
    <category term="jira"/>
    <content type="html">Looking forward to 2015, we have very exciting news to share on the future on OpenVZ. But first, let's take a quick look into OpenVZ history.&lt;br /&gt;&lt;br /&gt;Linux Containers is an ancient technology, going back to last century. Indeed it was 1999 when our engineers started adding bits and pieces of containers technology to Linux kernel 2.2. Well, not exactly "containers", but rather "virtual environments" at that time -- as it often happens with new technologies, the terminology was different (the term "container" was coined by Sun only five years later, in 2004).&lt;br /&gt;&lt;br /&gt;Anyway, in 2000 we ported our experimental code to kernel 2.4.0test1, and in January 2002 we already had Virtuozzo 2.0 version released. From there it went on and on, with more releases, newer kernels, improved feature set (like adding live migration capability) and so on.&lt;br /&gt;&lt;br /&gt;It was 2005 when we finally realized we made a mistake of not employing the open source development model for the whole project from the very beginning. This is when OpenVZ was born as a separate entity, to complement commercial Virtuozzo (which was later renamed to Parallels Cloud Server, or PCS for short).&lt;br /&gt;&lt;br /&gt;Now it's time to admit -- over the course of years OpenVZ became just a little bit too separate, essentially becoming a fork (perhaps even a stepchild) of Parallels Cloud Server. While the kernel is the same between two of them, userspace tools (notably vzctl) differ. This results in slight incompatiblities between the configuration files, command line options etc. More to say, userspace development efforts need to be doubled.&lt;br /&gt;&lt;br /&gt;Better late than never; we are going to fix it now! &lt;b&gt;We are going to merge OpenVZ and Parallels Cloud Server into a single common open source code base.&lt;/b&gt; The obvious benefit for OpenVZ users is, of course, more features and better tested code. There will be other much anticipated changes, rolled out in a few stages.&lt;br /&gt;&lt;br /&gt;As a first step, &lt;b&gt;we will open the git repository of RHEL7-based Virtuozzo kernel&lt;/b&gt; early next year (2015, that is). This has become possible as we changed the internal development process to be more git-friendly (before that we relied on lists of patches a la quilt but with home grown set of scripts). We have worked on this kernel for quite some time already, initially porting our patchset to kernel 3.6, then rebasing it to RHEL7 beta, then final RHEL7. While it is still in development, we will publish it so anyone can follow the development process.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Our kernel development mailing list will also be made public.&lt;/b&gt; The big advantage of this change for those who want to participate in the development process is that you'll see our proposed changes discussed on this mailing list before the maintainer adds them to the repository, not just months later when the the code is published and we'll consider any patch sent to the mailing list.  This should allow the community to become full participants in development rather than mere bystanders as they were previously.&lt;br /&gt;&lt;br /&gt;Bug tracking systems have also diverged over time. Internally, we use JIRA (this is where all those PCLIN-xxxx and PSBM-xxxx codes come from), while OpenVZ relies on Bugzilla. &lt;b&gt;For the new unified product, we are going to open up JIRA&lt;/b&gt; which we find to me more usable than Bugzilla. Similar to what Red Hat and other major Linux vendors do, we will limit access to security-sensitive issues in order to not compromise our user base.&lt;br /&gt;&lt;br /&gt;Last but not least, the name. We had a lot of discussions about naming, had a few good candidates, and finally unanimously agreed on this one:&lt;br /&gt;&lt;br /&gt;&lt;big&gt;&lt;b&gt;&lt;center&gt;Virtuozzo Core&lt;/center&gt;&lt;/b&gt;&lt;/big&gt;&lt;br /&gt;&lt;br /&gt;Please stay tuned for more news (including more formal press release from Parallels). Feel free to ask any questions as we don't even have a FAQ yet.&lt;br /&gt;&lt;br /&gt;Merry Christmas and a Happy New Year!</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:49112</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/49112.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=49112"/>
    <title>On kernel branching</title>
    <published>2014-11-26T22:36:08Z</published>
    <updated>2014-11-27T02:58:11Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="rhel6"/>
    <content type="html">This is a topic I always wanted to write about but was afraid my explanation would end up very cumbersome. This is no longer the case as we now have a picture that worth a thousand words!&lt;br /&gt;&lt;br /&gt;The picture describes how we develop kernel releases. It's bit more complicated than the linearity of version 1 -&amp;gt; version 2 -&amp;gt; version 3. The reason behind it is we are balancing between adding new features, fixing bugs, and rebasing to newer kernels, while trying to maintain stability for our users. This is our convoluted way of achieving all this:&lt;br /&gt;&lt;br /&gt;&lt;a target="_blank" href="http://ic.pics.livejournal.com/k001/990679/928/928_original.png" target="_blank"&gt;&lt;img src="https://ic.pics.livejournal.com/k001/990679/928/928_1000.png" alt="kernel_tree-2.6.32-x" title="kernel_tree-2.6.32-x" fetchpriority="high"&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As you can see, we create a new branch when rebasing to a newer upstream (i.e. RHEL6) kernel, as regressions are quite common during a rebase. At the same time, we keep maintaining the older branch in which we add stability and security fixes. Sometimes we create a new branch to add some bold feature that takes a longer time to stabilize. Stability patches are then forward-ported to the new branch, which is either eventually becoming stable or obsoleted by yet another new one.&lt;br /&gt;&lt;br /&gt;Of course there is a lot of work behind these curtains, including rigorous internal testing of new releases. In addition to that, we usually provide those kernels to our users (in &lt;a href="http://openvz.org/Download/kernel/rhel6-testing" target="_blank" rel="nofollow"&gt;rhel6-testing&lt;/a&gt; repo) so they could test new stuff before it hits production servers, and we can fix more bugs earlier (&lt;a href="http://openvz.livejournal.com/45010.html" target="_blank"&gt;more on that here&lt;/a&gt;). If you are not taking part in this testing, well, it's never late to start!</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:48014</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/48014.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=48014"/>
    <title>An interview with ANK</title>
    <published>2014-04-15T03:13:00Z</published>
    <updated>2018-08-27T18:52:37Z</updated>
    <category term="kernel"/>
    <category term="interview"/>
    <category term="ank"/>
    <category term="linux"/>
    <content type="html">This is a rare interview with the legendary Alexey Kuznetsov (a.k.a. ANK), who happen to work for Parallels. Alan Cox once said he had thought for a long time that "Kuznetsov" is a collective name for a secret group of Russian programmers -- because no single man can write so much code at once.&lt;br /&gt;&lt;br /&gt;An interview is taken by &lt;a href="http://lifehacker.ru/2013/08/01/ank/" target="_blank" rel="nofollow"&gt;lifehacker.ru&lt;/a&gt; and is part of "work places" series. I tried to do my best to translate it to English, but it's far from perfect. I guess this is still a very interesting reading.&lt;hr&gt;&lt;br /&gt;&lt;br /&gt;&lt;img src="https://ic.pics.livejournal.com/k001/990679/1047/1047_1000.jpg" alt="" title="" fetchpriority="high"&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q: Who are you and what you do?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Since mid-90s I was one of Linux maintainers. Back then all the communication was done via conferences and Linux mailing lists. Pretty often I was aggressively arguing with someone there, don't remember for which reasons. Now it's fun to recall. Being a maintainer, I wasn't just making something on my own, but had to control others. Kicking out those who were making rubbish (from my point of view), and supporting those who were making something non-rubbish. All these conflicts, they were exhausting me. Since some moment I started noticing I am becoming "bronzed" [Alexey is referring to superiority complex -- Kir]. You said or did some crap, and then learn that this is becoming the right way now, since ANK said so.&lt;br /&gt;&lt;br /&gt;I started to doubt, maybe I am just using my authority to maintain status quo. Every single morning started with a fight with myself, then with the world. In 2003 I got fed by it, so I went away from public, and later switched to a different field of knowledge. At that time I started my first project in Parallels. The task was to implement live migration of containers, and it was very complicated.&lt;br /&gt;&lt;br /&gt;Now in Parallels we work on Parallels Cloud Storage project, developing cluster file systems for storing virtual machine images. The technology itself is a few years old already, we did a release recently, and are now working on improving it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q: How does your workplace look like?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;My workplace is a bunch of computers. But I only work on a notebook, currently it's Lenovo T530. Other computers here are used for various purposes. This display, standing here, I never use it, nor this keyboard. Well, only if something breaks. Here we have different computers, including a Power Mac, an Intel and an AMD. I was using those in different years for different experiments. Once I needed to create a cluster of 3 machines right here at my workplace. One machine here is really old, and its sole purpose is to manage a power switch, so I can reboot all the others when working remotely from home. Here I have two Mac Minis and a Power Mac. They are always on, but I use them rarely, only when I need to see something in Parallels Desktop.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q: What software do you use?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I don't use anything except for Google Chrome. Well, an editor and a compiler, if they qualify for software. I also store some personal data and notes in Evernote.&lt;br /&gt;&lt;br /&gt;I only use a text console. For everything. In fact, on a newer notebooks, when the screen is getting better, the console mode is working worse and worse. So I am working in a graphical environment now, running a few full-screen terminals on top of it. It looks like a good old Unix console. So this is how I live, read email, work.&lt;br /&gt;&lt;br /&gt;I do have a GMail account, I use it to read email from my phone. Sometimes it is needed. Or, when I see someone sent me a PDF, I have nothing else to do than to forward this email to where I can open that PDF. Same for PPT. But this is purely theoretical, in practice I never work with PPT.&lt;br /&gt;&lt;br /&gt;I use Linux. Currently it is Fedora 13 -- not a newest one, to say at least. I am always using a version that was a base for a corresponding RHEL release. Every few years a new Red Hat [Enterprise Linux] is released, so I install a new system. When I do not change anything for a few years. Say, 5 years. I can't think of any new feature of an OS that would force me to update. I am just using the system as an editor, same as I have used it 20 years ago.&lt;br /&gt;&lt;br /&gt;I have a phone, Motorola RAZR Maxx, running Android. I don't like iOS. You can't change anything in there. Not that I like customizations, I like a possibility to customize. I got a Motorola because I hate Samsung. This hatred is absolutely irrational. I had no happiness with any Samsung product, they either did't work for me or they break. I need a phone to make calls and check emails, that is all I need. Everything else is disabled -- to save the battery.&lt;br /&gt;&lt;br /&gt;I am also reading news over RSS every day, like a morning paper. Now Feedly, before it was Google Reader, until they closed it. I have a list of bloggers I read, I won't mention their names. I am also reading Russian and foreign media. Lenta.ru, for example. There's that nice english-language service, News 360. It fits for what I like and gives me the relevant news. I am not able to say if it works or not, but the fact is, what it shows to me is really interesting to me. It was showing a lot of sports news at first, but then they disappeared.&lt;br /&gt;&lt;br /&gt;I don't use instant messengers like Skype or ICQ, it's just meaningless. If you need something, write an email. If you need it urgently, call. Email and phone covers everything.&lt;br /&gt;&lt;br /&gt;Speaking of social networks, I do have a Facebook account with two friends -- my wife and my sister. I view this account only when they post a picture, I don't wander there for no reason.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q: Is there a use for paper in your work?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;It's a mess. I don't have a pen, so when I would need it I could not find it. If I am close to the notebook and I need to write something -- I write to a file. If I don't have a notebook around, I write to my phone. For these situations I recently started to use Google Keep, a service to store small notes. It is convenient so far. Otherwise I use Evernote. Well, I don't have a system for that. But I do have a database of everything on my notebook: perpetual emails, all the files and notes. All this stuff is indexed. Total size is about 10 gigabytes, since I don't have any graphics. Well, if we throw away all the junk from there, almost nothing will remain.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q: Is there a dream configuration?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;What I have here is more than enough for me. This last notebook is probably the best notebook I ever had.&lt;br /&gt;&lt;br /&gt;I was getting used to it for a long time, swore a lot. I only use Thinkpads for a long time. They are similar from version to version, but each next one is getting bigger and heavier, physically. This is annoying. This model, they changed the keyboard. I had to get used to it, but now I realize this is the best keyboard I ever had. In general, I am pretty satisfied with ThinkPads. Well, if it would had a Retina screen and be just 1 kilogram less weight -- that would be ideal.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:45831</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/45831.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=45831"/>
    <title>Yay to I/O limits!</title>
    <published>2013-10-30T03:12:10Z</published>
    <updated>2013-10-30T03:15:34Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="iolimit"/>
    <category term="vzctl"/>
    <category term="iopslimit"/>
    <content type="html">&lt;p&gt;Today we are releasing a somewhat small but very important OpenVZ feature: &lt;b&gt;per-container disk I/O bandwidth and &lt;a href="http://en.wikipedia.org/wiki/IOPS" target="_blank" rel="nofollow"&gt;IOPS&lt;/a&gt; limiting.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;OpenVZ have I/O priority feature for a while, which lets one set a per-container I/O priority -- a number from 0 to 7. This is working in a way that if two similar containers with similar I/O patterns, but different I/O priorities are run on the same system, a container with a prio of 0 (lowest) will have I/O speed of about 2-3 times less than that of a container with a prio of 7 (highest). This works for some scenarios, but not all.&lt;/p&gt;

&lt;p&gt;So, I/O bandwidth limiting was introduced in &lt;a href="http://www.parallels.com/products/pcs/" target="_blank" rel="nofollow"&gt;Parallels Cloud Server&lt;/a&gt;, and as of today is available in OpenVZ as well. Using the feature is very easy: you set a limit for a container (in megabytes per second), and watch it obeying the limit. For example, here I try doing I/O without any limit set first:&lt;/p&gt;

&lt;pre&gt;
root@host# vzctl enter 777
root@CT:/# cat /dev/urandom | pv -c - &amp;gt;/bigfile
 88MB 0:00:10 [8.26MB/s] [         &amp;lt;=&amp;gt;      ]
^C
&lt;/pre&gt;

&lt;p&gt;Now let's set the I/O limit to 3 MB/s:&lt;/p&gt;

&lt;pre&gt;
root@host# vzctl set 777 --iolimit 3M --save
UB limits were set successfully
Setting iolimit: 3145728 bytes/sec
CT configuration saved to /etc/vz/conf/777.conf
root@host# vzctl enter 777
root@CT:/# cat /dev/urandom | pv -c - &amp;gt;/bigfile3
39.1MB 0:00:10 [   3MB/s] [         &amp;lt;=&amp;gt;     ]
^C
&lt;/pre&gt;

&lt;p&gt;If you run it yourself, you'll notice a spike of speed at the beginning, and then it goes down to the limit. This is so-called burstable limit working, it allows a container to over-use its limit (up to 3x) for a short time.&lt;/p&gt;

&lt;p&gt;In the above example we tested writes. Reads work the same way, except when read data are in fact coming from the page cache (such as when you are reading the file which you just wrote). In this case, no actual I/O is performed, therefore no limiting.&lt;/p&gt;

&lt;p&gt;Second feature is &lt;a href="http://en.wikipedia.org/wiki/IOPS" target="_blank" rel="nofollow"&gt;I/O operations per second, or just IOPS&lt;/a&gt; limit. For more info on what is IOPS, go read the linked Wikipedia article -- all I can say here is for traditional rotating disks the hardware capabilities are pretty limited (75 to 150 IOPS is a good guess, or 200 if you have high-end server class HDDs), while for SSDs this is much less of a problem. IOPS limit is set in the same way as iolimit (&lt;code&gt;vzctl set $CTID --iopslimit NN --save&lt;/code&gt;), although measuring its impact is more tricky.&amp;lt;/o&amp;gt;

&lt;p&gt;Finally, to play with this stuff, you need:&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openvz.org/Download/vzctl" target="_blank" rel="nofollow"&gt;vzctl&lt;/a&gt; 4.6 (or higher)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openvz.org/Download/kernel/rhel6-testing" target="_blank" rel="nofollow"&gt;Kernel&lt;/a&gt; 042stab084.3 (or higher)&lt;/li&gt;
&lt;/ul&gt;

Note that the kernel with this feature is currently still in testing -- so if you haven't done so, it's time to read about &lt;a href="http://openvz.livejournal.com/45010.html" target="_blank"&gt;testing kernels&lt;/a&gt;.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:45647</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/45647.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=45647"/>
    <title>Is OpenVZ obsoleted?</title>
    <published>2013-10-15T16:11:53Z</published>
    <updated>2013-11-07T16:32:19Z</updated>
    <category term="containers"/>
    <category term="kernel"/>
    <category term="lxc"/>
    <category term="openvz"/>
    <category term="ubuntu"/>
    <category term="rhel6"/>
    <category term="debian"/>
    <content type="html">Oh, such a provocative subject! Not really. Many people do believe that OpenVZ is obsoleted, and when I ask why, three most popular answers are:&lt;br /&gt;&lt;br /&gt;1. OpenVZ kernel is old and obsoleted, because it is based on 2.6.32, while everyone in 2013 runs 3.x.&lt;br /&gt;2. LXC is the future, OpenVZ is the past.&lt;br /&gt;3. OpenVZ is no longer developed, it was even removed from Debian Wheezy.&lt;br /&gt;&lt;br /&gt;Let me try to address all these misconceptions, one by one.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. "OpenVZ kernel is old".&lt;/b&gt; Current OpenVZ kernels are based on kernels from Red Hat Enterprise Linux 6 (RHEL6 for short). This is the latest and greatest version of enterprise Linux distribution from Red Hat, a company who is always almost at the top of the list of top companies contributing to the Linux kernel development (see &lt;a href="http://lwn.net/Articles/507986/" target="_blank" rel="nofollow"&gt;1&lt;/a&gt;, &lt;a href="http://lwn.net/Articles/451243/" target="_blank" rel="nofollow"&gt;2&lt;/a&gt;, &lt;a href="http://lwn.net/Articles/373405/" target="_blank" rel="nofollow"&gt;3&lt;/a&gt;, &lt;a href="http://lwn.net/Articles/222773/" target="_blank" rel="nofollow"&gt;4&lt;/a&gt; for a few random examples). While no kernel being ideal and bug free, RHEL6 one is a good real world approximation of these qualities.&lt;br /&gt;&lt;br /&gt;What people in Red Hat do for their enterprise Linux is they take an upstream kernel and basically fork it, ironing out the bugs, cherry-picking security fixes, driver updates, and sometimes new features from upstream. They do so for about half a year or more before a release, so the released kernel is already "old and obsoleted", as it seems if one is looking at the kernel version number. Well, don't judge a book by its cover, don't judge a kernel by its number. Of course it's not old, neither obsoleted -- it's just more stable and secure. And then, after a release, it is very well maintained, with modern hardware support, regular releases, and prompt security fixes. This makes it a great base for OpenVZ kernel. In a sense, we are standing on the shoulders of a red hatted giant (and since this is open source, &lt;a href="http://openvz.livejournal.com/23621.html" target="_blank"&gt;they are standing just a little bit on our shoulders&lt;/a&gt;, too).&lt;br /&gt;&lt;br /&gt;RHEL7 is being worked on right now, and it will be based on some 3.x kernel (possibly 3.10). We will port OpenVZ kernel to RHEL7 once it will become available. In the meantime, RHEL6-based OpenVZ kernel is latest and greatest, and please don't be fooled by the fact that uname shows 2.6.32.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. OpenVZ vs LXC.&lt;/b&gt; OpenVZ kernel was historically developed separately, i.e. aside from the upstream Linux kernel. This mistake was recognized in 2005, and since then we keep working on merging OpenVZ bits and pieces to the upstream kernel. It took way longer than expected, we are still in the middle of the process with some great stuff (like net namespace and &lt;a href="http://criu.org/" target="_blank" rel="nofollow"&gt;CRIU&lt;/a&gt;, totally more than 2000 changesets) merged, while some other features are still in our TODO list. In the future (another eight years? who knows...) OpenVZ kernel functionality will probably be fully upstream, so it will just be a set of tools. We are happy to see that Parallels is not the only company interested in containers for Linux, so it might happen a bit earlier. For now, though, we still rely on our organic non-GMO home grown kernel (although it is &lt;a href="http://wiki.openvz.org/Vzctl_for_upstream_kernel" target="_blank" rel="nofollow"&gt;already optional&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Now what is LXC? In fact, it is just another user-space tool (not unlike vzctl) that works on top of a recent upstream kernel (again, not unlike vzctl). As we work on merging our stuff upstream, LXC tools will start using new features and therefore benefit from this work. So far at least half of kernel functionality used by LXC was developed by our engineers, and while we don't work on LXC tools, it would not be an overestimation to say that Parallels is the biggest LXC contributor.&lt;br /&gt;&lt;br /&gt;So, both OpenVZ and LXC are actively developed and have their future. We might even merge our tools at some point, the idea was briefly discussed during last containers mini-conf at Linux Plumbers. LXC is not a successor to OpenVZ, though, they are two different projects, although not entirely separate (since OpenVZ team contributes to the kernel a lot, and both tools use the same kernel functionality). OpenVZ is essentially LXC++, because it adds some more stuff that are not (yet) available in the upstream kernel (such as stronger isolation, better resource accounting, plus some auxiliary ones like ploop).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. OpenVZ no longer developed, removed from Debian&lt;/b&gt;. Debian kernel team decided to drop OpenVZ (as well as few other) kernel flavors from Debian 7 a.k.a. Wheezy. This is completely understandable: kernel maintenance takes time and other resources, and they probably don't have enough. That doesn't mean though that OpenVZ is not developed. It's really strange to argue that, but please check our &lt;a href="http://openvz.org/News/updates" target="_blank" rel="nofollow"&gt;software updates page&lt;/a&gt; (or the &lt;a href="https://lists.openvz.org/pipermail/announce/" target="_blank" rel="nofollow"&gt;announce@ mailing list archives&lt;/a&gt;). We made about 80 software releases this year so far. This accounts for 2 releases every week. Most of those are new kernels. So no, in no way it is abandoned.&lt;br /&gt;&lt;br /&gt;As for Debian Wheezy, we are providing our repository with OpenVZ kernel and tools, &lt;a href="http://openvz.livejournal.com/45345.html" target="_blank"&gt;as it was announced just yesterday&lt;/a&gt;.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:45345</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/45345.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=45345"/>
    <title>Debian kernel packages</title>
    <published>2013-10-08T21:54:24Z</published>
    <updated>2013-10-08T21:55:38Z</updated>
    <category term="kernel"/>
    <category term="packages"/>
    <category term="debian"/>
    <category term="linux"/>
    <content type="html">&lt;div align="right"&gt;&lt;i&gt;Good news, everyone!&lt;/i&gt;&lt;br /&gt;Prof. Farnsworth&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Many people use OpenVZ on Debian. In fact, Debian was one of the distribution that come with OpenVZ kernel and tools. Unfortunately, it's not that way anymore, since Debian 7 "Wheezy" dropped OpenVZ kernel. A workaround was to take an RPM-packaged OpenVZ kernel and convert it to .deb using alien tool, but the process is manual and somewhat unnatural.&lt;br /&gt;&lt;br /&gt;Finally, now we have a working build system for Debian kernel packages, and a repository for Debian Wheezy with latest and greatest OpenVZ kernels, as well as tools. In fact, we have two: one for stable, one for testing kernels and tools. Kernels debs are built and released at the same time as rpms. Currently we have vzctl/vzquota/ploop in 'wheezy-test' repository only -- once we'll be sure they work as expected, we will move those into stable 'wheezy' repo.&lt;br /&gt;&lt;br /&gt;To enable these repos:&lt;br /&gt;&lt;code&gt;&lt;br /&gt;cat &amp;lt;&amp;lt; EOF &amp;gt; /etc/apt/sources.list.d/openvz.list&lt;br /&gt;deb &lt;a target='_blank' href='http://download.openvz.org/debian' rel='nofollow'&gt;http://download.openvz.org/debian&lt;/a&gt; wheezy main&lt;br /&gt;deb &lt;a target='_blank' href='http://download.openvz.org/debian' rel='nofollow'&gt;http://download.openvz.org/debian&lt;/a&gt; wheezy-test main&lt;br /&gt;EOF&lt;br /&gt;apt-get update&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;To install the kernel:&lt;br /&gt;&lt;code&gt;apt-get install linux-image-openvz-amd64&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;More info is available from &lt;a target='_blank' href='https://wiki.openvz.org/Installation_on_Debian' rel='nofollow'&gt;https://wiki.openvz.org/Installation_on_Debian&lt;/a&gt; and &lt;a target='_blank' href='http://download.openvz.org/debian/' rel='nofollow'&gt;http://download.openvz.org/debian/&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:45010</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/45010.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=45010"/>
    <title>on testing kernels</title>
    <published>2013-08-13T01:56:05Z</published>
    <updated>2013-08-13T13:57:49Z</updated>
    <category term="testing"/>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="process"/>
    <content type="html">Currently, our best kernel line is the one that is based on Red Hat Enterprise Linux 6 kernels (RHEL6 for short). This is our most feature-reach, up-to-date yet stable kernel -- i.e. the best. Second-best option is RHEL5-based kernel -- a few years so neither &lt;a href="http://openvz.org/VSwap" target="_blank" rel="nofollow"&gt;vSwap&lt;/a&gt; nor &lt;a href="http://openvz.org/Ploop" target="_blank" rel="nofollow"&gt;ploop&lt;/a&gt;, but still good.&lt;br /&gt;&lt;br /&gt;There is a dilemma of either releasing the new kernel version earlier, or delay it for more internal testing. We figured we can do both! Each kernel branch (RHEL6 and RHEL5) comes via two channels -- testing and stable. In terms of yum, we have four kernel repositories defined in &lt;a href="http://download.openvz.org/openvz.repo" target="_blank" rel="nofollow"&gt;openvz.repo&lt;/a&gt; file, their names should be self-explanatory:&lt;br /&gt;&lt;br /&gt;* openvz-kernel-rhel6&lt;br /&gt;* openvz-kernel-rhel6-testing&lt;br /&gt;* openvz-kernel-rhel5&lt;br /&gt;* openvz-kernel-rhel5-testing&lt;br /&gt;&lt;br /&gt;The process of releasing kernels is the following: right after building a kernel, we push it out to the appropriate -testing repository, so it is available as soon as possible. We when do some internal QA on it (that can either be basic or throughout, depending on amount of our changes, and whether we did a rebase to newer RHEL6 kernel). Based on QA report, sometimes we do another build with a few more patches, and repeat the process. Once the kernel looks good to our QA, we  put it from testing to stable. In some rare cases (such as when we do one simple but quite important fix), new kernels go right into stable.&lt;br /&gt;&lt;br /&gt;So, our users can enjoy being stable, or being up-to-the-moment, or both. In fact, if you have more than a few servers running OpenVZ, we strongly suggest you to dedicate one or two boxes for running -testing kernels, and report any bugs found to &lt;a href="http://bugzilla.openvz.org/" target="_blank" rel="nofollow"&gt;OpenVZ bugzilla&lt;/a&gt;. This is good for you, because you will be able to catch bugs early, and let us fix them before they hit your production systems. This is good for us, too, because no QA department is big enough to catch all possible bugs in a myriad of hardware and software configurations and use cases.&lt;br /&gt;&lt;br /&gt;Enabling -testing repo is easy: just edit &lt;a href="http://download.openvz.org/openvz.repo" target="_blank" rel="nofollow"&gt;openvz.repo&lt;/a&gt;, setting &lt;code&gt;enabled=1&lt;/code&gt; under an appropriate &lt;code&gt;[openvz-kernel-...-testing]&lt;/code&gt; section.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:44508</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/44508.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=44508"/>
    <title>ploop snapshots and backups</title>
    <published>2013-06-11T06:40:59Z</published>
    <updated>2013-06-13T23:27:39Z</updated>
    <category term="ploop"/>
    <category term="kernel"/>
    <category term="openvz"/>
    <content type="html">OpenVZ ploop is a wonderful technology, and I want to share more of its wonderfulness with you. We have previously covered &lt;a href="http://openvz.livejournal.com/40830.html" target="_blank"&gt;ploop in general&lt;/a&gt; and it's &lt;a href="http://openvz.livejournal.com/41835.html" target="_blank"&gt;write tracker feature to help speed up container migration&lt;/a&gt; in particular. This time, I'd like to talk about snapshots and backups.&lt;br /&gt;&lt;br /&gt;But let's start with yet another ploop feature -- it's expandable format. When you create a ploop container with say 10G of disk space, ploop image is just slightly larger than the size of actual container files. I just created centos-6-x86 container -- ploop image size is 747M, and inside CT df shows that 737M is used. Of course, for empty ploop image (with a fresh filesystem and zero files) the ratio will be worse. Now, when CT is writing data, ploop image is auto-growing up to accomodate the data size.&lt;br /&gt;&lt;br /&gt;Now, these images can be layered, or stacked. Imagine having a single ploop image, consisting of blocks. We can add another image on top of the first one, so that new reads will fall through to the lower image (because the upper one is empty yet), while new writes will end up being written to the upper (top) image. Perhaps this image will save some more words here:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://ic.pics.livejournal.com/k001/990679/705/705_original.png" target="_blank" target="_blank"&gt;&lt;img src="https://ic.pics.livejournal.com/k001/990679/705/705_900.png" alt="ploop-stacked-images" title="ploop-stacked-images" width="900" height="305" fetchpriority="high" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The new (top) image is now accumulating all the changes, while the old (bottom) one is in fact the read-only snapshot of the container filesystem. Such a snapshot is cheap and instant, because there is no need to copy a lot of data or do other costly operations. Of course, ploop is not limited to only two levels -- you can create much more (up to 255 if I remember correctly, which is way above any practical limit).&lt;br /&gt;&lt;br /&gt;What can be done with such a snapshot? We can mount it and copy all the data to a backup (&lt;b&gt;update&lt;/b&gt;: see &lt;a href="https://openvz.org/Ploop/backup" target="_blank" rel="nofollow"&gt;openvz.org/Ploop/backup&lt;/a&gt;). Note that such backup is very fast, online and consistent. There's more to it though. A ploop snapshot, combined with a snapshot of a running container in memory (also known as a checkpoint) and a container configuration file(s), can serve as a real checkpoint to which you can roll back.&lt;br /&gt;&lt;br /&gt;Consider the following scenario: you need to upgrade your web site backend inside your container. First, you do a container snapshot (I mean complete snapshot, including an in-memory image of a running container). Then you upgrade, and realize your web site is all messed up and broken. Horror story, is it? No. You just switch to before-upgrade snapshot and keep working as it. It's like moving back in time, and all this is done on a running container, i.e. you don't have to shut it down.&lt;br /&gt;&lt;br /&gt;Finally, when you don't need a snapshot anymore, you can merge it back. Merging process is when changes from an upper level are written to a lower level (i.e. the one under it), then the upper level is removed. Such merging is of course not as instant as creating a snapshot, but it is online, so you can just keep working while ploop is working with merge.&lt;br /&gt;&lt;br /&gt;All this can be performed from the command line using vzctl. For details, see &lt;a href="http://openvz.org/Man/vzctl.8#Snapshotting" target="_blank" rel="nofollow"&gt;vzctl(8) man page, section Snapshotting&lt;/a&gt;. Here's a quick howto:&lt;br /&gt;&lt;br /&gt;Create a snapshot:&lt;br /&gt;&lt;code&gt;vzctl snapshot $CTID [--id $UUID] [--name name] [--description desc] [--skip-suspend] [--skip-config]&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Mount a snapshot (say to copy the data to a backup):&lt;br /&gt;&lt;code&gt;vzctl snapshot-mount CTID --id uuid --target directory&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Rollback to a snapshot:&lt;br /&gt;&lt;code&gt;vzctl snapshot-switch CTID --id uuid&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;Delete a snapshot (merging its data to a lower level image):&lt;br /&gt;&lt;code&gt;vzctl snapshot-delete CTID --id uuid&lt;/code&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:42793</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/42793.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=42793"/>
    <title>OpenVZ turns 7, gifts are available!</title>
    <published>2012-10-06T09:31:53Z</published>
    <updated>2012-12-03T10:14:11Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="vzctl"/>
    <category term="criu"/>
    <category term="crtools"/>
    <content type="html">&lt;p&gt;&lt;b&gt;OpenVZ project is 7 years old&lt;/b&gt; as of last month. It's hard to believe the number, but looking back, we've done a lot of things together with you, our users.&lt;/p&gt;

&lt;p&gt;One of the main project goals was (and still is) to include the containers support upstream, i.e. to vanilla Linux kernel. In practice, OpenVZ kernel is a fork of the Linux kernel, and we don't like it that way, for a number of reasons. The main ones are:&lt;/p&gt;

&lt;p&gt;&lt;ul&gt;&lt;li&gt; We want everyone to benefit from containers, not just ones using OpenVZ kernel. Yes to world domination!&lt;/li&gt;
&lt;li&gt;We'd like to concentrate on new features, improvements and bug fixes, rather than forward porting our changes to the next kernel.&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;

&lt;p&gt;So, we were (and still are) working hard to bring in-kernel containers support upstream, and many key pieces are already there in the kernel -- for example, PID and network namespaces, cgroups and memory controller. This is the functionality that lxc tool and libvirt library are using. We also use the features we merged into upstream, so with every new kernel branch we have to port less, and the size of our patch set decreases.&lt;/p&gt;

&lt;h3&gt;CRIU&lt;/h3&gt;

&lt;p&gt;One of such features for upstream is checkpoint/restore, an ability to save running container state and then restore it. The main use of this feature is live migration, but there are other &lt;a href="http://criu.org/Usage_scenarios" target="_blank" rel="nofollow"&gt;usage scenarios as well&lt;/a&gt;. While the feature is present in OpenVZ kernel since April 2006, it was never accepted to upstream Linux kernel (nor was the other implementation proposed by Oren Laadan).&lt;/p&gt;

&lt;p&gt;For the last year we are working on &lt;a href="http://criu.org/" target="_blank" rel="nofollow"&gt;CRIU&lt;/a&gt; project, which aims to reimplement most of the checkpoint/restore functionality in userspace, with bits of kernel support where required. As of now, most of the additional kernel patches needed for CRIU are already there in kernel 3.6, and a few more patches are on its way to 3.7 or 3.8. Speaking of CRIU tools, they are currently at version 0.2, released 20th of September, which already have limited support for checkpointing and restoring an upstream container. Check &lt;a href="http://criu.org/" target="_blank" rel="nofollow"&gt;criu.org&lt;/a&gt; for more details, and give it a try. Note that this project is not only for containers -- you can checkpoint any process trees -- it's just the container is better because it is clearly separated from the rest of the system.&lt;/p&gt;

&lt;p&gt;One of the most important things about CRIU is we are NOT developing it behind the closed doors. As usual, we have wiki and git, but most important thing is every patch is going through the &lt;a href="http://openvz.org/pipermail/criu/" target="_blank" rel="nofollow"&gt;public mailing list&lt;/a&gt;, so everyone can join the fun.&lt;/p&gt;

&lt;h3&gt;vzctl for upstream kernel&lt;/h3&gt;

&lt;p&gt;We have also released vzctl 4.0 recently (25th of September). As you can see by the number, it is a major release, and the main feature is support for non-OpenVZ kernels. Yes it's true -- now you can have a feeling of OpenVZ without installing OpenVZ kernel. Any recent 3.x kernel should work.&lt;/p&gt;

&lt;p&gt;As with OpenVZ kernel, you can use ready container images we have for OpenVZ (so called "OS templates") or employ your own. You can create, start, stop, and delete containers, set various resource parameters such as RAM and CPU limits. Networking (aside from routed-based) is also supported -- you can either move a network interface from host system to inside container (&lt;code&gt;--netdev_add&lt;/code&gt;), or use bridged setup (&lt;code&gt;--netif_add&lt;/code&gt;). I personally run this stuff on my Fedora 17 desktop using stock F17 kernel -- it just works!&lt;/p&gt;

&lt;p&gt;Having said all that, surely OpenVZ kernel is in much better shape now as it comes for containers support -- it has more features (such as live container shapshots and live migration), better resource management capabilities, and overall is more stable and secure. But the fact that the kernel is now optional makes the whole thing more appealing (or so I hope).&lt;/p&gt;

&lt;p&gt;You can find information on how to setup and start using vzctl at &lt;a href="http://wiki.openvz.org/Vzctl_for_upstream_kernel" target="_blank" rel="nofollow"&gt;vzctl for upstream kernel&lt;/a&gt; wiki page. The page also lists known limitations are pointers to other resources. I definitely recommend you to give it a try and share your experience! As usual, any bugs found are to be reported to &lt;a href="http://bugzilla.openvz.org/" target="_blank" rel="nofollow"&gt;OpenVZ bugzilla&lt;/a&gt;.

&lt;p&gt;&lt;b&gt;Update:&lt;/b&gt; comments disabled due to spam&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:42414</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/42414.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=42414"/>
    <title>[CRIU] CRtools 0.1 released!</title>
    <published>2012-07-24T13:48:59Z</published>
    <updated>2012-08-22T10:56:26Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="criu"/>
    <content type="html">Checkpoint/restore, or CPT for short, is a nice feature of OpenVZ, and probably the most amazing one. In a nutshell, it's a way to freeze a container and dump its complete state (processes, memory, network connections etc) to a file on disk, and then restore from that dump, resuming processes execution as if nothing happened. This opens a way to do nifty things such as live container migration, fast reboots, high availability setups etc.&lt;br /&gt;&lt;br /&gt;It is our ultimate goal to merge all bits and pieces of OpenVZ to the mainstream Linux kernel. It's not a big secret that we failed miserably trying to merge the checkpoint/restore functionality (and yes, we have tried more than once). The fact that everyone else failed as well soothes the pain a bit, but is not really helpful. The reason is simple: CPT code is big, complex, and touches way too many places in the kernel.&lt;br /&gt;&lt;br /&gt;So we&lt;sup&gt;*&lt;/sup&gt; came up with an idea to implement most of CPT stuff in user space, i.e. as a separate program not as a part of the Linux kernel. In practice this is impossible because some kernel trickery is still required here and there, but the whole point was to limit kernel intervention to the bare minimum.&lt;br /&gt;&lt;br /&gt;Guess what? It worked even better that we expected. As of today, after about a year of development, up to 90% of the stuff that is needed to be in the kernel is already there, and the rest is ready and seems to be relatively easy to merge (see &lt;a href="http://criu.org/Commits" target="_blank" rel="nofollow"&gt;this table&lt;/a&gt; to get an idea what's in and what's not).&lt;br /&gt;&lt;br /&gt;As for the user space stuff, &lt;b&gt;we are happy to announce the release of CRtools version 0.1&lt;/b&gt;. Now, let me step aside and quote Pavel's announcement:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;The tool can already be used for checkpointing and restoring various individual&lt;br /&gt;applications. And the greatest thing about this so far is that most of the below&lt;br /&gt;functionality has the required kernel support in the recently released v3.5!&lt;br /&gt;&lt;br /&gt;So, we support now&lt;br /&gt;&lt;br /&gt;* x86_64 architecture&lt;br /&gt;* process' linkage&lt;br /&gt;* process groups and sessions (without ttys though :\ )&lt;br /&gt;* memory mappings of any kind (shared, file, etc.)&lt;br /&gt;* threads&lt;br /&gt;* open files (shared between tasks and partially opened-and-unlinked)&lt;br /&gt;* pipes and fifos with data&lt;br /&gt;* unix sockets with packet queues contents&lt;br /&gt;* TCP and UDP sockets (TCP connections support exists, but needs polishing)&lt;br /&gt;* inotifies, eventpoll and eventfd&lt;br /&gt;* tasks' sigactions setup, credentials and itimers&lt;br /&gt;* IPC, mount and PID namespaces&lt;br /&gt;&lt;br /&gt;Though namespaces support is in there, we do not yet support an LXC container c/r,&lt;br /&gt;but we're close to it :)&lt;br /&gt;&lt;br /&gt;I'd like to thank everyone who took part in new kernel APIs discussions, the&lt;br /&gt;feedback was great! Special thanks goes to Linus for letting the kernel parts&lt;br /&gt;in early, instead of making them sit out of tree till becoming stable enough.&lt;br /&gt;&lt;br /&gt;Tarball with the tool sources is at&lt;br /&gt;  &lt;a target='_blank' href='http://download.openvz.org/criu/crtools-0.1.tar.bz2' rel='nofollow'&gt;http://download.openvz.org/criu/crtools-0.1.tar.bz2&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The git repo is at&lt;br /&gt;  &lt;a target='_blank' href='http://git.criu.org/' rel='nofollow'&gt;http://git.criu.org/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And some sort of docs growing at&lt;br /&gt;  &lt;a target='_blank' href='http://criu.org/' rel='nofollow'&gt;http://criu.org/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There are still things for which we don't have the kernel support merged (SysVIPC&lt;br /&gt;and various anon file descriptors, i.e. inotify, eventpoll, eventfd) yet. We have&lt;br /&gt;the kernel branch with the stuff applied available at&lt;br /&gt;&lt;br /&gt;  &lt;a target='_blank' href='https://github.com/cyrillos/linux-2.6.git' rel='nofollow'&gt;https://github.com/cyrillos/linux-2.6.git&lt;/a&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;What's next? We will be rebasing OpenVZ to Linux Kernel 3.5 (most probably) and will try to re-use CRIU for checkpoint and restore of OpenVZ containers, effectively killing a huge chunk of out-of-tree kernel code that we have in OpenVZ kernel.&lt;br /&gt;&lt;br /&gt;* - In fact it was Pavel Emelyanov, our chief kernel architect, but it feels oh so nice to say &lt;i&gt;we&lt;/i&gt; that we can't refrain).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; comments disabled due to spam.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:40830</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/40830.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=40830"/>
    <title>Introducing container in a file aka ploop</title>
    <published>2012-03-27T08:48:03Z</published>
    <updated>2012-08-20T08:45:09Z</updated>
    <category term="ploop"/>
    <category term="kernel"/>
    <category term="openvz"/>
    <content type="html">OpenVZ have just introduced kernel and tools support for container-in-a-file technology, also known as ploop. This post tries to summarize why ploop is needed, and why is it a superior technology to what we had before.&lt;br /&gt;&lt;h3&gt;Before ploop: simfs and vzquota&lt;/h3&gt;&lt;br /&gt;First of all, a few facts about the pre-ploop era technologies and their limitations.&lt;br /&gt;&lt;br /&gt;As you are probably aware, a container file system was just a directory on the host, which a new container was chroot()-ed into. Although it seems like a good and natural idea, there are a number of limitations.&lt;br /&gt;&lt;br /&gt;Since containers are living on one same file system, they all share common properties of that file system (it's type, block size, and other options). That means we can not configure the above properties on a per-container basis.&lt;br /&gt;&lt;br /&gt;One such property that deserves a special item in this list is file system journal. While journal is a good thing to have, because it helps to maintain file system integrity and improve reboot times (by eliminating fsck in many cases), it is also a bottleneck for containers. If one container will fill up in-memory journal (with lots of small operations leading to file metadata updates, e.g. file truncates), all the other containers I/O will block waiting for the journal to be written to disk. In some extreme cases we saw up to 15 seconds of such blockage.&lt;br /&gt;&lt;br /&gt;Since many containers share the same file system with limited space, in order to limit containers disk space we had to develop per-directory disk quotas (i.e. vzquota).&lt;br /&gt;&lt;br /&gt;Again, since many containers share the same file system, and the number of inodes on a file system is limited [for most file systems], vzquota should also be able to limit inodes on a per container (per directory) basis.&lt;br /&gt;&lt;br /&gt;In order for in-container (aka second-level) disk quota (i.e. standard per-user and per-group UNIX dist quota) to work, we had to provide a dummy file system called simfs. Its sole purpose is to have a superblock which is needed for disk quota to work.&lt;br /&gt;&lt;br /&gt;When doing a live migration without some sort of shared storage (like NAS or SAN), we sync the files to a destination system using rsync, which does the exact copy of all files, except that their i-node numbers on disk will change. If there are some apps that rely on files' i-node numbers being constant (which is normally the case), those apps are not surviving the migration.&lt;br /&gt;&lt;br /&gt;Finally, a container backup or snapshot is harder to do because there is a lot of small files that need to be copied.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Introducing ploop&lt;/h3&gt;&lt;br /&gt;In order to address the above problems and ultimately make a world a better place, we decided to implement a container-in-a-file technology, not different from what various VM products are using, but working as effectively as all the other container bits and pieces in OpenVZ.&lt;br /&gt;&lt;br /&gt;The main idea of ploop is to have an image file, use it as a block device, and create and use a file system on that device. Some readers will recognize that this is exactly what Linux loop device does! Right, the only thing is loop device is very inefficient (say, using it leads to double caching of data in memory) and its functionality is very limited.&lt;br /&gt;&lt;br /&gt;Ploop implementation in the kernel have a modular and layered design.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The top layer is the main ploop module&lt;/b&gt;, which provides a virtual block device to be used for CT file system.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The middle layer is the image format module&lt;/b&gt;, which does translation of block device block numbers into image file block numbers. A simple format module which is called "raw" is doing trivial 1:1 translation, same as existing loop device.&lt;br /&gt;&lt;br /&gt;More sophisticated format module is keeping the translation table and is able to dynamically grow and shrink the image file. That means, if you create a container with 2GB of disk space, the image file size will not be 2GB, but less -- the size of the actual data stored in the container. It is also possible to support other image formats by writing other ploop format modules, such as the one for QCOW2 (used by QEMU and KVM).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The bottom layer is the I/O module&lt;/b&gt;. Currently modules for direct I/O on an ext4 device, and for NFS are available. There are plans to also have a generic VFS module, which will be able to store images on any decent file system, but that needs an efficient direct I/O implementation in the VFS layer which is still being worked on.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Ploop benefits&lt;/h3&gt;&lt;br /&gt;In a nutshell:&lt;ul&gt;&lt;br /&gt;&lt;li&gt;File system journal is not bottleneck anymore&lt;br /&gt;&lt;li&gt;Large-size image files I/O instead of lots of small-size files I/O on management operations&lt;br /&gt;&lt;li&gt;Disk space quota can be implemented based on virtual device sizes; no need for per-directory quotas&lt;br /&gt;&lt;li&gt;Number of inodes doesn't have to be limited because this is not a shared resource anymore (each CT has its own file system)&lt;br /&gt;&lt;li&gt;Live backup is easy and consistent&lt;br /&gt;&lt;li&gt;Live migration is reliable and efficient&lt;br /&gt;&lt;li&gt;Different containers may use file systems of different types and properties&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;In addition:&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Efficient container creation&lt;br /&gt;&lt;li&gt;[Potential] support for QCOW2 and other image formats&lt;br /&gt;&lt;li&gt;Support for different storage types&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; comments are disabled due to spam.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:40599</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/40599.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=40599"/>
    <title>CT console</title>
    <published>2012-02-23T14:28:15Z</published>
    <updated>2013-01-10T19:13:45Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="console"/>
    <category term="feature"/>
    <category term="vzctl"/>
    <category term="container"/>
    <content type="html">Are you ready for the next cool feature? Please welcome CT console.&lt;br /&gt;&lt;br /&gt;Available in RHEL6-based kernel since 042stab048.1, this feature is pretty simple to use. Use &lt;code&gt;vzctl attach &lt;i&gt;CTID&lt;/i&gt;&lt;/code&gt; to attach to this container's console, and you will be able to see all the messages CT init is writing to console, or run getty on it, or anything else.&lt;br /&gt;&lt;br /&gt;Please note that the console is persistent, i.e. it is available even if a container is not running. That way, you can run &lt;s&gt;&lt;code&gt;vzctl attach&lt;/s&gt; vzctl console&lt;/code&gt;and then (in another terminal) &lt;code&gt;vzctl start&lt;/code&gt;. That also means that if a container is stopped, vzctl attach is still there.&lt;br /&gt;&lt;br /&gt;Press &lt;b&gt;Esc .&lt;/b&gt; to detach from the console.&lt;br /&gt;&lt;br /&gt;The feature (&lt;a href="http://git.openvz.org/?p=vzctl;a=commitdiff;h=a1f523f59a6e321ce2cc6dd42d0f5a660a712339" target="_blank" rel="nofollow"&gt;vzctl git commit&lt;/a&gt;) will be available in up-coming vzctl-3.0.31. I have just made a nightly build of vzctl (version 3.0.30.2-18.git.a1f523f) available so you can test this. Check &lt;a target='_blank' href='http://wiki.openvz.org/Download/vzctl/nightly' rel='nofollow'&gt;http://wiki.openvz.org/Download/vzctl/nightly&lt;/a&gt; for information of how to get a nightly build.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; the feature is renamed to &lt;code&gt;vzctl console&lt;/code&gt;.&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; comments disabled due to spam.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:39644</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/39644.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=39644"/>
    <title>On vSwap and 042stab04x kernel improvements</title>
    <published>2011-11-14T20:03:12Z</published>
    <updated>2012-07-16T15:29:57Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="vswap"/>
    <category term="rhel6"/>
    <content type="html">&lt;h3&gt;vSwap&lt;/h3&gt;The best feature of the new (RHEL6-based) 042 series of the OpenVZ kernels is definitely vSwap. The short story is, we used to have 22 user beancounter parameters which every seasoned OpenVZ user knows by heart. Each of these parameters is there for a reason, but 22 knobs are a bit too complex to manage for a mere mortal, especially bearing in mind that&lt;ul&gt;&lt;li&gt;many of them are interdependent;&lt;/li&gt;&lt;li&gt;the sum of all limits should not exceed the resources of a given physical server.&lt;/li&gt;&lt;/ul&gt;Keeping this configuration optimal (or even consistent) is quite a challenging task even for a senior OpenVZ admin (with a probable exception of an ex airline pilot). This complexity is the main reason why there are multiple articles and blog entries complaining OpenVZ is worse than Xen, or that it is not suitable for hosting Java apps. We do have some workarounds to mitigate this complexity, such as:&lt;ul&gt;&lt;li&gt;precreated &lt;a href="http://git.openvz.org/?p=vzctl;a=tree;f=etc/conf" target="_blank" rel="nofollow"&gt;container configs&lt;/a&gt; with sane defaults for beancounters;&lt;/li&gt;&lt;li&gt;some special tools (&lt;a href="http://wiki.openvz.org/Man/vzsplit.8" target="_blank" rel="nofollow"&gt;vzsplit&lt;/a&gt; and &lt;a href="http://wiki.openvz.org/Man/vzcfgvalidate.8" target="_blank" rel="nofollow"&gt;vzcfgvalidate&lt;/a&gt;);&lt;/li&gt;&lt;li&gt;the comprehensive &lt;a href="http://wiki.openvz.org/UBC" target="_blank" rel="nofollow"&gt;User Beancounters manual&lt;/a&gt;.&lt;/li&gt;&lt;/ul&gt;This is still not the way to go. While we think high of our users, we do not expect all of them to be ex airline pilots. To solve the complexity, the number of per-container knobs and handles should be reduced to some decent number, or at least most of these knobs should be optional.&lt;br /&gt;&lt;br /&gt;We worked on that for a few years, and the end result is called &lt;b&gt;vSwap&lt;/b&gt; (where V is for Vendetta, oh, pardon me, Virtual).&lt;br /&gt;&lt;br /&gt;vSwap concept is as simple as a rectangle. For each container, there are only two required parameters: the memory size (known as &lt;code&gt;physpages&lt;/code&gt;) and the swap size (&lt;code&gt;swappages&lt;/code&gt;). Almost everyone (not only an admin, but even an advanced end user) knows what is RAM and what is swap. On a physical server, if there is not enough memory, the system starts to swap out memory pages to disk, then swap in some other pages, which results in severe performance degradation but it keeps the system from failing miserably.&lt;br /&gt;&lt;br /&gt;It&amp;#39;s about the same with vSwap, except that&lt;ul&gt;&lt;li&gt;RAM and swap are configured on a per container basis;&lt;/li&gt;&lt;li&gt;no I/O is performed until it is really necessary (this is why swap is virtual).&lt;/li&gt;&lt;/ul&gt;&lt;h3&gt;Some VSwap internals&lt;/h3&gt;Now, there are only two knobs per container on a dashboard, namely RAM and swap, and all the complexity is hidden under the hood. I am going to describe just a bit of that undercover mechanics and explain what does the &amp;quot;&lt;i&gt;Reworked VSwap kernel memory accounting&lt;/i&gt;&amp;quot; line from the 042stab040.1 kernel changelog stands for.&lt;br /&gt;&lt;br /&gt;The biggest problem is, RAM for containers is not just RAM. First of all, there is a need to distinguish between&lt;ul&gt;&lt;li&gt;the user memory,&lt;/li&gt;&lt;li&gt;the kernel memory,&lt;/li&gt;&lt;li&gt;the page cache,&lt;/li&gt;&lt;li&gt;and the directory entry cache.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;The user memory&lt;/b&gt; is more or less clear, it is simply the memory that programs allocate for themselves to run. It is relatively easy to account for, and it is relatively simple to limit it (but read on).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The kernel memory&lt;/b&gt; is really complex thingie. Right, it is the memory that kernel allocates for itself in order for programs in a particular container to run. This includes a lot of stuff I&amp;#39;d rather not dive into, if I want to keep this piece as an article not a tome. Having said that, two particular kernel memory types are worth explaining.&lt;br /&gt;&lt;br /&gt;First is &lt;b&gt;the&amp;nbsp;page cache&lt;/b&gt;, the kernel mechanism that caches disk contents in memory (that would be unused otherwise) to minimize the I/O. When a program reads some data from a disk, that data are read into the page cache first, and when a program writes to a disk, data goes to the page cache (and then eventually are written (flushed) to disk). In case of repeated disk access (which happens quite often) data is taken from a page cache, not from the real disk, which greatly improves the overall system performance, since a disk is much slower than RAM. Now, some of the page cache is used on behalf of a container, and this amount must be charged into &amp;quot;RAM used by this container&amp;quot; (i.e. physpages).&lt;br /&gt;&lt;br /&gt;Second is &lt;b&gt;the directory entry cache&lt;/b&gt; (dcache for short) is yet another sort of cache, and another sort of the kernel memory. Disk contents is a tree of files and directories, and such a tree is quite tall and wide. In order to read the contents of, say, /bin/sh file, kernel have to read the root (/) directory, find &amp;#39;bin&amp;#39; entry in it, read /bin directory, find &amp;#39;sh&amp;#39; entry in it and finally read it. Although these operations are not very complex, there is a multitude of those, they take time and are repeated often for most of the &amp;quot;popular&amp;quot; files. In order to improve performance, kernel keeps directory entries in memory &amp;mdash; this is what dcache is for. The memory used by dcache should also be accounted and limited, since otherwise it&amp;#39;s easily exploitable (not only by root, but also by an ordinary user, since any user is free to change into directories and read files).&lt;br /&gt;&lt;br /&gt;Now, the physical memory of a container is the sum of its user memory, the kernel memory, the page cache and the dcache. Technically, dcache is accounted into the kernel memory, then kernel memory is accounted into the physical memory, but it&amp;#39;s not overly important.&lt;br /&gt;&lt;h3&gt;Improvements in the new 042stab04x kernels&lt;/h3&gt;&lt;h4&gt;Better reclamation and memory balancing&lt;/h4&gt;What to do if a container hit a physical memory limit? Free some pages by writing their contents to the abovementioned virtual swap. Well, not quite yet. Remember that there is also a page cache and a dcache, so the kernel can easily discard some of the pages from these caches, which is way cheaper than swapping out.&lt;br /&gt;&lt;br /&gt;The process of finding some free memory is known as reclamation. Kernel needs to decide very carefully when to start reclamation, how many and what exact pages to reclaim in every particular situation, and when it is the right time to swap out rather than discard some of the cache contents.&lt;br /&gt;&lt;br /&gt;Remember, we have four types of memory (kernel, user, dcache and page cache) and only one knob which limits the sum of all these. It would be easier for the kernel, but not for the user, to have separate limits for each type of memory. But, for the user convenience and simplicity, the kernel only have one knob for these four parameters, so it needs to balance between those four. One major improvement in 042stab040 kernel is that such balancing is now performed better.&lt;br /&gt;&lt;h4&gt;Stricter memory limit&lt;/h4&gt;During the lifetime of a container, the kernel might face a situation when it needs more kernel memory, or user memory, or perhaps more dcache entries, and the memory for the container is tight (i.e. close to the limit), so it needs to either reclaim or swap. The problem is there are some situations when neither reclamation nor swapping is possible, so the kernel can either fail miserably (say by killing a process) or go beyond the limit and hope that everything will be fine and mommy won&amp;#39;t notice. Another big improvement in 042stab040 kernel is it reduces the number of such situations, in other words, the new kernel obeys memory limit in a more strict way.&lt;br /&gt;&lt;h4&gt;Polishing&lt;/h4&gt;Finally, the kernel is now in a pretty good shape, so we can afford some polishing, minor optimizations, and fine tuning. Such polishing was performed in a few subsystems, including checkpointing, user beancounters, netfilter, kernel NFS server and VZ disk quota.&lt;br /&gt;&lt;h4&gt;Some numbers&lt;/h4&gt;Totally, there are 53 new patches in 042stab040.1, compared to previous 039 kernels. On top of that, 042stab042.1 adds another 30. We hope that the end result is improved stability and performance.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:39369</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/39369.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=39369"/>
    <title>Announcing rhel6-testing kernel branch/repo</title>
    <published>2011-10-14T21:40:02Z</published>
    <updated>2012-03-06T13:24:59Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="rhel6"/>
    <category term="qa"/>
    <content type="html">Instead of having a nice drink in a bar, I spent this Friday night splitting the RHEL6-based OpenVZ kernel branch/repository into two, so now we have 'rhel6' and 'rhel6-testing' branches/repos. Let me explain why.&lt;br /&gt;&lt;br /&gt;When we made an initial port of OpenVZ to RHEL6 kernel and released the first kernel (in October 2010, named 042test001), I created a repository named openvz-kernel-rhel6 (or just rhel6), and this repository was marked as "development, unstable". When, after almost a year, we announced it as "testing" and then, finally, "stable" (in August 2011, named 042stab035.1).&lt;br /&gt;&lt;br /&gt;After that, all the kernels in that repository were supposed to be stable, because they are incremental improvements of the kernel we call stable. In theory it is. In practice, of course, there can always be new bugs (both introduced by us and by Red Hat folks releasing their kernel updates which we rebase to). Thus a kernel update from a repo which is supposed to be stable can do bad things.&lt;br /&gt;&lt;br /&gt;Better late than never, I have fixed the situation tonight by basically renaming "rhel6" repository into "rhel6-testing", and creating a new repository called just "rhel6". For now, I put 042stab037.1 (which is the latest kernel which has passed our internal QA) into rhel6 (aka stable), while all the other kernels, up to and including 042stab039.3, are in rhel6-testing repo.&lt;br /&gt;&lt;br /&gt;Now, very similar to what we do with RHEL5 kernels, all the new fresh-from-the-build-farm kernels will appear in rhel6-testing repo, at about the same time they go to internal QA. Then, the kernels which will have QA approval will appear in rhel6 (aka -stable) repo. What it means for you as a user is you can now choose whether to stay at the bleeding edge and have the latest stuff, or to take a conservative approach and have less frequent and delayed updates, but be more confident about kernel quality and stability.&lt;br /&gt;&lt;br /&gt;A few links:&lt;br /&gt;* &lt;a href="http://wiki.openvz.org/Download/kernel/rhel6" target="_blank" rel="nofollow"&gt;Stable RHEL6-based OpenVZ kernels&lt;/a&gt;&lt;br /&gt;* &lt;a href="http://wiki.openvz.org/Download/kernel/rhel6-testing" target="_blank" rel="nofollow"&gt;Testing RHEL6-based OpenVZ kernels&lt;/a&gt;&lt;br /&gt;* &lt;a href="http://download.openvz.org/openvz.repo" target="_blank" rel="nofollow"&gt;OpenVZ yum repository setup file&lt;/a&gt;&lt;br /&gt;* &lt;a href="http://openvz.org/pipermail/announce/2011-October/000268.html" target="_blank" rel="nofollow"&gt;Official announce of rhel6-testing&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; comments disabled due to spam.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:38982</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/38982.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=38982"/>
    <title>LinuxCon Europe 2011 in Prague is coming</title>
    <published>2011-09-30T11:26:07Z</published>
    <updated>2011-09-30T11:26:07Z</updated>
    <category term="linuxcon"/>
    <category term="kernel"/>
    <category term="prague"/>
    <category term="openvz"/>
    <category term="event"/>
    <content type="html">&lt;p&gt;And we are coming to Prague, too! This time, there will be as many as six people and two talks from us, plus we will held a memory cgroup controller meeting.&lt;/p&gt;

&lt;p&gt;The following OpenVZ/Parallels people are coming:
&lt;ul&gt;&lt;li&gt;&lt;b&gt;James Bottomley&lt;/b&gt;, Parallels virtualization CTO&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Kir Kolyshkin&lt;/b&gt;, OpenVZ project manager&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Pavel Emelyanov&lt;/b&gt;, OpenVZ kernel team leader (he's also taking part in Linux Kernel Summit)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Glauber Costa&lt;/b&gt;, OpenVZ kernel developer&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Maxim Patlasov&lt;/b&gt;, OpenVZ kernel developer&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Andrey Vagin&lt;/b&gt;, OpenVZ kernel developer&lt;/li&gt;&lt;/ul&gt;&lt;/p&gt;

&lt;p&gt;Two talks will be presented. Since linuxsymposium.org site is currently down, let me quote talk descriptions here.&lt;/p&gt;
&lt;p&gt;1. &lt;b&gt;Container in a file&lt;/b&gt; by Maxim Patlasov.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;One of the feature differences between hypervisors and containers is the ability to store a virtual machine image in a single file, since most containers exist as a chroot within the host OS rather than as fully independent entities.  However, the ability to save and restore state in a machine image file is invaluable in managing virtual machine life cycles in the data centre.&lt;/p&gt;

&lt;p&gt;This talk will début a new loopback device which gives all the advantages of virtual machine images by storing the container in a file
while preserving the benefits of sharing significant portions with the host OS.  We will compare and contrast the technology with the
traditional loopback device, and describe some changes to the ext4 filesystem which make it more friendly to new loopback device needs.&lt;/p&gt;

&lt;p&gt;This talk will be technical in nature but should be accessible to people interested in cloud, virtualisation and container technologies.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;2. &lt;b&gt;OpenVZ and Linux kernel testing&lt;/b&gt; by Andrey Vagin.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;One of the less appealing but very important part of software development is testing. This talk tries to summarize our 10+ years of experience in Linux kernel testing (including OpenVZ and Red Hat Enterprise Linux kernels). Overall description of our test system is provided, followed by details on some of the interesting test cases developed. Finally, a few anecdotal cases of bugs found will be presented.&lt;/p&gt;

&lt;p&gt;In a sense, the talk is an answer to Andrew Morton's question from 2007: "I'm curious. For the past few months, people@openvz.org have discovered (and fixed) an ongoing stream of obscure but serious and quite long-standing bugs. How are you discovering these bugs?"&lt;/p&gt;

&lt;p&gt;Talk is of interest to those concerned about kernel quality, and in general to people doing development and testing.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Finally, there will be a &lt;b&gt;memcg meeting&lt;/b&gt;. Since LinuxCon will be right after the Kernel Summit, a number of kernel guys will still be there so anyone interested in cgroups can come. This meeting is a continuation of &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/events/LPC2011MC/tracks/105" target="_blank" rel="nofollow"&gt;our recent discussion at Linux Plumbers&lt;/a&gt; (see etherpad and presentations).&lt;/p&gt;

&lt;p&gt; See you all in Prague in less than a month!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:38801</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/38801.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=38801"/>
    <title>RHEL6 goes stable!</title>
    <published>2011-08-30T17:22:43Z</published>
    <updated>2012-11-11T19:33:26Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="rhel6"/>
    <content type="html">Guys, I am very proud to inform you that today we mark RHEL6 kernel branch as stable. Below is a copy-paste from &lt;a href="http://openvz.org/pipermail/announce/2011-August/000250.html" target="_blank" rel="nofollow"&gt;the relevant announce@ post&lt;/a&gt;. I personally highly recommend RHEL6-based OpenVZ kernel to everyone -- it is a major step forward compared to RHEL5.&lt;br /&gt;&lt;br /&gt;In the other news, Parallels has &lt;a href="http://www.parallels.com/products/pvcl/whatsnew/" target="_blank" rel="nofollow"&gt;just released Virtuozzo Containers for Linux 4.7&lt;/a&gt;, bringing the same cool stuff (VSwap et al) to commercial customers. Despite being only a "dot" (or "minor") release, this product incorporates an impressive amount of man-hours of best Parallels engineers.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;== Stable: RHEL6 ==&lt;br /&gt;&lt;br /&gt;This is to announce that RHEL6-based kernel branch (starting from kernel 042stab035.1) is now marked as stable, and it is now the recommended branch to use.&lt;br /&gt;&lt;br /&gt;We are not aware of any major bugs or show-stoppers in this kernel. As always, we still recommend to test any new kernels before rolling out to production.&lt;br /&gt;&lt;br /&gt;New features of RHEL6-based kernel branch (as compared to previous stable kernel branch, RHEL5) includes better performance, better scalability (especially on high-end SMP systems), and better resource management (notably, vSwap support -- see &lt;a target='_blank' href='http://wiki.openvz.org/VSwap' rel='nofollow'&gt;http://wiki.openvz.org/VSwap&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;RHEL6 kernels can be downloaded from &lt;a target='_blank' href='http://wiki.openvz.org/Download/kernel/rhel6' rel='nofollow'&gt;http://wiki.openvz.org/Download/kernel/rhel6&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;== Frozen: 2.6.27, 2.6.32 ==&lt;br /&gt;&lt;br /&gt;Also, from now we no longer maintain the following kernel branches:&lt;br /&gt;&lt;br /&gt;   * 2.6.27&lt;br /&gt;   * 2.6.32&lt;br /&gt;&lt;br /&gt;No more new releases of the above kernels are expected. Existing users (if any) are recommended to switch to other (maintained) branches, such as RHEL6-2.6.32 or RHEL5-2.6.18.&lt;br /&gt;&lt;br /&gt;This change does not affect vendor OpenVZ kernels (such as Debian or Ubuntu) -- those will still be supported for the lifetime of their distributions via the usual means (i.e. bugzilla.openvz.org).&lt;br /&gt;&lt;br /&gt;== Development: none ==&lt;br /&gt;&lt;br /&gt;Currently, there are no non-stable kernels in development. Eventually we will port to Linux kernel 3.x, but it might not happen this year. Instead, we are currently focused on bringing more of OpenVZ features to mainstream Linux kernels.&lt;br /&gt;&lt;br /&gt;Regards, OpenVZ team.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt; comments disabled due to spam</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:38447</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/38447.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=38447"/>
    <title>Linux Plumbers: Containers and CGroups miniconf</title>
    <published>2011-08-24T15:16:32Z</published>
    <updated>2011-09-05T09:20:52Z</updated>
    <category term="containers"/>
    <category term="kernel"/>
    <category term="linux plumbers"/>
    <category term="mainline"/>
    <content type="html">We have finally filed &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/users/963" target="_blank" rel="nofollow"&gt;a number of proposals&lt;/a&gt; for the up-coming Containers and CGroups miniconf to be held during &lt;a href="http://www.linuxplumbersconf.org/" target="_blank" rel="nofollow"&gt;Linux Plumbers Conference&lt;/a&gt;, 7 to 9 September 2011 in Santa Rosa, CA.&lt;br /&gt;&lt;br /&gt;From those proposals, one can clearly see what are our plans regarding the mainline integration. In a few words: &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/819" target="_blank" rel="nofollow"&gt;dcache management&lt;/a&gt;, &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/825" target="_blank" rel="nofollow"&gt;memory&lt;/a&gt; and &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/861" target="_blank" rel="nofollow"&gt;CPU&lt;/a&gt; cgroup controllers improvements, &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/849" target="_blank" rel="nofollow"&gt;container enter&lt;/a&gt;, &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/843" target="_blank" rel="nofollow"&gt;improved /proc virtualization&lt;/a&gt;, &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/831" target="_blank" rel="nofollow"&gt;checkpoint/restart [mostly] in userspace&lt;/a&gt; (of which I have &lt;a href="http://openvz.livejournal.com/38287.html" target="_blank"&gt;blogged recently&lt;/a&gt;), and &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/855" target="_blank" rel="nofollow"&gt;making vzctl work with mainline kernel containers&lt;/a&gt;. Oh, and the interesting &lt;a href="http://www.linuxplumbersconf.org/2011/ocw/proposals/837" target="_blank" rel="nofollow"&gt;loopback-like block device to hold a container filesystem&lt;/a&gt; (a.k.a. ploop).&lt;br /&gt;&lt;br /&gt;Quite a lot of interesting stuff, what do you think?&lt;br /&gt;&lt;br /&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:38287</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/38287.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=38287"/>
    <title>Checkpoint/restart (mostly) in user space</title>
    <published>2011-08-13T09:26:08Z</published>
    <updated>2011-08-13T09:26:08Z</updated>
    <category term="checkpointing"/>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="mainstream"/>
    <content type="html">There is a &lt;a href="http://lwn.net/Articles/452184/" target="_blank" rel="nofollow"&gt;good article at lwn.net&lt;/a&gt; telling about one of our latest development.&lt;br /&gt;&lt;br /&gt;We have checkpoint/restart (CPT) and live migration in OpenVZ for ages (well, OK, since 2007 or so), allowing for containers to be freely moved between physical servers without any service interruption. It is a great feature which is valued by our users. The problem is we can't merge it upstream, ie to vanilla kernel.&lt;br /&gt;&lt;br /&gt;Various people from our team worked on that, and they all gave up. Then, Oren Laadan was trying very hard to merge his CPT implementation -- unfortunately it didn't worked out very well either. The thing is, checkpointing is a complex thing, and the patch implementing it is very intrusive. &lt;br /&gt;&lt;br /&gt;Recently, our kernel team leader Pavel Emelyanov got a new idea of moving most of the checkpointing complexity out of the kernel and into user space, thus minimizing the amount of the in-kernel changes needed. In about two weeks of time he wrote a working prototype. So far the reaction is mostly positive, and he's going to submit a second RFC version for review to lkml.&lt;br /&gt;&lt;br /&gt;For more details, read the &lt;a href="http://lwn.net/Articles/452184/" target="_blank" rel="nofollow"&gt;lwn.net article&lt;/a&gt;. After all, while I am sitting next to Pavel, Mr. Corbet ability to explain complex stuff in simple terms is way better than mine.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:35956</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/35956.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=35956"/>
    <title>Kernel 2.6.27 repin aka "Unexpected return"</title>
    <published>2011-01-26T15:13:42Z</published>
    <updated>2011-01-26T15:13:57Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="repin"/>
    <content type="html">You probably thought we have abandoned 2.6.27 kernel branch. Well, we ourselves thought we did (although it was not yet officially announced). Then, out of a sudden, &lt;a href="http://wiki.openvz.org/Download/kernel/2.6.27/2.6.27-repin.1" target="_blank" rel="nofollow"&gt;kernel 2.6.27-repin.1 is released&lt;/a&gt;, rebasing to latest upstream kernel (2.6.27.57), and fixing &lt;a href="http://bugzilla.openvz.org/show_bug.cgi?id=1593" target="_blank" rel="nofollow"&gt;OpenVZ bug #1593&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;The thing is, this kernel is called after &lt;a href="http://www.wikipedia.org/wiki/Ilya_Repin" target="_blank" rel="nofollow"&gt;Ilya Repin&lt;/a&gt;, a leading Russian painter and sculptor of the Peredvizhniki artistic school. One of his best paintings is called "Unexpected Return", and I happen to enjoy the original in Tretyakov Gallery here in Moscow a couple of weeks ago. So here it is: the unexpected return of 2.6.27 kernel. It took Ilya 4 years to finish the painting, it took Pavel 6 months to release the fix. Better late than never, that is.&lt;br /&gt;&lt;br /&gt;Please enjoy: &lt;b&gt;&lt;a href="http://www.tanais.info/art/en/repin40more.html" target="_blank" rel="nofollow"&gt;Ilya Repin. Unexpected return. 1884-1888&lt;/a&gt;&lt;/b&gt;.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:35628</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/35628.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=35628"/>
    <title>news from the VSwap front</title>
    <published>2011-01-25T18:44:57Z</published>
    <updated>2011-01-25T18:46:59Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="vswap"/>
    <content type="html">&lt;p&gt;I have added vswap confguration samples to vzctl git. Basically, you set physpages and swappages and leave every other beancounter at unlimited. For example, this is how ve-vswap-256m-conf.sample looks like:&lt;/p&gt;

&lt;p&gt;&lt;pre&gt;
# UBC parameters (in form of barrier:limit)
PHYSPAGES="0:256M"
SWAPPAGES="0:512M"
KMEMSIZE="unlimited"
LOCKEDPAGES="unlimited"
PRIVVMPAGES="unlimited"
SHMPAGES="unlimited"
NUMPROC="unlimited"
VMGUARPAGES="unlimited"
OOMGUARPAGES="unlimited"
NUMTCPSOCK="unlimited"
NUMFLOCK="unlimited"
NUMPTY="unlimited"
NUMSIGINFO="unlimited"
TCPSNDBUF="unlimited"
TCPRCVBUF="unlimited"
OTHERSOCKBUF="unlimited"
DGRAMRCVBUF="unlimited"
NUMOTHERSOCK="unlimited"
DCACHESIZE="unlimited"
NUMFILE="unlimited"
NUMIPTENT="unlimited"

# Disk quota parameters (in form of softlimit:hardlimit)
DISKSPACE="1G"
DISKINODES="200000"
QUOTATIME="0"

# CPU fair scheduler parameter
CPUUNITS="1000"
&lt;/pre&gt;&lt;/p&gt;&lt;a name='cutid1-end'&gt;&lt;/a&gt;

&lt;p&gt;As you can see, physpages (ie RAM size) is set to 256 megabytes, while swappages (ie swap size) is set to 512 megabytes, all the other beancounters are unlimited. Wow, it's never been easier to configure your containers!&lt;/p&gt;

&lt;p&gt;Now, we can utilize this stuff using RHEL6 based kernel. This is what we see from inside the container:&lt;/p&gt;

&lt;p&gt;&lt;pre&gt;
[root@localhost ~]# vzctl enter 103
entered into CT 103
[root@localhost /]# free
             total       used       free     shared    buffers     cached
Mem:        262144      23936     238208          0          0      10968
-/+ buffers/cache:      12968     249176
Swap:       524288          0     524288
&lt;/pre&gt;&lt;/p&gt;

&lt;p&gt;&lt;pre&gt;
[root@localhost /]# cat /proc/user_beancounters 
Version: 2.5
       uid  resource                     held              maxheld              barrier                limit              failcnt
      103:  kmemsize                  4722976              4853726  9223372036854775807  9223372036854775807                    0
            lockedpages                     0                    0  9223372036854775807  9223372036854775807                    0
            privvmpages                  4296                13875  9223372036854775807  9223372036854775807                    0
            shmpages                       31                   31  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0                    0                    0                    0
            numproc                        33                   33  9223372036854775807  9223372036854775807                    0
            physpages                    5984                 5985                    0                65536                    0
            vmguarpages                     0                    0  9223372036854775807  9223372036854775807                    0
            oomguarpages                 2696                 2696  9223372036854775807  9223372036854775807                    0
            numtcpsock                      4                    4  9223372036854775807  9223372036854775807                    0
            numflock                        5                    6  9223372036854775807  9223372036854775807                    0
            numpty                          1                    1  9223372036854775807  9223372036854775807                    0
            numsiginfo                     12                   18  9223372036854775807  9223372036854775807                    0
            tcpsndbuf                   69760                    0  9223372036854775807  9223372036854775807                    0
            tcprcvbuf                   65536                    0  9223372036854775807  9223372036854775807                    0
            othersockbuf                 2312                10768  9223372036854775807  9223372036854775807                    0
            dgramrcvbuf                     0                    0  9223372036854775807  9223372036854775807                    0
            numothersock                   51                   53  9223372036854775807  9223372036854775807                    0
            dcachesize                1172451              1172451  9223372036854775807  9223372036854775807                    0
            numfile                       370                  390  9223372036854775807  9223372036854775807                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            dummy                           0                    0                    0                    0                    0
            numiptent                      14                   14  9223372036854775807  9223372036854775807                    0


[root@localhost /]# cat /proc/meminfo 
MemTotal:         262144 kB
MemFree:          238208 kB
Cached:            10968 kB
Active:            16956 kB
Inactive:           1384 kB
Active(anon):       6352 kB
Inactive(anon):     1020 kB
Active(file):      10604 kB
Inactive(file):      364 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:        524288 kB
SwapFree:         524288 kB
Dirty:                 0 kB
AnonPages:          7364 kB
Mapped:             3416 kB
Shmem:               124 kB
Slab:               4012 kB
SReclaimable:       1088 kB
SUnreclaim:         2924 kB
&lt;/pre&gt;&lt;/p&gt;&lt;a name='cutid2-end'&gt;&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:35500</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/35500.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=35500"/>
    <title>cpulimit is back in RHEL6 based kernel</title>
    <published>2011-01-20T16:53:15Z</published>
    <updated>2011-01-20T16:53:15Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="vzctl"/>
    <content type="html">Hard CPU limit (ability to specify that you don't want this container to use more than X per cent of CPU no matter what) is back in latest RHEL6-based kernel, &lt;a href="http://wiki.openvz.org/Download/kernel/rhel6/042test006.1" target="_blank" rel="nofollow"&gt;042test006.1, which has just been released&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The feature was only available for the stable (i.e RHEL4 and RHEL5-based) kernels, and was missing from all of our development kernels from 2.6.20 to 2.6.32. So while it was always there in stable branches, the feeling is like it's back.&lt;br /&gt;&lt;br /&gt;In order to use CPU limit feature, set the limit using &lt;code&gt;vzctl set $CTID --cpulimit X&lt;/code&gt;, where X is in per cent of one single CPU. For example, if you have single 2 GHz CPU and want container 123 to use no more than 1 GHz, use &lt;code&gt;vzctl set 123 --cpulimit 50&lt;/code&gt;. If you have 2 GHz quad-core system and want to use no more than 4 GHz, use &lt;code&gt;vzctl set 123 --cpulimit 200&lt;/code&gt;. Well, in the second case it might be better to just use &lt;code&gt;--cpus 2&lt;/code&gt;. Anyways, see vzctl man page.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:35207</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/35207.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=35207"/>
    <title>RHEL6-based kernel: try today not next year!</title>
    <published>2010-12-29T15:21:05Z</published>
    <updated>2010-12-29T15:21:05Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="centos"/>
    <category term="rhel6"/>
    <category term="scientific linux"/>
    <content type="html">We have just released &lt;a href="http://wiki.openvz.org/Download/kernel/rhel6/042test005.1" target="_blank" rel="nofollow"&gt;a new RHEL6-based kernel, 042test005&lt;/a&gt;. It is shaping up pretty good — as you can see from the changelog, it's not just bug fixes but also performance improvements. If you haven't tried it yet, I suggest to do it today! Do not postpone this until 2011 — after all, this is what will become the next stable OpenVZ kernel.&lt;br /&gt;&lt;br /&gt;RHEL6 kernel needs an appropriate (i.e. recent) Linux distribution. If you don't want latest Fedora releases, can't afford RHEL6, and tired of waiting for CentOS 6, I suggest you go with &lt;b&gt;Scientific Linux 6 (SL6)&lt;/b&gt;. This is yet another RHEL6 clone developed and used by CERN, Fermilabs and other similar institutions.&lt;br /&gt;&lt;br /&gt;While SL6 is still at its infancy (&lt;a href="http://listserv.fnal.gov/scripts/wa.exe?A2=ind1012&amp;amp;L=scientific-linux-devel&amp;amp;T=0&amp;amp;P=2876" target="_blank" rel="nofollow"&gt;they have recently released alpha 3&lt;/a&gt; and plan to release beta 1 at Jan 7 2011), it it worth trying since it's based on a very stable set of sources from RHEL6. Repositories and stuff are available from &lt;a target='_blank' href='http://ftp.scientificlinux.org/linux/scientific/6rolling/' rel='nofollow'&gt;http://ftp.scientificlinux.org/linux/scientific/6rolling/&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:openvz:34694</id>
    <author>
      <name>Kir Kolyshkin</name>
    </author>
    <lj:poster user="k001" userid="990679"/>
    <link rel="alternate" type="text/html" href="https://openvz.livejournal.com/34694.html"/>
    <link rel="self" type="text/xml" href="https://openvz.livejournal.com/data/atom/?itemid=34694"/>
    <title>On kernel exploits and OpenVZ user beancounters</title>
    <published>2010-11-26T15:37:24Z</published>
    <updated>2010-11-26T15:37:24Z</updated>
    <category term="kernel"/>
    <category term="openvz"/>
    <category term="security"/>
    <content type="html">Yesterday a guy with his name written in Cyrillic letters ("Марк Коренберг") and a @gmail.com email address &lt;a href="http://lkml.org/lkml/headers/2010/11/25/8" target="_blank" rel="nofollow"&gt;posted&lt;/a&gt; a kernel exploit to the Linux kernel mailing list (aka LKML). This morning one brave guy from our team tried to run it on his desktop -- and had to reboot it after a few minutes of total system unresponsiveness.&lt;br /&gt;&lt;br /&gt;The bad news are the exploit is pretty serious and causes Denial of Service. It looks like most kernels are indeed vulnerable.&lt;br /&gt;&lt;br /&gt;The good news is OpenVZ is not vulnerable. Why? Because of &lt;a href="http://wiki.openvz.org/UBC" target="_blank" rel="nofollow"&gt;user beancounters&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The nature of exploit is to create an unlimited number of sockets, thus rendering the whole system unusable so you need to power-cycle it to bring it back to life. Now, if you run this exploit in an OpenVZ container, you will hit the &lt;code&gt;numothersock&lt;/code&gt; beancounter limit pretty soon and the script will exit. This is an except from /proc/user_beancounters after the run:&lt;br /&gt;&lt;br /&gt;&lt;small&gt;&lt;pre&gt;
 cat /proc/user_beancounters 
       uid  resource                     held              maxheld              barrier                limit              failcnt
.....
            numothersock                    9                  360                  360                  360                    1
.....
&lt;/pre&gt;&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;As you can see, current usage is 9, while peak usage is 360, same as limit. Finally, &lt;code&gt;failcnt&lt;/code&gt; is 1 meaning there was once a situation when the limit was hit.&lt;br /&gt;&lt;br /&gt;I went further and set &lt;code&gt;numothersock&lt;/code&gt; limit to 'unlimited', and re-run the exploit. The situation is much worse in that case (don't try it at home, you've been warned), system slows down considerably, but I was still able to login to the physical server using ssh and kill the offending task from the host system using SIGTERM.&lt;br /&gt;&lt;br /&gt;Now, this is how &lt;code&gt;/proc/user_beancounters&lt;/code&gt; look after the second run:&lt;br /&gt;&lt;small&gt;&lt;pre&gt;
       uid  resource                     held              maxheld              barrier                limit              failcnt
      111:  kmemsize                  1237973             14372344             14372700             14790164                   80
.....
            numothersock                    9                 2509  9223372036854775807  9223372036854775807                    1
&lt;/pre&gt;&lt;/small&gt;&lt;br /&gt;&lt;br /&gt;As you can see, even with numothersock set to unlimited, another beancounter, &lt;code&gt;kmemsize&lt;/code&gt;, is working to save the system.&lt;a name='cutid1-end'&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Of course, if you set all beancounters to unlimited, exploit will work. So don't do that, unless your CT is completely trusted. Those limits are there for a reason, you know.</content>
  </entry>
</feed>
