There is a good article at lwn.net telling about one of our latest development.
We have checkpoint/restart (CPT) and live migration in OpenVZ for ages (well, OK, since 2007 or so), allowing for containers to be freely moved between physical servers without any service interruption. It is a great feature which is valued by our users. The problem is we can't merge it upstream, ie to vanilla kernel.
Various people from our team worked on that, and they all gave up. Then, Oren Laadan was trying very hard to merge his CPT implementation -- unfortunately it didn't worked out very well either. The thing is, checkpointing is a complex thing, and the patch implementing it is very intrusive.
Recently, our kernel team leader Pavel Emelyanov got a new idea of moving most of the checkpointing complexity out of the kernel and into user space, thus minimizing the amount of the in-kernel changes needed. In about two weeks of time he wrote a working prototype. So far the reaction is mostly positive, and he's going to submit a second RFC version for review to lkml.
For more details, read the lwn.net article. After all, while I am sitting next to Pavel, Mr. Corbet ability to explain complex stuff in simple terms is way better than mine.
We have checkpoint/restart (CPT) and live migration in OpenVZ for ages (well, OK, since 2007 or so), allowing for containers to be freely moved between physical servers without any service interruption. It is a great feature which is valued by our users. The problem is we can't merge it upstream, ie to vanilla kernel.
Various people from our team worked on that, and they all gave up. Then, Oren Laadan was trying very hard to merge his CPT implementation -- unfortunately it didn't worked out very well either. The thing is, checkpointing is a complex thing, and the patch implementing it is very intrusive.
Recently, our kernel team leader Pavel Emelyanov got a new idea of moving most of the checkpointing complexity out of the kernel and into user space, thus minimizing the amount of the in-kernel changes needed. In about two weeks of time he wrote a working prototype. So far the reaction is mostly positive, and he's going to submit a second RFC version for review to lkml.
For more details, read the lwn.net article. After all, while I am sitting next to Pavel, Mr. Corbet ability to explain complex stuff in simple terms is way better than mine.


Comments
Apparently all the attempts to merge in-kernel implementations fail. So the idea is to minimize the kernel changes, moving all that we can to userspace.