on testing kernels 12 August 2013 @ 06:56 pm
Kir Kolyshkin
k001
[openvz]
Currently, our best kernel line is the one that is based on Red Hat Enterprise Linux 6 kernels (RHEL6 for short). This is our most feature-reach, up-to-date yet stable kernel -- i.e. the best. Second-best option is RHEL5-based kernel -- a few years so neither vSwap nor ploop, but still good.

There is a dilemma of either releasing the new kernel version earlier, or delay it for more internal testing. We figured we can do both! Each kernel branch (RHEL6 and RHEL5) comes via two channels -- testing and stable. In terms of yum, we have four kernel repositories defined in openvz.repo file, their names should be self-explanatory:

* openvz-kernel-rhel6
* openvz-kernel-rhel6-testing
* openvz-kernel-rhel5
* openvz-kernel-rhel5-testing

The process of releasing kernels is the following: right after building a kernel, we push it out to the appropriate -testing repository, so it is available as soon as possible. We when do some internal QA on it (that can either be basic or throughout, depending on amount of our changes, and whether we did a rebase to newer RHEL6 kernel). Based on QA report, sometimes we do another build with a few more patches, and repeat the process. Once the kernel looks good to our QA, we put it from testing to stable. In some rare cases (such as when we do one simple but quite important fix), new kernels go right into stable.

So, our users can enjoy being stable, or being up-to-the-moment, or both. In fact, if you have more than a few servers running OpenVZ, we strongly suggest you to dedicate one or two boxes for running -testing kernels, and report any bugs found to OpenVZ bugzilla. This is good for you, because you will be able to catch bugs early, and let us fix them before they hit your production systems. This is good for us, too, because no QA department is big enough to catch all possible bugs in a myriad of hardware and software configurations and use cases.

Enabling -testing repo is easy: just edit openvz.repo, setting enabled=1 under an appropriate [openvz-kernel-...-testing] section.
Permanent Link4 comments | Leave a comment
Post a new comment
pavel_odintsov on August 13th, 2013 - 11:08 am
Good idea =) Killing i386 arch is also fine idea.
Andy Shevchenko: proudandy_shev on August 13th, 2013 - 04:43 pm
You may temporary enable repo by running yum as follows:
yum --enablerepo=openvz-kernel-rhel6-testing update
ext_1633627 on August 18th, 2013 - 10:02 pm
Do you have any suggestions on simulating normal VPS activity and (over)use of the resources on a -testing install? I can load it with the same various templates that we use on client systems, and look for any unexpected output while starting them and doing common administrative tasks, but I fear this was probably already done by you guys and won't serve as much of a test.

Beyond it running and waiting for it to break is there any type of statistic generation that would be helpful in determining if and how well (in terms of stability and performance) the current -testing kernel is working vs the -stable branch? Are there common ways OpenVZ might fail on a new kernel that we can test against and if so where can I find this information?

Are they any existing testing tools that are used in stress testing an OpenVZ install to look for possible problems?

Thanks for your help and hope to aid in testing of the kernels,
Joe Huss
InterServer
Kir Kolyshkin: nepalk001 on August 21st, 2013 - 10:20 pm
We have an extensive test suite in house, consisting of performance, stress, and security tests, most are third-party but some are written by our QA guys. Sometimes we even find bugs in RHEL kernels (i.e. bugs not found by very serious testing inside Red Hat), so we think our test suite is pretty good and comprehensive. Surely, we don't run full test suite for every kernel -- it is just too huge for it -- usually our developers give recommendations to QA team as to which aspects needs testing, or even which specific tests are required (make sense, would be good to have etc.)

What it lacks though is diversity. No one can possibly test all the hardware combinations and use scenarios. So what we want from our users is just to run this kernel as a normal one on a couple of their boxes. Maybe using it for the internal purposes, or hosting some lower-tier or free customers, or in general having it on hosts where you can tolerate slightly higher downtime / lower stability. Chances are high you can't find any bugs at all, because these kernels are not some experimental high-risky stuff, they are supposed and intended to be stable. In fact, those are release candidates.

So, it's enough to just use these kernels are normal, and report any bugs/regressions found to bugzilla. Think of it as a way to face an occasional nasty bug earlier, before it pops up on those of your hosts you consider highly critical.