Top.Mail.Ru
? ?

Previous Entry | Next Entry

Why we still use gzip for templates?

OpenVZ precreated OS templates are tarballs of a pre-installed Linux distributions. While there are other ways to create a container, the easiest one is to take such a tarball and extract its contents. This is what takes 99.9% of vzctl create command execution.

To save some space and improve download speeds, those tarballs are compacted with good ol' gzip tool. For example, CentOS 6 template tar.gz is about 200 MB in size, while uncompacted tar would be about 550 MB. But why don't we use more efficient compacting tools, such as bzip2 or xz? Say, the same CentOS 6 tarball, compressed by xz, is as lightweight as 120 MB! Here are the numbers again:

centos-6-x86.tar.gz: 203M
centos-6-x86.tar.xz: 122M
centos-6-x86.tar: 554M

So, why don't we switch to xz which apparently looks way better? Well, there are other criteria to optimize for, except for file size and download speed. In fact, the main optimization target is container creation speed! I just ran a quick non-scientific test on my notebook in order to proof my words, measuring the time it takes to run tar xf on a tarball:

time tar xf tar.gz: ~7 seconds
time tar xf tar.xz: ~13 seconds

See, it takes twice the time if we switch to xz! Note that this ratio doesn't change much when I switched from fast SSD to (relatively slow) rotating hard disk drive:

time tar xf tar.gz: ~8 seconds
time tar xf tar.xz: ~16 seconds

Note, while I call it non-scientific, I still ran each test at least three times, with proper syncs, rms and cache drops in between.

Now, do we want to trade a double increase of container creation time for saving 80 MB of disk space. We sure don't!

Comments

( 10 comments — Leave a comment )
greycat_na_kor
Dec. 28th, 2013 03:52 pm (UTC)
Well, if you'd really target maximum decompression speed, then you'd probably used LZ4, Snappy or at least, dated-but-still-faster-than-gzip LZO.

So, probably, the most rightful answer is "compatibility".
k001
Dec. 28th, 2013 07:30 pm (UTC)
In fact you are right -- in our commercial product (Parallels Cloud Server, which is bare metal, so we bundle everything) we use our home-grown compression tool. OpenVZ is not bare metal and we aim to cover many different host distributions, therefore we don't want unusual dependencies. Yet another reason is we try to avoid extra packages in our repositories, because once they are there we need to maintain those. Therefore gzip.

As I said, there are multiple optimization targets, and you are right, compatibility is just one of them.
Pavel Odintsov
Feb. 20th, 2014 06:03 pm (UTC)
Hello!

Kir, could you say anything about prlcompress? What it do what other tools can't?
k001
Mar. 5th, 2014 01:57 am (UTC)
It's just very fast compression. Not very efficient in terms of compression ratio, but very fast. That's it.
ivanz85
Dec. 28th, 2013 10:43 pm (UTC)
I think that for precreated templates it could be XZ.
k001
Dec. 30th, 2013 06:52 am (UTC)
This is exactly what I am talking about -- precreated templates.
dowdle
Dec. 29th, 2013 05:55 pm (UTC)
Reducing transfer time is the target for me... both in how long it takes me to upload the OS Templates I contribute... and in how long it takes me to download other OS Templates. Saving some disk space is a bonus.

Fedora switched to xz for their rpm packages some time ago although I'm not sure if RHEL6 is using it or not. I'm guessing RHEL7 will though... so compatibility isn't an issue there. I believe the vzctl package has xz as a dependency... so again, I don't think compatibility is much of an issue.

If you don't have the OS Template in question already downloaded when you are creating a new container, downloading an additional +80MB can slow it down too. Of course, once it is downloaded, that doesn't come into play anymore. I don't really care if it takes 30 seconds or a minute to create a container. If it becomes a concern, then I can always convert my .xz files to .gz on my own host.

When you are talking about much larger files in the multiple GB range, saving a few hundred megabytes on an .xz vs. a .gz seems like a no brainer to me.

As you probably know, kernel.org dropped .bz2 in favor of .xz but they still provide .gz. Ideally you could offer both and let the consumer decide... and then see how that plays out. I'd guess that people will generally pick the smaller download size if given a choice... which may re-enforce the idea that humans are dumb and prefer short-term beneficial solutions over long-term.
k001
Dec. 30th, 2013 06:55 am (UTC)
Yep, I am thinking about providing both formats (as you know, vzctl can do both and vztmpl-dl is configurable). The only problem I see is a few extra GBs for mirrors (extra converting/signing etc shouldn't be a big deal though).
rleir
Dec. 30th, 2013 01:01 pm (UTC)
But gz is common, it works, and we have more important issues to talk about. See my next post.
dowdle
Jan. 1st, 2014 05:42 pm (UTC)
Just for reference here's the official Debian 7.0 64bit OS Template uncompressed, gz, and xz with sizes for comparison:

716M Dec 31 21:42 debian-7.0-x86_64.tar
290M Dec 31 21:42 debian-7.0-x86_64.tar.gz
193M Dec 31 21:42 debian-7.0-x86_64.tar.xz

97MB savings between the gz and the xz is pretty significant with regards to transfer time except of course for those with fairly fast network connections.

Unrelated to OpenVZ... on a system I use for data backups I had a directory that contained .tar.gz's of deleted account home directories... and as .gz files it was 23GB. I switched it to xz and saved 4GB. I use xz for most everything now.
( 10 comments — Leave a comment )

Latest Month

July 2016
S M T W T F S
     12
3456789
10111213141516
17181920212223
24252627282930
31      

Comments

Powered by LiveJournal.com
Designed by Tiffany Chow