← Back to team overview

cuneiform team mailing list archive

Choosing other compression method for distribution

 

Hi,
I looked at your source package for distribution and noticed that it is not 
efficient in file size and decompression time. So I would suggest that it is a 
good idea to change the compressiom method. A good compressor has to be free, 
available in distributions and able to compress a file in finite time, tested 
and able to decompress lossless. A really good one should also be able to 
decompress a file relative fast and have a good compression ratio.
I took the tarball from you download side and tested following tools:
 - xz-utils 4.999.8beta-38-g94eb9ad
 - lzma-utils (predecessor of xz-utils) 4.999.8beta-31-gfd6a380
 - gzip 1.3.12
 - bzip2 1.0.5
 - lzop 1.02~rc1
 - rzip 2.1
 - lrzip 0.23
 - paq8l 20070308

Compression were always done with -9/--best. I did also test xz-utils in 
extreme mode (added suffix .extreme to the filename to be able to 
differentiate between them). paq8l only supported -8 as best compression, but 
it is so slow that it is maybe not a good choice at all (6 hours to compress 
on my machine).

The machine I use is a AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ with 4GB 
ram. So it is a machine you can currently buy without spending too much money. 
It runs debian sid amd64.

Sorted by filesizes:
18031296 cuneiform-0.6.tar.paq8l
24244695 cuneiform-0.6.tar.lzma.extreme
24248508 cuneiform-0.6.tar.xz.extreme
24361132 cuneiform-0.6.tar.lzma
24364972 cuneiform-0.6.tar.xz
24894939 cuneiform-0.6.tar.lrz
27752104 cuneiform-0.6.tar.rz
28730704 cuneiform-0.6.tar.bz2
32054632 cuneiform-0.6.tar.gz
35871568 cuneiform-0.6.tar.lzo
79964160 cuneiform-0.6.tar

Decompression test were done by following script. It prints everything to 
/dev/null and reads the whole file before. Nothing is read or written to disk 
during the tests. So the test  is cpu bound. I run the test multiple times to 
be sure that the measurements are representative.

echo cuneiform-0.6.tar.paq8l
cat cuneiform-0.6.tar.paq8l > /dev/null
/usr/bin/time paq8l -d cuneiform-0.6.tar.paq8l
echo cuneiform-0.6.tar.lzma.extreme
cat cuneiform-0.6.tar.lzma.extreme > /dev/null
/usr/bin/time lzma -d -c cuneiform-0.6.tar.lzma.extreme > /dev/null
echo cuneiform-0.6.tar.xz.extreme
cat cuneiform-0.6.tar.xz.extreme > /dev/null
/usr/bin/time xz -d -c cuneiform-0.6.tar.xz.extreme > /dev/null
echo cuneiform-0.6.tar.lzma
cat cuneiform-0.6.tar.lzma > /dev/null
/usr/bin/time lzma -d -c cuneiform-0.6.tar.lzma > /dev/null
echo cuneiform-0.6.tar.xz
cat cuneiform-0.6.tar.xz > /dev/null
/usr/bin/time xz -d -c cuneiform-0.6.tar.xz > /dev/null
echo cuneiform-0.6.tar.lrz
rm cuneiform-0.6.tar
cat cuneiform-0.6.tar.lrz > /dev/null
/usr/bin/time lrzip -d cuneiform-0.6.tar.lrz
echo cuneiform-0.6.tar.rz
rm cuneiform-0.6.tar
cat cuneiform-0.6.tar.rz > /dev/null
/usr/bin/time rzip -d cuneiform-0.6.tar.rz
echo cuneiform-0.6.tar.bz2
cat cuneiform-0.6.tar.bz2 > /dev/null
/usr/bin/time bzip2 -d -c cuneiform-0.6.tar.bz2 > /dev/null
echo cuneiform-0.6.tar.gz
cat cuneiform-0.6.tar.gz > /dev/null
/usr/bin/time gzip -d -c cuneiform-0.6.tar.gz > /dev/null
echo cuneiform-0.6.tar.lzo
cat cuneiform-0.6.tar.lzo > /dev/null
/usr/bin/time lzop -d -c cuneiform-0.6.tar.lzo > /dev/null
echo cuneiform-0.6.tar
cat cuneiform-0.6.tar > /dev/null
/usr/bin/time cat cuneiform-0.6.tar > /dev/null

Results are
cuneiform-0.6.tar.paq8l
21026.30user 51.69system 6:15:08elapsed 93%CPU
cuneiform-0.6.tar.lzma.extreme
3.14user 0.08system 0:03.46elapsed 92%CPU
cuneiform-0.6.tar.xz.extreme
2.67user 0.07system 0:02.80elapsed 97%CPU
cuneiform-0.6.tar.lzma
2.56user 0.06system 0:02.69elapsed 97%CPU
cuneiform-0.6.tar.xz
2.64user 0.04system 0:02.79elapsed 96%CPU
cuneiform-0.6.tar.lrz
3.35user 4.24system 0:07.78elapsed 97%CPU
cuneiform-0.6.tar.rz
6.58user 4.16system 0:10.75elapsed 99%CPU
cuneiform-0.6.tar.bz2
7.31user 0.02system 0:07.45elapsed 98%CPU
cuneiform-0.6.tar.gz
0.94user 0.01system 0:00.95elapsed 100%CPU
cuneiform-0.6.tar.lzo
0.36user 0.02system 0:00.37elapsed 100%CPU
cuneiform-0.6.tar
0.00user 0.03system 0:00.03elapsed 100%CPU

As we can see paq8l has a really good compression ratio, but 6 hours to 
compress and uncompress a file of this size isn't acceptable at all. lrzip and 
rzip are slower and have a lower compression ratio than xz and lzma. gzip and 
lzop are the fastest, but also the one with the lowest compression ratio. I 
don't think they are acceptable when we search for something better than 
bzip2. xz and lzma are bother faster in decompression and have a better 
compression ratio than bzip2.

I would suggest that lzma-utils are used. They are supported by most 
distributions. rpm and dpkg also supports them inside their package formats. 
xz-utils is the next generation of the tools, but currently under development. 
New features are for example checksumming and correct magic bytes and 
versioning of the on disk format, but nearly no distribution has packages for 
it. autotools supports it (dist-lzma or so) and tar can extract from it by 
using the --lzma -> `tar xvf cuneiform-0.6.tar.lzma --lzma`. tar has also 
support for xz with -J -> `tar xvfJ cuneiform-0.6.tar.xz`, but as nearly 
nobody has xz installed it will not work.
-- 
Robert Wohlrab




Follow ups