simple-scan-team team mailing list archive
-
simple-scan-team team
-
Mailing list archive
-
Message #01388
Re: [Merge] lp:~soliloque/simple-scan/fastpdf into lp:simple-scan
Let me demonstrate that (gray colorspace -> zlib compression) / (rgb
colorspace -> jpeg compression) is a safe assumption to make.
As we know, zlib compression works best when there is big area of one
unique color. It is also a better choice than jpeg for storing images
that contain text, line art and other images with sharp transitions
where jpeg would produce visible artefacts. OTOH, jpeg is ideal for big
images and photographs.
This is the theory. Things maybe different so I will attempt a number
of tests.
I. RGB colorspace.
Let's begin with this (1) comic page with colors --intended-- to be flat.
zlib should theoricaly perform well in this context. At 300 dpi, I get
a zlib compression time of 1,4 seconds for a size of 18 189 934 bytes and a
jpeg compression time of 0,08 second for a size of 2 149 009 bytes.
Jpeg win.
The second test (2) is composed of even flater colors, being black and
white text but again, zlib doesn't compress well. At 300 dpi, I get a
zlib compression time of 1,4 seconds for a size of 14 521 574 bytes and
a jpeg compression time of 0,06 second for a size of 1 583 488 bytes.
The first two tests had light colors so the third test (3) is a very
dark book cover with a huge black area. This time, at 300 dpi, I get a
zlib compression time of 0,3 second for a size of 4 898 072 bytes and a
jpeg compression time of 0,02 second for a size of 434 849 bytes.
The conclusion for these three tests is that in the cases of rgb color
space, jpeg compression is an order of magnitude faster and smaller
than zlib compression and can't be beat so it's safe to save images
with rgp color space in jpeg without loosing time with trying zlib.
II. Gray colorspace.
I reused the same three previous example but this time using the
setting for scanning text, thus scanning in grayscale at 150 dpi. I now
get the following result:
Page of comics: zlib compression time of 0,2 seconds for a size of
168 618 bytes and jpeg compression time of 0,02 seconds for a size of
557 410 bytes.
Black and white text: zlib compression time of 0,09 seconds for a size
of 94 282 bytes and jpeg compression time of 0,01 seconds for a size of
438 169 bytes.
Dark book cover: zlib compression time of 0,04 seconds for a size of
26 470 bytes and jpeg compression time of 0,004 seconds for a size of
84 103 bytes.
This time zlib win every time on size, while still being slower than
jpeg. Moreover, as "text" scanning mode was selected to produce
grayscale images, the user specifically asked to optimize the output
for text, where zlib shine theoreticaly. In the case of greyscale, it
would thus be better to directly use zlib encoding, because while
longer that jpeg encoding it's faster than doing both encoding,
because zlib will most probably be smaller than jpeg, and because we
want an output optimized for text.
Compressing with jpeg first and then do the zlib compression and bail
out once we exceed the size of the jpeg would certainly be better than
what simple-scan do right now but considering the previous arguments, I
think encoding only once is the best solution.
Further improvements
-----------------------
There are alternative ways for encoding in zlib or jpeg format. I don't
know if those are worth anything but maybe it could be a good idea to
investigate: libdeflate(4) claims to be compatible with zlib while
being faster and libjpeg-turbo(5) claims to be compatible with libjpeg
while being faster.
(1) http://imgur.com/1dH1hRd
(2) http://imgur.com/18YChxe
(3) http://imgur.com/rNDD2DF
(4) https://github.com/ebiggers/libdeflate
(5) http://libjpeg-turbo.virtualgl.org/
--
https://code.launchpad.net/~soliloque/simple-scan/fastpdf/+merge/322610
Your team Simple Scan Development Team is subscribed to branch lp:simple-scan.
References