← Back to team overview

simple-scan-team team mailing list archive

Re: [Merge] lp:~soliloque/simple-scan/fastpdf into lp:simple-scan

 

Let me demonstrate that (gray colorspace -> zlib compression) / (rgb 
colorspace -> jpeg compression) is a safe assumption to make. 

As we know, zlib compression works best when there is big area of one 
unique color. It is also a better choice than jpeg for storing images 
that contain text, line art and other images with sharp transitions 
where jpeg would produce visible artefacts. OTOH, jpeg is ideal for big 
images and photographs.

This is the theory. Things maybe different so I will attempt a number 
of tests.

I. RGB colorspace.

Let's begin with this (1) comic page with colors --intended-- to be flat. 
zlib should theoricaly perform well in this context. At 300 dpi, I get 
a zlib compression time of 1,4 seconds for a size of 18 189 934 bytes and a 
jpeg compression time of 0,08 second for a size of 2 149 009 bytes. 
Jpeg win.

The second test (2) is composed of even flater colors, being black and 
white text but again, zlib doesn't compress well. At 300 dpi, I get a 
zlib compression time of 1,4 seconds for a size of 14 521 574 bytes and 
a jpeg compression time of 0,06 second for a size of 1 583 488 bytes.

The first two tests had light colors so the third test (3) is a very 
dark book cover with a huge black area. This time, at 300 dpi, I get a 
zlib compression time of 0,3 second for a size of 4 898 072 bytes and a 
jpeg compression time of 0,02 second for a size of 434 849 bytes.

The conclusion for these three tests is that in the cases of rgb color 
space, jpeg compression is an order of magnitude faster and smaller 
than zlib compression and can't be beat so it's safe to save images 
with rgp color space in jpeg without loosing time with trying zlib.

II. Gray colorspace.

I reused the same three previous example but this time using the 
setting for scanning text, thus scanning in grayscale at 150 dpi. I now 
get the following result:

Page of comics: zlib compression time of 0,2 seconds for a size of 
168 618 bytes and jpeg compression time of 0,02 seconds for a size of 
557 410 bytes.

Black and white text: zlib compression time of 0,09 seconds for a size 
of 94 282 bytes and jpeg compression time of 0,01 seconds for a size of 
438 169 bytes.

Dark book cover: zlib compression time of 0,04 seconds for a size of 
26 470 bytes and jpeg compression time of 0,004 seconds for a size of 
84 103 bytes.

This time zlib win every time on size, while still being slower than 
jpeg. Moreover, as "text" scanning mode was selected to produce 
grayscale images, the user specifically asked to optimize the output 
for text, where zlib shine theoreticaly. In the case of greyscale, it 
would thus be better to directly use zlib encoding, because while 
longer that jpeg encoding it's faster than doing both encoding,
because zlib will most probably be smaller than jpeg, and because we 
want an output optimized for text.

Compressing with jpeg first and then do the zlib compression and bail 
out once we exceed the size of the jpeg would certainly be better than 
what simple-scan do right now but considering the previous arguments, I 
think encoding only once is the best solution.

Further improvements
-----------------------

There are alternative ways for encoding in zlib or jpeg format. I don't 
know if those are worth anything but maybe it could be a good idea to 
investigate: libdeflate(4) claims to be compatible with zlib while 
being faster and libjpeg-turbo(5) claims to be compatible with libjpeg 
while being faster.



(1) http://imgur.com/1dH1hRd
(2) http://imgur.com/18YChxe
(3) http://imgur.com/rNDD2DF
(4) https://github.com/ebiggers/libdeflate
(5) http://libjpeg-turbo.virtualgl.org/
-- 
https://code.launchpad.net/~soliloque/simple-scan/fastpdf/+merge/322610
Your team Simple Scan Development Team is subscribed to branch lp:simple-scan.


References