cuneiform team mailing list archive

Thread
Date

Re: new user with questions

To: aerospace1028@xxxxxxxxxxx
From: Ben Jackson <ben@xxxxxxx>
Date: Mon, 21 Dec 2009 11:27:29 -0800
Cc: cuneiform@xxxxxxxxxxxxxxxxxxx
In-reply-to: <BAY113-W9493EE367DB20D4A4ADBAB9820@phx.gbl>
User-agent: Mutt/1.5.18 (2008-05-17)

On Mon, Dec 21, 2009 at 02:00:30PM -0500, aerospace1028@xxxxxxxxxxx wrote:
> 
> (a) is this the version of cuneifrom I should be using?  Is there a way to use git/svn/... to automatically pull the cuneiform sourcecode as it's updated without downloading the whole tarball each time?

There's a bzr repo.  It's on launchpad.net called cuneiform-linux.
Bazaar has builtin support for launchpad (which is the github of bzr).  

> Currently, when scanning a book with the two facing pages, cunefirm puts the two page headers at the top followed by the contents of both pages one after the other.  E.G.

You could cut the image yourself (with netpbm or ImageMagick).  I believe
cuneiform can do what you want if you enable 'tables' (I don't know if
the '--tables' argument is in the mainline).

You could also output HOCR format which tags every letter with its coords
from the original image and split it up that way, but that sounds way harder
than splitting the input image.

> finally, can I append the recognitions of multiple scans to the same file?  I tried "cuneiform -f rtf -o test.rtf *.tiff" on a hanful of consecutively numbered image files, but the results continually over-write the previous data and I am left with the results from the last file recognized.

There might be an rtf tool that can help you.  If you chose a format
like HOCR or text you could just concatenate the output files yourself.

-- 
Ben Jackson AD7GD
<ben@xxxxxxx>
http://www.ben.com/

Follow ups

Re: new user with questions
From: Yury V. Zaytsev, 2009-12-22

References

new user with questions
From: aerospace1028, 2009-12-21