← Back to team overview

cuneiform team mailing list archive

Re: new user with questions

 

On Mon, Dec 21, 2009 at 02:00:30PM -0500, aerospace1028@xxxxxxxxxxx wrote:
> 
> (a) is this the version of cuneifrom I should be using?  Is there a way to use git/svn/... to automatically pull the cuneiform sourcecode as it's updated without downloading the whole tarball each time?

There's a bzr repo.  It's on launchpad.net called cuneiform-linux.
Bazaar has builtin support for launchpad (which is the github of bzr).  

> Currently, when scanning a book with the two facing pages, cunefirm puts the two page headers at the top followed by the contents of both pages one after the other.  E.G.

You could cut the image yourself (with netpbm or ImageMagick).  I believe
cuneiform can do what you want if you enable 'tables' (I don't know if
the '--tables' argument is in the mainline).

You could also output HOCR format which tags every letter with its coords
from the original image and split it up that way, but that sounds way harder
than splitting the input image.

> finally, can I append the recognitions of multiple scans to the same file?  I tried "cuneiform -f rtf -o test.rtf *.tiff" on a hanful of consecutively numbered image files, but the results continually over-write the previous data and I am left with the results from the last file recognized.

There might be an rtf tool that can help you.  If you chose a format
like HOCR or text you could just concatenate the output files yourself.

-- 
Ben Jackson AD7GD
<ben@xxxxxxx>
http://www.ben.com/



Follow ups

References