cuneiform team mailing list archive

Thread
Date

new user with questions

To: <cuneiform@xxxxxxxxxxxxxxxxxxx>
From: <aerospace1028@xxxxxxxxxxx>
Date: Mon, 21 Dec 2009 14:00:30 -0500
Importance: Normal

greetings,
I am interested in using cuneifrom for OCR on my linux computer. I am running Ubuntu8.04 (Hardy Herron). I have managed to download and compile cuneifrm0.8.0 (obtained from the cuneiform launchpad page).

(a) is this the version of cuneifrom I should be using? Is there a way to use git/svn/... to automatically pull the cuneiform sourcecode as it's updated without downloading the whole tarball each time?

I've been experimenting with the cuneiform I was able to build. I would like to scan books for the bookshare project durring my spare time. At the moement, I am having trouble with the page breaks when scanning a book with the two facing pages.

Currently, when scanning a book with the two facing pages, cunefirm puts the two page headers at the top followed by the contents of both pages one after the other. E.G.

line 1: page number and author
line 2: title and right-hand page number
line 3: typically blank
line 4-end: main text body.

Currently, I can not seperate the two original pages. This is a crucial requirement for submitting books to bookshare. Is it possible to get cuneiform to output all the left-hand text followed by all the right-hand text? E.G.

page number Author

title page number

finally, can I append the recognitions of multiple scans to the same file? I tried "cuneiform -f rtf -o test.rtf *.tiff" on a hanful of consecutively numbered image files, but the results continually over-write the previous data and I am left with the results from the last file recognized.

thanks in advance for any help you may provide:-)
_________________________________________________________________
Your E-mail and More On-the-Go. Get Windows Live Hotmail Free.
http://clk.atdmt.com/GBL/go/171222985/direct/01/

Follow ups

Re: new user with questions
From: Taxman, 2010-01-09
Re: new user with questions
From: Ben Jackson, 2009-12-21