← Back to team overview

cuneiform team mailing list archive

Re: Hocr output status and identified improvements.

 

I created a branch, hope I did it right.
lp:~julien-student/cuneiform-linux/hocroutput

I have tested and compared the output with the v0.8 release.
Visually the bounding boxes are correct. No text is missing. The typography output is exactly the same (bold, italics etc). I tried on 4 different images.
I would say it could be merged with the main branch.

I used the minimum amount required from patch by Dmitry Polevoy, to make this work.
It ended up being only the html.cpp file.
https://lists.launchpad.net/cuneiform/msg00269.html

Regards
Julien

________________________________________
Från: Jussi Pakkanen [jpakkane@xxxxxxxxx]
Skickat: den 1 oktober 2009 13:10
Till: julien
Cc: cuneiform@xxxxxxxxxxxxxxxxxxx
Ämne: Re: [Cuneiform] Hocr output status and identified improvements.

On Thu, Oct 1, 2009 at 1:58 PM, julien <julien@xxxxxxxxxxxxxxxxxxx> wrote:

> I was about to start modifying the code when I noticed there was a patch to handle ocr_line.
> https://lists.launchpad.net/cuneiform/msg00269.html
>
> However, it seems this patch was not merged into v0.8.
> Is there a reason for why it was not merged into v0.8?

I was under the impression that he was going to submit an ever better
version. Since nothing happened I probably just forgot about it.

> What would be the most appropiate way for me to contribute back any effort?
> (I suppose starting of from v0.8 and then once stable, ask to have it merged?)

Check out the newest code from Bazaar and work against that. Plain
patches against bzr head are fine. You can use the fancier options
that bzr offers if you feel like it.

> As for what reference should be used to standardize the hocr, would it be the following reference?
> http://docs.google.com/View?docid=dfxcv4vc_67g844kf

I actually don't know. Anyone?



Follow ups

References