← Back to team overview

cuneiform team mailing list archive

Re: Patch to extend hOCR output

 

Hello, Dmitry!

On Sun, 2009-02-22 at 00:16 +0300, Dmitry Polevoy wrote:
> Patch to extend hOCR output. Lines info can be useful for OCR testing.

Looks cool! Would you please clarify few things?

    * What's this stuff:

+/*!
+\brief \~english Put stream bufer into buffer for OCR results.
+       \~russian      
+                  .
+*/

      Looks like the comments are in Doxygen format (I've been using
JavaDoc for quite some time so it's only a guess)? Also, it seems that
Russian comments are in CP1251, which brings me to the following
questions:

        - What do you guys think about converting all of the source
files to UTF-8? The licence statement and the comments in Russian can be
currently read only under Russian edition of Windows (or any other
Windows workstation which is set to use CP1251 locale which often is not
the case). 

They are painful to decipher under any other system (need to iconv -f
cp1251 -t utf8 and then recode it back) and can be easily corrupted by
non-Russian speaking developer if the wrong encoding is set...

I think we definitively need to do this before we get some UTF-8-encoded
stuff in and it will be quite difficult to recover.

        - Don't you think we need to introduce some commenting
guidelines? If it's Doxygen, then it's Doxygen, not that I really care
about specific choice, but I feel we need to be coherent in this regard.

    * Also the encoding is only set if it's utf-8:

+ if (gActiveCode==ROUT_CODE_UTF8) 
+ {
+         outStrm << "<meta http-equiv=\"Content-Type\""
+                        " content=\"text/html;charset=utf-8\" >" << endl;
+ }

      And if it's not? Is there a way to put the correct encoding in?
 
-- 
Sincerely yours,
Yury V. Zaytsev




Follow ups

References