cuneiform team mailing list archive

Thread
Date

Re: Planning a new release

To: cuneiform@xxxxxxxxxxxxxxxxxxx
From: "Yury V. Zaytsev" <yury@xxxxxxxxxx>
Date: Sat, 06 Jun 2009 14:55:40 +0400
In-reply-to: <42d23b2e0905120221n2e48edb0k6fa596b4fa81e561@mail.gmail.com>

Hi!

On Tue, 2009-05-12 at 12:21 +0300, Jussi Pakkanen wrote:

> I don't know any existing solution for this, so probably I'll have to
> do my own scripts. If someone knows a tool for this, do let me know.

While browsing for something else I stumbled upon an interesting Perl
class:

http://search.cpan.org/~rkrimen/String-Comments-Extract-0.02/

The good thing about this class is that it uses a tokenizer to extract
the comments, not just a bunch of regular expressions, so it correctly
handles the most obscure corner cases like comment-like structures
embedded in the code.

> Actually there is a simpler way:

Sounds good to me.

I don't know any Perl, but I might try to find some time to write a
recoder using this class as a starting point. However I am not sure what
would be the correct algorithm to follow...

1) Extract the comments from the file
2) Convert them from CP1251 to UTF-8 using iconv
3) For each comment, replace the old CP1251 string with a new UTF-8
string via a regular expression
4) Write the output to the file

I suspect that 3) might be unreliable. Maybe a better way would be to go
through the result of the comments extraction line-by-line and thus
restrict the replacements to one line only?

-- 
Sincerely yours,
Yury V. Zaytsev

References

Planning a new release
From: Jussi Pakkanen, 2009-05-06
Re: Planning a new release
From: Yury V. Zaytsev, 2009-05-06
Re: Planning a new release
From: Dmitry Polevoy, 2009-05-12
Re: Planning a new release
From: Jussi Pakkanen, 2009-05-12
Re: Planning a new release
From: Yury V. Zaytsev, 2009-05-12
Re: Planning a new release
From: Jussi Pakkanen, 2009-05-12