← Back to team overview

cuneiform team mailing list archive

Re: Planning a new release

 

Hi!

On Tue, 2009-05-12 at 12:21 +0300, Jussi Pakkanen wrote:

> I don't know any existing solution for this, so probably I'll have to
> do my own scripts. If someone knows a tool for this, do let me know.

While browsing for something else I stumbled upon an interesting Perl
class:

http://search.cpan.org/~rkrimen/String-Comments-Extract-0.02/

The good thing about this class is that it uses a tokenizer to extract
the comments, not just a bunch of regular expressions, so it correctly
handles the most obscure corner cases like comment-like structures
embedded in the code.

> Actually there is a simpler way:

Sounds good to me.

I don't know any Perl, but I might try to find some time to write a
recoder using this class as a starting point. However I am not sure what
would be the correct algorithm to follow...

1) Extract the comments from the file
2) Convert them from CP1251 to UTF-8 using iconv
3) For each comment, replace the old CP1251 string with a new UTF-8
string via a regular expression
4) Write the output to the file

I suspect that 3) might be unreliable. Maybe a better way would be to go
through the result of the comments extraction line-by-line and thus
restrict the replacements to one line only?
 
-- 
Sincerely yours,
Yury V. Zaytsev




References