cuneiform team mailing list archive
-
cuneiform team
-
Mailing list archive
-
Message #00306
Re: Planning a new release
Hi!
On Tue, 2009-05-12 at 12:21 +0300, Jussi Pakkanen wrote:
> I don't know any existing solution for this, so probably I'll have to
> do my own scripts. If someone knows a tool for this, do let me know.
While browsing for something else I stumbled upon an interesting Perl
class:
http://search.cpan.org/~rkrimen/String-Comments-Extract-0.02/
The good thing about this class is that it uses a tokenizer to extract
the comments, not just a bunch of regular expressions, so it correctly
handles the most obscure corner cases like comment-like structures
embedded in the code.
> Actually there is a simpler way:
Sounds good to me.
I don't know any Perl, but I might try to find some time to write a
recoder using this class as a starting point. However I am not sure what
would be the correct algorithm to follow...
1) Extract the comments from the file
2) Convert them from CP1251 to UTF-8 using iconv
3) For each comment, replace the old CP1251 string with a new UTF-8
string via a regular expression
4) Write the output to the file
I suspect that 3) might be unreliable. Maybe a better way would be to go
through the result of the comments extraction line-by-line and thus
restrict the replacements to one line only?
--
Sincerely yours,
Yury V. Zaytsev
References