kicad-developers team mailing list archive

Thread
Date

Re: UTF8 source files

To: Kicad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
From: Edwin van den Oetelaar <oetelaar.automatisering@xxxxxxxxx>
Date: Fri, 3 May 2013 10:07:30 +0200
In-reply-to: <20130503075857.GB23229@eris.logos.lan>

Thanks for your detailed explanation.

In my other code I put this : ( off topic joke )
(according to http://www.python.org/dev/peps/pep-0263/ )
#! /usr/bin/env python
# -*- coding: utf-8 -*-
The encoding should be on first or second line, so the python parsers
(and editors) can find it.

I found this reference :
http://stackoverflow.com/questions/688760/how-to-create-a-utf-8-string-literal-in-visual-c-2008
Personally I do not want to go all this trouble.
String Literals are '.DATA' and should not bring trouble to '.CODE',
so escaping is fine with me.
I work with VT-100 terminal-editor (remote support over slow
connection) lots of times, and can not display all these strange
characters.

Greetings,
Edwin van den Oetelaar

2013/5/3 Lorenzo Marcantonio <l.marcantonio@xxxxxxxxxxxx>:
> On Fri, May 03, 2013 at 09:43:29AM +0200, Edwin van den Oetelaar wrote:
>> Flashback : It reminds me of the problems that existed with
>> web-browsers, when people pasted stuff from their text-editor (like
>> Word) into html textarea boxes to publish articles... oh the old days.
>
> More or less the same... when the 'standard' wasn't Latin-1 but some
> ANSI encoding ('smart quotes' often where the culprit)
>
>> Maybe the UTF-8 should be 'escaped' using hex notation? How do other
>> folks handle this?
>
> It's just a policy thing to decide.
>
> The current C/C++ standard allows to use the \U notation but AFAIK it
> gives wide chars, not UTF8, since C is encoding agnostic. So you'd have
> to *use* yourself the encoding in the string like
>
> "This is mu:\xc2\xb5"
>
> or, since I don't know if \x is standard or a gcc extension (need to
> check)
>
> "This is mu:\302\265"
>
> Not exactly the most convenient thing to do, but keeps the sources in
> strict ISO646 (ASCII). That is the 'multibyte' string approach.
>
> The official C way (C++ too) would be to use wchars_t (which are
> 4 bytes big under Linux :D) and use
>
> L"This is mu:\U00B5"
>
> *If* you're using C++11 then you can say (C++11 is no more encoding
> agnostic!)
>
> u8"This is mu:\U00B5"
>
> And have an UTF8 string.
>
> So it's a choose-your-poison situation. More info here
> http://en.cppreference.com/w/cpp/language/string_literal
>
> Add to this the many ways that wx uses to handle string depending on
> version, build option and (probably) the current phase of the moon.
>
> --
> Lorenzo Marcantonio
> Logos Srl
>
> _______________________________________________
> Mailing list: https://launchpad.net/~kicad-developers
> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: UTF8 source files
From: Lorenzo Marcantonio, 2013-05-03

References

UTF8 source files
From: Dick Hollenbeck, 2013-05-03
Re: UTF8 source files
From: Edwin van den Oetelaar, 2013-05-03
Re: UTF8 source files
From: Lorenzo Marcantonio, 2013-05-03
Re: UTF8 source files
From: Edwin van den Oetelaar, 2013-05-03
Re: UTF8 source files
From: Lorenzo Marcantonio, 2013-05-03