kicad-developers team mailing list archive
Mailing list archive
Re: Re: We should decide a quoting convention...
--- In kicad-devel@xxxxxxxxxxxxxxx, Manveru <manveru@...> wrote:
For example, sequence of "`" (grave) character followed by character with
code n, should be treated as character code (n^0x20). So symbols get these
escape codes: '\n' -> '`J', '"' -> '`b', '#' -> '`c', '`' -> '` ' etc.
Is it reinventing the wheel? This is most common these days in open source
projects having test formats to support UTF-8 with most classic C escape
sequences with \x or \dddd (where x are letters and dddd are codes). Then
string are enclosed in " (double-quotes) and multiline is divided by \â†µ
It is inventing better wheel, \x and \nnn codes are just a waste of space, as they require 2 times more memory. =) But I have nothing more against this. I've just shown principle. But, why should we use C-style? There are many fine escape coding standards, for example, URL encoding (%xx).
Also, do we really need multiline? Schematics/PCB format is not primarily human readable format. Multiline will only complicate parsers. If human needs to read schematic file, (s)he can turn on line wrapping in text editor.
I also think that quotes(") are redundant in file format. If spaces and linefeeds are escape coded (%20 and %0A), parser just can stop reading text string at space or linefeed.
Keep it simple!
We will keep it simple, and I admit that there are a couple minor holes
in the lisp-like format that we need to plug.
In general however, my thinking is this:
Any such file is to be interpreted as a blend of ASCII sequences with
intermittent UTF8 sequences. The ASCII sequences are the keywords, '(',
and ')' delimiters, everything except a quoted string is ASCII.
The UTF8 sequences are reserved ONLY for quoted strings.
Quoted strings are required for ONLY for tokens which must include
either a) one of the ASCII white space characters, or b) a non ASCII
character, or c) ')' or '('.
Within a quoted string, it is assumed to be UTF8, no exceptions, and
therefore inherently supports all international 16 bit characters.
With this understanding the problem is reduced to quoted strings, and
A) differentiating the leading and trailing quote from a quote character
within the quoted string, and
B) as aid for human readability, some consideration might be given to
the handling of new lines, so that they do not screw up the pretty
indenting that these files typically have.
(multiline_text "ABC" "DEF" )
the parser can recombine the ABC and DEF into "ABC\nDEF" when it sees
B) is up to the grammar designer, as it happens at the parser level, not
at the lexer level. Only A) is a DSNLEXER issue.
Another designer might allow
And decide human readability of this file is not so important.