kicad-developers team mailing list archive

Thread
Date

Re: Re: We should decide a quoting convention...

To: kicad-devel@xxxxxxxxxxxxxxx
From: Dick Hollenbeck <dick@...>
Date: Tue, 22 Dec 2009 11:54:24 -0600
In-reply-to: <hgqsig+eh5p@eGroups.com>
Scanners: none
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

vladimir_uryvaev wrote:

--- In kicad-devel@xxxxxxxxxxxxxxx, Manveru <manveru@...> wrote:

For example, sequence of "`" (grave) character followed by character with
code n, should be treated as character code (n^0x20). So symbols get these
escape codes: '\n' -> '`J', '"' -> '`b', '#' -> '`c', '`' -> '` ' etc.

Is it reinventing the wheel? This is most common these days in open source
projects having test formats to support UTF-8 with most classic C escape
sequences with \x or \dddd (where x are letters and dddd are codes). Then
string are enclosed in " (double-quotes) and multiline is divided by \â†µ
(backslash+cr).


It is inventing better wheel, \x and \nnn codes are just a waste of space, as they require 2 times more memory. =) But I have nothing more against this. I've just shown principle. But, why should we use C-style? There are many fine escape coding standards, for example, URL encoding (%xx).

Also, do we really need multiline? Schematics/PCB format is not primarily human readable format. Multiline will only complicate parsers. If human needs to read schematic file, (s)he can turn on line wrapping in text editor.
I also think that quotes(") are redundant in file format. If spaces and linefeeds are escape coded (%20 and %0A), parser just can stop reading text string at space or linefeed.

Keep it simple!

We will keep it simple, and I admit that there are a couple minor holesin the lisp-like format that we need to plug.


In general however, my thinking is this:

Any such file is to be interpreted as a blend of ASCII sequences withintermittent UTF8 sequences. The ASCII sequences are the keywords, '(',and ')' delimiters, everything except a quoted string is ASCII.


The UTF8 sequences are reserved ONLY for quoted strings.

Quoted strings are required for ONLY for tokens which must includeeither a) one of the ASCII white space characters, or b) a non ASCIIcharacter, or c) ')' or '('.

Within a quoted string, it is assumed to be UTF8, no exceptions, andtherefore inherently supports all international 16 bit characters.


With this understanding the problem is reduced to quoted strings, and

A) differentiating the leading and trailing quote from a quote characterwithin the quoted string, and

B) as aid for human readability, some consideration might be given tothe handling of new lines, so that they do not screw up the prettyindenting that these files typically have.



For example:

(multiline_text "ABC" "DEF" )

the parser can recombine the ABC and DEF into "ABC\nDEF" when it seesT_multiline_text.

B) is up to the grammar designer, as it happens at the parser level, notat the lexer level. Only A) is a DSNLEXER issue.


Another designer might allow

"ABC
DEF"

And decide human readability of this file is not so important.

Dick

Follow ups

Re: We should decide a quoting convention...
From: vladimir_uryvaev, 2009-12-22

References

Re: We should decide a quoting convention...
From: vladimir_uryvaev, 2009-12-22