kicad-developers team mailing list archive
-
kicad-developers team
-
Mailing list archive
-
Message #22037
Re: [rfc] actual sexpression parsing
On Thu, Dec 17, 2015 at 10:32:10PM -0500, Mark Roszko wrote:
> So awhile back, Wayne said to use sexpr for something I wanted to do.
> Then I looked at the sexpr parsing and said NOPE.
OK, being a lisper and having read the dragon book here's my view on the
subject:
1) the parser isn't actually bad, it's mostly a recursive descent i.e.
it expects the *actual* grammar to be generated. You take the BNF
form, process it and write the corresponding state machine. It's not
VB6, it's a state machine :D
So you tailor a parser for a specific grammar; if you do it by hand
good luck if the grammar changes (there are tools like yacc that
generate parsers but that's beyond the scope)
2) Given the point before it doesn't actually do sexps because it needs
the type in the grammar beforehand. Given:
(module smd:C0603 (layer F.Cu) (tedit 52108AB5) (tstamp 558BEA9B) ...)
The parser *must* know that module, layer, tedit and tstamp are
atoms, and where are strings and hex values placed. So the grammar
should be (ignoring the fact that tedit and tstamp forms are
optional):
(module <string> (layer <string>) (tedit <hex>) (tstamp <hex>) ...)
Any true sexp form has types self-evident (exactly how depends on the
lisp flavour you're using, in common lisp reader macros define the
behaviour), so the reader (the tokenizer in two piece parsers) can
parse it WITHOUT knowing the grammar beforehand. The common lisp way
(actually any lisp except for the hex numbers) would be:
(module "smd:C0603" (layer "F.Cu") (tedit #x52108AB5) (tstamp #x558BEA9B) ...)
...actually I adapted the pcbnew code (it's about a ten line diff:P)
to emit this
(module "smd:C0603" (layer "F.Cu") (tedit 52108AB5) (tstamp 558BEA9B) ...)
...which is fully backward compatible with 'official' kicad parser.
Just quote every string; it also get correctly font locked in emacs :D
Hex values are only used for these fields AFAIK, I didn't special
cased them in kicad but in lisp...
3) Using an XML similitude, usual sexp processing in lisp follows
something like a DOM model: 'read' (that's the actual function name)
pull up the whole sexp in memory (yes the whole file, in this case!
there is a way to do partial processing but it's quite advanced and
you need to temporarily rebind the system reader) and then you
process it with your favourite list mangler; too bad you would need a
full lisp environment to do it in pcbnew :P However something like
SAX would be quite easy to implement; events would be 'start of list'
'end of list' 'atom' 'string' 'number' and such
4) Even in lisp *you have parser generators*! it's something like 40
years they've been used for, like, everything. Kicad has something
for keywords but the grammar is still hand coded. That's the major
flaw, IMHO
As for the performance issue I think these are non-existant. It's I/O
code and it's done only once every ten minutes, if you work like me...
probably the kernel works a lot more to handle buffers and schedule the
disk I/O than pcbnew to form or decode the sexps. OTOH Dick is famous to
micromanage performance stuff (like not checking types with dynamic_cast
because it's more expensive than reading the object type tag:P), so that
would be a given :D
--
Lorenzo Marcantonio
CZ Srl - Parma
References