← Back to team overview

kicad-developers team mailing list archive

Re: [rfc] actual sexpression parsing

 

On Thu, Dec 17, 2015 at 10:32:10PM -0500, Mark Roszko wrote:
> So awhile back, Wayne said to use sexpr for something I wanted to do.
> Then I looked at the sexpr parsing and said NOPE.

OK, being a lisper and having read the dragon book here's my view on the
subject:

1) the parser isn't actually bad, it's mostly a recursive descent i.e.
   it expects the *actual* grammar to be generated. You take the BNF
   form, process it and write the corresponding state machine. It's not
   VB6, it's a state machine :D

   So you tailor a parser for a specific grammar; if you do it by hand
   good luck if the grammar changes (there are tools like yacc that
   generate parsers but that's beyond the scope)

2) Given the point before it doesn't actually do sexps because it needs
   the type in the grammar beforehand. Given:

   (module smd:C0603 (layer F.Cu) (tedit 52108AB5) (tstamp 558BEA9B) ...)

   The parser *must* know that module, layer, tedit and tstamp are
   atoms, and where are strings and hex values placed. So the grammar
   should be (ignoring the fact that tedit and tstamp forms are
   optional):

   (module <string> (layer <string>) (tedit <hex>) (tstamp <hex>) ...)

   Any true sexp form has types self-evident (exactly how depends on the
   lisp flavour you're using, in common lisp reader macros define the
   behaviour), so the reader (the tokenizer in two piece parsers) can
   parse it WITHOUT knowing the grammar beforehand. The common lisp way
   (actually any lisp except for the hex numbers) would be:

   (module "smd:C0603" (layer "F.Cu") (tedit #x52108AB5) (tstamp #x558BEA9B) ...)

   ...actually I adapted the pcbnew code (it's about a ten line diff:P)
   to emit this

   (module "smd:C0603" (layer "F.Cu") (tedit 52108AB5) (tstamp 558BEA9B) ...)

   ...which is fully backward compatible with 'official' kicad parser.
   Just quote every string; it also get correctly font locked in emacs :D
   Hex values are only used for these fields AFAIK, I didn't special
   cased them in kicad but in lisp...

3) Using an XML similitude, usual sexp processing in lisp follows
   something like a DOM model: 'read' (that's the actual function name)
   pull up the whole sexp in memory (yes the whole file, in this case!
   there is a way to do partial processing but it's quite advanced and
   you need to temporarily rebind the system reader) and then you
   process it with your favourite list mangler; too bad you would need a
   full lisp environment to do it in pcbnew :P However something like
   SAX would be quite easy to implement; events would be 'start of list'
   'end of list' 'atom' 'string' 'number' and such

4) Even in lisp *you have parser generators*! it's something like 40
   years they've been used for, like, everything. Kicad has something
   for keywords but the grammar is still hand coded. That's the major
   flaw, IMHO

As for the performance issue I think these are non-existant. It's I/O
code and it's done only once every ten minutes, if you work like me...
probably the kernel works a lot more to handle buffers and schedule the
disk I/O than pcbnew to form or decode the sexps. OTOH Dick is famous to
micromanage performance stuff (like not checking types with dynamic_cast
because it's more expensive than reading the object type tag:P), so that
would be a given :D

-- 
Lorenzo Marcantonio
CZ Srl - Parma


References