kicad-developers team mailing list archive

Thread
Date

Re: [rfc] actual sexpression parsing

To: Kicad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
From: Lorenzo Marcantonio <l.marcantonio@xxxxxxxxxxxxx>
Date: Fri, 18 Dec 2015 14:17:47 +0100
In-reply-to: <CAJjB1qJWb6ZFitkRP9UUq-LjsM-YtuT6mrRJTwhaW_w1fa6usQ@mail.gmail.com>
Mail-followup-to: Kicad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.24 (2015-08-30)

On Fri, Dec 18, 2015 at 07:55:07AM -0500, Mark Roszko wrote:
> Writing a new state machine for every single list and every single
> file over and over again is the part I have problems with. There
> should be a single state machine that takes the tokens and gives you a
> list. Not 500 over the whole codebase.

Also, re-read eventually the part about parser generation. And think
about grammar changes...

> The definition of sanity is not splitting it into many data forms.

It's nonetheless a 'curious' engineering approach:D

> >3) Using an XML similitude, usual sexp processing in lisp follows
>    something like a DOM model
> 
> Yea that was the plan when I structured my end result. Walking it
> later is trivial.

I'd suggest to use a proper list/vector container instead of the cons
approach (it was meant to be a joke). Cons handling is trickier without
the lisp runtime at hand :D

In pseudo-BNF

list :- sequence-of list-element
list-element :- one-of(symbol, string, number, whatever, list)

The sequence-of could be a vector of base pointers using push_back, the
one-of is obviously modeled with inheritance (if it were C a union would
be fine...). As for the lexing strategy: the traditional lisp reader has
*no* lookahead and dispatch on the first character:

- '(' starts a list
- [0123456789.+-] starts a number
- '"' starts a string
- a letter start a symbol
- whitespace is eaten
- other characters trigger specific behaviour (like the '#' main macro
  character)

*if* you want to keep string quoting optional then you can't distinguish
a string from a symbol (because depends on the semantic grammar which
the reader doesn't have access to). Then you have to match keywords as
string, not elegant but doable.

> I'm more for manual walking of the lists after the fact than trying to
> use an event based one. I don't see a benefit really and rather see it
> increase complexity with needing callback classes when manual
> unrolling should work fairly well BUT i am not exactly happy with
> manual unrolling looks so its something to play with.

Given the relatively low amount of data to process a DOM approach is
quite feasible. Keep an iterator on the current list handy and loop
away. There are plenty of matching/binding/unifying/destructuring
methods to use when you have the whole list already in core. Personally
I would use a recursive descent driven by the tree elements (*not*
directly by the input file, as it is now); it should be the easiest to
do by hand.

-- 
Lorenzo Marcantonio
CZ Srl - Parma

References

[rfc] actual sexpression parsing
From: Mark Roszko, 2015-12-18
Re: [rfc] actual sexpression parsing
From: Mark Roszko, 2015-12-18
Re: [rfc] actual sexpression parsing
From: tiger12506, 2015-12-18
Re: [rfc] actual sexpression parsing
From: Mark Roszko, 2015-12-18
Re: [rfc] actual sexpression parsing
From: Lorenzo Marcantonio, 2015-12-18
Re: [rfc] actual sexpression parsing
From: Mark Roszko, 2015-12-18