← Back to team overview

kicad-developers team mailing list archive

Re: [rfc] actual sexpression parsing

 

Hi David

This discussion about binary formats was just a side comment from
someone, this is off topic. Please use your energy on something
usefull, like commenting on Roszko's patch instead. Which is about
making am more sane sexpr parser than we have in KiCad at the moment.
Please try to be constructive.

2015-12-20 4:51 GMT+01:00 David Godfrey <info@xxxxxxxxxxx>:
> Hi,
>
> I don't often post here, but I will add my ~30 years of experience with
> binary file formats.
> THEY ARE EVIL EVIL THINGS !!
>
> They tend to be extremely fragile, any change and something is likely to
> break.
> They require parsers that suck to maintain
> In most cases general purpose parsers *CAN NOT* be used, requiring a custom
> parser.
> it is *VERY* difficult to keep compatibility with an older version of the
> format.
> Essentially you need to keep multiple copies of the parser and have some
> form of VersionID that can be used to select between parsers.
> To all intents it is *IMPOSSIBLE* to handle forward compatibility.
> ie: open a file with version 1.1 that was written with version 1.1a
> A permanent and accurate copy of documentation is required to be maintained
> for *EVERY* version of the parser ever written
> Parser VersionID magic must uniquely identify the parser version and be the
> first entry in the file.
> NOTE: other locations are possible, but cause significant problems with
> extracting the version
> The more involved the protocol the worse all of these issues get.
> Documentation is hard to write, and even harder to read and understand
> sufficiently to code against.
> Debugging read/write issues is an absolute nightmare.
> You either have to provide a reference to test against by either
>
> hand encode or decode a file
> have an independent programmer write a "clean room" implementation of the
> parser
> This requires the previously mentioned documentation to be correct for this
> version
> It can introduce additional bugs (you now have 2 parsers to debug)
>
> Many potential developers of addons or core code may
>
> be scared off by the complexity
> accidentally introduce bugs or incompatibilities that don't show up until a
> user looses data
>
> Inadvertently force full or partial Vendor Lock In.
> If a protocol (file format) is
>
> too hard to implement
> too hard to keep up with changes
> too prone to breakage (fragile)
> too obscure (poorly documented or the documentation is not easily available)
>
> Then other vendors won't put in the effort to support the protocol (file
> format)
>
> I could go on with a lot more points but you get the picture.
>
> Binary file formats made a lot of sense back in the days of
>
> low speed serial communications
> very small storage devices
> expensive data links (anyone remember gprs @ 10c per kilo byte?)
>
> Expensive GPRS lead to WAP which was (roughly) a tokenised binary HTML/XML
> It was not widely used due to some of the above problems, and it also scared
> off content developers
>
> memory constrained systems like
>
> Old PC's (486 and earlier)
> old microcontrollers were short on memory and clock speed
> some modern microcontrollers can be affected here, but only the lower end
> ones
>
> slow radio links (bluetooth and wifi are not slow)
>
> They do make sense when used internally to a compression algorithm or an
> encryption scheme.
> They don't make sense in (almost) all other cases today.
> (No flames here please, I am not trolling, just making a very generalized
> statement)
>
>
> For a format like .png binary is not so bad, the format is unlikely to
> change much if ever (even so there are at least 6 different png formats out
> there, some of them ascii and some binary), but it is a simple format to
> describe and implement. Any program that works with png's needs to implement
> support for all versions of png, or clearly explain to the user what is and
> is not supported with appropriate error messages. Any png programs that
> don't support all versions tend to fall into disuse and die a slow death.
>
> Text based formats on the other hand, if well designed, are (to a fair
> extent)
>
> human readable
> self documenting (although good documentation is *ALWAYS* recommended
> general purpose parsers can be used allowing our code to focus on using the
> information, not extracting it
> tolerant of extra nodes (features)
> tolerant of missing nodes
> often a newer version of a parser can parse older versions of a file without
> problem.
> It is trivial for a newer version to cater for specific version based
> variations in a protocol.
> If the protocol is well designed new versions will only add nodes to the
> previous version, never alter the way data is stored.
>
> This allows a newer parser to parse any historical version file.
> In almost all cases an older version parser can parse a newer file, but
>
> any nodes that are unknown will just be skipped.
> Skipped nodes may or may not be a problem for the way a use sees the result,
> but there should be a lockout preventing the file from being saved.
> If the lockout can be overridden, then we should force a filename change
> Ideally any unknown nodes should be retained in the tree and written back
> when the file is saved.
> This can get complicated especially if associated nodes have been modified
>
> Easily debugged as the human readable nature allows direct inspection to
> validate what has been read/written
>
> I have been writing software for PC's and microcontrollers since the early
> 1980's much of the time dealing with communications between different pieces
> of hardware and invariably the binary protocols take up 90% of the debugging
> time and way too much of the documentation and coding time.
>
>
> I'll climb down from my soapbox now, but I hope I have assisted people's
> understanding of Wayne's refusal to use binary formats.
>
> Regards
> David G
>
>
> On 19/12/15 05:46, Wayne Stambaugh wrote:
>
> Mark,
>
> I haven't had time to look at your commit so I'm not going to comment on
> that until I do.  I don't know when I'll have time as I have a lot of
> stuff to do and I'm traveling over the holidays so my review time will
> be limited.  In the mean time I will comment on some of the things I've
> read in this thread.
>
> Binary file formats are not going happen while I'm project leader. I
> have 30 years of less than positive experience with them and I'm not
> about to start using them now.
>
> Be careful with the C++ << operators. They may not do the correct thing
> with floating point numbers. There may also be some overhead that the
> current design does not have.  I will not accept a performance hit on
> file parsing or any significant increase in formatted file size.
>
> On 12/17/2015 10:32 PM, Mark Roszko wrote:
>
> So awhile back, Wayne said to use sexpr for something I wanted to do.
> Then I looked at the sexpr parsing and said NOPE.
>
>
> Why NOPE? Because the current parsing regime is basically Visual Basic
> 6 parser written in a modern language with micro-optimizations meant
> for someone running a Windows 3.1 computer with 5.25" storage drives.
>
>
> This patch is a propsed sexpr parser that parses sepxr like its sexpr
> and not a parenthesis format. Because if you are parsing with things
> like NeedLeft() and NeedRight(), you are parsing it wrong.
>
> Especially when you see something like this:
>
>             case T_descr:
>                 if( sawDesc )
>                     in->Duplicate( tok );
>                 sawDesc = true;
>                 in->NeedSYMBOLorNUMBER();
>                 row.SetDescr( in->FromUTF8() );
>                 break;
>
>
> but then you realize that fp_lib_table has quoted strings for MOST
> descr entries. And you question why the method is called
> NeedSYMBOLorNumber when its a string with quotes. Then you go deeper
> and realize the parser may as well be a csv reader/writer. You either
> parse like sexpressions or you stop calling it sexpressions.
>
>
>
> New proposed system:
> 1. Reads and generates an in memory tree structure of data as it
> should be, i.e. lists, strings, numbers, etc.
>
> I'm ok with the memory tree structure as long as it doesn't add any
> significant parsing or conversion to internal object overhead.
>
> 2. Helpers to pull out each item as need be
>
> OK
>
> 3. Backwards compatible
>
> Backwards compatibility is not optional.
>
> 4. Doesn't do silly keywords micro optimization at compile time. You
> do a string comparison to convert the value to integer anyway, using
> if/else is no different each time. Kicad isn't parsing gigabyte sized
> files nor hundreds of files, this optimization really isn't worth the
> overhead in maintenance.
>
> This change will probably kill your performance as the file format gets
> new features which it will.  I could be wrong but there is not going to
> be much faster token look up than integers.  Before you decide this
> optimization isn't worth it, you need to get an 8 layer+ high density
> board file from someone and do some testing.
>
> 5. Generate saved files from in memory tree structures, this will
> avoid all possible formatting irregularities and differences because
> someone handwrote unrolling all the data members.
>
> I need to see how your doing this because keeping both the memory tree
> and the board objects in memory doesn't make a lot of sense to me.
>
> 6. Avoid things like " ${KIGITHUB}/Air_Coils_SML_NEOSID.pretty" being
> defined as a symbol instead of a string in the future.
> 7. Explicit definitions of symbols and strings. Strings are always
> quoted. Period. No silly auto-detection logic.
>
>
> So my first goal is to have 1:1 parity with the existing stuff for
> kicad files for both reading and writing.
>
>
> Benchmarks:
> Old fp-lib-table read: 1ms
> New fp-lib-table read: 2ms
>
> Here's the actual commit for fp-lib-table:
> https://github.com/marekr/kicad-sexpr/commit/9367f469be69962d14671411eddd6fd759ace1f2
>
> Not expecting anyone to compile it or anything, more input that
> anything. Yes its messy as its an initial proof of concept.
>
>
>
> Yes, string parsing and writing isn't escaping properly, TBD (easy, just
> lazy).
> There would probably be a smaller kicad_sexpr wrapper to implement
> common sexpr pattern helpers such as the "list key-value pair" that's
> used.
>
>
>
> Also played around with .kicad-pcb but its not committed:
> Old read: ~350ms
> New read: ~230ms
>
>
> I await the abuse.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~kicad-developers
> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~kicad-developers
> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~kicad-developers
> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~kicad-developers
> More help   : https://help.launchpad.net/ListHelp
>


References