kicad-developers team mailing list archive

Thread
Date
Re: [RFC] Symbol library file format

To: John Beard <john.j.beard@xxxxxxxxx>
From: Andrew Lutsenko <anlutsenko@xxxxxxxxx>
Date: Thu, 3 Jan 2019 11:51:34 -0800
Cc: KiCad Developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <73BA8178-D5E0-4C13-A4EF-C796A980E8AB@gmail.com>
> The important thing is what is in the file. If nothing else, S-exp is a
concise way to express this concept during development. Exact format
representation in disk is, right now, bikeshedding.
The important thing is a clean data model and formal IDL like proto helps
with that immensely. If you read through Wayne's google doc comments you
will see how tightly coupled is the discussion of the data model to file
format, which should not be the case. That does not lend itself well to
extensibility and evolution of the data model which is natural part of any
actively developed software.
Thomas pointed out excellent example of issues with current approach in the
other reply. That is very similar to my experience of digging in pcbnew
internals but it's applicable to eeschema as well.

Imagine if pcbnew plugin API was proto based. No more swig quirks for devs
to take care of, no more plugin breakages on every tiny change in Cpp
source, data exchange between main app and plugin is straightforward, they
agree on data model because it has an actual formal definition.

> On the subject of parsers, as opposed to formats, I strongly suggest not
to use the pcbnew one, as it is highly bound up in the data rather than the
syntax. The beauty of sexp is the abstract nature, which lends itself will
to extensible formats. How it is implemented in pcbnew is quite contrary to
that aim and naturally leads to fragility of the parser (for example
unexpected, but syntactically valid, fields can be lethal).

Or you could just use a library that takes care of parsing for you, does it
in all major languages for free, represents formal definition of your data
model and will not break when you add a field.

And to Jose's point
> the S expression parser is already implemented and it works fine, it is
trivial to convert s-expressions to any other data representation you like,
be it, xml, json or whatever comes up next week in NPM
Yeah it works fine but only in cpp and if you want to use it outside of
KiCad codebase you will spend hours reading code to determine actual
format. It is trivial to convert, as long as you put in the aforementioned
hours AND nothing ever changes in the format. Oh and protos have been there
before NPM was a thing and will likely outlive it too (at least one of
thrift or proto definitely will).

> I think useful comments to the proposed format should see beyond the
actual low level representation of the data and talk about the overall
model being used to store it.
I agree completely. Protobufs help with decoupling that.

Regards,
Andrew

On Thu, Jan 3, 2019 at 11:02 AM John Beard <john.j.beard@xxxxxxxxx> wrote:

> I agree. The important thing is what is in the file. If nothing else,
> S-exp is a concise way to express this concept during development. Exact
> format representation in disk is, right now, bikeshedding.
>
> When we get to that stage, all that is required is that the format is VCS
> friendly and human readable. I suggest that we work out what will be
> represented in the v6 files, then work with a preliminary format at first
> while we implement the kicad data structure handling.
>
> Sexp is probably the simpler path as we have both the pseudo-sexp parser
> used in pcbnew, as well as a "real" one for STEP export (I can't check
> details now, I'm not at my computer).
>
> On the subject of parsers, as opposed to formats, I strongly suggest not
> to use the pcbnew one, as it is highly bound up in the data rather than the
> syntax. The beauty of sexp is the abstract nature, which lends itself will
> to extensible formats. How it is implemented in pcbnew is quite contrary to
> that aim and naturally leads to fragility of the parser (for example
> unexpected, but syntactically valid, fields can be lethal).
>
> Whatever the format, we should strive to separate the handling of the
> syntax of the file and the meaning of the data therein. By doing this, the
> format layer can be swapped and tweaked without reference to the data
> model, and the data model can be changed without touching a byte of the
> format parser/writers.
>
> Cheers,
>
> John
>
>
>
>
> On 3 January 2019 18:06:45 GMT, "José Ignacio" <jose.cyborg@xxxxxxxxx>
> wrote:
>>
>> I think all this babble about data representations to be pointless and
>> counterproductive. the S expression parser is already implemented and it
>> works fine, it is trivial to convert s-expressions to any other data
>> representation you like, be it, xml, json or whatever comes up next week in
>> NPM. The issue with the file format is really to come up with a good data
>> model to represent the objects in kicad, and neither protobufs nor any of
>> the other guys really does anything for us in that area, if anything that
>> is the input that needs to be given to whatever parser generator, or
>> manually generated parser process we choose to utilize. I think useful
>> comments to the proposed format should see beyond the actual low level
>> representation of the data and talk about the overall model being used to
>> store it.
>>
>> On Thu, Jan 3, 2019 at 3:37 AM Andrew Lutsenko <anlutsenko@xxxxxxxxx>
>> wrote:
>>
>>> Hi Martijn,
>>> My guess is that most of the complexity is in switching to new data
>>> model in eeschema, which is going to happen anyway. Since with protobufs
>>> you don't concern yourself with parsing data the only tricky thing would be
>>> to seamlessly integrate proto codegen into build process on all platforms.
>>> Since eeschema data model is going to change without backward
>>> compatibility, file compatibility is out of the window too so this is good
>>> opportunity to change underlying format to something that is better in the
>>> long run. If we decide to go this route than ideally pcbnew would switch to
>>> proto at some point too, not necessarily in v6.
>>>
>>> > Would it be feasible to support both in V6 and let the future tell us
>>> which one would prevail?
>>> What do you mean by prevailing? It's clear at this point that
>>> s-expressions are not a winner of global trends while I can guarantee that
>>> sizable portion of the data in the internet (except probably porn, lol) is
>>> exchanged in protobufs and thrift since all tech giants use them.
>>> If you mean in terms of what KiCad users prefer I don't think vast
>>> majority would care either way. Both formats are easy to edit directly and
>>> protos are much better for devs because codegen takes all the routine out
>>> of data manipulation.
>>> While having both formats coexist is theoretically possible I think it
>>> will only split the user base since people will pick one and stick to it. I
>>> think this is the case where devs should be empowered to make the choice,
>>> same as when switch to s-expressions happened.
>>> Of course new version still needs to be able to read old files or at
>>> least some conversion utility needs to be provided to migrate projects to
>>> v6 so s-expressions code will not be going away for some time.
>>>
>>> Andrew
>>>
>>> On Wed, Jan 2, 2019 at 11:53 PM Martijn Kuipers <
>>> martijn.kuipers@xxxxxxxxx> wrote:
>>>
>>>>
>>>>
>>>> On 3 Jan 2019, at 04:17, Andrew Lutsenko <anlutsenko@xxxxxxxxx> wrote:
>>>>
>>>> Wayne,
>>>>
>>>> > There are some interesting and practical concepts with protobuf but
>>>> it's
>>>> functionally a binary storage method which I am opposed to.
>>>>
>>>> That is a somewhat common misconception because protobufs are
>>>> frequently used for efficient storage/transfer in binary format. But it's
>>>> not tied to that format at all, at it's core protobufs are just a way of
>>>> defining well structured data, nothing more. It comes with bells and
>>>> whistles like ability to serialize it in various ways, binary being one of
>>>> them.
>>>>
>>>> > Encoding and decoding to a text format would be an acceptable
>>>> solution.
>>>> Perfect, here is example of built in proto text encoder, it resembles
>>>> JSON in that it uses curly braces to encase submessages but doesn't abuse
>>>> punctuation marks unnecessarily.
>>>>
>>>> user_collection {
>>>>   description = "my default users"
>>>>   users {
>>>>     key: "user_1234"
>>>>     value {
>>>>       handle: "winniepoo"
>>>>       paid_membership: true
>>>>     }
>>>>   }
>>>>   users {
>>>>     key: "user_9b27"
>>>>     value {
>>>>       handle: "smokeybear"
>>>>     }
>>>>   }}
>>>>
>>>> > There is also the issue of learning curve and another build
>>>> dependency.
>>>> Yes, that is inevitable but benefits I outlined in my earlier email are
>>>> too significant to overlook in my opinion.
>>>>
>>>> > S-expr is not at all like XML at least not in terms of readability.
>>>> It actually is a lot like XML, just less pointy brackets. Same
>>>> arbitrary distinctions between attributes (which are not even named in
>>>> S-expr, which adds to confusion when glancing at the file) and subfields.
>>>>
>>>> Here is real example:
>>>>     (fp_text value 330K (at 0 -1.75) (layer B.Fab)
>>>>       (effects (font (size 1 1) (thickness 0.15)) (justify mirror))
>>>>     )
>>>> or
>>>>     (pad 2 smd rect (at 0.95 0 180) (size 0.7 1.3) (layers B.Cu B.Paste
>>>> B.Mask)
>>>>       (net 95 "Net-(R17-Pad2)"))
>>>>
>>>> Without reading file format docs and/or source code I have no idea why
>>>> some data is in subfields and some data is in some special fixed order on
>>>> the same level as it's container.
>>>> It's also very easy to confuse "330K" being related to "value" when in
>>>> fact "value" is field name and "330K" is the value. In proto that would
>>>> look like this:
>>>>
>>>> fp_text {
>>>>   name: "value"
>>>>   value: "330K"
>>>>   at {
>>>>     x: 0
>>>>     y: -1.75
>>>>   }
>>>>   layer: "B.Fab"
>>>>   effects {
>>>>     ...
>>>>   }
>>>> }
>>>>
>>>> But I do agree on the point that all markup formats have their
>>>> downsides. For example json/yaml or proto text format would be less dense
>>>> per line, however that may be an upside when looked at it from version
>>>> control and diff-ing perspective.
>>>>
>>>> > To me self documenting means that the file format doesn't even
>>>> require a
>>>> > document to explain it's contents.  It should be self evident from the
>>>> > contents of the file.
>>>> What I meant by self documenting was that the document describing the
>>>> format and document implementing the format (actual source code) was the
>>>> same.
>>>> But even as evident from example above, proto text format yields file
>>>> contents that are pretty well self documented too.
>>>>
>>>> > Changing file formats would be a
>>>> > benefit but why would we need to do that?  If we have a human readable
>>>> > file format that can be parsed easily and quickly by a computer, what
>>>> > other criteria do we need in a file format?
>>>>
>>>> See benefit #3 from my first message. By using something standard we
>>>> make it so much easier to expand on KiCad ecosystem in other languages. I
>>>> already made an example as a web viewer, here is another: someone may write
>>>> a plugin for java autorouter software that will read kicad files directly.
>>>> S-expressions was likely a good choice when it was made but today it's far
>>>> from widespread and is pretty much unsupported in most languages so the
>>>> burden is wholly on the developers. Argument can be made that while
>>>> S-expressions are both human and computer readable it excels at neither
>>>> since humans still need supporting documentation or source and computers
>>>> need custom written libraries.
>>>>
>>>> Regards,
>>>> Andrew
>>>>
>>>> and Happy New Year :)
>>>>
>>>> On Wed, Jan 2, 2019 at 8:01 AM Wayne Stambaugh <stambaughw@xxxxxxxxx>
>>>> wrote:
>>>>
>>>>> On 1/2/2019 5:24 AM, kristoffer Ödmark wrote:
>>>>> > I like the idea of using something as Protobuf and I agree fully with
>>>>> > the benefits, especially since one can add/remove fields with minimal
>>>>> > impact.
>>>>>
>>>>> There are some interesting and practical concepts with protobuf but
>>>>> it's
>>>>> functionally a binary storage method which I am opposed to.  Encoding
>>>>> and decoding to a text format would be an acceptable solution.  There
>>>>> is
>>>>> also the issue of learning curve and another build dependency.
>>>>>
>>>>> >
>>>>> > Basically the S-expression system used now is looking very much like
>>>>> a
>>>>> > reinvented XML to me anyway, and storing protobuf-defined stuff as
>>>>> XML
>>>>> > or similar seems actually nice.
>>>>>
>>>>> S-expr is not at all like XML at least not in terms of readability.
>>>>> Obviously there are an infinite number of ways to store information.  I
>>>>> do find it amusing and somewhat telling that there are so many markdown
>>>>> formats available these days.  I think the jury has spoken on the
>>>>> readability of markup formats.
>>>>>
>>>>> > There is one catch, and that is that we have to support opening a
>>>>> newer
>>>>> > file, in an old software, and then store it again, without losing
>>>>> data
>>>>> > that the software is not aware of. Or implement a way of not being
>>>>> able
>>>>> > to store values in older software, when they open something newer.
>>>>>
>>>>> This is the reason that we have not implemented this in our own file
>>>>> formats.  I don't see anyone who would be happy about someone loosing
>>>>> information by saving a board file with an older version of KiCad.  We
>>>>> could always warn users when saving with a version of kicad that is
>>>>> older than the file format but even that may cause unexpected loss of
>>>>> data.
>>>>>
>>>>> >
>>>>> > There is also a middle way here, and that is to actually implement a
>>>>> > Protobuf to S-Expression decoder/encoder, with the real benefit of
>>>>> > actually defining fields in a modern well-known way, where the
>>>>> > specification and implementetation does not have to manually be
>>>>> synced
>>>>> > in code, comments, and a google doc. I have yet to see anything
>>>>> actually
>>>>> > stay synchronized in such a manner over time, and many bugs manifest
>>>>> > themself in these synchronization attempts. Anyway to avoid having to
>>>>> > change the file-format another time, or add extra files to the side,
>>>>> I
>>>>> > think that using an IDF is great next-step, mostly since the tooling,
>>>>> > libraries and workflows for these are better defined.
>>>>>
>>>>> To me self documenting means that the file format doesn't even require
>>>>> a
>>>>> document to explain it's contents.  It should be self evident from the
>>>>> contents of the file.  If it isn't, you've done something wrong.  The
>>>>> only reason I published the file format is so I can get everyone's
>>>>> input
>>>>> to make sure we have everything we need for the new features we plan to
>>>>> implement during v6.  I expect over time that this document will not be
>>>>> kept up to date even though it probably should be.
>>>>>
>>>>> Writing an s-expr encoder and decoder is not likely to be a trivial
>>>>> task
>>>>> so finding someone who has the time to implement it for an IDF is
>>>>> probably low.
>>>>>
>>>>> >
>>>>> > But to be honest, I have a hard time understanding why we have to
>>>>> stick
>>>>> > to the KiCad S-Expression, when there are quite readable text-formats
>>>>> > that are widely supported already.
>>>>> >
>>>>> > I know the requirement for the file format is readability, but I have
>>>>> > yet to find and editor that actually understands the KiCad
>>>>> S-Expression
>>>>> > (I have not searched extensively), but JSON,XML,YAML are usually read
>>>>> > just fine, with syntax highlighting out of box. And an IDF would make
>>>>> > these discussions quite reduntant, since changing file formats would
>>>>> be
>>>>> > a minimal change in code, and not as now, where it is actually quite
>>>>> > time-consuming.
>>>>>
>>>>> I wouldn't be opposed to JSON although I still think that it is more
>>>>> verbose than necessary.  XML was rejected by the project along time ago
>>>>> and I've seen nothing to change my mind about that.  I am not familiar
>>>>> with YAML.
>>>>>
>>>>> I doubt using an IDF will make these discussions redundant because
>>>>> there
>>>>> will always be disagreements about file formatting irregardless of how
>>>>> the information is defined internally.  Changing file formats would be
>>>>> a
>>>>> benefit but why would we need to do that?  If we have a human readable
>>>>> file format that can be parsed easily and quickly by a computer, what
>>>>> other criteria do we need in a file format?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Wayne
>>>>>
>>>>> >
>>>>> > - Kristoffer
>>>>> >
>>>>> > On 2019-01-02 01:37, Andrew Lutsenko wrote:
>>>>> >> Hi Wayne,
>>>>> >>
>>>>> >> I would like to take this opportunity to do an elevator pitch for
>>>>> idea
>>>>> >> of using one of IDL languages widely accepted in the industry like
>>>>> >> Apache Thrift or Google Protobufs to define formats in KiCad.
>>>>> >> There are few large benefits in favor of using such languages:
>>>>> >>
>>>>> >> 1. They are self documenting. No more keeping a google doc in sync
>>>>> >> with sources.
>>>>> >> 2. They are easily extensible. Just add a field, old parsers will
>>>>> >> ignore it, new ones will pick it up. Need to deprecate a field? Just
>>>>> >> add it's ID to reserved list to never reuse it again.
>>>>> >> 3. They have code generators for pretty much all commonly used
>>>>> >> languages. That means anyone can pick KiCad file and just parse it
>>>>> in
>>>>> >> Java/Go/Haskell or whatever language they fancy without porting over
>>>>> >> s-expressions library or meticulously studying the file format doc.
>>>>> >> This opens lot's of possibilities for third party tools to be added
>>>>> to
>>>>> >> KiCad ecosystem. Writing a web viewer for schematic/pcb would be a
>>>>> >> piece of cake for example.
>>>>> >>
>>>>> >> Other probably less impactful benefits:
>>>>> >> 4. Easy to serialize/encode in multiple formats. Need to send data
>>>>> >> over network in compact form? No problem, just serialize using
>>>>> compact
>>>>> >> binary protocol. Need to store in text file? just use text encoder.
>>>>> >> 5. Code generators will reduce amount of boilerplate in KiCad
>>>>> source.
>>>>> >> Only actual application logic needs to be added on top of generated
>>>>> >> data objects.
>>>>> >>
>>>>> >> I have personally worked extensively with both Thrift and
>>>>> Protobufs, I
>>>>> >> think for KICad use case proto is better fit. Thrift has a lot more
>>>>> >> library support for client/server RPC logic and defining RPCs is
>>>>> core
>>>>> >> part of the language but we don't need any of that (at least for
>>>>> now).
>>>>> >> Proto has all of that as extensions but it's core is just definition
>>>>> >> of data types and it has better support for plain text format.
>>>>> >> Here are docs for both:
>>>>> >> https://developers.google.com/protocol-buffers/
>>>>> >> https://thrift.apache.org/tutorial/
>>>>> >>
>>>>> >> Let me know if any of that sounds interesting and if you have any
>>>>> >> questions. I think this is worth investing time into and I'm willing
>>>>> >> to help if needed.
>>>>> >>
>>>>> >> Regards,
>>>>> >> Andrew
>>>>> >>
>>>>> >> On Tue, Jan 1, 2019 at 11:59 AM Wayne Stambaugh <
>>>>> stambaughw@xxxxxxxxx
>>>>> >> <mailto:stambaughw@xxxxxxxxx>> wrote:
>>>>> >>
>>>>> >>     I have updated and published the symbol file format[1] for
>>>>> comment.
>>>>> >>     Hopefully there isn't too much to change.  The only thing to
>>>>> really
>>>>> >>     finalize is the internal units.  The initial concept was
>>>>> unitless but
>>>>> >>     the more I think about it and discuss with other developers, it
>>>>> makes
>>>>> >>     more sense to use units for the following reasons:
>>>>> >>
>>>>> >>     1. It's easier to visualize in your head how the symbols on a
>>>>> >>     given page
>>>>> >>     size will layout.
>>>>> >>
>>>>> >>     2. Converting from other file formats (Eagle, Altium, etc) will
>>>>> be
>>>>> >>     easier since most if not all of them have a defined unit.
>>>>> >>
>>>>> >>     I'm thinking 10u (or possibly 100u) will make a good internal
>>>>> units
>>>>> >>     value.  Once we nail down the units, I will update the file
>>>>> format
>>>>> >>     document accordingly.
>>>>> >>
>>>>> >>     Please keep in mind that this is the symbol library file format
>>>>> >>     document
>>>>> >>     so things like constraints belong in the schematic file
>>>>> format.  I
>>>>> >>     will
>>>>> >>     be posting the schematic file format as soon as I finish
>>>>> updating it.
>>>>> >>
>>>>> >>     Cheers,
>>>>> >>
>>>>> >>     Wayne
>>>>> >>
>>>>> >>     [1]:
>>>>> >>
>>>>> >>
>>>>> https://docs.google.com/document/d/1lyL_8FWZRouMkwqLiIt84rd2Htg4v1vz8_2MzRKHRkc/edit
>>>>> >>
>>>>>
>>>>>
>>>> Although I cannot speak for the main developers at all, I remembered
>>>> when S-expressions were introduced (it seems only yesterday) and it was the
>>>> start of a big leap forward for the project (not all related to the
>>>> format). I also remember the huge amount of work out in by Dick, Wayne and
>>>> others (sorry if I did not mention anyone specifically, too lazy to look it
>>>> up).
>>>>
>>>> I perfectly understand that changing file-format is not an easy
>>>> decision. However, proto-bufs do look really clean, but I have no clue on
>>>> the amount of effort it would take to implement. I have never worked with
>>>> it before (will remember it for my next project, though).
>>>>
>>>> Andrew, how easy would it be? Would it be feasible to support both in
>>>> V6 and let the future tell us which one would prevail?
>>>>
>>>> Happy NewYear to all,
>>>> Martijn
>>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~kicad-developers
>>> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~kicad-developers
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
References

[RFC] Symbol library file format
From: Wayne Stambaugh, 2019-01-01
Re: [RFC] Symbol library file format
From: Andrew Lutsenko, 2019-01-02
Re: [RFC] Symbol library file format
From: kristoffer Ödmark, 2019-01-02
Re: [RFC] Symbol library file format
From: Wayne Stambaugh, 2019-01-02
Re: [RFC] Symbol library file format
From: Andrew Lutsenko, 2019-01-03
Re: [RFC] Symbol library file format
From: Martijn Kuipers, 2019-01-03
Re: [RFC] Symbol library file format
From: Andrew Lutsenko, 2019-01-03
Re: [RFC] Symbol library file format
From: José Ignacio, 2019-01-03
Re: [RFC] Symbol library file format
From: John Beard, 2019-01-03