kicad-developers team mailing list archive

Thread
Date
Re: [RFC] Symbol library file format

To: Andrew Lutsenko <anlutsenko@xxxxxxxxx>
From: Martijn Kuipers <martijn.kuipers@xxxxxxxxx>
Date: Thu, 3 Jan 2019 07:53:39 +0000
Cc: KiCad developers <kicad-developers@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <CADn3vW1dN-UZ6fCyPguygFKX6uCLm2LK+aytBiU8Hg_K01U49w@mail.gmail.com>

> On 3 Jan 2019, at 04:17, Andrew Lutsenko <anlutsenko@xxxxxxxxx> wrote:
> 
> Wayne,
> 
> > There are some interesting and practical concepts with protobuf but it's
> functionally a binary storage method which I am opposed to. 
> 
> That is a somewhat common misconception because protobufs are frequently used for efficient storage/transfer in binary format. But it's not tied to that format at all, at it's core protobufs are just a way of defining well structured data, nothing more. It comes with bells and whistles like ability to serialize it in various ways, binary being one of them.
> 
> > Encoding and decoding to a text format would be an acceptable solution. 
> Perfect, here is example of built in proto text encoder, it resembles JSON in that it uses curly braces to encase submessages but doesn't abuse punctuation marks unnecessarily.
> 
> user_collection {
>   description = "my default users"
>   users {
>     key: "user_1234"
>     value {
>       handle: "winniepoo"
>       paid_membership: true
>     }
>   }
>   users {
>     key: "user_9b27"
>     value {
>       handle: "smokeybear"
>     }
>   }
> }
> > There is also the issue of learning curve and another build dependency.
> Yes, that is inevitable but benefits I outlined in my earlier email are too significant to overlook in my opinion.
> 
> > S-expr is not at all like XML at least not in terms of readability.
> It actually is a lot like XML, just less pointy brackets. Same arbitrary distinctions between attributes (which are not even named in S-expr, which adds to confusion when glancing at the file) and subfields.
> 
> Here is real example:
>     (fp_text value 330K (at 0 -1.75) (layer B.Fab)
>       (effects (font (size 1 1) (thickness 0.15)) (justify mirror))
>     )
> or
>     (pad 2 smd rect (at 0.95 0 180) (size 0.7 1.3) (layers B.Cu B.Paste B.Mask)
>       (net 95 "Net-(R17-Pad2)"))
> 
> Without reading file format docs and/or source code I have no idea why some data is in subfields and some data is in some special fixed order on the same level as it's container.
> It's also very easy to confuse "330K" being related to "value" when in fact "value" is field name and "330K" is the value. In proto that would look like this:
> 
> fp_text {
>   name: "value"
>   value: "330K"
>   at {
>     x: 0
>     y: -1.75
>   }
>   layer: "B.Fab"
>   effects {
>     ...
>   }
> }
> 
> But I do agree on the point that all markup formats have their downsides. For example json/yaml or proto text format would be less dense per line, however that may be an upside when looked at it from version control and diff-ing perspective.
> 
> > To me self documenting means that the file format doesn't even require a
> > document to explain it's contents.  It should be self evident from the
> > contents of the file. 
> What I meant by self documenting was that the document describing the format and document implementing the format (actual source code) was the same.
> But even as evident from example above, proto text format yields file contents that are pretty well self documented too.
> 
> > Changing file formats would be a
> > benefit but why would we need to do that?  If we have a human readable
> > file format that can be parsed easily and quickly by a computer, what
> > other criteria do we need in a file format?
> 
> See benefit #3 from my first message. By using something standard we make it so much easier to expand on KiCad ecosystem in other languages. I already made an example as a web viewer, here is another: someone may write a plugin for java autorouter software that will read kicad files directly. S-expressions was likely a good choice when it was made but today it's far from widespread and is pretty much unsupported in most languages so the burden is wholly on the developers. Argument can be made that while S-expressions are both human and computer readable it excels at neither since humans still need supporting documentation or source and computers need custom written libraries.
> 
> Regards,
> Andrew
> 
> and Happy New Year :)
> 
> On Wed, Jan 2, 2019 at 8:01 AM Wayne Stambaugh <stambaughw@xxxxxxxxx <mailto:stambaughw@xxxxxxxxx>> wrote:
> On 1/2/2019 5:24 AM, kristoffer Ödmark wrote:
> > I like the idea of using something as Protobuf and I agree fully with
> > the benefits, especially since one can add/remove fields with minimal
> > impact.
> 
> There are some interesting and practical concepts with protobuf but it's
> functionally a binary storage method which I am opposed to.  Encoding
> and decoding to a text format would be an acceptable solution.  There is
> also the issue of learning curve and another build dependency.
> 
> > 
> > Basically the S-expression system used now is looking very much like a
> > reinvented XML to me anyway, and storing protobuf-defined stuff as XML
> > or similar seems actually nice.
> 
> S-expr is not at all like XML at least not in terms of readability.
> Obviously there are an infinite number of ways to store information.  I
> do find it amusing and somewhat telling that there are so many markdown
> formats available these days.  I think the jury has spoken on the
> readability of markup formats.
> 
> > There is one catch, and that is that we have to support opening a newer
> > file, in an old software, and then store it again, without losing data
> > that the software is not aware of. Or implement a way of not being able
> > to store values in older software, when they open something newer.
> 
> This is the reason that we have not implemented this in our own file
> formats.  I don't see anyone who would be happy about someone loosing
> information by saving a board file with an older version of KiCad.  We
> could always warn users when saving with a version of kicad that is
> older than the file format but even that may cause unexpected loss of data.
> 
> > 
> > There is also a middle way here, and that is to actually implement a
> > Protobuf to S-Expression decoder/encoder, with the real benefit of
> > actually defining fields in a modern well-known way, where the
> > specification and implementetation does not have to manually be synced
> > in code, comments, and a google doc. I have yet to see anything actually
> > stay synchronized in such a manner over time, and many bugs manifest
> > themself in these synchronization attempts. Anyway to avoid having to
> > change the file-format another time, or add extra files to the side, I
> > think that using an IDF is great next-step, mostly since the tooling,
> > libraries and workflows for these are better defined.
> 
> To me self documenting means that the file format doesn't even require a
> document to explain it's contents.  It should be self evident from the
> contents of the file.  If it isn't, you've done something wrong.  The
> only reason I published the file format is so I can get everyone's input
> to make sure we have everything we need for the new features we plan to
> implement during v6.  I expect over time that this document will not be
> kept up to date even though it probably should be.
> 
> Writing an s-expr encoder and decoder is not likely to be a trivial task
> so finding someone who has the time to implement it for an IDF is
> probably low.
> 
> > 
> > But to be honest, I have a hard time understanding why we have to stick
> > to the KiCad S-Expression, when there are quite readable text-formats
> > that are widely supported already.
> > 
> > I know the requirement for the file format is readability, but I have
> > yet to find and editor that actually understands the KiCad S-Expression
> > (I have not searched extensively), but JSON,XML,YAML are usually read
> > just fine, with syntax highlighting out of box. And an IDF would make
> > these discussions quite reduntant, since changing file formats would be
> > a minimal change in code, and not as now, where it is actually quite
> > time-consuming.
> 
> I wouldn't be opposed to JSON although I still think that it is more
> verbose than necessary.  XML was rejected by the project along time ago
> and I've seen nothing to change my mind about that.  I am not familiar
> with YAML.
> 
> I doubt using an IDF will make these discussions redundant because there
> will always be disagreements about file formatting irregardless of how
> the information is defined internally.  Changing file formats would be a
> benefit but why would we need to do that?  If we have a human readable
> file format that can be parsed easily and quickly by a computer, what
> other criteria do we need in a file format?
> 
> Cheers,
> 
> Wayne
> 
> > 
> > - Kristoffer
> > 
> > On 2019-01-02 01:37, Andrew Lutsenko wrote:
> >> Hi Wayne,
> >>
> >> I would like to take this opportunity to do an elevator pitch for idea
> >> of using one of IDL languages widely accepted in the industry like
> >> Apache Thrift or Google Protobufs to define formats in KiCad.
> >> There are few large benefits in favor of using such languages:
> >>
> >> 1. They are self documenting. No more keeping a google doc in sync
> >> with sources.
> >> 2. They are easily extensible. Just add a field, old parsers will
> >> ignore it, new ones will pick it up. Need to deprecate a field? Just
> >> add it's ID to reserved list to never reuse it again.
> >> 3. They have code generators for pretty much all commonly used
> >> languages. That means anyone can pick KiCad file and just parse it in
> >> Java/Go/Haskell or whatever language they fancy without porting over
> >> s-expressions library or meticulously studying the file format doc.
> >> This opens lot's of possibilities for third party tools to be added to
> >> KiCad ecosystem. Writing a web viewer for schematic/pcb would be a
> >> piece of cake for example.
> >>
> >> Other probably less impactful benefits:
> >> 4. Easy to serialize/encode in multiple formats. Need to send data
> >> over network in compact form? No problem, just serialize using compact
> >> binary protocol. Need to store in text file? just use text encoder.
> >> 5. Code generators will reduce amount of boilerplate in KiCad source.
> >> Only actual application logic needs to be added on top of generated
> >> data objects.
> >>
> >> I have personally worked extensively with both Thrift and Protobufs, I
> >> think for KICad use case proto is better fit. Thrift has a lot more
> >> library support for client/server RPC logic and defining RPCs is core
> >> part of the language but we don't need any of that (at least for now).
> >> Proto has all of that as extensions but it's core is just definition
> >> of data types and it has better support for plain text format.
> >> Here are docs for both:
> >> https://developers.google.com/protocol-buffers/ <https://developers.google.com/protocol-buffers/>
> >> https://thrift.apache.org/tutorial/ <https://thrift.apache.org/tutorial/>
> >>
> >> Let me know if any of that sounds interesting and if you have any
> >> questions. I think this is worth investing time into and I'm willing
> >> to help if needed.
> >>
> >> Regards,
> >> Andrew
> >>
> >> On Tue, Jan 1, 2019 at 11:59 AM Wayne Stambaugh <stambaughw@xxxxxxxxx <mailto:stambaughw@xxxxxxxxx>
> >> <mailto:stambaughw@xxxxxxxxx <mailto:stambaughw@xxxxxxxxx>>> wrote:
> >>
> >>     I have updated and published the symbol file format[1] for comment.
> >>     Hopefully there isn't too much to change.  The only thing to really
> >>     finalize is the internal units.  The initial concept was unitless but
> >>     the more I think about it and discuss with other developers, it makes
> >>     more sense to use units for the following reasons:
> >>
> >>     1. It's easier to visualize in your head how the symbols on a
> >>     given page
> >>     size will layout.
> >>
> >>     2. Converting from other file formats (Eagle, Altium, etc) will be
> >>     easier since most if not all of them have a defined unit.
> >>
> >>     I'm thinking 10u (or possibly 100u) will make a good internal units
> >>     value.  Once we nail down the units, I will update the file format
> >>     document accordingly.
> >>
> >>     Please keep in mind that this is the symbol library file format
> >>     document
> >>     so things like constraints belong in the schematic file format.  I
> >>     will
> >>     be posting the schematic file format as soon as I finish updating it.
> >>
> >>     Cheers,
> >>
> >>     Wayne
> >>
> >>     [1]:
> >>    
> >> https://docs.google.com/document/d/1lyL_8FWZRouMkwqLiIt84rd2Htg4v1vz8_2MzRKHRkc/edit <https://docs.google.com/document/d/1lyL_8FWZRouMkwqLiIt84rd2Htg4v1vz8_2MzRKHRkc/edit>
> >>
> 

Although I cannot speak for the main developers at all, I remembered when S-expressions were introduced (it seems only yesterday) and it was the start of a big leap forward for the project (not all related to the format). I also remember the huge amount of work out in by Dick, Wayne and others (sorry if I did not mention anyone specifically, too lazy to look it up).

I perfectly understand that changing file-format is not an easy decision. However, proto-bufs do look really clean, but I have no clue on the amount of effort it would take to implement. I have never worked with it before (will remember it for my next project, though). 

Andrew, how easy would it be? Would it be feasible to support both in V6 and let the future tell us which one would prevail?

Happy NewYear to all,
Martijn
Follow ups

Re: [RFC] Symbol library file format
From: Andrew Lutsenko, 2019-01-03
References

[RFC] Symbol library file format
From: Wayne Stambaugh, 2019-01-01
Re: [RFC] Symbol library file format
From: Andrew Lutsenko, 2019-01-02
Re: [RFC] Symbol library file format
From: kristoffer Ödmark, 2019-01-02
Re: [RFC] Symbol library file format
From: Wayne Stambaugh, 2019-01-02
Re: [RFC] Symbol library file format
From: Andrew Lutsenko, 2019-01-03