← Back to team overview

kicad-developers team mailing list archive

Re: [RFC] Symbol library file format

 

I agree. The important thing is what is in the file. If nothing else, S-exp is a concise way to express this concept during development. Exact format representation in disk is, right now, bikeshedding.

When we get to that stage, all that is required is that the format is VCS friendly and human readable. I suggest that we work out what will be represented in the v6 files, then work with a preliminary format at first while we implement the kicad data structure handling. 

Sexp is probably the simpler path as we have both the pseudo-sexp parser used in pcbnew, as well as a "real" one for STEP export (I can't check details now, I'm not at my computer).

On the subject of parsers, as opposed to formats, I strongly suggest not to use the pcbnew one, as it is highly bound up in the data rather than the syntax. The beauty of sexp is the abstract nature, which lends itself will to extensible formats. How it is implemented in pcbnew is quite contrary to that aim and naturally leads to fragility of the parser (for example unexpected, but syntactically valid, fields can be lethal).

Whatever the format, we should strive to separate the handling of the syntax of the file and the meaning of the data therein. By doing this, the format layer can be swapped and tweaked without reference to the data model, and the data model can be changed without touching a byte of the format parser/writers.

Cheers, 

John




On 3 January 2019 18:06:45 GMT, "José Ignacio" <jose.cyborg@xxxxxxxxx> wrote:
>I think all this babble about data representations to be pointless and
>counterproductive. the S expression parser is already implemented and
>it
>works fine, it is trivial to convert s-expressions to any other data
>representation you like, be it, xml, json or whatever comes up next
>week in
>NPM. The issue with the file format is really to come up with a good
>data
>model to represent the objects in kicad, and neither protobufs nor any
>of
>the other guys really does anything for us in that area, if anything
>that
>is the input that needs to be given to whatever parser generator, or
>manually generated parser process we choose to utilize. I think useful
>comments to the proposed format should see beyond the actual low level
>representation of the data and talk about the overall model being used
>to
>store it.
>
>On Thu, Jan 3, 2019 at 3:37 AM Andrew Lutsenko <anlutsenko@xxxxxxxxx>
>wrote:
>
>> Hi Martijn,
>> My guess is that most of the complexity is in switching to new data
>model
>> in eeschema, which is going to happen anyway. Since with protobufs
>you
>> don't concern yourself with parsing data the only tricky thing would
>be to
>> seamlessly integrate proto codegen into build process on all
>platforms.
>> Since eeschema data model is going to change without backward
>> compatibility, file compatibility is out of the window too so this is
>good
>> opportunity to change underlying format to something that is better
>in the
>> long run. If we decide to go this route than ideally pcbnew would
>switch to
>> proto at some point too, not necessarily in v6.
>>
>> > Would it be feasible to support both in V6 and let the future tell
>us
>> which one would prevail?
>> What do you mean by prevailing? It's clear at this point that
>> s-expressions are not a winner of global trends while I can guarantee
>that
>> sizable portion of the data in the internet (except probably porn,
>lol) is
>> exchanged in protobufs and thrift since all tech giants use them.
>> If you mean in terms of what KiCad users prefer I don't think vast
>> majority would care either way. Both formats are easy to edit
>directly and
>> protos are much better for devs because codegen takes all the routine
>out
>> of data manipulation.
>> While having both formats coexist is theoretically possible I think
>it
>> will only split the user base since people will pick one and stick to
>it. I
>> think this is the case where devs should be empowered to make the
>choice,
>> same as when switch to s-expressions happened.
>> Of course new version still needs to be able to read old files or at
>least
>> some conversion utility needs to be provided to migrate projects to
>v6 so
>> s-expressions code will not be going away for some time.
>>
>> Andrew
>>
>> On Wed, Jan 2, 2019 at 11:53 PM Martijn Kuipers
><martijn.kuipers@xxxxxxxxx>
>> wrote:
>>
>>>
>>>
>>> On 3 Jan 2019, at 04:17, Andrew Lutsenko <anlutsenko@xxxxxxxxx>
>wrote:
>>>
>>> Wayne,
>>>
>>> > There are some interesting and practical concepts with protobuf
>but it's
>>> functionally a binary storage method which I am opposed to.
>>>
>>> That is a somewhat common misconception because protobufs are
>frequently
>>> used for efficient storage/transfer in binary format. But it's not
>tied to
>>> that format at all, at it's core protobufs are just a way of
>defining well
>>> structured data, nothing more. It comes with bells and whistles like
>>> ability to serialize it in various ways, binary being one of them.
>>>
>>> > Encoding and decoding to a text format would be an acceptable
>solution.
>>> Perfect, here is example of built in proto text encoder, it
>resembles
>>> JSON in that it uses curly braces to encase submessages but doesn't
>abuse
>>> punctuation marks unnecessarily.
>>>
>>> user_collection {
>>>   description = "my default users"
>>>   users {
>>>     key: "user_1234"
>>>     value {
>>>       handle: "winniepoo"
>>>       paid_membership: true
>>>     }
>>>   }
>>>   users {
>>>     key: "user_9b27"
>>>     value {
>>>       handle: "smokeybear"
>>>     }
>>>   }}
>>>
>>> > There is also the issue of learning curve and another build
>dependency.
>>> Yes, that is inevitable but benefits I outlined in my earlier email
>are
>>> too significant to overlook in my opinion.
>>>
>>> > S-expr is not at all like XML at least not in terms of
>readability.
>>> It actually is a lot like XML, just less pointy brackets. Same
>arbitrary
>>> distinctions between attributes (which are not even named in S-expr,
>which
>>> adds to confusion when glancing at the file) and subfields.
>>>
>>> Here is real example:
>>>     (fp_text value 330K (at 0 -1.75) (layer B.Fab)
>>>       (effects (font (size 1 1) (thickness 0.15)) (justify mirror))
>>>     )
>>> or
>>>     (pad 2 smd rect (at 0.95 0 180) (size 0.7 1.3) (layers B.Cu
>B.Paste
>>> B.Mask)
>>>       (net 95 "Net-(R17-Pad2)"))
>>>
>>> Without reading file format docs and/or source code I have no idea
>why
>>> some data is in subfields and some data is in some special fixed
>order on
>>> the same level as it's container.
>>> It's also very easy to confuse "330K" being related to "value" when
>in
>>> fact "value" is field name and "330K" is the value. In proto that
>would
>>> look like this:
>>>
>>> fp_text {
>>>   name: "value"
>>>   value: "330K"
>>>   at {
>>>     x: 0
>>>     y: -1.75
>>>   }
>>>   layer: "B.Fab"
>>>   effects {
>>>     ...
>>>   }
>>> }
>>>
>>> But I do agree on the point that all markup formats have their
>downsides.
>>> For example json/yaml or proto text format would be less dense per
>line,
>>> however that may be an upside when looked at it from version control
>and
>>> diff-ing perspective.
>>>
>>> > To me self documenting means that the file format doesn't even
>require a
>>> > document to explain it's contents.  It should be self evident from
>the
>>> > contents of the file.
>>> What I meant by self documenting was that the document describing
>the
>>> format and document implementing the format (actual source code) was
>the
>>> same.
>>> But even as evident from example above, proto text format yields
>file
>>> contents that are pretty well self documented too.
>>>
>>> > Changing file formats would be a
>>> > benefit but why would we need to do that?  If we have a human
>readable
>>> > file format that can be parsed easily and quickly by a computer,
>what
>>> > other criteria do we need in a file format?
>>>
>>> See benefit #3 from my first message. By using something standard we
>make
>>> it so much easier to expand on KiCad ecosystem in other languages. I
>>> already made an example as a web viewer, here is another: someone
>may write
>>> a plugin for java autorouter software that will read kicad files
>directly.
>>> S-expressions was likely a good choice when it was made but today
>it's far
>>> from widespread and is pretty much unsupported in most languages so
>the
>>> burden is wholly on the developers. Argument can be made that while
>>> S-expressions are both human and computer readable it excels at
>neither
>>> since humans still need supporting documentation or source and
>computers
>>> need custom written libraries.
>>>
>>> Regards,
>>> Andrew
>>>
>>> and Happy New Year :)
>>>
>>> On Wed, Jan 2, 2019 at 8:01 AM Wayne Stambaugh
><stambaughw@xxxxxxxxx>
>>> wrote:
>>>
>>>> On 1/2/2019 5:24 AM, kristoffer Ödmark wrote:
>>>> > I like the idea of using something as Protobuf and I agree fully
>with
>>>> > the benefits, especially since one can add/remove fields with
>minimal
>>>> > impact.
>>>>
>>>> There are some interesting and practical concepts with protobuf but
>it's
>>>> functionally a binary storage method which I am opposed to. 
>Encoding
>>>> and decoding to a text format would be an acceptable solution. 
>There is
>>>> also the issue of learning curve and another build dependency.
>>>>
>>>> >
>>>> > Basically the S-expression system used now is looking very much
>like a
>>>> > reinvented XML to me anyway, and storing protobuf-defined stuff
>as XML
>>>> > or similar seems actually nice.
>>>>
>>>> S-expr is not at all like XML at least not in terms of readability.
>>>> Obviously there are an infinite number of ways to store
>information.  I
>>>> do find it amusing and somewhat telling that there are so many
>markdown
>>>> formats available these days.  I think the jury has spoken on the
>>>> readability of markup formats.
>>>>
>>>> > There is one catch, and that is that we have to support opening a
>newer
>>>> > file, in an old software, and then store it again, without losing
>data
>>>> > that the software is not aware of. Or implement a way of not
>being able
>>>> > to store values in older software, when they open something
>newer.
>>>>
>>>> This is the reason that we have not implemented this in our own
>file
>>>> formats.  I don't see anyone who would be happy about someone
>loosing
>>>> information by saving a board file with an older version of KiCad. 
>We
>>>> could always warn users when saving with a version of kicad that is
>>>> older than the file format but even that may cause unexpected loss
>of
>>>> data.
>>>>
>>>> >
>>>> > There is also a middle way here, and that is to actually
>implement a
>>>> > Protobuf to S-Expression decoder/encoder, with the real benefit
>of
>>>> > actually defining fields in a modern well-known way, where the
>>>> > specification and implementetation does not have to manually be
>synced
>>>> > in code, comments, and a google doc. I have yet to see anything
>>>> actually
>>>> > stay synchronized in such a manner over time, and many bugs
>manifest
>>>> > themself in these synchronization attempts. Anyway to avoid
>having to
>>>> > change the file-format another time, or add extra files to the
>side, I
>>>> > think that using an IDF is great next-step, mostly since the
>tooling,
>>>> > libraries and workflows for these are better defined.
>>>>
>>>> To me self documenting means that the file format doesn't even
>require a
>>>> document to explain it's contents.  It should be self evident from
>the
>>>> contents of the file.  If it isn't, you've done something wrong. 
>The
>>>> only reason I published the file format is so I can get everyone's
>input
>>>> to make sure we have everything we need for the new features we
>plan to
>>>> implement during v6.  I expect over time that this document will
>not be
>>>> kept up to date even though it probably should be.
>>>>
>>>> Writing an s-expr encoder and decoder is not likely to be a trivial
>task
>>>> so finding someone who has the time to implement it for an IDF is
>>>> probably low.
>>>>
>>>> >
>>>> > But to be honest, I have a hard time understanding why we have to
>stick
>>>> > to the KiCad S-Expression, when there are quite readable
>text-formats
>>>> > that are widely supported already.
>>>> >
>>>> > I know the requirement for the file format is readability, but I
>have
>>>> > yet to find and editor that actually understands the KiCad
>S-Expression
>>>> > (I have not searched extensively), but JSON,XML,YAML are usually
>read
>>>> > just fine, with syntax highlighting out of box. And an IDF would
>make
>>>> > these discussions quite reduntant, since changing file formats
>would be
>>>> > a minimal change in code, and not as now, where it is actually
>quite
>>>> > time-consuming.
>>>>
>>>> I wouldn't be opposed to JSON although I still think that it is
>more
>>>> verbose than necessary.  XML was rejected by the project along time
>ago
>>>> and I've seen nothing to change my mind about that.  I am not
>familiar
>>>> with YAML.
>>>>
>>>> I doubt using an IDF will make these discussions redundant because
>there
>>>> will always be disagreements about file formatting irregardless of
>how
>>>> the information is defined internally.  Changing file formats would
>be a
>>>> benefit but why would we need to do that?  If we have a human
>readable
>>>> file format that can be parsed easily and quickly by a computer,
>what
>>>> other criteria do we need in a file format?
>>>>
>>>> Cheers,
>>>>
>>>> Wayne
>>>>
>>>> >
>>>> > - Kristoffer
>>>> >
>>>> > On 2019-01-02 01:37, Andrew Lutsenko wrote:
>>>> >> Hi Wayne,
>>>> >>
>>>> >> I would like to take this opportunity to do an elevator pitch
>for idea
>>>> >> of using one of IDL languages widely accepted in the industry
>like
>>>> >> Apache Thrift or Google Protobufs to define formats in KiCad.
>>>> >> There are few large benefits in favor of using such languages:
>>>> >>
>>>> >> 1. They are self documenting. No more keeping a google doc in
>sync
>>>> >> with sources.
>>>> >> 2. They are easily extensible. Just add a field, old parsers
>will
>>>> >> ignore it, new ones will pick it up. Need to deprecate a field?
>Just
>>>> >> add it's ID to reserved list to never reuse it again.
>>>> >> 3. They have code generators for pretty much all commonly used
>>>> >> languages. That means anyone can pick KiCad file and just parse
>it in
>>>> >> Java/Go/Haskell or whatever language they fancy without porting
>over
>>>> >> s-expressions library or meticulously studying the file format
>doc.
>>>> >> This opens lot's of possibilities for third party tools to be
>added to
>>>> >> KiCad ecosystem. Writing a web viewer for schematic/pcb would be
>a
>>>> >> piece of cake for example.
>>>> >>
>>>> >> Other probably less impactful benefits:
>>>> >> 4. Easy to serialize/encode in multiple formats. Need to send
>data
>>>> >> over network in compact form? No problem, just serialize using
>compact
>>>> >> binary protocol. Need to store in text file? just use text
>encoder.
>>>> >> 5. Code generators will reduce amount of boilerplate in KiCad
>source.
>>>> >> Only actual application logic needs to be added on top of
>generated
>>>> >> data objects.
>>>> >>
>>>> >> I have personally worked extensively with both Thrift and
>Protobufs, I
>>>> >> think for KICad use case proto is better fit. Thrift has a lot
>more
>>>> >> library support for client/server RPC logic and defining RPCs is
>core
>>>> >> part of the language but we don't need any of that (at least for
>now).
>>>> >> Proto has all of that as extensions but it's core is just
>definition
>>>> >> of data types and it has better support for plain text format.
>>>> >> Here are docs for both:
>>>> >> https://developers.google.com/protocol-buffers/
>>>> >> https://thrift.apache.org/tutorial/
>>>> >>
>>>> >> Let me know if any of that sounds interesting and if you have
>any
>>>> >> questions. I think this is worth investing time into and I'm
>willing
>>>> >> to help if needed.
>>>> >>
>>>> >> Regards,
>>>> >> Andrew
>>>> >>
>>>> >> On Tue, Jan 1, 2019 at 11:59 AM Wayne Stambaugh
><stambaughw@xxxxxxxxx
>>>> >> <mailto:stambaughw@xxxxxxxxx>> wrote:
>>>> >>
>>>> >>     I have updated and published the symbol file format[1] for
>>>> comment.
>>>> >>     Hopefully there isn't too much to change.  The only thing to
>>>> really
>>>> >>     finalize is the internal units.  The initial concept was
>unitless
>>>> but
>>>> >>     the more I think about it and discuss with other developers,
>it
>>>> makes
>>>> >>     more sense to use units for the following reasons:
>>>> >>
>>>> >>     1. It's easier to visualize in your head how the symbols on
>a
>>>> >>     given page
>>>> >>     size will layout.
>>>> >>
>>>> >>     2. Converting from other file formats (Eagle, Altium, etc)
>will be
>>>> >>     easier since most if not all of them have a defined unit.
>>>> >>
>>>> >>     I'm thinking 10u (or possibly 100u) will make a good
>internal
>>>> units
>>>> >>     value.  Once we nail down the units, I will update the file
>format
>>>> >>     document accordingly.
>>>> >>
>>>> >>     Please keep in mind that this is the symbol library file
>format
>>>> >>     document
>>>> >>     so things like constraints belong in the schematic file
>format.  I
>>>> >>     will
>>>> >>     be posting the schematic file format as soon as I finish
>updating
>>>> it.
>>>> >>
>>>> >>     Cheers,
>>>> >>
>>>> >>     Wayne
>>>> >>
>>>> >>     [1]:
>>>> >>
>>>> >>
>>>>
>https://docs.google.com/document/d/1lyL_8FWZRouMkwqLiIt84rd2Htg4v1vz8_2MzRKHRkc/edit
>>>> >>
>>>>
>>>>
>>> Although I cannot speak for the main developers at all, I remembered
>when
>>> S-expressions were introduced (it seems only yesterday) and it was
>the
>>> start of a big leap forward for the project (not all related to the
>>> format). I also remember the huge amount of work out in by Dick,
>Wayne and
>>> others (sorry if I did not mention anyone specifically, too lazy to
>look it
>>> up).
>>>
>>> I perfectly understand that changing file-format is not an easy
>decision.
>>> However, proto-bufs do look really clean, but I have no clue on the
>amount
>>> of effort it would take to implement. I have never worked with it
>before
>>> (will remember it for my next project, though).
>>>
>>> Andrew, how easy would it be? Would it be feasible to support both
>in V6
>>> and let the future tell us which one would prevail?
>>>
>>> Happy NewYear to all,
>>> Martijn
>>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~kicad-developers
>> Post to     : kicad-developers@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~kicad-developers
>> More help   : https://help.launchpad.net/ListHelp
>>

Follow ups

References