fenics team mailing list archive

Thread
Date
Re: Generation of docstring module

To: johan.hake@xxxxxxxxx
From: Kristian Ølgaard <k.b.oelgaard@xxxxxxxxx>
Date: Tue, 7 Sep 2010 00:59:47 +0200
Cc: fenics@xxxxxxxxxxxxxxxxxxx
In-reply-to: <201009061314.41059.johan.hake@gmail.com>
On 6 September 2010 22:14, Johan Hake <johan.hake@xxxxxxxxx> wrote:
> On Monday September 6 2010 08:56:13 Kristian Ølgaard wrote:
>> On 6 September 2010 17:24, Johan Hake <johan.hake@xxxxxxxxx> wrote:
>> > On Monday September 6 2010 08:13:44 Anders Logg wrote:
>> >> On Mon, Sep 06, 2010 at 08:08:10AM -0700, Johan Hake wrote:
>> >> > On Monday September 6 2010 05:47:27 Anders Logg wrote:
>> >> > > On Mon, Sep 06, 2010 at 12:19:03PM +0200, Kristian Ølgaard wrote:
>> >> > > > > Do we have any functionality in place for handling documentation
>> >> > > > > that should be automatically generated from the C++ interface
>> >> > > > > and documentation that needs to be added later?
>> >> > > >
>> >> > > > No, not really.
>> >> > >
>> >> > > ok.
>> >> > >
>> >> > > > > I assume that the documentation we write in the C++ header files
>> >> > > > > (like Mesh.h) will be the same that appears in Python using
>> >> > > > > help(Mesh)?
>> >> > > >
>> >> > > > Yes and no, the problem is that for instance overloaded methods
>> >> > > > will only show the last docstring.
>> >> > > > So, the Mesh.__init__.__doc__ will just contain the Mesh(std::str
>> >> > > > file_name) docstring.
>> >> > >
>> >> > > It would not be difficult to make the documentation extraction
>> >> > > script we have (in fenics-doc) generate the docstrings module and
>> >> > > just concatenate all constructor documentation. We are already
>> >> > > doing the parsing so spitting out class Foo: """ etc would be easy.
>> >> > > Perhaps that is an option.
>> >> >
>> >> > There might be other overloaded methods too. We might try to setle on
>> >> > a format for these methods, or make this part of the 1% we need to
>> >> > handle our self.
>> >>
>> >> ok. Should also be fairly easy to handle.
>
> If we choose to use Doxygen to extract the documentation, we need to figure
> out how to do this when we parse the XML information.
>
> The argument information is already handled by SWIG using:
>
>  %feature("autodoc", "1");
>
> This will generate signatures like:
>
>        """
>        __init__(self) -> Mesh
>        __init__(self, Mesh mesh) -> Mesh
>        __init__(self, string filename) -> Mesh
>
>        Create mesh from data file.
>        """
>
> The first part is ok, I think. But the comment relates to the latest parsed
> comment. We might deside to try to use the information in the xml file to
> generate all of this information. By this we can put in the relevant comments
> on the correct places like:
>
>        """
>        __init__(self) -> Mesh
>
>        Create an empty mesh.
>
>        __init__(self, Mesh mesh) -> Mesh
>
>        Copy mesh.
>
>        __init__(self, string filename) -> Mesh
>
>        Create mesh from data file.
>        """

I'd say that we should strive to make the docstring look like the one
for Mesh in docstrings/dolfin/cpp.py.
The argument information should contain links like :py:class:`Foo` if
functions/classes are used.
We can do this by storing the function docstrings, check for
overloaded functions and concatenate them if needed.

>> >> > > > > But in some special cases, we may want to go in and handle
>> >> > > > > documentation for special cases where the Python documentation
>> >> > > > > needs to be different from the C++ documentation. So there
>> >> > > > > should be two different sources for the documentation: one that
>> >> > > > > is generated automatically from the C++ header files, and one
>> >> > > > > that overwrites or adds documentation for special cases. Is
>> >> > > > > that the plan?
>> >> > > >
>> >> > > > The plan is currently to write the docstrings by hand for the
>> >> > > > entire dolfin module. One of the reasons is that we
>> >> > > > rename/ignores functions/classes in the *.i files, and if we we
>> >> > > > try to automate the docstring generation I think we should make
>> >> > > > it fully automatic not just part of it.
>> >> > >
>> >> > > If we can make it 99% automatic and have an extra file with special
>> >> > > cases I think that would be ok.
>> >> >
>> >> > Agree.
>>
>> Yes, but we'll need some automated testing to make sure that the 1%
>> does not go out of sync with the code.
>> Most likely the 1% can't be handled because it is relatively important
>> (definitions in *.i files etc.).
>
> Do we need to parse the different SWIG interface files? This sounds
> cumbersome. Couldn't we just automatically generate documentation from the
> parsed C++ header files. These are then used to generate the docstrings.i,
> which is used to put in documentation in cpp.py.
>
>  * Ignored methods will be ignored, even if a
>
>      %feature("docstring") somemethod
>      "some docstring"
>
>    is generated for that method.

Yes, but what about renamed functions/classes??? This is what really bugs me.

>  * Extended methods needs to be handled in one of three ways:
>    1) Write the docstring directly into the foo_post.i file
>    2) Write a hook to an external docstring module for these methods

The beauty of extended methods is that we can assign to their docs
dynamically on import (unless we add a class, do we?), I do this
already. So no need to handle this case. All our problems arise from
what Swig does at the end of a class:

Mesh.num_vertices = new_instancemethod(_cpp.Mesh_num_vertices,None,Mesh)

which makes it impossible to assign to __doc__ --> we need to tell
Swig what docstrings do use.

> If we choose 2) we might have something like:
>
>  %extend GenericMatrix {
>     %pythoncode%{
>     def data(self):
>         docstringmodule.cppextended.GenericMatrix.data.__doc__
>     %}
>  }
>
> in the foo_post.i files.
>
> We would then have two parts in the external docstringmodule:
>
>  1) for the extended Python layer (VariationalForm aso)

Since we're already aiming at generating the docstrings module
(dolfin/site-packages/dolfin/docstrings), maybe we should just extract
the docs from the extended Python layer in dolfin/site-packages/dolfin
and dump them in the docstrings? Then programmer's writing the Python
layer just need to document while they're coding, where they are
coding just like they do (or should anyways) for the C++ part.

>  2) for the extended Python layer in the cpp.py
>
> For the rest, and this will be the main part, we rely on parsed docstrings
> from the headers.
>
> The python programmers reference will then be generated based on the actuall
> dolfin module using sphinx and autodoc.

We could/should probably use either the dolfin module or the generated
docstring module to generate the relevant reST files. Although we
might need to run some cross-checks with the Doxygen xml to get the
correct file names where the classes are defined in DOLFIN such that
we retain the original DOLFIN source tree structure. Otherwise all our
documentation will end up in cpp.rst which I would hate to navigate
through as a user.
I vote for using the generated docstrings module for the documentation
since it should contain all classes even if some HAS_* was not
switched on, which brings me to the last question, how do we handle
the case where some ifdefs result in classes not being generated in
cpp.py? They should still be documented of course.

Another issue we need to handle is any example code in the C++ docs
which must be translated into Python syntax. Either automatically, or
by some looking up in a dictionary, but that brings us right back to
something < 100% automatic.

>> >> > > > Also, we will need to change the syntax in all *example* code of
>> >> > > > the docstrings. Maybe it can be done, but I'll need to give it
>> >> > > > some more careful thought. We've already changed the approach a
>> >> > > > few times now, so I really like the next try to close to our
>> >> > > > final implementation.
>> >> > >
>> >> > > I agree. :-)
>> >> > >
>> >> > > > > Another thing to discuss is the possibility of using Doxygen to
>> >> > > > > extract the documentation. We currently have our own script
>> >> > > > > since (I assume) Doxygen does not have a C++ --> reST
>> >> > > > > converter. Is that correct?
>> >> > > >
>> >> > > > I don't think Doxygen has any such converter, but there exist a
>> >> > > > project http://github.com/michaeljones/breathe
>> >> > > > which makes it possible to use xml output from Doxygen in much the
>> >> > > > same way as we use autodoc for the Python module. I had a quick go
>> >> > > > at it but didn't like the result. No links on the index pages to
>> >> > > > function etc. So what we do now is better, but perhaps it would
>> >> > > > be a good idea to use Doxygen to extract the docstrings for all
>> >> > > > classes and functions, I tried parsing the xml output in the
>> >> > > > test/verify_cpp_
>> >> > > > ocumentation.py script and it should be relatively
>> >> > > > simple to get the docstrings since these are stored as attributes
>> >> > > > of classes/functions.
>> >> > >
>> >> > > Perhaps an idea would be to use Doxygen for parsing and then have
>> >> > > our own script that works with the XML output from Doxygen?
>> >> >
>> >> > I did not know we allready used Doxygen to extract information about
>> >> > class structure from the headers.
>> >>
>> >> I thought it was you who implemented the Doxygen documentation
>> >> extraction?
>> >
>> > Duh... I mean that I did not know we used it in fenics_doc, in
>> > verify_cpp_documentation.py.
>>
>> We don't. I wrote this script to be able to test the documentation in
>> *.rst files against dolfin.
>> Basically, I parse all files and keep track of the classes/functions
>> which are defined in dolfin and try to match those up against the
>> definitions in the documentation (and vise versa) to catch
>> missing/obsolete documentation.
>
> Ok, so you do not use this to extact documentation?

No, but I already visit all the class/function nodes so it should just
be a question of requesting the docstring attribute.

>> >> > What are the differences between using the XML from Doxygen to also
>> >> > extract the documentation, and the approach we use today?
>> >>
>> >> Pros (of using Doxygen):
>> >>
>> >>   - Doxygen is developed by people that presumably are very good at
>> >>     extracting docs from C++ code
>> >>
>> >>   - Doxygen might handle some corner cases we can't handle?
>>
>> Definitely, and we don't have to maintain it.
>>
>> >> Cons (of using Doxygen):
>> >>
>> >>   - Another dependency
>> >
>> > Which we already have.
>> >
>> >>   - We still need to write a script to parse the XML
>> >
>> > We should be able to ust the xml parser in docstringgenerator.py.
>> >
>> >>   - The parsing of /// stuff from C++ code is very simple
>> >
>> > Yes, and this might be just fine. But if it grows we might consider using
>> > Doxygen.
>>
>> But some cases are not handled correctly already (nested classes etc.)
>> so I vote for Doxygen.
>
> Ok.
>
> My itch with Doxygen is its specific syntax which make the header files look
> ugly. But I guess we write all our comments in reST, and then just use Doxygen
> to extract these. Should this information be used to generate both the
> docstring.i file and the c++ programmers reference?

Yes, no Doxygen markup crap here. The docstrings defined by '///'
should already be in reST and
we currently use this for the C++ programmer's reference.

>
> I found a potential show breaker with Doxygen. It does not preserve line
> breakes...We need to put all the comments in the header files, which we will
> need to preserve, into a \verbatim \endverbatime block. This can probably be
> done where we need them. A good example is the table command in Mesh.h

Bummer. The line breaks are essential, not just for the table command
but for all the definition lists too i.e., the entire reST
documentation (both C++ and Python). No line breaks would also make
any output from calls like:

>>> help(dolfin.Mesh)

useless. There's no line break markup we can actually learn to live
with? But of course we would have to remove this later such that it
doesn't look completely stupid in the Python documentation.

Kristian

>> > Would it be possible to setle on a format of the extracted documentation
>> > which we use as input to generate reST documentation. It would make it
>> > easier to do a switch to Doxygen XML whenever we figure this is needed,
>> > ie we just switch the backend of the documentation parser.
>>
>> This will probably be a good idea, even if we start with Doxygen since
>> the xml output might change in format so we can easily adapt.
>
> Ok
>
> Johan
>
>> Kristian
>>
>> > Johan
>> >
>> >> --
>> >> Anders
>
Follow ups

Re: Generation of docstring module
From: Johan Hake, 2010-09-07
References

Generation of docstring module
From: Johan Hake, 2010-09-02
Re: Generation of docstring module
From: Johan Hake, 2010-09-06
Re: Generation of docstring module
From: Kristian Ølgaard, 2010-09-06
Re: Generation of docstring module
From: Johan Hake, 2010-09-06