fenics team mailing list archive

Thread
Date
Re: Generation of docstring module

To: Kristian Ølgaard <k.b.oelgaard@xxxxxxxxx>
From: Johan Hake <johan.hake@xxxxxxxxx>
Date: Mon, 6 Sep 2010 13:14:40 -0700
Cc: fenics@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTimpmF-Cn3jh=TrJaUWuCdXHky3y-4tLGfZK8hno@mail.gmail.com>
Reply-to: johan.hake@xxxxxxxxx
User-agent: KMail/1.13.5 (Linux/2.6.32-25-generic; KDE/4.5.0; x86_64; ; )
On Monday September 6 2010 08:56:13 Kristian Ølgaard wrote:
> On 6 September 2010 17:24, Johan Hake <johan.hake@xxxxxxxxx> wrote:
> > On Monday September 6 2010 08:13:44 Anders Logg wrote:
> >> On Mon, Sep 06, 2010 at 08:08:10AM -0700, Johan Hake wrote:
> >> > On Monday September 6 2010 05:47:27 Anders Logg wrote:
> >> > > On Mon, Sep 06, 2010 at 12:19:03PM +0200, Kristian Ølgaard wrote:
> >> > > > > Do we have any functionality in place for handling documentation
> >> > > > > that should be automatically generated from the C++ interface
> >> > > > > and documentation that needs to be added later?
> >> > > > 
> >> > > > No, not really.
> >> > > 
> >> > > ok.
> >> > > 
> >> > > > > I assume that the documentation we write in the C++ header files
> >> > > > > (like Mesh.h) will be the same that appears in Python using
> >> > > > > help(Mesh)?
> >> > > > 
> >> > > > Yes and no, the problem is that for instance overloaded methods
> >> > > > will only show the last docstring.
> >> > > > So, the Mesh.__init__.__doc__ will just contain the Mesh(std::str
> >> > > > file_name) docstring.
> >> > > 
> >> > > It would not be difficult to make the documentation extraction
> >> > > script we have (in fenics-doc) generate the docstrings module and
> >> > > just concatenate all constructor documentation. We are already
> >> > > doing the parsing so spitting out class Foo: """ etc would be easy.
> >> > > Perhaps that is an option.
> >> > 
> >> > There might be other overloaded methods too. We might try to setle on
> >> > a format for these methods, or make this part of the 1% we need to
> >> > handle our self.
> >> 
> >> ok. Should also be fairly easy to handle.

If we choose to use Doxygen to extract the documentation, we need to figure 
out how to do this when we parse the XML information. 

The argument information is already handled by SWIG using:

  %feature("autodoc", "1");

This will generate signatures like:
  
        """
        __init__(self) -> Mesh
        __init__(self, Mesh mesh) -> Mesh
        __init__(self, string filename) -> Mesh

        Create mesh from data file. 
        """

The first part is ok, I think. But the comment relates to the latest parsed 
comment. We might deside to try to use the information in the xml file to 
generate all of this information. By this we can put in the relevant comments 
on the correct places like:

        """
        __init__(self) -> Mesh

        Create an empty mesh.

        __init__(self, Mesh mesh) -> Mesh

        Copy mesh.

        __init__(self, string filename) -> Mesh

        Create mesh from data file. 
        """

> >> > > > > But in some special cases, we may want to go in and handle
> >> > > > > documentation for special cases where the Python documentation
> >> > > > > needs to be different from the C++ documentation. So there
> >> > > > > should be two different sources for the documentation: one that
> >> > > > > is generated automatically from the C++ header files, and one
> >> > > > > that overwrites or adds documentation for special cases. Is
> >> > > > > that the plan?
> >> > > > 
> >> > > > The plan is currently to write the docstrings by hand for the
> >> > > > entire dolfin module. One of the reasons is that we
> >> > > > rename/ignores functions/classes in the *.i files, and if we we
> >> > > > try to automate the docstring generation I think we should make
> >> > > > it fully automatic not just part of it.
> >> > > 
> >> > > If we can make it 99% automatic and have an extra file with special
> >> > > cases I think that would be ok.
> >> > 
> >> > Agree.
> 
> Yes, but we'll need some automated testing to make sure that the 1%
> does not go out of sync with the code.
> Most likely the 1% can't be handled because it is relatively important
> (definitions in *.i files etc.).

Do we need to parse the different SWIG interface files? This sounds 
cumbersome. Couldn't we just automatically generate documentation from the 
parsed C++ header files. These are then used to generate the docstrings.i, 
which is used to put in documentation in cpp.py.

  * Ignored methods will be ignored, even if a 

      %feature("docstring") somemethod 
      "some docstring" 

    is generated for that method.

  * Extended methods needs to be handled in one of three ways:
    1) Write the docstring directly into the foo_post.i file
    2) Write a hook to an external docstring module for these methods

If we choose 2) we might have something like:

  %extend GenericMatrix {
     %pythoncode%{
     def data(self):
         docstringmodule.cppextended.GenericMatrix.data.__doc__
     %}
  }

in the foo_post.i files.

We would then have two parts in the external docstringmodule:

  1) for the extended Python layer (VariationalForm aso)
  2) for the extended Python layer in the cpp.py

For the rest, and this will be the main part, we rely on parsed docstrings 
from the headers.

The python programmers reference will then be generated based on the actuall 
dolfin module using sphinx and autodoc.

> >> > > > Also, we will need to change the syntax in all *example* code of
> >> > > > the docstrings. Maybe it can be done, but I'll need to give it
> >> > > > some more careful thought. We've already changed the approach a
> >> > > > few times now, so I really like the next try to close to our
> >> > > > final implementation.
> >> > > 
> >> > > I agree. :-)
> >> > > 
> >> > > > > Another thing to discuss is the possibility of using Doxygen to
> >> > > > > extract the documentation. We currently have our own script
> >> > > > > since (I assume) Doxygen does not have a C++ --> reST
> >> > > > > converter. Is that correct?
> >> > > > 
> >> > > > I don't think Doxygen has any such converter, but there exist a
> >> > > > project http://github.com/michaeljones/breathe
> >> > > > which makes it possible to use xml output from Doxygen in much the
> >> > > > same way as we use autodoc for the Python module. I had a quick go
> >> > > > at it but didn't like the result. No links on the index pages to
> >> > > > function etc. So what we do now is better, but perhaps it would
> >> > > > be a good idea to use Doxygen to extract the docstrings for all
> >> > > > classes and functions, I tried parsing the xml output in the
> >> > > > test/verify_cpp_
> >> > > > ocumentation.py script and it should be relatively
> >> > > > simple to get the docstrings since these are stored as attributes
> >> > > > of classes/functions.
> >> > > 
> >> > > Perhaps an idea would be to use Doxygen for parsing and then have
> >> > > our own script that works with the XML output from Doxygen?
> >> > 
> >> > I did not know we allready used Doxygen to extract information about
> >> > class structure from the headers.
> >> 
> >> I thought it was you who implemented the Doxygen documentation
> >> extraction?
> > 
> > Duh... I mean that I did not know we used it in fenics_doc, in
> > verify_cpp_documentation.py.
> 
> We don't. I wrote this script to be able to test the documentation in
> *.rst files against dolfin.
> Basically, I parse all files and keep track of the classes/functions
> which are defined in dolfin and try to match those up against the
> definitions in the documentation (and vise versa) to catch
> missing/obsolete documentation.

Ok, so you do not use this to extact documentation?

> >> > What are the differences between using the XML from Doxygen to also
> >> > extract the documentation, and the approach we use today?
> >> 
> >> Pros (of using Doxygen):
> >> 
> >>   - Doxygen is developed by people that presumably are very good at
> >>     extracting docs from C++ code
> >> 
> >>   - Doxygen might handle some corner cases we can't handle?
> 
> Definitely, and we don't have to maintain it.
> 
> >> Cons (of using Doxygen):
> >> 
> >>   - Another dependency
> > 
> > Which we already have.
> > 
> >>   - We still need to write a script to parse the XML
> > 
> > We should be able to ust the xml parser in docstringgenerator.py.
> > 
> >>   - The parsing of /// stuff from C++ code is very simple
> > 
> > Yes, and this might be just fine. But if it grows we might consider using
> > Doxygen.
> 
> But some cases are not handled correctly already (nested classes etc.)
> so I vote for Doxygen.

Ok.

My itch with Doxygen is its specific syntax which make the header files look 
ugly. But I guess we write all our comments in reST, and then just use Doxygen 
to extract these. Should this information be used to generate both the 
docstring.i file and the c++ programmers reference?

I found a potential show breaker with Doxygen. It does not preserve line 
breakes... We need to put all the comments in the header files, which we will 
need to preserve, into a \verbatim \endverbatime block. This can probably be 
done where we need them. A good example is the table command in Mesh.h

> > Would it be possible to setle on a format of the extracted documentation
> > which we use as input to generate reST documentation. It would make it
> > easier to do a switch to Doxygen XML whenever we figure this is needed,
> > ie we just switch the backend of the documentation parser.
> 
> This will probably be a good idea, even if we start with Doxygen since
> the xml output might change in format so we can easily adapt.

Ok

Johan

> Kristian
> 
> > Johan
> > 
> >> --
> >> Anders
Follow ups

Re: Generation of docstring module
From: Kristian Ølgaard, 2010-09-06
References

Generation of docstring module
From: Johan Hake, 2010-09-02
Re: Generation of docstring module
From: Johan Hake, 2010-09-06
Re: Generation of docstring module
From: Kristian Ølgaard, 2010-09-06