← Back to team overview

fenics team mailing list archive

Re: Generation of docstring module

 

On 7 September 2010 07:09, Johan Hake <johan.hake@xxxxxxxxx> wrote:
> On Monday September 6 2010 15:59:47 Kristian Ølgaard wrote:
>> On 6 September 2010 22:14, Johan Hake <johan.hake@xxxxxxxxx> wrote:
>> > On Monday September 6 2010 08:56:13 Kristian Ølgaard wrote:
>> >> On 6 September 2010 17:24, Johan Hake <johan.hake@xxxxxxxxx> wrote:
>> >> > On Monday September 6 2010 08:13:44 Anders Logg wrote:
>> >> >> On Mon, Sep 06, 2010 at 08:08:10AM -0700, Johan Hake wrote:
>> >> >> > On Monday September 6 2010 05:47:27 Anders Logg wrote:
>> >> >> > > On Mon, Sep 06, 2010 at 12:19:03PM +0200, Kristian Ølgaard wrote:
>> >> >> > > > > Do we have any functionality in place for handling
>> >> >> > > > > documentation that should be automatically generated from
>> >> >> > > > > the C++ interface and documentation that needs to be added
>> >> >> > > > > later?
>> >> >> > > >
>> >> >> > > > No, not really.
>> >> >> > >
>> >> >> > > ok.
>> >> >> > >
>> >> >> > > > > I assume that the documentation we write in the C++ header
>> >> >> > > > > files (like Mesh.h) will be the same that appears in Python
>> >> >> > > > > using help(Mesh)?
>> >> >> > > >
>> >> >> > > > Yes and no, the problem is that for instance overloaded methods
>> >> >> > > > will only show the last docstring.
>> >> >> > > > So, the Mesh.__init__.__doc__ will just contain the
>> >> >> > > > Mesh(std::str file_name) docstring.
>> >> >> > >
>> >> >> > > It would not be difficult to make the documentation extraction
>> >> >> > > script we have (in fenics-doc) generate the docstrings module and
>> >> >> > > just concatenate all constructor documentation. We are already
>> >> >> > > doing the parsing so spitting out class Foo: """ etc would be
>> >> >> > > easy. Perhaps that is an option.
>> >> >> >
>> >> >> > There might be other overloaded methods too. We might try to setle
>> >> >> > on a format for these methods, or make this part of the 1% we need
>> >> >> > to handle our self.
>> >> >>
>> >> >> ok. Should also be fairly easy to handle.
>> >
>> > If we choose to use Doxygen to extract the documentation, we need to
>> > figure out how to do this when we parse the XML information.
>> >
>> > The argument information is already handled by SWIG using:
>> >
>> >  %feature("autodoc", "1");
>> >
>> > This will generate signatures like:
>> >
>> >        """
>> >        __init__(self) -> Mesh
>> >        __init__(self, Mesh mesh) -> Mesh
>> >        __init__(self, string filename) -> Mesh
>> >
>> >        Create mesh from data file.
>> >        """
>> >
>> > The first part is ok, I think. But the comment relates to the latest
>> > parsed comment. We might deside to try to use the information in the xml
>> > file to generate all of this information. By this we can put in the
>> > relevant comments on the correct places like:
>> >
>> >        """
>> >        __init__(self) -> Mesh
>> >
>> >        Create an empty mesh.
>> >
>> >        __init__(self, Mesh mesh) -> Mesh
>> >
>> >        Copy mesh.
>> >
>> >        __init__(self, string filename) -> Mesh
>> >
>> >        Create mesh from data file.
>> >        """
>>
>> I'd say that we should strive to make the docstring look like the one
>> for Mesh in docstrings/dolfin/cpp.py.
>
> It looks nice.

Yes, I know. :)

>> The argument information should contain links like :py:class:`Foo` if
>> functions/classes are used.
>> We can do this by storing the function docstrings, check for
>> overloaded functions and concatenate them if needed.
>
> But how do we extract the different arguments? I suppose this is collected by
> Doxygen, and we just need to parse these and ouput them in a correct way?

I don't think we need to parse the arguments and output them. We just
get the function name and if we have more than one set of arguments
i.e., a different signature we know that we have an overloaded method
and how to handle it.
The arguments should be described in the *Arguments* section of the
individual docstring with links to classes formatted like
_MeshEntity_, which we will substitute with :py:class:`MeshEntity` or
:cpp:class:`MeshEntity` depending on which interface we document.
Although I just realised that standard C++ stuff like double* which
end up as numpy.array etc. should probably be handled.
On a related note:
int some_func()
and
const int some_func() const
are different in C++, but in Python we don't have const right?
This will simplify the documentation a lot.

>> >> >> > > > > But in some special cases, we may want to go in and handle
>> >> >> > > > > documentation for special cases where the Python
>> >> >> > > > > documentation needs to be different from the C++
>> >> >> > > > > documentation. So there should be two different sources for
>> >> >> > > > > the documentation: one that is generated automatically from
>> >> >> > > > > the C++ header files, and one that overwrites or adds
>> >> >> > > > > documentation for special cases. Is that the plan?
>> >> >> > > >
>> >> >> > > > The plan is currently to write the docstrings by hand for the
>> >> >> > > > entire dolfin module. One of the reasons is that we
>> >> >> > > > rename/ignores functions/classes in the *.i files, and if we we
>> >> >> > > > try to automate the docstring generation I think we should make
>> >> >> > > > it fully automatic not just part of it.
>> >> >> > >
>> >> >> > > If we can make it 99% automatic and have an extra file with
>> >> >> > > special cases I think that would be ok.
>> >> >> >
>> >> >> > Agree.
>> >>
>> >> Yes, but we'll need some automated testing to make sure that the 1%
>> >> does not go out of sync with the code.
>> >> Most likely the 1% can't be handled because it is relatively important
>> >> (definitions in *.i files etc.).
>> >
>> > Do we need to parse the different SWIG interface files? This sounds
>> > cumbersome. Couldn't we just automatically generate documentation from
>> > the parsed C++ header files. These are then used to generate the
>> > docstrings.i, which is used to put in documentation in cpp.py.
>> >
>> >  * Ignored methods will be ignored, even if a
>> >
>> >      %feature("docstring") somemethod
>> >      "some docstring"
>> >
>> >    is generated for that method.
>>
>> Yes, but what about renamed functions/classes??? This is what really bugs
>> me.
>
> This is handled by SWIG. If somemethod is a correct C++ DOLFIN type which is
> renamed, then will "some docstring" be the docstring of the renamed method.

Cool, I wasn't aware of that. Maybe it won't be impossible after all.

>> >  * Extended methods needs to be handled in one of three ways:
>> >    1) Write the docstring directly into the foo_post.i file

I like this option, if this is where we have the code for a function,
then this is where the docstring should be as it increases the
probability of the docstring being up to date.

>> >    2) Write a hook to an external docstring module for these methods
>>
>> The beauty of extended methods is that we can assign to their docs
>> dynamically on import (unless we add a class, do we?),
>
> Do not think so.
>
>> I do this already.
>
> Ok
>
>> So no need to handle this case. All our problems arise from
>> what Swig does at the end of a class:
>>
>> Mesh.num_vertices = new_instancemethod(_cpp.Mesh_num_vertices,None,Mesh)
>>
>> which makes it impossible to assign to __doc__ --> we need to tell
>> Swig what docstrings do use.
>
> Why do we need to assign to these methods? They allready get their docstrings
> from the docstrings.i file. However if we want to get rid of the
> new_instancemethod assignment above, we can just remove the

Some history.
Initially, we wanted to have all docstrings separated from the DOLFIN
code and collected in the fenics-doc module. The easiest way to get
the >>> help(dolfin) docstring correct is to assign to __doc__
dynamically.
If we could do this we wouldn't even need the docstrings.i file and
things would be simple.
However, we discovered that this was not possible, and because of that
we still need to generate the docstrings.i file.
Then, still assuming we wanted to separate docs from code and keeping
docstrings in fenics-doc, I thought it would be easier to generate the
docstrings.i file from the handwritten docstrings module in
fenics-doc.
Some methods don't get their docstrings from the docstrings.i file
though, so we still need to assign to __doc__ which is the easiest
thing to do.
Just recently we decided to extract the docstrings from the C++
implementation thus moving the docs back into DOLFIN. This makes the
docstrings module almost superfluous with the only practical usage is
to have documentation for the extended methods defined in the _post.i
files but if we put the docstrings directly in the _post.i files we no
longer need it.

>  -fastproxy
>
> option from the SWIG call.

OK, good to know, but let's not do that.

>> > If we choose 2) we might have something like:
>> >
>> >  %extend GenericMatrix {
>> >     %pythoncode%{
>> >     def data(self):
>> >         docstringmodule.cppextended.GenericMatrix.data.__doc__
>> >     %}
>> >  }
>> >
>> > in the foo_post.i files.
>> >
>> > We would then have two parts in the external docstringmodule:
>> >
>> >  1) for the extended Python layer (VariationalForm aso)
>>
>> Since we're already aiming at generating the docstrings module
>> (dolfin/site-packages/dolfin/docstrings), maybe we should just extract
>> the docs from the extended Python layer in dolfin/site-packages/dolfin
>> and dump them in the docstrings?
>
> I am confused. Do you suggest that we just document the extended Python layer
> directly in the pyton module as it is today? Why should we then dumpt the
> docstrings in a seperate docstring module? So autodoc can have something to
> shew on? Couldn't autodoc just shew on the dolfin module directly?

I'm confused too. :) I guess my head has not been properly reset
between the changes in documentation strategies.
The Sphinx autodoc can only handle one dolfin module, so we need to
either import the 'real' one or the docstrings dolfin module.
If we can completely remove the need for the docstrings module, then
we should of course include the 'real' one.

>> Then programmer's writing the Python
>> layer just need to document while they're coding, where they are
>> coding just like they do (or should anyways) for the C++ part.
>
> Still confused why we need a certain docstring module.

Maybe we don't.

>> >  2) for the extended Python layer in the cpp.py
>> >
>> > For the rest, and this will be the main part, we rely on parsed
>> > docstrings from the headers.
>> >
>> > The python programmers reference will then be generated based on the
>> > actuall dolfin module using sphinx and autodoc.
>>
>> We could/should probably use either the dolfin module or the generated
>> docstring module to generate the relevant reST files. Although we
>> might need to run some cross-checks with the Doxygen xml to get the
>> correct file names where the classes are defined in DOLFIN such that
>> we retain the original DOLFIN source tree structure. Otherwise all our
>> documentation will end up in cpp.rst which I would hate to navigate
>> through as a user.
>
> This one got to technical for me. Do you say that there is no way to split the
> documentation into smaller parts without relying on the c++ module/file
> strucutre?

But how would you split it? It makes sense to keep the classes Mesh
and MeshEntity in the mesh/ part of the documentation. Unfortunately,
Swig doesn't add info to the classes in the cpp.py module about where
they were originally defined. This is why we need to pair it with info
from the xml output.

>> I vote for using the generated docstrings module for the documentation
>> since it should contain all classes even if some HAS_* was not
>> switched on, which brings me to the last question, how do we handle
>> the case where some ifdefs result in classes not being generated in
>> cpp.py? They should still be documented of course.
>
> I think we are fine if the server that generate the documentation has all
> optional packages, so the online documentation is fully up to date.

Maybe, but I think I saw somewhere that depending on the ifdefs some
names would be different and we need the documentation to be complete
regardless of the users installation.

>> Another issue we need to handle is any example code in the C++ docs
>> which must be translated into Python syntax. Either automatically, or
>> by some looking up in a dictionary, but that brings us right back to
>> something < 100% automatic.
>
> Would it be possible to have just pointers to demos instead of example code. I
> know it is common to have example code in Python docstrings but I do not think
> it is equally common to have this in C++ header files.

Since when did we care about common in FEniCS? :) I think small
input/output examples are good even for C++, look at the Mesh class
for instance.

>> >> >> > > > Also, we will need to change the syntax in all *example* code
>> >> >> > > > of the docstrings. Maybe it can be done, but I'll need to give
>> >> >> > > > it some more careful thought. We've already changed the
>> >> >> > > > approach a few times now, so I really like the next try to
>> >> >> > > > close to our final implementation.
>> >> >> > >
>> >> >> > > I agree. :-)
>> >> >> > >
>> >> >> > > > > Another thing to discuss is the possibility of using Doxygen
>> >> >> > > > > to extract the documentation. We currently have our own
>> >> >> > > > > script since (I assume) Doxygen does not have a C++ --> reST
>> >> >> > > > > converter. Is that correct?
>> >> >> > > >
>> >> >> > > > I don't think Doxygen has any such converter, but there exist a
>> >> >> > > > project http://github.com/michaeljones/breathe
>> >> >> > > > which makes it possible to use xml output from Doxygen in much
>> >> >> > > > the same way as we use autodoc for the Python module. I had a
>> >> >> > > > quick go at it but didn't like the result. No links on the
>> >> >> > > > index pages to function etc. So what we do now is better, but
>> >> >> > > > perhaps it would be a good idea to use Doxygen to extract the
>> >> >> > > > docstrings for all classes and functions, I tried parsing the
>> >> >> > > > xml output in the test/verify_cpp_
>> >> >> > > > ocumentation.py script and it should be relatively
>> >> >> > > > simple to get the docstrings since these are stored as
>> >> >> > > > attributes of classes/functions.
>> >> >> > >
>> >> >> > > Perhaps an idea would be to use Doxygen for parsing and then have
>> >> >> > > our own script that works with the XML output from Doxygen?
>> >> >> >
>> >> >> > I did not know we allready used Doxygen to extract information
>> >> >> > about class structure from the headers.
>> >> >>
>> >> >> I thought it was you who implemented the Doxygen documentation
>> >> >> extraction?
>> >> >
>> >> > Duh... I mean that I did not know we used it in fenics_doc, in
>> >> > verify_cpp_documentation.py.
>> >>
>> >> We don't. I wrote this script to be able to test the documentation in
>> >> *.rst files against dolfin.
>> >> Basically, I parse all files and keep track of the classes/functions
>> >> which are defined in dolfin and try to match those up against the
>> >> definitions in the documentation (and vise versa) to catch
>> >> missing/obsolete documentation.
>> >
>> > Ok, so you do not use this to extact documentation?
>>
>> No, but I already visit all the class/function nodes so it should just
>> be a question of requesting the docstring attribute.
>
> Ok.
>
>> >> >> > What are the differences between using the XML from Doxygen to also
>> >> >> > extract the documentation, and the approach we use today?
>> >> >>
>> >> >> Pros (of using Doxygen):
>> >> >>
>> >> >>   - Doxygen is developed by people that presumably are very good at
>> >> >>     extracting docs from C++ code
>> >> >>
>> >> >>   - Doxygen might handle some corner cases we can't handle?
>> >>
>> >> Definitely, and we don't have to maintain it.
>> >>
>> >> >> Cons (of using Doxygen):
>> >> >>
>> >> >>   - Another dependency
>> >> >
>> >> > Which we already have.
>> >> >
>> >> >>   - We still need to write a script to parse the XML
>> >> >
>> >> > We should be able to ust the xml parser in docstringgenerator.py.
>> >> >
>> >> >>   - The parsing of /// stuff from C++ code is very simple
>> >> >
>> >> > Yes, and this might be just fine. But if it grows we might consider
>> >> > using Doxygen.
>> >>
>> >> But some cases are not handled correctly already (nested classes etc.)
>> >> so I vote for Doxygen.
>> >
>> > Ok.
>> >
>> > My itch with Doxygen is its specific syntax which make the header files
>> > look ugly. But I guess we write all our comments in reST, and then just
>> > use Doxygen to extract these. Should this information be used to
>> > generate both the docstring.i file and the c++ programmers reference?
>>
>> Yes, no Doxygen markup crap here. The docstrings defined by '///'
>> should already be in reST and
>> we currently use this for the C++ programmer's reference.
>
> Good!
>
>> > I found a potential show breaker with Doxygen. It does not preserve line
>> > breakes...We need to put all the comments in the header files, which we
>> > will need to preserve, into a \verbatim \endverbatime block. This can
>> > probably be done where we need them. A good example is the table command
>> > in Mesh.h
>>
>> Bummer. The line breaks are essential, not just for the table command
>> but for all the definition lists too i.e., the entire reST
>> documentation (both C++ and Python). No line breaks would also make
>>
>> any output from calls like:
>> >>> help(dolfin.Mesh)
>>
>> useless. There's no line break markup we can actually learn to live
>> with?
>
> I think the \verbatim \endverbatim is the least intrusive in the header files.
> This would then look like:
>
>    <detaileddescription><para><verbatim>
>      A _Mesh_ consists of a set of connected and numbered mesh entities.
>      ....
>
>      .. tabularcolumns:: |c|c|c|
>
>      +--------+-----------+-------------+
>      | Entity | Dimension | Codimension |
>      +========+===========+=============+
>      | Vertex |  0        |             |
>      +--------+-----------+-------------+
>      | Edge   |  1        |             |
>      +--------+-----------+-------------+
>      ....
>
>      </verbatim> </para> </detaileddescription>
>
> After some fidling with the parser (which used textwrap to beautify the text
> NOT!) I managed to get out this:
>
> %feature("docstring") dolfin::Mesh "
>
>
>      A _Mesh_ consists of a set of connected and numbered mesh entities.
>      ....
>
>      .. tabularcolumns:: |c|c|c|
>
>      +--------+-----------+-------------+
>      | Entity | Dimension | Codimension |
>      +========+===========+=============+
>      | Vertex |  0        |             |
>      +--------+-----------+-------------+
>      | Edge   |  1        |             |
>      +--------+-----------+-------------+
>      ....
>
>
> ";
>
> Which should serve us well.

OK, looks reasonable. So it might be possible to do this after all.

Kristian

>> But of course we would have to remove this later such that it
>> doesn't look completely stupid in the Python documentation.
>
> Shouldn't be nessesary!
>
> Johan
>
>> Kristian
>>
>> >> > Would it be possible to setle on a format of the extracted
>> >> > documentation which we use as input to generate reST documentation.
>> >> > It would make it easier to do a switch to Doxygen XML whenever we
>> >> > figure this is needed, ie we just switch the backend of the
>> >> > documentation parser.
>> >>
>> >> This will probably be a good idea, even if we start with Doxygen since
>> >> the xml output might change in format so we can easily adapt.
>> >
>> > Ok
>> >
>> > Johan
>> >
>> >> Kristian
>> >>
>> >> > Johan
>> >> >
>> >> >> --
>> >> >> Anders
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~fenics
>> Post to     : fenics@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fenics
>> More help   : https://help.launchpad.net/ListHelp
>



Follow ups

References