← Back to team overview

fenics team mailing list archive

Re: Generation of docstring module

 

On Mon, Sep 06, 2010 at 10:09:09PM -0700, Johan Hake wrote:
> On Monday September 6 2010 15:59:47 Kristian Ølgaard wrote:
> > On 6 September 2010 22:14, Johan Hake <johan.hake@xxxxxxxxx> wrote:
> > > On Monday September 6 2010 08:56:13 Kristian Ølgaard wrote:
> > >> On 6 September 2010 17:24, Johan Hake <johan.hake@xxxxxxxxx> wrote:
> > >> > On Monday September 6 2010 08:13:44 Anders Logg wrote:
> > >> >> On Mon, Sep 06, 2010 at 08:08:10AM -0700, Johan Hake wrote:
> > >> >> > On Monday September 6 2010 05:47:27 Anders Logg wrote:
> > >> >> > > On Mon, Sep 06, 2010 at 12:19:03PM +0200, Kristian Ølgaard wrote:
> > >> >> > > > > Do we have any functionality in place for handling
> > >> >> > > > > documentation that should be automatically generated from
> > >> >> > > > > the C++ interface and documentation that needs to be added
> > >> >> > > > > later?
> > >> >> > > >
> > >> >> > > > No, not really.
> > >> >> > >
> > >> >> > > ok.
> > >> >> > >
> > >> >> > > > > I assume that the documentation we write in the C++ header
> > >> >> > > > > files (like Mesh.h) will be the same that appears in Python
> > >> >> > > > > using help(Mesh)?
> > >> >> > > >
> > >> >> > > > Yes and no, the problem is that for instance overloaded methods
> > >> >> > > > will only show the last docstring.
> > >> >> > > > So, the Mesh.__init__.__doc__ will just contain the
> > >> >> > > > Mesh(std::str file_name) docstring.
> > >> >> > >
> > >> >> > > It would not be difficult to make the documentation extraction
> > >> >> > > script we have (in fenics-doc) generate the docstrings module and
> > >> >> > > just concatenate all constructor documentation. We are already
> > >> >> > > doing the parsing so spitting out class Foo: """ etc would be
> > >> >> > > easy. Perhaps that is an option.
> > >> >> >
> > >> >> > There might be other overloaded methods too. We might try to setle
> > >> >> > on a format for these methods, or make this part of the 1% we need
> > >> >> > to handle our self.
> > >> >>
> > >> >> ok. Should also be fairly easy to handle.
> > >
> > > If we choose to use Doxygen to extract the documentation, we need to
> > > figure out how to do this when we parse the XML information.
> > >
> > > The argument information is already handled by SWIG using:
> > >
> > >  %feature("autodoc", "1");
> > >
> > > This will generate signatures like:
> > >
> > >        """
> > >        __init__(self) -> Mesh
> > >        __init__(self, Mesh mesh) -> Mesh
> > >        __init__(self, string filename) -> Mesh
> > >
> > >        Create mesh from data file.
> > >        """
> > >
> > > The first part is ok, I think. But the comment relates to the latest
> > > parsed comment. We might deside to try to use the information in the xml
> > > file to generate all of this information. By this we can put in the
> > > relevant comments on the correct places like:
> > >
> > >        """
> > >        __init__(self) -> Mesh
> > >
> > >        Create an empty mesh.
> > >
> > >        __init__(self, Mesh mesh) -> Mesh
> > >
> > >        Copy mesh.
> > >
> > >        __init__(self, string filename) -> Mesh
> > >
> > >        Create mesh from data file.
> > >        """
> >
> > I'd say that we should strive to make the docstring look like the one
> > for Mesh in docstrings/dolfin/cpp.py.
>
> It looks nice.
>
> > The argument information should contain links like :py:class:`Foo` if
> > functions/classes are used.
> > We can do this by storing the function docstrings, check for
> > overloaded functions and concatenate them if needed.
>
> But how do we extract the different arguments? I suppose this is collected by
> Doxygen, and we just need to parse these and ouput them in a correct way?
>
> > >> >> > > > > But in some special cases, we may want to go in and handle
> > >> >> > > > > documentation for special cases where the Python
> > >> >> > > > > documentation needs to be different from the C++
> > >> >> > > > > documentation. So there should be two different sources for
> > >> >> > > > > the documentation: one that is generated automatically from
> > >> >> > > > > the C++ header files, and one that overwrites or adds
> > >> >> > > > > documentation for special cases. Is that the plan?
> > >> >> > > >
> > >> >> > > > The plan is currently to write the docstrings by hand for the
> > >> >> > > > entire dolfin module. One of the reasons is that we
> > >> >> > > > rename/ignores functions/classes in the *.i files, and if we we
> > >> >> > > > try to automate the docstring generation I think we should make
> > >> >> > > > it fully automatic not just part of it.
> > >> >> > >
> > >> >> > > If we can make it 99% automatic and have an extra file with
> > >> >> > > special cases I think that would be ok.
> > >> >> >
> > >> >> > Agree.
> > >>
> > >> Yes, but we'll need some automated testing to make sure that the 1%
> > >> does not go out of sync with the code.
> > >> Most likely the 1% can't be handled because it is relatively important
> > >> (definitions in *.i files etc.).
> > >
> > > Do we need to parse the different SWIG interface files? This sounds
> > > cumbersome. Couldn't we just automatically generate documentation from
> > > the parsed C++ header files. These are then used to generate the
> > > docstrings.i, which is used to put in documentation in cpp.py.
> > >
> > >  * Ignored methods will be ignored, even if a
> > >
> > >      %feature("docstring") somemethod
> > >      "some docstring"
> > >
> > >    is generated for that method.
> >
> > Yes, but what about renamed functions/classes??? This is what really bugs
> > me.
>
> This is handled by SWIG. If somemethod is a correct C++ DOLFIN type which is
> renamed, then will "some docstring" be the docstring of the renamed method.
>
> > >  * Extended methods needs to be handled in one of three ways:
> > >    1) Write the docstring directly into the foo_post.i file
> > >    2) Write a hook to an external docstring module for these methods
> >
> > The beauty of extended methods is that we can assign to their docs
> > dynamically on import (unless we add a class, do we?),
>
> Do not think so.
>
> > I do this already.
>
> Ok
>
> > So no need to handle this case. All our problems arise from
> > what Swig does at the end of a class:
> >
> > Mesh.num_vertices = new_instancemethod(_cpp.Mesh_num_vertices,None,Mesh)
> >
> > which makes it impossible to assign to __doc__ --> we need to tell
> > Swig what docstrings do use.
>
> Why do we need to assign to these methods? They allready get their docstrings
> from the docstrings.i file. However if we want to get rid of the
> new_instancemethod assignment above, we can just remove the
>
>   -fastproxy
>
> option from the SWIG call.
>
> > > If we choose 2) we might have something like:
> > >
> > >  %extend GenericMatrix {
> > >     %pythoncode%{
> > >     def data(self):
> > >         docstringmodule.cppextended.GenericMatrix.data.__doc__
> > >     %}
> > >  }
> > >
> > > in the foo_post.i files.
> > >
> > > We would then have two parts in the external docstringmodule:
> > >
> > >  1) for the extended Python layer (VariationalForm aso)
> >
> > Since we're already aiming at generating the docstrings module
> > (dolfin/site-packages/dolfin/docstrings), maybe we should just extract
> > the docs from the extended Python layer in dolfin/site-packages/dolfin
> > and dump them in the docstrings?
>
> I am confused. Do you suggest that we just document the extended Python layer
> directly in the pyton module as it is today? Why should we then dumpt the
> docstrings in a seperate docstring module? So autodoc can have something to
> shew on? Couldn't autodoc just shew on the dolfin module directly?
>
> > Then programmer's writing the Python
> > layer just need to document while they're coding, where they are
> > coding just like they do (or should anyways) for the C++ part.
>
> Still confused why we need a certain docstring module.
>
> > >  2) for the extended Python layer in the cpp.py
> > >
> > > For the rest, and this will be the main part, we rely on parsed
> > > docstrings from the headers.
> > >
> > > The python programmers reference will then be generated based on the
> > > actuall dolfin module using sphinx and autodoc.
> >
> > We could/should probably use either the dolfin module or the generated
> > docstring module to generate the relevant reST files. Although we
> > might need to run some cross-checks with the Doxygen xml to get the
> > correct file names where the classes are defined in DOLFIN such that
> > we retain the original DOLFIN source tree structure. Otherwise all our
> > documentation will end up in cpp.rst which I would hate to navigate
> > through as a user.
>
> This one got to technical for me. Do you say that there is no way to split the
> documentation into smaller parts without relying on the c++ module/file
> strucutre?
>
> > I vote for using the generated docstrings module for the documentation
> > since it should contain all classes even if some HAS_* was not
> > switched on, which brings me to the last question, how do we handle
> > the case where some ifdefs result in classes not being generated in
> > cpp.py? They should still be documented of course.
>
> I think we are fine if the server that generate the documentation has all
> optional packages, so the online documentation is fully up to date.
>
> > Another issue we need to handle is any example code in the C++ docs
> > which must be translated into Python syntax. Either automatically, or
> > by some looking up in a dictionary, but that brings us right back to
> > something < 100% automatic.
>
> Would it be possible to have just pointers to demos instead of example code. I
> know it is common to have example code in Python docstrings but I do not think
> it is equally common to have this in C++ header files.
>
> > >> >> > > > Also, we will need to change the syntax in all *example* code
> > >> >> > > > of the docstrings. Maybe it can be done, but I'll need to give
> > >> >> > > > it some more careful thought. We've already changed the
> > >> >> > > > approach a few times now, so I really like the next try to
> > >> >> > > > close to our final implementation.
> > >> >> > >
> > >> >> > > I agree. :-)
> > >> >> > >
> > >> >> > > > > Another thing to discuss is the possibility of using Doxygen
> > >> >> > > > > to extract the documentation. We currently have our own
> > >> >> > > > > script since (I assume) Doxygen does not have a C++ --> reST
> > >> >> > > > > converter. Is that correct?
> > >> >> > > >
> > >> >> > > > I don't think Doxygen has any such converter, but there exist a
> > >> >> > > > project http://github.com/michaeljones/breathe
> > >> >> > > > which makes it possible to use xml output from Doxygen in much
> > >> >> > > > the same way as we use autodoc for the Python module. I had a
> > >> >> > > > quick go at it but didn't like the result. No links on the
> > >> >> > > > index pages to function etc. So what we do now is better, but
> > >> >> > > > perhaps it would be a good idea to use Doxygen to extract the
> > >> >> > > > docstrings for all classes and functions, I tried parsing the
> > >> >> > > > xml output in the test/verify_cpp_
> > >> >> > > > ocumentation.py script and it should be relatively
> > >> >> > > > simple to get the docstrings since these are stored as
> > >> >> > > > attributes of classes/functions.
> > >> >> > >
> > >> >> > > Perhaps an idea would be to use Doxygen for parsing and then have
> > >> >> > > our own script that works with the XML output from Doxygen?
> > >> >> >
> > >> >> > I did not know we allready used Doxygen to extract information
> > >> >> > about class structure from the headers.
> > >> >>
> > >> >> I thought it was you who implemented the Doxygen documentation
> > >> >> extraction?
> > >> >
> > >> > Duh... I mean that I did not know we used it in fenics_doc, in
> > >> > verify_cpp_documentation.py.
> > >>
> > >> We don't. I wrote this script to be able to test the documentation in
> > >> *.rst files against dolfin.
> > >> Basically, I parse all files and keep track of the classes/functions
> > >> which are defined in dolfin and try to match those up against the
> > >> definitions in the documentation (and vise versa) to catch
> > >> missing/obsolete documentation.
> > >
> > > Ok, so you do not use this to extact documentation?
> >
> > No, but I already visit all the class/function nodes so it should just
> > be a question of requesting the docstring attribute.
>
> Ok.
>
> > >> >> > What are the differences between using the XML from Doxygen to also
> > >> >> > extract the documentation, and the approach we use today?
> > >> >>
> > >> >> Pros (of using Doxygen):
> > >> >>
> > >> >>   - Doxygen is developed by people that presumably are very good at
> > >> >>     extracting docs from C++ code
> > >> >>
> > >> >>   - Doxygen might handle some corner cases we can't handle?
> > >>
> > >> Definitely, and we don't have to maintain it.
> > >>
> > >> >> Cons (of using Doxygen):
> > >> >>
> > >> >>   - Another dependency
> > >> >
> > >> > Which we already have.
> > >> >
> > >> >>   - We still need to write a script to parse the XML
> > >> >
> > >> > We should be able to ust the xml parser in docstringgenerator.py.
> > >> >
> > >> >>   - The parsing of /// stuff from C++ code is very simple
> > >> >
> > >> > Yes, and this might be just fine. But if it grows we might consider
> > >> > using Doxygen.
> > >>
> > >> But some cases are not handled correctly already (nested classes etc.)
> > >> so I vote for Doxygen.
> > >
> > > Ok.
> > >
> > > My itch with Doxygen is its specific syntax which make the header files
> > > look ugly. But I guess we write all our comments in reST, and then just
> > > use Doxygen to extract these. Should this information be used to
> > > generate both the docstring.i file and the c++ programmers reference?
> >
> > Yes, no Doxygen markup crap here. The docstrings defined by '///'
> > should already be in reST and
> > we currently use this for the C++ programmer's reference.
>
> Good!
>
> > > I found a potential show breaker with Doxygen. It does not preserve line
> > > breakes...We need to put all the comments in the header files, which we
> > > will need to preserve, into a \verbatim \endverbatime block. This can
> > > probably be done where we need them. A good example is the table command
> > > in Mesh.h
> >
> > Bummer. The line breaks are essential, not just for the table command
> > but for all the definition lists too i.e., the entire reST
> > documentation (both C++ and Python). No line breaks would also make
> >
> > any output from calls like:
> > >>> help(dolfin.Mesh)
> >
> > useless. There's no line break markup we can actually learn to live
> > with?
>
> I think the \verbatim \endverbatim is the least intrusive in the header files.
> This would then look like:
>
>     <detaileddescription><para><verbatim>
>       A _Mesh_ consists of a set of connected and numbered mesh entities.
>       ....
>
>       .. tabularcolumns:: |c|c|c|
>
>       +--------+-----------+-------------+
>       | Entity | Dimension | Codimension |
>       +========+===========+=============+
>       | Vertex |  0        |             |
>       +--------+-----------+-------------+
>       | Edge   |  1        |             |
>       +--------+-----------+-------------+
>       ....
>
>       </verbatim> </para> </detaileddescription>
>
> After some fidling with the parser (which used textwrap to beautify the text
> NOT!) I managed to get out this:

Are you saying we need stuff like

  <detaileddescription><para><verbatim>

in the header files?

That doesn't look very nice. In that case I would cast a clear vote
for not using Doxygen. It's a mess if we have to mix reST and some
other markup language. The beauty of reST is that it looks good and
non-intrusive.

Btw, we are already using a mix since generate_cpp_doc.py adds the
extra functionality for _Mesh_ which links "Mesh" to the documentation
page for that class.

--
Anders


> %feature("docstring") dolfin::Mesh "
>
>
>       A _Mesh_ consists of a set of connected and numbered mesh entities.
>       ....
>
>       .. tabularcolumns:: |c|c|c|
>
>       +--------+-----------+-------------+
>       | Entity | Dimension | Codimension |
>       +========+===========+=============+
>       | Vertex |  0        |             |
>       +--------+-----------+-------------+
>       | Edge   |  1        |             |
>       +--------+-----------+-------------+
>       ....
>
>
> ";
>
> Which should serve us well.
>
> > But of course we would have to remove this later such that it
> > doesn't look completely stupid in the Python documentation.
>
> Shouldn't be nessesary!
>
> Johan
>
> > Kristian
> >
> > >> > Would it be possible to setle on a format of the extracted
> > >> > documentation which we use as input to generate reST documentation.
> > >> > It would make it easier to do a switch to Doxygen XML whenever we
> > >> > figure this is needed, ie we just switch the backend of the
> > >> > documentation parser.
> > >>
> > >> This will probably be a good idea, even if we start with Doxygen since
> > >> the xml output might change in format so we can easily adapt.
> > >
> > > Ok
> > >
> > > Johan
> > >
> > >> Kristian
> > >>
> > >> > Johan
> > >> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~fenics
> > Post to     : fenics@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~fenics
> > More help   : https://help.launchpad.net/ListHelp
>
> _______________________________________________
> Mailing list: https://launchpad.net/~fenics
> Post to     : fenics@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~fenics
> More help   : https://help.launchpad.net/ListHelp

--
Anders



Follow ups

References