← Back to team overview

fenics team mailing list archive

Re: Generation of docstring module

 

On Tuesday September 7 2010 08:19:31 Anders Logg wrote:
> On Tue, Sep 07, 2010 at 04:46:53PM +0200, Kristian Ølgaard wrote:
> > On 7 September 2010 16:13, Anders Logg <logg@xxxxxxxxx> wrote:
> > > On Tue, Sep 07, 2010 at 03:50:11PM +0200, Kristian Ølgaard wrote:
> > >> On 7 September 2010 15:02, Anders Logg <logg@xxxxxxxxx> wrote:
> > >> > On Tue, Sep 07, 2010 at 02:56:47PM +0200, Kristian Ølgaard wrote:
> > >> >> On 7 September 2010 12:37, Anders Logg <logg@xxxxxxxxx> wrote:
> > >> >> > On Tue, Sep 07, 2010 at 12:20:09PM +0200, Kristian Ølgaard wrote:
> > >> >> >> On 7 September 2010 11:04, Anders Logg <logg@xxxxxxxxx> wrote:
> > >> >> >> > On Mon, Sep 06, 2010 at 05:56:13PM +0200, Kristian Ølgaard 
wrote:
> > >> >> >> >> On 6 September 2010 17:24, Johan Hake <johan.hake@xxxxxxxxx> 
wrote:
> > >> >> >> >> > On Monday September 6 2010 08:13:44 Anders Logg wrote:
> > >> >> >> >> >> On Mon, Sep 06, 2010 at 08:08:10AM -0700, Johan Hake wrote:
> > >> >> >> >> >> > On Monday September 6 2010 05:47:27 Anders Logg wrote:
> > >> >> >> >> >> > > On Mon, Sep 06, 2010 at 12:19:03PM +0200, Kristian 
Ølgaard wrote:
> > >> >> >> >> >> > > > > Do we have any functionality in place for handling
> > >> >> >> >> >> > > > > documentation that should be automatically
> > >> >> >> >> >> > > > > generated from the C++ interface and
> > >> >> >> >> >> > > > > documentation that needs to be added later?
> > >> >> >> >> >> > > > 
> > >> >> >> >> >> > > > No, not really.
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > ok.
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > > > I assume that the documentation we write in the
> > >> >> >> >> >> > > > > C++ header files (like Mesh.h) will be the same
> > >> >> >> >> >> > > > > that appears in Python using help(Mesh)?
> > >> >> >> >> >> > > > 
> > >> >> >> >> >> > > > Yes and no, the problem is that for instance
> > >> >> >> >> >> > > > overloaded methods will only show the last
> > >> >> >> >> >> > > > docstring.
> > >> >> >> >> >> > > > So, the Mesh.__init__.__doc__ will just contain the
> > >> >> >> >> >> > > > Mesh(std::str file_name) docstring.
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > It would not be difficult to make the documentation
> > >> >> >> >> >> > > extraction script we have (in fenics-doc) generate
> > >> >> >> >> >> > > the docstrings module and just concatenate all
> > >> >> >> >> >> > > constructor documentation. We are already doing the
> > >> >> >> >> >> > > parsing so spitting out class Foo: """ etc would be
> > >> >> >> >> >> > > easy. Perhaps that is an option.
> > >> >> >> >> >> > 
> > >> >> >> >> >> > There might be other overloaded methods too. We might
> > >> >> >> >> >> > try to setle on a format for these methods, or make
> > >> >> >> >> >> > this part of the 1% we need to handle our self.
> > >> >> >> >> >> 
> > >> >> >> >> >> ok. Should also be fairly easy to handle.
> > >> >> >> >> > 
> > >> >> >> >> > Ok.
> > >> >> >> >> > 
> > >> >> >> >> >> > > > > But in some special cases, we may want to go in
> > >> >> >> >> >> > > > > and handle documentation for special cases where
> > >> >> >> >> >> > > > > the Python documentation needs to be different
> > >> >> >> >> >> > > > > from the C++ documentation. So there should be
> > >> >> >> >> >> > > > > two different sources for the documentation: one
> > >> >> >> >> >> > > > > that is generated automatically from the C++
> > >> >> >> >> >> > > > > header files, and one that overwrites or adds
> > >> >> >> >> >> > > > > documentation for special cases. Is that the
> > >> >> >> >> >> > > > > plan?
> > >> >> >> >> >> > > > 
> > >> >> >> >> >> > > > The plan is currently to write the docstrings by
> > >> >> >> >> >> > > > hand for the entire dolfin module. One of the
> > >> >> >> >> >> > > > reasons is that we rename/ignores functions/classes
> > >> >> >> >> >> > > > in the *.i files, and if we we try to automate the
> > >> >> >> >> >> > > > docstring generation I think we should make it
> > >> >> >> >> >> > > > fully automatic not just part of it.
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > If we can make it 99% automatic and have an extra file
> > >> >> >> >> >> > > with special cases I think that would be ok.
> > >> >> >> >> >> > 
> > >> >> >> >> >> > Agree.
> > >> >> >> >> 
> > >> >> >> >> Yes, but we'll need some automated testing to make sure that
> > >> >> >> >> the 1% does not go out of sync with the code.
> > >> >> >> >> Most likely the 1% can't be handled because it is relatively
> > >> >> >> >> important (definitions in *.i files etc.).
> > >> >> >> > 
> > >> >> >> > I imagine that "1%" will be the same as the "1%" that we have
> > >> >> >> > special treatment for in the SWIG files anyway, so it makes
> > >> >> >> > sense those need special treatment.
> > >> >> >> 
> > >> >> >> I think that we can automate that last 1% too.
> > >> >> >> 
> > >> >> >> > So the idea would be:
> > >> >> >> > 
> > >> >> >> >  1. Document the C++ code in the C++ header files
> > >> >> >> >  2. Document the extra Python code in the Python files (?)
> > >> >> >> >  3. Document the extra SWIG stuff in a special file
> > >> >> >> 
> > >> >> >> All Python docstrings should be located where the code is.
> > >> >> >> In the Python layer (like dolfin/fem.py), or in the extended
> > >> >> >> methods in the *.i files for the dolfin/cpp.py module.
> > >> >> >> 
> > >> >> >> We then need to figure out how to change the syntax/name
> > >> >> >> correctly such that std::vector, double* etc. are mapped to the
> > >> >> >> correct Python arguments/return values, and how to handle the
> > >> >> >> *example* code.
> > >> >> >> 
> > >> >> >> >> >> > > > Also, we will need to change the syntax in all
> > >> >> >> >> >> > > > *example* code of the docstrings. Maybe it can be
> > >> >> >> >> >> > > > done, but I'll need to give it some more careful
> > >> >> >> >> >> > > > thought. We've already changed the approach a few
> > >> >> >> >> >> > > > times now, so I really like the next try to close
> > >> >> >> >> >> > > > to our final implementation.
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > I agree. :-)
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > > > Another thing to discuss is the possibility of
> > >> >> >> >> >> > > > > using Doxygen to extract the documentation. We
> > >> >> >> >> >> > > > > currently have our own script since (I assume)
> > >> >> >> >> >> > > > > Doxygen does not have a C++ --> reST converter.
> > >> >> >> >> >> > > > > Is that correct?
> > >> >> >> >> >> > > > 
> > >> >> >> >> >> > > > I don't think Doxygen has any such converter, but
> > >> >> >> >> >> > > > there exist a project
> > >> >> >> >> >> > > > http://github.com/michaeljones/breathe which makes
> > >> >> >> >> >> > > > it possible to use xml output from Doxygen in much
> > >> >> >> >> >> > > > the same way as we use autodoc for the Python
> > >> >> >> >> >> > > > module. I had a quick go at it but didn't like the
> > >> >> >> >> >> > > > result. No links on the index pages to function
> > >> >> >> >> >> > > > etc. So what we do now is better, but perhaps it
> > >> >> >> >> >> > > > would be a good idea to use Doxygen to extract the
> > >> >> >> >> >> > > > docstrings for all classes and functions, I tried
> > >> >> >> >> >> > > > parsing the xml output in the test/verify_cpp_
> > >> >> >> >> >> > > > ocumentation.py script and it should be relatively
> > >> >> >> >> >> > > > simple to get the docstrings since these are stored
> > >> >> >> >> >> > > > as attributes of classes/functions.
> > >> >> >> >> >> > > 
> > >> >> >> >> >> > > Perhaps an idea would be to use Doxygen for parsing
> > >> >> >> >> >> > > and then have our own script that works with the XML
> > >> >> >> >> >> > > output from Doxygen?
> > >> >> >> >> >> > 
> > >> >> >> >> >> > I did not know we allready used Doxygen to extract
> > >> >> >> >> >> > information about class structure from the headers.
> > >> >> >> >> >> 
> > >> >> >> >> >> I thought it was you who implemented the Doxygen
> > >> >> >> >> >> documentation extraction?
> > >> >> >> >> > 
> > >> >> >> >> > Duh... I mean that I did not know we used it in fenics_doc,
> > >> >> >> >> > in verify_cpp_documentation.py.
> > >> >> >> >> 
> > >> >> >> >> We don't. I wrote this script to be able to test the
> > >> >> >> >> documentation in *.rst files against dolfin.
> > >> >> >> >> Basically, I parse all files and keep track of the
> > >> >> >> >> classes/functions which are defined in dolfin and try to
> > >> >> >> >> match those up against the definitions in the documentation
> > >> >> >> >> (and vise versa) to catch missing/obsolete documentation.
> > >> >> >> >> 
> > >> >> >> >> >> > What are the differences between using the XML from
> > >> >> >> >> >> > Doxygen to also extract the documentation, and the
> > >> >> >> >> >> > approach we use today?
> > >> >> >> >> >> 
> > >> >> >> >> >> Pros (of using Doxygen):
> > >> >> >> >> >> 
> > >> >> >> >> >>   - Doxygen is developed by people that presumably are
> > >> >> >> >> >> very good at extracting docs from C++ code
> > >> >> >> >> >> 
> > >> >> >> >> >>   - Doxygen might handle some corner cases we can't
> > >> >> >> >> >> handle?
> > >> >> >> >> 
> > >> >> >> >> Definitely, and we don't have to maintain it.
> > >> >> >> > 
> > >> >> >> > We would need to maintain the script that extracts data from
> > >> >> >> > the Doxygen-generated XML files.
> > >> >> >> > 
> > >> >> >> >> >> Cons (of using Doxygen):
> > >> >> >> >> >> 
> > >> >> >> >> >>   - Another dependency
> > >> >> >> >> > 
> > >> >> >> >> > Which we already have.
> > >> >> >> >> > 
> > >> >> >> >> >>   - We still need to write a script to parse the XML
> > >> >> >> >> > 
> > >> >> >> >> > We should be able to ust the xml parser in
> > >> >> >> >> > docstringgenerator.py.
> > >> >> >> >> > 
> > >> >> >> >> >>   - The parsing of /// stuff from C++ code is very simple
> > >> >> >> >> > 
> > >> >> >> >> > Yes, and this might be just fine. But if it grows we might
> > >> >> >> >> > consider using Doxygen.
> > >> >> >> >> 
> > >> >> >> >> But some cases are not handled correctly already (nested
> > >> >> >> >> classes etc.) so I vote for Doxygen.
> > >> >> >> > 
> > >> >> >> > Not that I'm insisting on not using Doxygen, but isn't it
> > >> >> >> > quite rare that we use nested classes? I think we decided at
> > >> >> >> > some point that we wanted to avoid it for some other reason.
> > >> >> >> > I don't remember which but it might have been a SWIG problem.
> > >> >> >> 
> > >> >> >> Look at
> > >> >> >> http://www.fenics.org/newdoc/programmers-reference/cpp/function
> > >> >> >> /Function.html as a user I would be confused by LocalScratch and
> > >> >> >> GatherScratch.
> > >> >> > 
> > >> >> > Those can be easily fixed by letting the script stop parsing when
> > >> >> > it finds "private:".
> > >> >> 
> > >> >> OK, and if we are sure that no other nested classes are present in
> > >> >> DOLFIN I guess things should be fine.
> > >> >> 
> > >> >> >> The documentation here is also rather confusing, yes we can fix
> > >> >> >> it, but similar cases will arise in the future.
> > >> >> >> 
> > >> >> >> http://www.fenics.org/newdoc/programmers-reference/cpp/mesh/Mesh
> > >> >> >> Primitive.html
> > >> >> > 
> > >> >> > That looks strange because Andre has used an arbitrary mix of
> > >> >> > "//" and "///" in his comments. Don't blame my script for that.
> > >> >> > :-)
> > >> >> 
> > >> >> Alright alright, I'll never question the almighty
> > >> >> generate_cpp_documentation.py script again. :)
> > >> > 
> > >> > Sounds good. ;-)
> > >> > 
> > >> >> In light of the above and the Doxygen line break issue, maybe it's
> > >> >> best to use your script as a first try?
> > >> >> We just need to break it up in parsing (intermediate
> > >> >> representation), modifying (C++ and Python syntax) and writing
> > >> >> stages (dump in respective folders in the documentation) and
> > >> >> settle on the intermediate representation such that we can easily
> > >> >> switch to a Doxygen parser in case we decide to.
> > >> > 
> > >> > Sounds like a compiler to me. :-)
> > >> 
> > >> Yup.
> > >> 
> > >> > And since I anticipated your comment, it is already broken up into
> > >> > two different stages:
> > >> > 
> > >> >  generate_documentation (should maybe be extract_documentation)
> > >> >  write_documentation
> > >> 
> > >> Nice.
> > >> 
> > >> > The intermediate representation is a simple Python list with class
> > >> > names, signatures, comments etc. I'm sure it can be improved and
> > >> > simplified.
> > >> 
> > >> We will most likely need to refine it w.r.t. information, but I don't
> > >> think that we can simplify it, most likely it will become a little
> > >> more complex.
> > >> 
> > >> BTW, shouldn't the extract_documentation() part be in DOLFIN since we
> > >> intend to use to generate the docstrings.i file?
> > >> 
> > >> Then write_cpp_documentation() and write_python_documentation() is
> > >> part of fenics-doc, but they'll import extract_documentation() from
> > >> DOLFIN. Otherwise we'll end up with redundant code.
> > > 
> > > Yes, that sounds like a good idea.
> > > 
> > > Perhaps we should have a module 'documentation' as part of DOLFIN:
> > > 
> > >  from dolfin import documentation

I vote to put it in dolfin_utils.

Johan

> > >  doc = documentation.extract_documentation(DOLFIN_DIR=...)
> > > 
> > > The extract_documentation function would look for the DOLFIN_DIR
> > > environment variable and if it is not set, it would need the argument
> > > to be supplied.
> > 
> > Sounds OK to me, but can't we just use os.path.abspath(__file__) to
> > get the file name, and then use our knowledge of the relative
> > location. We still need the DOLFIN_DIR for the writee_*_documentation
> > scripts to work though.
> 
> Yes, that sounds better.
> 
> --
> Anders
> 
> > > If we make it a part of DOLFIN, I have a feeling it will be more
> > > robust since it just needs to extract the documentation, while other
> > > scripts are responsible for generating various kinds of output.
> > 
> > I agree.
> > 
> > Kristian
> 
> --
> Anders



References