← Back to team overview

fenics team mailing list archive

Re: Docstrings etc

 

On 27 August 2010 08:54, Garth N. Wells <gnw20@xxxxxxxxx> wrote:
>
>
> On 27/08/10 07:43, Anders Logg wrote:
>>
>> On Thu, Aug 26, 2010 at 10:34:02PM +0200, Kristian Ølgaard wrote:
>>>
>>> On 26 August 2010 22:13, Anders Logg<logg@xxxxxxxxx>  wrote:
>>>>
>>>> On Thu, Aug 26, 2010 at 10:09:16PM +0200, Kristian Ølgaard wrote:
>>>>>
>>>>> On 26 August 2010 22:04, Anders Logg<logg@xxxxxxxxx>  wrote:
>>>>>>
>>>>>> On Thu, Aug 26, 2010 at 09:34:01PM +0200, Kristian Ølgaard wrote:
>>>>>>>
>>>>>>> On 26 August 2010 20:35, Anders Logg<logg@xxxxxxxxx>  wrote:
>>>>>>>>
>>>>>>>> On Thu, Aug 26, 2010 at 08:16:41PM +0200, Anders Logg wrote:
>>>>>>>>>
>>>>>>>>> On Thu, Aug 26, 2010 at 08:09:56PM +0200, Kristian Ølgaard wrote:
>>>>>>>>>>
>>>>>>>>>> On 26 August 2010 19:51, Anders Logg<logg@xxxxxxxxx>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 26, 2010 at 07:42:35PM +0200, Kristian Ølgaard wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 26 August 2010 18:22, Anders Logg<logg@xxxxxxxxx>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've thought some more on how to organize/synchronize the
>>>>>>>>>>>>> FEniCS
>>>>>>>>>>>>> documentation (in fenics-doc) with the documentation we have in
>>>>>>>>>>>>> the
>>>>>>>>>>>>> code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think it is important that
>>>>>>>>>>>>>
>>>>>>>>>>>>> (1) the strings we have in the code are the same as those that
>>>>>>>>>>>>> appear
>>>>>>>>>>>>> on in the HTML documentation (which we write in Sphinx).
>>>>>>>>>>>>>
>>>>>>>>>>>>> (2) the strings we have in the code are short (so they don't
>>>>>>>>>>>>> clutter
>>>>>>>>>>>>> up the code)
>>>>>>>>>>>>
>>>>>>>>>>>> I disagree. The whole idea of the documentation effort was to
>>>>>>>>>>>> document
>>>>>>>>>>>> in one place
>>>>>>>>>>>> (using carefully handwritten and elaborate explanations
>>>>>>>>>>>> including
>>>>>>>>>>>> examples and links to demos etc.) and code in another.
>>>>>>>>>>>> The comments in the code should be very short and precise such
>>>>>>>>>>>> that
>>>>>>>>>>>> together with the class/function definition and type info the
>>>>>>>>>>>> developer can complete the task without looking elsewhere. These
>>>>>>>>>>>> kind
>>>>>>>>>>>> of comments, I expect, will look weird when put next to an
>>>>>>>>>>>> elaborate
>>>>>>>>>>>> explanation on how the class/function works including all the
>>>>>>>>>>>> bells
>>>>>>>>>>>> and whistles.
>>>>>>>>>>>>
>>>>>>>>>>>>> If we look at these two, it seems that (1) implies that we
>>>>>>>>>>>>> should
>>>>>>>>>>>>> write the documentation as part of the code and then extract it
>>>>>>>>>>>>> using
>>>>>>>>>>>>> some tool.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But (2) prevents that since we don't want to constrain the
>>>>>>>>>>>>> documentation for all functions to be very short.
>>>>>>>>>>>>>
>>>>>>>>>>>>> How about the following solution.
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Write short docstrings in the code
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Auto-generate all the .rst input files for the Programmer's
>>>>>>>>>>>>>  Reference using a simple Python script that looks for '///'
>>>>>>>>>>>>>
>>>>>>>>>>>>> * The script looks at the code to generate the signature of the
>>>>>>>>>>>>>  function and the text that comes immediately after.
>>>>>>>>>>>>
>>>>>>>>>>>> This might be possible for a simple
>>>>>>>>>>>> 'change-order-of-comment-and-function' script where you
>>>>>>>>>>>> manipulate the
>>>>>>>>>>>> output manually afterwards, but if you want to run this more
>>>>>>>>>>>> than once
>>>>>>>>>>>> you will have to pick up nested class/struct definitions
>>>>>>>>>>>> templates and
>>>>>>>>>>>> all kinds of crap.
>>>>>>>>>>>> I tried to write a parser like this to check if all classes and
>>>>>>>>>>>> functions were documented, but gave up and let Doxygen do the
>>>>>>>>>>>> dirty
>>>>>>>>>>>> work. (But do we want to do this just to generate 20 characters
>>>>>>>>>>>> of
>>>>>>>>>>>> docstring automatically?)
>>>>>>>>>>>>
>>>>>>>>>>>>>  But it also looks in a hand-written .rst file that contains
>>>>>>>>>>>>> any
>>>>>>>>>>>>>  additional stuff we want to put below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So for the code example in the style manual, the things that
>>>>>>>>>>>>> get
>>>>>>>>>>>>> picked up from the code are
>>>>>>>>>>>>>
>>>>>>>>>>>>>  // Return the cell which is closest to the given point
>>>>>>>>>>>>>  uint closest_cell(const Point&  point) const
>>>>>>>>>>>>>
>>>>>>>>>>>>> which gets converted to
>>>>>>>>>>>>>
>>>>>>>>>>>>> .. cpp:function:: uint closest_cell(const Point&  point) const
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Return the cell which is closest to the given point
>>>>>>>>>>>>>
>>>>>>>>>>>>> The script also looks in a file for "closest_cell" below which
>>>>>>>>>>>>> we have
>>>>>>>>>>>>> written all the *Arguments* stuff that will be thrown in below.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Will that work?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, but the work flow is getting complex, and you'll need to
>>>>>>>>>>>> know
>>>>>>>>>>>> what you get from the source code so you don't repeat yourself.
>>>>>>>>>>>> It is much easier to have the documentation in one place.
>>>>>>>>>>>>
>>>>>>>>>>>>> Another solution would be to just write everything as part of
>>>>>>>>>>>>> the
>>>>>>>>>>>>> code, and just add some settings to our editors that will fold
>>>>>>>>>>>>> the
>>>>>>>>>>>>> extra stuff away so we don't need to see it. Maybe that is the
>>>>>>>>>>>>> most
>>>>>>>>>>>>> robust solution?
>>>>>>>>>>>>
>>>>>>>>>>>> The general consensus the last time this issue came up was not
>>>>>>>>>>>> to
>>>>>>>>>>>> clutter the code with documentation markup.
>>>>>>>>>>>>
>>>>>>>>>>>> Kristian
>>>>>>>>>>>
>>>>>>>>>>> I agree it's good to have the documentation in one place, but it
>>>>>>>>>>> would
>>>>>>>>>>> be good if we found a way to keep it in sync. Helper scripts can
>>>>>>>>>>> do
>>>>>>>>>>> some of that work, but we probably won't be able to pick up
>>>>>>>>>>> things
>>>>>>>>>>> like having
>>>>>>>>>>>
>>>>>>>>>>>  "Compute the number of neighbors"
>>>>>>>>>>>
>>>>>>>>>>> in one place and
>>>>>>>>>>>
>>>>>>>>>>>  "Return the number of neighbors"
>>>>>>>>>>>
>>>>>>>>>>> in other places. Things like this will creep in over time. It
>>>>>>>>>>> might
>>>>>>>>>>> not be a big issue but I find it a bit annoying.
>>>>>>>>>>
>>>>>>>>>> I see. A simpler approach, rather than generating docstrings would
>>>>>>>>>> be
>>>>>>>>>> to have a script that
>>>>>>>>>> simply looks for '///' comments in dolfin/mesh/Mesh.h and check if
>>>>>>>>>> the
>>>>>>>>>> EXACT same strings are present in
>>>>>>>>>> programmers-reference/cpp/mesh/Mesh.rst, if not crash test and let
>>>>>>>>>> user figure out manually why it failed and which comment/docstring
>>>>>>>>>> should be changed.
>>>>>>>>>> This won't be completely bulletproof, but much much simpler than
>>>>>>>>>> parsing a C++ library.
>>>>>>>>>
>>>>>>>>> Yes, that might be a good solution.
>>>>>>>>>
>>>>>>>>>> I currently check if the docstrings of the documentation for the
>>>>>>>>>> Python interface is equal to the docstrings of the DOLFIN module
>>>>>>>>>> after
>>>>>>>>>> import so that sort of works in the same way, only in this case I
>>>>>>>>>> know
>>>>>>>>>> that the docstring I check belongs to function 'bar' of class
>>>>>>>>>> 'foo'.
>>>>>>>>>>
>>>>>>>>>> Then we use the stub-generator that you have know to give us the
>>>>>>>>>> first
>>>>>>>>>> set of *.rst files and then add the '///' comments check to the
>>>>>>>>>> verify_cpp_documentation.py script.
>>>>>>>>>
>>>>>>>>> It's almost there now, I just need to do some polishing.
>>>>>>>>>
>>>>>>>>> Sphinx is currently crashing when it generates the documentation
>>>>>>>>> from
>>>>>>>>> the .rst files I generate.
>>>>>>>>>
>>>>>>>>> Exception occurred:
>>>>>>>>>   File "/usr/lib/pymodules/python2.6/docutils/nodes.py", line 1898,
>>>>>>>>> in
>>>>>>>>>   dupname
>>>>>>>>>     node['names'].remove(name)
>>>>>>>>> ValueError: list.remove(x): x not in list
>>>>>>>>>
>>>>>>>>> Any ideas what this might be?
>>>>>>>>
>>>>>>>> Looks like this happens when there are multiple functions with the
>>>>>>>> same signature.
>>>>>>>
>>>>>>> Very likely,  and that's probably because you need to extract 'const'
>>>>>>> information too, and that's just the tip of the iceberg if we proceed
>>>>>>> down this road....
>>>>>>
>>>>>> Try now.
>>>>>>
>>>>>> You need to set DOLFIN_DIR to the DOLFIN source tree.
>>>>>>
>>>>>> Then run
>>>>>>
>>>>>>  python utils/generate_cpp_doc.py
>>>>>>  make html
>>>>>>
>>>>>> The generated stuff is in
>>>>>> {source/build}/programmers-reference/test/cpp
>>>>>
>>>>> OK, I'm just finishing a DOLFIN build to test the docstrings in the
>>>>> Python interface. Will test soon.
>>>>>
>>>>>> I'll be moving it to {source/build}/programmers-reference/cpp and make
>>>>>> sure not to overwrite the Mesh and Point class documentation that you
>>>>>> have written.
>>>>>
>>>>> There is no C++ documentation for Point, only for the Python interface
>>>>> and that was just to see how some of the autodoc functions worked.
>>>>> Anyway, we can always dig it up by reverting the repo to hack away.
>>>>
>>>> I noticed that. I just remember seeing something about the Point
>>>> class.
>>>>
>>>> Anyway, it seems to work now. What is missing is to generate the
>>>> index.rst files for each module.
>>>
>>> Looks pretty good to me. Do you need to generate the index.rst files?
>>> Can't you just add the output from 'ls *.h' in the modules to the
>>> index.rst files?
>>> Once we're finished editing the *.rst files you have generated we
>>> should be able to run the script verify_cpp_documentation.py which
>>> should tell us if we missed any.
>>> BTW, I'm done for today.
>>
>> I think we should think hard on this one more time. Is it really that
>> bad do write the documentation as part of the code?
>>
>
> It's good to have it in the code as long as its not too long and not full of
> mark up. The thing I look for most are function declarations, so I find it
> annoying when I can't find a declaration for all the markup. It's also hard
> to get an overview of a class when only a few declarations fit on the screen
> amongst the markup with funny symbols.

The markup that we plan to use will be pretty simple (see the source
for the C++ Mesh.rst), but it will add a lot of extra lines to the
source code.
I too would find this annoying.

> I do like long docstrings in Python. Because the argument list is not
> statically typed and there's more magic in Python, a good docstring is
> essential.
>
> Garth
>
>> The stuff that you have written for the Mesh class could easily go in
>> to Mesh.h without causing too much clutter (reST looks nice), and I
>> imagine it would be easy to add a folding mode to Emacs and other
>> editors that will hide all lines starting with /// except for the
>> first line.
>>
>> The simple script I wrote seems to work pretty well to extract the
>> documentation. If it breaks somewhere, we could either improve the
>> script or learn to write the code so the script does not break.
>>
>> The point here is that now the generated .rst files are in sync with
>> the code, but in a day or two someone will edit one of the .h files in
>> DOLFIN and the documentation and code will start to diverge.

Yes, but this problem is already there for the Python interface and it
won't go away.
I guess the key thing to this is that a new feature or a change in
DOLFIN source code is not complete until the documentation has been
updated.

Kristian

>> --
>> Anders
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~fenics
>> Post to     : fenics@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~fenics
>> More help   : https://help.launchpad.net/ListHelp
>



Follow ups

References