dolfin team mailing list archive

Thread
Date

Re: Status of parallel I/O

To: Anders Logg <logg@xxxxxxxxx>
From: "Garth N. Wells" <gnw20@xxxxxxxxx>
Date: Thu, 25 Aug 2011 07:45:49 -0400
Cc: DOLFIN Mailing List <dolfin@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20110825073306.GG2930@smaug>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11


On 25/08/11 03:33, Anders Logg wrote:
> On Wed, Aug 24, 2011 at 09:14:55AM -0400, Garth N. Wells wrote:
>>
>>
>> On 24/08/11 08:55, Anders Logg wrote:
>>> On Wed, Aug 24, 2011 at 08:43:40AM -0400, Garth N. Wells wrote:
>>>>
>>>>
>>>> On 24/08/11 08:07, Anders Logg wrote:
>>>>> On Wed, Aug 24, 2011 at 07:22:53AM -0400, Garth N. Wells wrote:
>>>>>>
>>>>>>
>>>>>> On 24/08/11 03:50, Anders Logg wrote:
>>>>>>> What is the plan for XMLLocalMeshData (using the DOM interface) vs
>>>>>>> XMLLocalMeshDataDistributed (using the SAX interface)?
>>>>>>>
>>>>>>
>>>>>> Both for the time being.
>>>>>>
>>>>>>> Reading boundary indicators is currently failing with
>>>>>>>
>>>>>>> RuntimeError: *** Error: Inconsistent state in XML reader: 6.
>>>>>>>
>>>>>>> Should this be fixed in XMLLocalMeshDataDistributed or is the plan to
>>>>>>> replace it with XMLLocalMeshData?
>>>>>>>
>>>>>>
>>>>>> In XMLLocalMeshDataDistributed.
>>>>>
>>>>> Could you elaborate? The functionality for reading and distributing
>>>>> boundary markers (in parallel) is currently broken and we want to fix
>>>>> it. But we need to know more about the design. I don't want to fix
>>>>> something if you decide to break it 5 min later.
>>>>>
>>>>
>>>> It never 'properly' worked in parallel. There were some messy ad hoc
>>>> changes made on top of functions that were planned for overhaul. I made
>>>> clear before this that parallel functionality was being sorted out (it's
>>>> not just in io, but also partitioning, etc), so the fact that it's not
>>>> working now should not be a surprise.
>>>
>>> It came as a surprise since there was a unit test for it and the unit
>>> test was removed. But nevermind, the important thing now is to get it
>>> working again.
>>>
>>>> There is a lot of missing functionality is parallel, so patience is
>>>> required to get things done properly.
>>>>
>>>>> Should we continue to use libxml2? Why not use the DOM parsing all the
>>>>> way?
>>>>>
>>>>
>>>> Because I'm inclined to keep SAX parsing for meshes since meshes are the
>>>> most likely to be created externally, and need to be scalable for
>>>> reading. Other objects (e.g., vectors) are likely to be created and
>>>> written by DOLFIN, so will eventually use parallel HDF5 for scalable
>>>> parallel io.
>>>
>>> The reason I once chose to implement the XML parsing using SAX, and
>>> why Ola decided to use SAX in his rewrite 2 years back, is exactly
>>> that: scalability and efficiency. I don't see why it should be
>>> different for meshes than any other objects. Other objects can also be
>>> large. It seems messy to use both.
>>>
>>
>> XML (be it with SAX ad DOM parsers) is not a scalable or efficient
>> solution. The scalable and efficient solution in binary + MPI. This will
>> appear when time permits.
> 
> Sure, but I would claim SAX scales better. 

In terms of memory, yes.

It is not sufficiently scalable to be a total solution. Plus it's too slow.

> Wouldn't it be better to
> just use one of DOM or SAX? 

Maybe. A SAX implementation is considerably more complex. The new
implementation reserves this complexity for a possibly critical case and
localises the complexity of the code. The old code was very complex and
less localised.

The locality means that it's no big deal to have a simple DOM
implementation for the majority of cases next to a more complex SAX
implementations for special cases. There is no point in the size and
complexity of a SAX parser for simple cases, e.g. reading parameter files.

> Either we use SAX all the way if it gives
> better performance than DOM, 

It doesn't give better performance. We discussed this before. Without
checking the archive, I recall that the DOM implementation was about 50
times faster for large data sets than the old SAX implementation.

> or we use DOM all the way as a solution
> for medium sized problems and complement with HDF5 for large scale
> problems. Having DOM + SAX + HDF5 seems messy.
>

This may happen, but the fact is that we don't have HDF5 in place yet.

>>>>> What is the difference between XMLLocalMeshData and
>>>>> XMLLocalMeshDataDistributed etc.
>>>>>
>>>>
>>>> Initially I planned to use DOM for all, but as outlined above decided
>>>> after some testing to retain SAX for meshes (but update to SAX2, since
>>>> the libxml2 SAX parser is deprecated and has memory leaks). Hence,
>>>> XMLLocalMeshData uses DOM and XMLLocalMeshDataDistributed uses SAX. So
>>>> far I've kept the DOM version since it's easy to code and could be
>>>> useful when reading non-distributed meshes on each process (which may
>>>> differ on different processes).
>>>
>>> I don't understand the difference between XMLLocalMeshData and
>>> XMLLocalMeshDataDistributed. Is XMLLocalMeshDataDistributed doing now
>>> what XMLLocalMeshData did before?
>>>
>>
>> Yes, but updated to SAX2 (which was very painful).
>>
>> The 'new' XMLLocalMeshData is a DOM version. It could be removed.
> 
> Or kept if we will add HDF5 anyway as a more scalable solution.
>

Again, it may be desirable to keep a SAX parser for reading meshes in
parallel since a mesh is the most likely large data structure to be
created externally, and the most complex. HDF5 would require a user to
supply a binary mesh file rather than an XML file. Most other large data
sets are created internally, and the read and written. In this case,
HDF5 will be fine.

Garth


> --
> Anders

Follow ups

Re: Status of parallel I/O
From: Anders Logg, 2011-08-25

References

Status of parallel I/O
From: Anders Logg, 2011-08-24
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-24
Re: Status of parallel I/O
From: Anders Logg, 2011-08-24
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-24
Re: Status of parallel I/O
From: Anders Logg, 2011-08-24
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-24
Re: Status of parallel I/O
From: Anders Logg, 2011-08-25