dolfin team mailing list archive

Thread
Date

Re: Status of parallel I/O

To: "Garth N. Wells" <gnw20@xxxxxxxxx>
From: Anders Logg <logg@xxxxxxxxx>
Date: Sat, 27 Aug 2011 09:59:22 +0200
Cc: DOLFIN Mailing List <dolfin@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <20110827074711.GA6231@smaug>
User-agent: Mutt/1.5.21 (2010-09-15)

On Sat, Aug 27, 2011 at 09:47:14AM +0200, Anders Logg wrote:
> On Fri, Aug 26, 2011 at 06:54:31PM -0700, Garth N. Wells wrote:
> >
> >
> > On 26/08/11 07:27, Anders Logg wrote:
> > > On Fri, Aug 26, 2011 at 07:11:11AM -0400, Garth N. Wells wrote:
> > >>
> > >>
> > >> On 26/08/11 02:39, Anders Logg wrote:
> > >>> On Thu, Aug 25, 2011 at 05:57:51PM -0400, Garth N. Wells wrote:
> > >>>>
> > >>>>
> > >>>> On 25/08/11 16:53, Anders Logg wrote:
> > >>>>> On Thu, Aug 25, 2011 at 09:59:44AM -0400, Garth N. Wells wrote:
> > >>>>>
> > >>>>>>>>> How about using DOM everywherme and reserve the use of SAX for an
> > >>>>>>>>> XML->HDF5 converter?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> That could be OK, but if we have the to implement a SAX parser it's
> > >>>>>>>> probably easiest to have it in DOLFIN anyway. I don't see the advantage
> > >>>>>>>> over having the SAX parser with the io code.
> > >>>>>>>
> > >>>>>>> I agree we should keep it in DOLFIN, but if the only thing it needs to
> > >>>>>>> do is extract data and spit out HDF5, I imagine it can be simpler than
> > >>>>>>> the current parser since it doesn't need to be parallel. (?)
> > >>>>>>>
> > >>>>>>
> > >>>>>> To make things clearer, I've just renamed the LocalMeshData parsers to
> > >>>>>>
> > >>>>>>   XMLLocalMeshDOM (was XMLLocalMeshData)
> > >>>>>>
> > >>>>>> and
> > >>>>>>
> > >>>>>>   XMLLocalMeshSAX (was XMLLocalMeshDataDistributed)
> > >>>>>
> > >>>>> That's good.
> > >>>>>
> > >>>>>> When XMLLocalMeshSAX is complete, it may be desirable to remove
> > >>>>>> XMLLocalMeshDOM.
> > >>>>>
> > >>>>> Either way is fine for me, as long as we decide which one to use. I
> > >>>>> initially wanted to use SAX (as before) but the DOM looks easier and
> > >>>>> may be enough if we plan to use HDF5 for large-scale problems anyway.
> > >>>>> Or is it the case that DOM is a limitation even for medium sized
> > >>>>> problems?
> > >>>>>
> > >>>>
> > >>>> It works for 'medium' (very arbitrary) size problems.
> > >>>>
> > >>>>>> I don't know what you mean by parallel - the XMLLocalMeshSAX works in
> > >>>>>> the same way as the old parser (each process reading a chunk). I don't
> > >>>>>> see how it can be made simpler by reading a XML file and then converting
> > >>>>>> to HDF5. The steps that are there now will all still be required to read
> > >>>>>> the XML mesh before writing a HDF file.
> > >>>>>
> > >>>>> I don't know HDF, but I imagine one could write one single file and
> > >>>>> HDF will handle parallel parsing of that file later. Then the
> > >>>>> conversion script we write does not need to do anything parallel, just
> > >>>>> read line by line and convert from one format to another.
> > >>>>>
> > >>>>
> > >>>> It may not be possible to do it line-by-line (I don't know, but I
> > >>>> wouldn't want to bank on it). Even if line-by-line is technically
> > >>>> possible, it could turn out to be terribly slow. We should support that
> > >>>> a mesh can be read into memory (distributed), and written to HDF5.
> > >>>>
> > >>>> Since we'll have support for writing HDF5 meshes, if we can read a large
> > >>>> XML mesh then we can re-use the HDF5 output code to make the conversion.
> > >>>>
> > >>>> I've removed the DOM-based LocalMeshData parser - there is no point to
> > >>>> it since we can just read the mesh on one process using XMLMesh and use
> > >>>> it to construct a dolfin::XMLLocalMeshData object.
> > >>>
> > >>> ok, looks good.
> > >>>
> > >>> How should we store boundary indicators? I'm not sure whether it needs
> > >>> to be stored as part of ParallelData. Is it really "parallel data"?
> > >>> ParallelData will for sure need to be used to compute it (convert
> > >>> somehow from the input) but it seems it can then be stored
> > >>> locally.
> > >>
> > >> OK. It's not really parallel data (but perhaps ParallelData should be
> > >> renamed).
> > >>
> > >>> Each facet just needs to know its indicator value.
> > >>>
> > >>> The input is a list of triples:
> > >>>
> > >>>   (indicator, facet_cell, facet_number)
> > >>>
> > >>> This indicates that local facet number `facet_number` of the cell
> > >>> `facet_cell` should have the indicator value (sub domain number)
> > >>> `indicator`.
> > >>>
> > >>
> > >> Let's make it generic:
> > >>
> > >>   (parent_cell_index, entity_dim, local_entity_index, indicator/value)
> > >
> > > Yes, that looks good. What about the XML format? It becomes unwieldy
> > > to store it as 4 different MeshFunctions. Here's an initial sketch:
> > >
> > > <mesh>
> > >   # cells and vertices here as before
> > >   <data>
> > >     # user data here as before
> > >   </data>
> > >   <indicators dim="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >   </indicators>
> > >   <indicators dim="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >     <indicator cell="..." local_entity_index="..." value="...">
> > >   </indicators>
> > >   ...
> > > </mesh>
> > >
> >
> > That looks OK, except for the name 'indicator(s)'.
> >
> > Essentially what it is is a MeshFunction that is defined only on a
> > subset of entities of a given dimension. I think we should template it
> > on the C++ side so that any data can be attached. We should then have
> >
> >    <indicators dim="..." type="...">
> >
> > (but with something other than 'indicators'). Internally, there may be
> > no need to construct a MeshFunction.
> >
> >
> > >> It could go into MeshData (possibly with what's in ParallelData), and
> > >> the current MeshData could be renamed to something like 'UserMeshData'.
> > >
> > > I think it's better to keep the name MeshData for user-defined data
> > > (and internal DOLFIN data stored there in waiting for a proper place
> > > to store it). mesh.data() is used in many places in user code.
> > >
> > > It would be better to each time we decide to amend the Mesh class with
> > > new data to add a proper class to hold it, like ParallelData (possibly
> > > renamed). How about a new class called "MeshIndicators" to hold mesh
> > > indicators. It would need to handle initialization from various
> > > sources of input data, in particular MeshFunctions, which is then
> > > converted to some proper internal representation. The MeshIndicator
> > > class should be "parallel aware" and not need any special extras in
> > > ParallelData.
> > >
> >
> > All fine if we can find a more appropriate name than 'Indicator'.
>
> How about SubsetFunction and we template it the same way as we do
> MeshFunctions?
>
> Or SubsetIndicators.

Or perhaps it's "markers"? So we add a class MeshMarkers that holds
all markers and make it a member of the Mesh class.

And the XML format would be

  <markers dim="..." type="..." size="...">

  </markers>

I've added "size" so we can preallocate when parse.

--
Anders

References

Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-25
Re: Status of parallel I/O
From: Anders Logg, 2011-08-25
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-25
Re: Status of parallel I/O
From: Anders Logg, 2011-08-25
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-25
Re: Status of parallel I/O
From: Anders Logg, 2011-08-26
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-26
Re: Status of parallel I/O
From: Anders Logg, 2011-08-26
Re: Status of parallel I/O
From: Garth N. Wells, 2011-08-27
Re: Status of parallel I/O
From: Anders Logg, 2011-08-27