← Back to team overview

dolfin team mailing list archive

Re: MatSetValues profiling


On Sat 2008-05-17 19:48, Murtazo Nazarov wrote:
> I think the MatSetFromOptions() and MatSeqAIJSetPreallocation() are called
> after creating Petsc matrix. I don't know where is the right place to put
> them.

Creating the matrix just creates an empty container.  The big malloc happens in
MatSeqAIJSetPreallocation (actually MatSeqAIJSetPreallocation_IMPL).  Just put
MatSetFromOptions before MatSeqAIJSetPreallocation.  If you really want to
override user options (effectively make certain options unavailable from the
command line), you can put them after MatSetFromOptions.

> Attached I send you log files using -info and -log_summary. I use 50x50x50
> mesh.

  [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 397953 X 397953; storage space: 0 unneeded,17351559 used
  [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
  [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 45
  [0] Mat_CheckInode(): Found 397953 nodes out of 397953 rows. Not using Inode routines
  time 21.770000

I assume this is a scalar problem with linear elements on a tetrahedral mesh.
The preallocation is done correctly here so you're unlikely to be able to
improve this by huge amounts.  The reason the first assembly is more expensive
is that the structure on each row is not known so it needs to be shuffled
around.  Since you know where the entries are (from the mesh) you could do a
dummy assemble (call MatSetValues once per row with the correct column numbers
and zeros for the entries, then MatAssembleBegin/End with MAT_FLUSH_ASSEMBLY (I
think this is enough)).  This should take around 2 seconds for your matrix,
similar to a finite difference assembly.  After that, all assemblies should take
the same time.  I don't think 7 seconds is terribly bad for this matrix.  It is
about 3 times longer than my quoted time for a finite difference assembly of a
matrix of similar size on a Core 2 Duo CPU T9300 @ 2.50GHz.  If you want to
speed up the subsequent assemblies, I suspect you will have take advantage of
the special structure of mesh traversal to avoid updating the same values many
times, but I don't know how much this would gain.  Is the 7 second assembly a
bottleneck for solving this system?  If so, this matrix must be pretty well
conditioned.  In any case, assembly should scale well in parallel and well with
problem size.

> > Before the preallocation fix:
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 25140 X 25140; storage space:
> > 261180 unneeded,730140 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
> > 57708
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 59
> >
> > After the fix:
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 25140 X 25140; storage space: 0
> > unneeded,730140 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 59
> >
> > The time difference is around 3 orders of magnitude.
> This excellent. I wounder do I get the same or not. How did you do this
> profiling?

3 orders of magnitude doesn't require profiling :-)

Note that this example matrix was very small compared to yours.


Attachment: pgp9sBh2kVs07.pgp
Description: PGP signature

Follow ups
