dolfin team mailing list archive

Thread
Date

Re: DofMapSet design

To: dolfin-dev@xxxxxxxxxx
From: Niclas Jansson <njansson@xxxxxx>
Date: Fri, 19 Sep 2008 16:29:47 +0200
Delivered-to: dolfin-dev@xxxxxxxxxx
In-reply-to: <20080919111817.GA7017@bunjil.simula.no>
User-agent: Thunderbird 2.0.0.16 (X11/20080724)

Anders Logg wrote:

On Fri, Sep 19, 2008 at 11:36:28AM +0200, Niclas Jansson wrote:

I also wonder about the following in PXMLMesh::readVertices:

  const uint L = floor( (real) num_vertices / (real) num_processes);
  const uint R = num_vertices % num_processes;
  const uint num_local = (num_vertices + num_processes -
  process_number - 1) / num_processes;

  start_index = process_number * L + std::min(process_number, R);
  end_index = start_index + ( num_local - 1);

I think I can guess what it does, but does it have to be this
complicated? Isn't it enough to do something like

  const uint n = num_vertices / num_processors;
  start_index = n*process_number;
  end_index = start_index + n;

and then a fix for the last processor:

  if (process_number == num_processors - 1)
    end_index = num_vertices;

?

But shouldn't that give a bad load balance, for example when N is large,
R << num_processes and (end_index - start_index) >> R.

Niclas


I don't understand, but maybe I'm missing something.

Say N = 1,000,000 and num_processes = 16. Then R = 0. With my scheme
above, then there will be 62500 vertices on each processor.

If we change N to 1,000,001, then there will be 62500 on each
processor except the last which will have 62501.

If we increase N further, we will have 62502, 62503 etc until 62515 on
the last processor, and after that 62501 on each processor etc.

But maybe I'm missing something important?

--
Anders

Ok, it was a bad example. But the point is that the extra elements mustbe distributed across all processors to even out the workload.

For example if N = num_processes**2 + num_processes - 1, the lastprocessor would get twice the amount of elements.

And even if the last processor only has small amount of extra elements,for, let say 1024 processor, the efficiency would drop since 1023processors would be wasting cycles waiting on the last one to finish.


Niclas

Follow ups

Re: DofMapSet design
From: Anders Logg, 2008-09-20
Re: DofMapSet design
From: Garth N. Wells, 2008-09-19

References

Re: DofMapSet design
From: Anders Logg, 2008-08-28
Re: DofMapSet design
From: Niclas Jansson, 2008-08-29
Re: DofMapSet design
From: Anders Logg, 2008-08-29
Re: DofMapSet design
From: Niclas Jansson, 2008-09-16
Re: DofMapSet design
From: Garth N. Wells, 2008-09-16
Re: DofMapSet design
From: Niclas Jansson, 2008-09-17
Re: DofMapSet design
From: Garth N. Wells, 2008-09-18
Re: DofMapSet design
From: Anders Logg, 2008-09-18
Re: DofMapSet design
From: Anders Logg, 2008-09-18
Re: DofMapSet design
From: Niclas Jansson, 2008-09-19
Re: DofMapSet design
From: Anders Logg, 2008-09-19