← Back to team overview

dolfin team mailing list archive

Re: DofMapSet design

 

Anders Logg wrote:
On Fri, Sep 19, 2008 at 11:36:28AM +0200, Niclas Jansson wrote:

I also wonder about the following in PXMLMesh::readVertices:

  const uint L = floor( (real) num_vertices / (real) num_processes);
  const uint R = num_vertices % num_processes;
  const uint num_local = (num_vertices + num_processes -
  process_number - 1) / num_processes;

  start_index = process_number * L + std::min(process_number, R);
  end_index = start_index + ( num_local - 1);

I think I can guess what it does, but does it have to be this
complicated? Isn't it enough to do something like

  const uint n = num_vertices / num_processors;
  start_index = n*process_number;
  end_index = start_index + n;

and then a fix for the last processor:

  if (process_number == num_processors - 1)
    end_index = num_vertices;

?

But shouldn't that give a bad load balance, for example when N is large,
R << num_processes and (end_index - start_index) >> R.

Niclas

I don't understand, but maybe I'm missing something.

Say N = 1,000,000 and num_processes = 16. Then R = 0. With my scheme
above, then there will be 62500 vertices on each processor.

If we change N to 1,000,001, then there will be 62500 on each
processor except the last which will have 62501.

If we increase N further, we will have 62502, 62503 etc until 62515 on
the last processor, and after that 62501 on each processor etc.

But maybe I'm missing something important?

--
Anders


Ok, it was a bad example. But the point is that the extra elements must be distributed across all processors to even out the workload.

For example if N = num_processes**2 + num_processes - 1, the last processor would get twice the amount of elements.

And even if the last processor only has small amount of extra elements, for, let say 1024 processor, the efficiency would drop since 1023 processors would be wasting cycles waiting on the last one to finish.

Niclas


Follow ups

References