dolfin team mailing list archive
-
dolfin team
-
Mailing list archive
-
Message #09769
Re: DofMapSet design
On Fri, Sep 19, 2008 at 04:29:47PM +0200, Niclas Jansson wrote:
> Anders Logg wrote:
> > On Fri, Sep 19, 2008 at 11:36:28AM +0200, Niclas Jansson wrote:
> >
> >>> I also wonder about the following in PXMLMesh::readVertices:
> >>>
> >>> const uint L = floor( (real) num_vertices / (real) num_processes);
> >>> const uint R = num_vertices % num_processes;
> >>> const uint num_local = (num_vertices + num_processes -
> >>> process_number - 1) / num_processes;
> >>>
> >>> start_index = process_number * L + std::min(process_number, R);
> >>> end_index = start_index + ( num_local - 1);
> >>>
> >>> I think I can guess what it does, but does it have to be this
> >>> complicated? Isn't it enough to do something like
> >>>
> >>> const uint n = num_vertices / num_processors;
> >>> start_index = n*process_number;
> >>> end_index = start_index + n;
> >>>
> >>> and then a fix for the last processor:
> >>>
> >>> if (process_number == num_processors - 1)
> >>> end_index = num_vertices;
> >>>
> >>> ?
> >>>
> >> But shouldn't that give a bad load balance, for example when N is large,
> >> R << num_processes and (end_index - start_index) >> R.
> >>
> >> Niclas
> >
> > I don't understand, but maybe I'm missing something.
> >
> > Say N = 1,000,000 and num_processes = 16. Then R = 0. With my scheme
> > above, then there will be 62500 vertices on each processor.
> >
> > If we change N to 1,000,001, then there will be 62500 on each
> > processor except the last which will have 62501.
> >
> > If we increase N further, we will have 62502, 62503 etc until 62515 on
> > the last processor, and after that 62501 on each processor etc.
> >
> > But maybe I'm missing something important?
> >
>
> Ok, it was a bad example. But the point is that the extra elements must
> be distributed across all processors to even out the workload.
>
> For example if N = num_processes**2 + num_processes - 1, the last
> processor would get twice the amount of elements.
>
> And even if the last processor only has small amount of extra elements,
> for, let say 1024 processor, the efficiency would drop since 1023
> processors would be wasting cycles waiting on the last one to finish.
>
> Niclas
ok, I think I understand now.
I have modified the code a bit. Take a look and see if it still makes
sense.
It now first computes the number of vertices per process (by int
division) and then distributes the remainder r with one extra vertex
on each of the first r processes.
--
Anders
Attachment:
signature.asc
Description: Digital signature
References
-
Re: DofMapSet design
From: Anders Logg, 2008-08-29
-
Re: DofMapSet design
From: Niclas Jansson, 2008-09-16
-
Re: DofMapSet design
From: Garth N. Wells, 2008-09-16
-
Re: DofMapSet design
From: Niclas Jansson, 2008-09-17
-
Re: DofMapSet design
From: Garth N. Wells, 2008-09-18
-
Re: DofMapSet design
From: Anders Logg, 2008-09-18
-
Re: DofMapSet design
From: Anders Logg, 2008-09-18
-
Re: DofMapSet design
From: Niclas Jansson, 2008-09-19
-
Re: DofMapSet design
From: Anders Logg, 2008-09-19
-
Re: DofMapSet design
From: Niclas Jansson, 2008-09-19