launchpad-dev team mailing list archive

Thread
Date
Re: optimizing adding team members

To: Launchpad Development <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Aaron Bentley <aaron@xxxxxxxxxxxxx>
Date: Tue, 24 Aug 2010 11:11:50 -0400
In-reply-to: <201008240910.37429.julian.edwards@canonical.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/24/2010 04:10 AM, Julian Edwards wrote:
> On Monday 23 August 2010 15:34:04 Aaron Bentley wrote:
>> On 08/12/2010 10:04 AM, Julian Edwards wrote:
>>> On Wednesday 11 August 2010 23:09:46 Michael Hudson wrote:
>>>> On Wed, 11 Aug 2010 16:34:27 -0500, Edwin Grubbs
>>>
>>> <edwin.grubbs@xxxxxxxxxxxxx> wrote:
>>>>> I think this shows that if we run everything through a queue, there
>>>>> needs to be a way to run really quick tasks in parallel with really
>>>>> big tasks. You wouldn't want a single POFile export for a project to
>>>>> be stuck behind the processing of 75k files for all of ubuntu. There
>>>>> could be a slow lane and a fast lane. If the jobs have a method for
>>>>> estimating the length of time reasonably well, we could also have a
>>>>> super-fast lane, where the page decides not to queue the task at all.
>>>>
>>>> https://dev.launchpad.net/CodeTeam/ExpressLane is an idea of Aaron's in
>>>> this direction.
>>>
>>> That seems awfully similar to the build farm's scoring mechanism.
>>
>> I really don't think the express lane is similar to the build farm's
>> scoring mechanism.
>> The express lane does not require the use of scores.
>
> That's like saying a teapot doesn't use a windscreen wiper.
>
> ie. The scoring is an implementation detail.

The express lane concept is an implementation detail, too.  Saying they
are the same is like saying a windscreen and a motorcycle helmet are the
same thing.  We are talking implementation here.

I'm not arguing that one is better.  I'm arguing that they are different.

>>  And its purpose is to ensure that tasks which can complete quickly have
>> low latency, which is not a characteristic that the build farm currently
>> has.
>
> I have discussed this to death with many people in Canonical now and that's a
> popular misconception.

Tasks which can complete quickly are not being handled with low latency.
 This is not a misconception, it is an observed fact.  For example,
source package recipe builds, many of which take very little time to
run, have been reported to take multiple days to start.

> There's a tradeoff between utilisation and latency.

No argument here.

>  If you keep a resource
> open for high-priority jobs requiring no latency then you're wasting that
> resource.

I'll have to use "fast" and "slow" as the analogue of high-priority and
low-priority in the express lane design, because it does not have
priorities per se.  (And note that it assumes that a task is "fast"
until it is proven to be "slow".)

In the express lane design, utilization is relatively high, because
every task is assumed to be "fast" until it is proven to be slow, and
therefore, it may run in the express lane.  It's only if all of the
tasks are proven to be slow that utilization becomes a question.

However, for express lanes, it is not clear that the waste would be
real.  Resources may be shared by workers.  The original idea I had for
express lanes was for handling Jobs such as generating diffs.  I assumed
that there would be multiple worker processes per machine, and that the
express lane worker would share a machine with a slow lane worker.  So
the "wasted" cycles would actually be allocated to the slow worker
through OS-level multitasking when the fast lane worker was idle.
Presumably this approach could be extended to build farms by running two
virtual machines on a single box-- one for the fast lane and one for the
slow lane.  I have no idea how feasible this is, but it is a potential
alternative.

> And once you dispatch something to that resource, and another high-
> priority job comes along, what do you do with it?

Cancel it.

Remember that the express lane design explicitly describes canceling
tasks from the express lane when they exceed a timeout, and rescheduling
them for the slow lane.  If you did decide to schedule a slow task into
the express lane when there were no "fast" tasks, the obvious extension
would be to cancel the slow task and stick it back into the slow lane
queue when the next "fast" task became available.

However, bear in mind that if the "waste" isn't real because of resource
sharing among workers, there's no advantage to reducing the "waste", so
no reason to dispatch a "slow" job to the express lane anyhow.

> Given nR resources, one will be becoming free within X seconds at any point in
> time.  X is inversely proportional to nR

Adding resources reduces the average of X, not X itself.  X is the
minimum of the all the remaining durations of all the currently running
tasks, and these remaining durations are determined by the expensiveness
of the job, speed of the machine, and time the job started.

>, so we can mostly nullify the wait by
> adding resources.

Adding resources can only reduce the average X, not X itself.  If all
the tasks being run are slow tasks (someone is building X11 and
openoffice on every distro and architecture), X can still be
unreasonably high.

Note that the express lane concept requires at least 1 resource handling
fast tasks, so the potential waste of not utilizing that resource would
be < 1/nR as you increase the number of resources.

> We also make the best overall use of the resources, which
> in itself reduces latency for everything.

That approach maximizes throughput.  It does reduce average latency, but
it does not minimize actual latency.

That is not at all the same approach as the express lane design.  The
express lane design improves the latency of fast tasks at the cost of
increasing the latency of slow tasks.

> You also have to ask yourself if a high-priority job really has to dispatch
> *now*, or can it wait X seconds.  I bet most of the time it can wait, and this
> the strategy I have with the high-priority package builds (security, private,
> OEM etc)

I don't agree that waiting is necessarily okay.  I believe that people
who manually request a build would be happiest if it started soon.

As noted above, X can be unreasonably high, but the express lane design
is meant to address such situations.  It provides a second value, Y,
which is the amount of time until the next fast task can run, and this
can never be greater than the configured timeout.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxz4TYACgkQ0F+nu1YWqI363wCfYexg4o9og4ISQW4lOd9B6fJf
jjUAn0dnLRLTCtST12ICDUdJY+PdmqIW
=BVoo
-----END PGP SIGNATURE-----
References

optimizing adding team members
From: Edwin Grubbs, 2010-08-10
Re: optimizing adding team members
From: Julian Edwards, 2010-08-12
Re: optimizing adding team members
From: Aaron Bentley, 2010-08-23
Re: optimizing adding team members
From: Julian Edwards, 2010-08-24