launchpad-dev team mailing list archive

Thread
Date

Re: plan for incremental code imports

To: Michael Hudson <michael.hudson@xxxxxxxxxxxxx>
From: Jelmer Vernooij <jelmer@xxxxxxxxxxxxx>
Date: Mon, 08 Feb 2010 11:25:02 +0100
Cc: Jelmer Vernooij <jelmer.vernooij@xxxxxxxxxxxxx>, Tim Penhey <tim@xxxxxxxxxxxxx>, Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4B6B893B.4030008@canonical.com>

Hi Michael,

On Fri, 2010-02-05 at 15:58 +1300, Michael Hudson wrote:
> We want to make code imports, or at least the ones done with a foreign
> branch plugin, import incrementally.  This will worm around some
> resource leaks somewhere in the import plugin or bzr and allow us to
> import really large repos like linux or firefox, but also will make
> scheduling fairer and reduce the damage done by a network blip.
> 
> This requires some infrastructure work to support an import status of
> "partially successful" and so on, but I know how to do that.  The part
> I'm a bit less sure of is how to do the "only import $N revisions" bit.
> 
> One way would be to not try too hard, and import only $N _mainline_
> revisions each time.  I think code like this could do that:
> 
> local_branch = ...
> foreign_branch = ...
> local_revno = local_branch.revno()
> foreign_revno = foreign_branch.revno()
> target_revno = max(local_revno + $N, foreign_revno)
> target_revid = foreign_branch.get_revid(target_revno)
> local_branch.pull(foreign_branch, stop_revision=target_revid)
> if target_revno == foreign_revno:
>     return SUCCESS
> else:
>     return PARTIAL_SUCCESS

> What I don't know is if this will be very efficient at all; does
> get_revid() on a mercurial or svn or git branch perform acceptably?
bzr-svn branches have this call and it's quite cheap, but it can be very
expensive for bzr-git and bzr-hg branches because we need to fetch all
data before we can lookup the revno. At the moment, we don't cache the
fetched data anywhere so we end up fetching it twice - once to lookup
the revid and once to actually import it. 

> It's also a bit lame in that it would be better to only import $N
> _revisions_ at a time, not mainline revisions.  But I don't know how to
> do that.  The above sketch might be good enough in any case.
The plugins should (with a trivial amount of work) be able to support an
optional argument to only convert approximately X revisions. I think
this is probably a simpler and faster solution than using get_revid(),
and it will also allow us to only import only X real revisions rather
than just X mainline revisions.

> The other thing that should be done is changing our bzr-git importer to
> preserve the git pack files between partial imports, by changing bzr-git
> to put them in a predictable location and then doing some work in the
> importer to preserve them.  I think I'd rather Jelmer look at this part,
> or at least provide me with very detailed instructions ...
Is this a requirement before the incremental imports?

Cheers,

Jelmer

Attachment: signature.asc
Description: This is a digitally signed message part

Follow ups

Re: plan for incremental code imports
From: Michael Hudson, 2010-02-08

References

plan for incremental code imports
From: Michael Hudson, 2010-02-05