launchpad-dev team mailing list archive

Thread
Date

Re: plan for incremental code imports

To: Jelmer Vernooij <jelmer@xxxxxxxxxxxxx>
From: Michael Hudson <michael.hudson@xxxxxxxxxxxxx>
Date: Tue, 09 Feb 2010 08:50:47 +1300
Cc: Jelmer Vernooij <jelmer.vernooij@xxxxxxxxxxxxx>, Tim Penhey <tim@xxxxxxxxxxxxx>, Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <1265624702.20167.78.camel@charis.vernstok.nl>
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Jelmer Vernooij wrote:
> Hi Michael,

Thanks for the reply!

> On Fri, 2010-02-05 at 15:58 +1300, Michael Hudson wrote:
>> We want to make code imports, or at least the ones done with a foreign
>> branch plugin, import incrementally.  This will worm around some
>> resource leaks somewhere in the import plugin or bzr and allow us to
>> import really large repos like linux or firefox, but also will make
>> scheduling fairer and reduce the damage done by a network blip.
>>
>> This requires some infrastructure work to support an import status of
>> "partially successful" and so on, but I know how to do that.  The part
>> I'm a bit less sure of is how to do the "only import $N revisions" bit.
>>
>> One way would be to not try too hard, and import only $N _mainline_
>> revisions each time.  I think code like this could do that:
>>
>> local_branch = ...
>> foreign_branch = ...
>> local_revno = local_branch.revno()
>> foreign_revno = foreign_branch.revno()
>> target_revno = max(local_revno + $N, foreign_revno)
>> target_revid = foreign_branch.get_revid(target_revno)
>> local_branch.pull(foreign_branch, stop_revision=target_revid)
>> if target_revno == foreign_revno:
>>     return SUCCESS
>> else:
>>     return PARTIAL_SUCCESS
> 
>> What I don't know is if this will be very efficient at all; does
>> get_revid() on a mercurial or svn or git branch perform acceptably?
> bzr-svn branches have this call and it's quite cheap, but it can be very
> expensive for bzr-git and bzr-hg branches because we need to fetch all
> data before we can lookup the revno. At the moment, we don't cache the
> fetched data anywhere so we end up fetching it twice - once to lookup
> the revid and once to actually import it. 

Right, that's what I was afraid of.

>> It's also a bit lame in that it would be better to only import $N
>> _revisions_ at a time, not mainline revisions.  But I don't know how to
>> do that.  The above sketch might be good enough in any case.
> The plugins should (with a trivial amount of work) be able to support an
> optional argument to only convert approximately X revisions. I think
> this is probably a simpler and faster solution than using get_revid(),
> and it will also allow us to only import only X real revisions rather
> than just X mainline revisions.

That would be great.  When can this be done by? :-)

>> The other thing that should be done is changing our bzr-git importer to
>> preserve the git pack files between partial imports, by changing bzr-git
>> to put them in a predictable location and then doing some work in the
>> importer to preserve them.  I think I'd rather Jelmer look at this part,
>> or at least provide me with very detailed instructions ...
> Is this a requirement before the incremental imports?

It's not strictly a requirement, but it means that for the kernel, we'll
transfer 55000 revisions for the first partial import, then 54000 for
the second then 53000, .... totaling to rather a lot.

Tim thinks this is more important than me, it seems.

Cheers,
mwh

References

plan for incremental code imports
From: Michael Hudson, 2010-02-05
Re: plan for incremental code imports
From: Jelmer Vernooij, 2010-02-08