launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #02495
Re: plan for incremental code imports
Jelmer Vernooij wrote:
> Hi Michael,
Thanks for the reply!
> On Fri, 2010-02-05 at 15:58 +1300, Michael Hudson wrote:
>> We want to make code imports, or at least the ones done with a foreign
>> branch plugin, import incrementally. This will worm around some
>> resource leaks somewhere in the import plugin or bzr and allow us to
>> import really large repos like linux or firefox, but also will make
>> scheduling fairer and reduce the damage done by a network blip.
>>
>> This requires some infrastructure work to support an import status of
>> "partially successful" and so on, but I know how to do that. The part
>> I'm a bit less sure of is how to do the "only import $N revisions" bit.
>>
>> One way would be to not try too hard, and import only $N _mainline_
>> revisions each time. I think code like this could do that:
>>
>> local_branch = ...
>> foreign_branch = ...
>> local_revno = local_branch.revno()
>> foreign_revno = foreign_branch.revno()
>> target_revno = max(local_revno + $N, foreign_revno)
>> target_revid = foreign_branch.get_revid(target_revno)
>> local_branch.pull(foreign_branch, stop_revision=target_revid)
>> if target_revno == foreign_revno:
>> return SUCCESS
>> else:
>> return PARTIAL_SUCCESS
>
>> What I don't know is if this will be very efficient at all; does
>> get_revid() on a mercurial or svn or git branch perform acceptably?
> bzr-svn branches have this call and it's quite cheap, but it can be very
> expensive for bzr-git and bzr-hg branches because we need to fetch all
> data before we can lookup the revno. At the moment, we don't cache the
> fetched data anywhere so we end up fetching it twice - once to lookup
> the revid and once to actually import it.
Right, that's what I was afraid of.
>> It's also a bit lame in that it would be better to only import $N
>> _revisions_ at a time, not mainline revisions. But I don't know how to
>> do that. The above sketch might be good enough in any case.
> The plugins should (with a trivial amount of work) be able to support an
> optional argument to only convert approximately X revisions. I think
> this is probably a simpler and faster solution than using get_revid(),
> and it will also allow us to only import only X real revisions rather
> than just X mainline revisions.
That would be great. When can this be done by? :-)
>> The other thing that should be done is changing our bzr-git importer to
>> preserve the git pack files between partial imports, by changing bzr-git
>> to put them in a predictable location and then doing some work in the
>> importer to preserve them. I think I'd rather Jelmer look at this part,
>> or at least provide me with very detailed instructions ...
> Is this a requirement before the incremental imports?
It's not strictly a requirement, but it means that for the kernel, we'll
transfer 55000 revisions for the first partial import, then 54000 for
the second then 53000, .... totaling to rather a lot.
Tim thinks this is more important than me, it seems.
Cheers,
mwh
References