launchpad-dev team mailing list archive

Thread
Date
Re: First cut at recipe db-schema patch

To: Julian Edwards <julian.edwards@xxxxxxxxxxxxx>
From: Michael Hudson <michael.hudson@xxxxxxxxxxxxx>
Date: Thu, 03 Dec 2009 15:19:54 +1300
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>, James Westby <james.westby@xxxxxxxxxxxxx>
In-reply-to: <200912021806.03816.julian.edwards@canonical.com>
User-agent: Thunderbird 2.0.0.23 (X11/20090817)
Julian Edwards wrote:
> On Wednesday 02 December 2009 03:50:50 Michael Hudson wrote:
>>>  * Should SourcePackageRecipeData have an owner or created_by?
>> I don't think so.  SourcePackageRecipeData is really just a shared
>> implementation detail of SourcePackageBuild and SourcePackageRecipe, and
>> both of those should record all the interesting stuff.
> 
> Ok, so the same recipe data can be re-used in multiple recipes.  Makes sense.

No... I see that I've not explained how I envision this bit working very
well at all.

Quoting from my mail of a few weeks ago in the "Immediate plan for Build
Farm generic jobs" thread
(http://www.mail-archive.com/launchpad-dev@xxxxxxxxxxxxxxxxxxx/msg01594.html):

> Let's start with a recipe.  In the abstract, a recipe is roughly a thing
> that specifies how to combine branches into something that debuild could
> turn into a source package -- a "debianized source tree".  Recipes today
> are specified as text files that the 'bzr-builder' plugin parses and
> acts on.  An obvious open question in choosing how to represent these is
> either to store them in a parsed form, with all the structure of a
> recipe in the schema or unparsed, as lumps of text.
> 
> Or, and there are actually reasons for doing this, we could do something
> in between: store it mostly as text but replace the references to
> branches in the text with references to database objects (probably the
> id of entries in some RecipeBranch linking table).  This would let us
> (a) check that the branches exist at parsing time (b) keep the
> references up to date if the branch is moved or renamed (c) prevent
> branches that are referenced in Recipes from being deleted.

The only thing that has really changed about this is that I realized a
manifest needs to be stored in essentially the same way as the recipe
text.  So I should explain properly what a manifest is!

A "manifest" is part of the output of building a recipe into a
debianized source tree.  Syntactically speaking, it's a recipe, but it's
a "frozen" recipe where all the references it includes specify precisely
which revision of which branch was used in the build, so if you build
from that recipe again you will[1] get the same debianized source tree
and thus same source package again.

The name and idea of a "manifest" are both not my fault :-)

[1] Modulo the "run" command being used to do evil things, I guess.

So instead of the pseudo-schema in the mail linked above, where the
SourcePackageRecipe table had a text column and there was a table for
linking this text with the branches it references, I have
SourcePackageRecipeData and SourcePackageRecipeDataBranch tables which
represent the text of a recipe, and link to these from the
SourcePackageRecipe and SourcePackageBuild branches.

As I start to write the model code, I'm thinking that all this will be
hidden from the Python level -- ISourcePackageRecipe will probably have
a 'recipe' field that gives you the recipe as the user will see it.

I hope this makes sense now, and I apologize for not being clear on this
before.

> So will some code attempt to factor the recipe data as much as possible into 
> as few rows as possible when the recipes are created?

No..., see above.

>>>  * Why is SourcePackageRecipe a separate table to
>>> SourcePackageRecipeData? Since one references the other, the duplicated
>>> columns are not needed are they?  In fact the only additional column is
>>> the recipe text. Maybe I don't understand what you're trying to do with
>>> this?
>> Ah right, I meant to remove those duplicate columns from SPRecipeData.
>> Essentially the reason they're separate is because SPRecipes have access
>> control and a location in the UI, but SPRecipeDatas don't as many of
>> them will be read-only versions referenced from SPBuild rows.
> 
> OK.
> 
>>>  * SourcePackageRecipeDataBranch isn't really needed yet, we're not
>>> creating branches for daily builds.  It's not a problem adding it though
>>> I guess.
>> It's meant to be there to record the links between the recipe and the
>> branches it references.  I guess I don't understand what you mean.
> 
> My understanding was that we're not going to be creating branches at all, for 
> daily builds.  The package file tree will be built from the recipe and then 
> uploaded to Soyuz.
> 
> This table will be needed when we start building branches from recipes though!

I hope this part makes sense by now.

>>>   * I don't think you need "archive", this is not pertinent until the
>>> package is actually uploaded.  At least, I think we can share these
>>> across archives. Can anyone think of a reason why we would not do that?
>> Well, somehow we need to record the archive(s?) to which the resulting
>> source pacakge will be built.  Maybe not in this table though, indeed.
>> More on this below.
> 
> OK
> 
>>>   * I see you have a column called "manifest" which references
>>> SourcePackageRecipeData.  Now I'm really confused about what that is!  I
>>> thought recipes and manifests were separate things?  If it needs to be
>>> separate why is it not SourcePackageRecipeManifest?
>> I hope this makes sense by now?  If we just stored recipe/manifests as
>> text, both SPRecipe and SPBuild would just have a text column, but we're
>> planning on being a bit more sophisticated than that.
> 
> It sorta does, but I don't get the name "manifest".

See above.

> Also, why is it
> referencing there as opposed to SourcePackageRecipe?

I hope this is clear now -- it's an output of the build, like the build log.

> Sorry if I am being thick!

The name is very far from obviously indicating what it is, so no, I
don't think so...

>>>  * BuildSourcePackageFromRecipeJob - this might as well contain
>>> everything in SourcePackageBuild unless you have a good reason for
>>> separating them?
>> Why would it make sense to duplicate the columns across the tables?
> 
> I meant that we should move them, not duplicate them.
> 
>> I thought we went over why BuildSourcePackageFromRecipeJob and SPBuild
>> are separate last week.
> 
> Maybe.

XXX

>>> I'm not sure, and replied to the other emails about this separately.
>> At some point I guess we need to just decide.
> 
> Or, we make jml decide :)

Works for me! <wink>

>>> Having said that, I would still vastly prefer a separate table linking
>> I object slightly to the 'still' -- it feels like I've been trying to
>> get a straight answer on this point for a while!  But thanks for
>> providing one (even if I still have the urge to avoid this bit of
>> complexity for now).
> 
> I've never advocated anything other than to put it on a separate table, so I'm 
> not sure why you're objecting.

Well, I though we could avoid this for now, and asked about that.  But
never mind!

>>> SourcePackageBuild and Archives.
>> It should be easy enough to do this.  I guess the upload_log would live
>> on the SPBuildArchive link?
> 
> It should be on the SourcePackageBuildUpload as you have done in the patch.  
> Which is nice :)
> 
> BTW should we s/SourcePackageBuild/SourcePackageRecipeBuild/ everywhere?  In 
> future we'll be building from a branch as well.

I think you should talk to jml about this, because *I* thought we
weren't planning to build from a branch.  I'm mainly suggesting that you
talk to jml because he's in your timezone for now but also because there
are elements of strategy here.

>>> This allows the possibility of re-uploading
>>> the same source recipe build if we want.  I think that will be important
>>> for 2 reasons:
>>>  1. re-creating upload issues
>>>  2. re-creating genuine package bugs in a different PPA environment
>> I guess I'm not sure about some of the mechanics of this.  In the
>> standard way things will work, the buildd-manager will grab the built
>> package and upload them for building into any archives that are
>> specified at that time.  How will the upload happen for one of these
>> after-the-fact requests?
> 
> I envisage a UI that allows a rebuild/reupload of a package in the same way 
> that we can retry builds right now if there was some sort of intermittent 
> failure.
> 
>> This doesn't really affect the schema though... I guess we need some way
>> of noting on the SPBuildArchive link that the upload it represents has
>> happened.
> 
> Yes, we do.  Right now we track this via Build.buildstate.  Each "xxxBuild" 
> should have similar thing.  It would be nice to part-refactor this into some 
> other table but I think it's too much hassle to change the Soyuz code for this 
> versus the benefit.

I guess for SPBuild we won't use that buildstate value then.

> Also one thing we lack with the rebuilds for Builds is some audit trail.  This 
> new schema will allow that for recipe builds and will be really useful for 
> tracking down problems.

Currently in my patch, SourcePackageBuildUpload doesn't have a
requester, should it?  I guess maybe it should, 'id, date_created,
registrant' are the standard fields after all...

>>> Talking of which, I noticed that this patch doesn't have anything for
>>> scheduling recipe builds.  Are you doing that separately?
>> I am consciously not thinking about that at the moment.  I guess it will
>> involve a few more fields on SPRecipe.
> 
> I think a separate table would be better, e.g. SourcePackageRecipeSchedule.  
> That's easy to leave out and do later.

Hooray for not worrying about things now.

>>> I would expect Job rows to live forever*.  At least, we've refactored the
>>> existing buildqueue stuff with that expectation.  Why would they be
>>> removed?
>> Well, because there will be really rather a lot of them if we use a Job
>> row for every branch pulled, code import updated, diff generated, daily
>> recipe built, etc etc.  I don't know if I'm being overly concerned about
>> this.
> 
> I think we need to be able to clean them up at some point, yes.
>  
>>> (* where forever == until we remove the whole set of related rows for
>>> whatever reason)
>> Do you ever delete Build rows today?
> 
> Nope - it would destroy the link between source and binary.

So there is a difference of scale here.  Build currently has 1.3 million
rows.  If we used Job rows for scheduling the puller and code imports,
just those jobs would mean about 20000 Job rows per day, or about 7
million a year.  That's probably not an impossible number -- I think
branchrevision is into the 100s of millions of rows now -- but it's
quite a lot more than Build.

But I think I'd like to keep the Job row separate from the Build row.

> We could consider doing it when we start completely blowing away distroseries 
> (right now we just mark them as obsolete).
> 
>>>> I still don't know where to store it.  In the
>>>> existing model I think this sort of thing is tracked in the
>>>> SourcePackageRelease table (correct?)
>>> Sort of.  We record the source creator on the SourcePackageRelease and
>>> the source uploader on the PackageUpload.
>>>
>>> You can see why I keep banging on about why creation and uploading are
>>> orthogonal :)
>> Araragh, more bits of the Soyuz data model to chew on my brain.
>>
>> Would it be accurate to say that in a world where all builds are done
>> through branches and not dput that we would still have
>> SourcePackageReleases but not PackageUploads?  
> 
> Not really, we still need to model uploads.

But all the uploads would be from "inside" the system, all source
packages would be built by Soyuz as well.

>> But that today, the only
>> way you can get a SourcePackageRelease is by starting with a PackageUpload?
> 
> I think this will always be the case unless we want to re-write Soyuz. :)

OK.

>>>> which we don't have an analogue
>>>> for yet in the build from recipe world -- I don't know if we need one
>>>> though as SourcePackageRelease <-> Build is 1-to-many in a way that
>>>> doesn't apply to recipes.
>>> Also, if they're automated daily builds, who is the requester?
>> I guess the person who set it up?  
> 
> Or a celebrity?  I dunno, I'm just throwing it out there.
> 
>> This starts to lead me into the "how
>> similar are daily builds and the way we expect Ubuntu devs to use this"
>> sort of thinking ...
>>
>> Thanks for the comments!  (and sorry for all the typos in my first post).
> 
> Thanks for working on it!

Hopefully tomorrow I can spend longer writing code than emails!

Cheers,
mwh
Follow ups

Re: First cut at recipe db-schema patch
From: Julian Edwards, 2009-12-03
Re: First cut at recipe db-schema patch
From: Michael Hudson, 2009-12-03
References

First cut at recipe db-schema patch
From: Michael Hudson, 2009-11-25
Re: First cut at recipe db-schema patch
From: Julian Edwards, 2009-12-01
Re: First cut at recipe db-schema patch
From: Michael Hudson, 2009-12-02
Re: First cut at recipe db-schema patch
From: Julian Edwards, 2009-12-02