← Back to team overview

launchpad-dev team mailing list archive

Re: Archive deletion strategy

 

On Wednesday 04 August 2010 21:16:33 James Westby wrote:
> On Tue, 3 Aug 2010 11:08:03 +0100, Julian Edwards 
<julian.edwards@xxxxxxxxxxxxx> wrote:
> > I think that's pretty much it, although we need to examine exactly which
> > database rows need to be deleted and under what conditions.  I've added
> > some ON DELETE CASCADEs in the past where it's a no brainer but it won't
> > delete everything because of the multiple publication referencing
> > package files issue.
> 
> Ok, I've been through the schema and I think this is the set of tables
> chained off archive:
> 
>    # Archive -> signing_key  <- KEEP
>    #    signing_key (GPGKey) <- ?

If it's the Person's last PPA then it should be deleted (keys are shared for 
all a Person's PPAs).  We should also try and revoke the key and remove it 
from the keyserver.

>    # SPPH -> Archive and SPR <- DELETE
>    #    SPR -> Archive <- DELETE IF ONLY REFERENCED BY SPPH AND
>    #                       BUILD ATTACHED TO THIS ARCHIVE. what
>    #                       about packageuploadsource?

That's upload_archive, and this is one of the awkward FKs.  We use it to work 
out if a package was copied by comparing that field to the SPPH.archive.  If 
the original upload archive disappears we should probably NULL this column and 
fix the code so that it copes with that.

>    #        bugpackageinfestation -> SPR ???

Only main archives have bugs so it should not matter for COPY/PPA archives.

Deryck will have to fix this whenever they do PPA bugs :)

>    #        SPRF -> SPR <- CASCADE

>    # BPPH -> Archive and BPR <- DELETE

This should only delete the BPR if there's no more BPPHes referring to it. 
(Same as for SPR/SPPH)

>    #    BPR -> build (delete the build and it will cascade) <- DELETE
>    #                                       IF ONLY REFERENCED BY
>    #                                       BPPH AND BUILD ATTACHED
>    #                                       TO THIS ARCHIVE.

It's BinaryPackageBuild now.  It links the source to the binary.

>    #        BPF -> BPR <- CASCADE
>    # archivearch -> Archive <- DELETE
>    # archiveauthtoken -> Archive <- DELETE
>    # archivedependency -> Archive (twice) <- DELETE WHEN archive, ?
>    #                                         when dependency

This is a nightmare.  But we have to remove the dependency record and the 
depending PPA will (potentially) just start failing its builds.  The user 
should probably get a warning about which PPAs are depending on it before he 
commits.

>    # archivepermission -> Archive <- DELETE
>    # archivesubscriber -> Archive <- DELETE
>    # build -> Archive, SPR
>    #    buildqueue -> build

We can't delete the archive if it has outstanding builds.  That makes things 
much easier, so we can ignore BuildQueue, Job and friends.  We also have a bug 
where a builder becomes stuck if we delete its running job from LP.

The buildfarmjob and packagebuild should also be removed.

>    # distributionsourcepackagecache -> Archive <- DELETE
>    # distroseriespackagecache -> Archive <- DELETE
>    # packageupload -> Archive <- WHO KNOWS?
>    #    packageuploadcustom -> packageupload
>    #    packageuploadbuild -> packageupload, build
>    #    packageuploadsource -> packageupload, SPR

All of these should be deleted, yes.

> Firstly we have to keep archive, as you can do a build targetting your
> PPA, and then copy that to another PPA. 

I've talked about that above.

> We won't delete those builds, so
> we either have to remove the NOT NULL on the build->archive reference,
> or keep the Archive around. Either way we need to test what happens if
> you go to a build page where the build references a deleted archive.

If someone has copied with binaries we have to break this link and make the 
code deal with the NULL.  I can't even remember why archive is needed on that 
table.

> 
> Archive references signing key, we could clean this up, or leave it
> around such that a resurrected archive would get the same key.

I say delete it, as above.

> 
> I'll skip to some easy ones.
> 
> archivearch can be deleted, we don't need to record which architectures
> the archive supports. archiveauthtoken, archivepermission and
> archivesubscriber can all be deleted.
> 
> distributionsourcepackagecace and distroseriespackagecache just
> cache data for the archive, so they can be deleted.
> 
> We want to remove all PublishingHistory records that reference the
> archive, as they are Archive specific. These reference SPR and BPR
> which can be referred to from more than one PublishingHistory, and also
> from multiple builds. Therefore we would check if all the references
> were ones we were about to delete, and if so remove them too.

Right.

> 
> I'm not sure whether it would cause problems to delete buildqueue
> entries for the builds, or whether they should be left dangling. This is
> not a particularly easy thing to test either.

As I said, let's prevent deletion on archives with pending builds.

I will do another fix in the near-ish future that allows people to kill their 
pending builds.

> Each of SPR and BPR have file tables associated with them, and they
> should be set to CASCADE, which will clean them up for
> us. bugpackageinfestation also references SPR: I'm not sure what it is,
> but my guess is that CASCADE is appropriate there too.
> 
> I'm not sure what to do with packageupload, but my guess is that
> deleting them all is appropriate.

Yep, blow 'em away.

> archivedependency is a little tricky as it has two FKs to archive. Where
> the dependency is specifying what the archive being deleted depends on
> it should be deleted, but what about where the archive is depended on by
> another? Given that we are removing all the packages anyway, I think it
> makes sense to delete them too, but it may lead to some surprising
> behaviour for people.

Indeed, we need to generate warnings at least for the deleter, but preferably 
also on the PPA page that was depending on the deleted archive.

> I'm still not entirely sure what the logic is for detecting which
> PackageReleases and Builds can be deleted, but that's what tests are
> for, right?

:)

We need to reference count active publications, and that should be enough.

> Any suggestions are welcome.

I hope that helped.

Cheers.



Follow ups

References