← Back to team overview

launchpad-dev team mailing list archive

Re: Archive deletion strategy

 

On Tue, 3 Aug 2010 11:08:03 +0100, Julian Edwards <julian.edwards@xxxxxxxxxxxxx> wrote:
> I think that's pretty much it, although we need to examine exactly which 
> database rows need to be deleted and under what conditions.  I've added some 
> ON DELETE CASCADEs in the past where it's a no brainer but it won't delete 
> everything because of the multiple publication referencing package files 
> issue.

Ok, I've been through the schema and I think this is the set of tables
chained off archive:

   # Archive -> signing_key  <- KEEP
   #    signing_key (GPGKey) <- ?
   # SPPH -> Archive and SPR <- DELETE
   #    SPR -> Archive <- DELETE IF ONLY REFERENCED BY SPPH AND
   #                       BUILD ATTACHED TO THIS ARCHIVE. what
   #                       about packageuploadsource?
   #        bugpackageinfestation -> SPR ???
   #        SPRF -> SPR <- CASCADE
   # BPPH -> Archive and BPR <- DELETE
   #    BPR -> build (delete the build and it will cascade) <- DELETE
   #                                       IF ONLY REFERENCED BY
   #                                       BPPH AND BUILD ATTACHED
   #                                       TO THIS ARCHIVE.              
   #        BPF -> BPR <- CASCADE                                        
   # archivearch -> Archive <- DELETE                                    
   # archiveauthtoken -> Archive <- DELETE                               
   # archivedependency -> Archive (twice) <- DELETE WHEN archive, ?      
   #                                         when dependency             
   # archivepermission -> Archive <- DELETE                              
   # archivesubscriber -> Archive <- DELETE                              
   # build -> Archive, SPR
   #    buildqueue -> build                       
   # distributionsourcepackagecache -> Archive <- DELETE                 
   # distroseriespackagecache -> Archive <- DELETE                       
   # packageupload -> Archive <- WHO KNOWS?                              
   #    packageuploadcustom -> packageupload                             
   #    packageuploadbuild -> packageupload, build
   #    packageuploadsource -> packageupload, SPR   

Firstly we have to keep archive, as you can do a build targetting your
PPA, and then copy that to another PPA. We won't delete those builds, so
we either have to remove the NOT NULL on the build->archive reference,
or keep the Archive around. Either way we need to test what happens if
you go to a build page where the build references a deleted archive.

Archive references signing key, we could clean this up, or leave it
around such that a resurrected archive would get the same key.

I'll skip to some easy ones.

archivearch can be deleted, we don't need to record which architectures
the archive supports. archiveauthtoken, archivepermission and
archivesubscriber can all be deleted.

distributionsourcepackagecace and distroseriespackagecache just
cache data for the archive, so they can be deleted.

We want to remove all PublishingHistory records that reference the
archive, as they are Archive specific. These reference SPR and BPR
which can be referred to from more than one PublishingHistory, and also
from multiple builds. Therefore we would check if all the references
were ones we were about to delete, and if so remove them too.

I'm not sure whether it would cause problems to delete buildqueue
entries for the builds, or whether they should be left dangling. This is
not a particularly easy thing to test either.

Each of SPR and BPR have file tables associated with them, and they
should be set to CASCADE, which will clean them up for
us. bugpackageinfestation also references SPR: I'm not sure what it is,
but my guess is that CASCADE is appropriate there too.

I'm not sure what to do with packageupload, but my guess is that
deleting them all is appropriate.

archivedependency is a little tricky as it has two FKs to archive. Where
the dependency is specifying what the archive being deleted depends on
it should be deleted, but what about where the archive is depended on by
another? Given that we are removing all the packages anyway, I think it
makes sense to delete them too, but it may lead to some surprising
behaviour for people.

I'm still not entirely sure what the logic is for detecting which
PackageReleases and Builds can be deleted, but that's what tests are
for, right?

Any suggestions are welcome.

Thanks,

James



Follow ups

References