← Back to team overview

launchpad-dev team mailing list archive

Re: The future of downtime for rollouts?

 

Hi Rob,

Sorry but I have to focus on this one.  It's seems there's a persistent
perception that Rosetta uses tables that are somehow out-of-this-world
compared to the rest of the LP, but this simply isn't true anymore.  For
~2 years at least.  How do people get this perception?  (I am wondering,
because I want to change that perception: around 6 months ago, I heard a
similar complaint from JamesT.)

У сре, 15. 09 2010. у 11:10 +1200, Robert Collins пише:

> AIUI the big ones today are BranchRevision (An optimisation problem,
> we can address it transparently) and various rosetta tables.

You are being very broad here.  What is big, and in that context, what
are "various rosetta tables"?  We only have two which are an order of
magnitude larger than any others in Translations (TranslationMessage,
~68M rows and POTranslation, ~28M rows; next one is ~4.5M rows).
Considering Lucid has something like 17M different translations, and we
grow roughly 4M rows in the biggest table every Ubuntu release, I'd say
we are not growing too fast (and need not be worried about removing data
because duplication is very low, and old translations are usually
useful: around 49M of these are actively used in one of Ubuntu
releases).  Others are either "active" suggestions or really obsolete
translations that we could get rid of, but the latter amount to a
relatively small percentage.

BranchRevision table is another order of magnitude bigger than the
biggest Rosetta table at 600M rows.  There're also
BugNotificationRecipientArchive (~58M) and HWSubmissionDevice (~97M).
Honorable mention for LibraryFileDownloadCount and Karma (both ~32M
rows).  So, "various malone/rosetta/registry tables" would be more
appropriate qualification.  Which basically includes almost all of LP,
so rosetta should not be brought up at all :)

Anyway, instead of guessing, check out on staging:

  SELECT relname, relpages, reltuples FROM pg_class where reltuples >
1000000 AND reltype!=0 ORDER BY reltuples DESC;

On staging, out of top 16 tables (>10M rows), 2 are Rosetta tables, and
out of top 53 (>1M), 6 are Rosetta.  So, why do people still feel
Rosetta has huge tables (at least compared to other parts of LP: I am
not saying ~60M rows is not big :)?

If we want to cut down on disk usage (and even improve speed),
'relpages' might be a more interesting metric, and the picture changes
there slightly.  Other than indexes (which we are going to be cutting
down on in Rosetta anytime we get around to removing some unused
columns), biggest tables end up being those with fattest rows
(MessageChunk and BinaryPackageRelease jump closer to the top).

More useful tools to find biggest tables on-disk:

  select pg_total_relation_size('tablename'); -- indexes included
  select pg_relation_size('tablename');

Cheers,
Danilo





Follow ups

References