launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #04476
Re: DB replication lag events
У сре, 25. 08 2010. у 09:09 +0700, Stuart Bishop пише:
> Lag is caused by updates happening faster than we can replicate them,
> so seeing high write activity is a symptom. Using
> utilities/report-database-stats.py tells me that just before the high
> lag, we had a spike in pofiletranslator write activity, probably from
> the pofilestats_daily or maybe poimport database user.
pofilestats_daily doesn't touch pofiletranslator (and it's actually very
slow, and should be disabled by now because of the DBLoopTuner problem).
So, it's likely poimport that does a lot more writes (including writes
to pofiletranslator). However, it's also does a single file import in a
single transaction, so it shouldn't be too many changes. We do have
some files which are huge, though (ddtp-ubuntu have around 40k messages:
everything else is less than 10k messages, with only a small number of
them not less than 2k).
> postgres@wildcherry:~/launchpad/utilities$ ./report-database-stats.py
> --from='2010-08-24 01:00' --until='2010-08-24 01:45'
>
> [...]
>
> == Most Written Tables ==
>
> sl_log_1 || 130.31 tuples/sec
> sl_seqlog || 124.54 tuples/sec
> pofiletranslator || 75.73 tuples/sec
> oauthnonce || 43.48 tuples/sec
> translationmessage || 15.36 tuples/sec
> pofile || 6.18 tuples/sec
> sl_confirm || 6.08 tuples/sec
> libraryfilealias || 4.45 tuples/sec
> libraryfiledownloadcount || 3.94 tuples/sec
> branchrevision || 3.36 tuples/sec
> karma || 3.14 tuples/sec
> potranslation || 3.01 tuples/sec
> bugnotificationrecipient || 2.95 tuples/sec
> revisioncache || 2.06 tuples/sec
> databasereplicationlag || 1.97 tuples/sec
>
> == Most Active Users ==
>
> lpnet || 83.80% CPU
> xmlrpc || 76.46% CPU
> edge || 65.79% CPU
> pofilestats_daily || 52.90% CPU
> lucille || 46.64% CPU
> poimport || 32.97% CPU
> slony || 30.27% CPU
> postgres || 18.50% CPU
> fiera || 13.35% CPU
> translations_import_queue_gardener || 8.76% CPU
> distributionmirror || 6.70% CPU
> checkwatches || 4.63% CPU
> generateppahtaccess || 4.45% CPU
> lagmon || 1.40% CPU
> uploader || 0.51% CPU
>
> I'm not sure what this script is doing. Perhaps it is committing all
> its changes in a single transaction? Perhaps it is sometimes touching
> far more rows than expected?
I guess poimport is a candidate for using DBLoopTuner when it's fixed
not to look at the cluster lag. Also, if that is the case, we'd have to
discuss how important this is.
Note that we currently get spikes of how many files we've got to do due
to Ubuntu[1]. The nature of our implementation makes it not run
as-fast-as-possible (script run is limited to 9 mins, with a pause
between runs). Also, it's already heavily optimized to do as little
writes as possible on translationmessage table, but not on the
pofiletranslator table. pofiletranslator is maintained by a trigger
(which is overly complex and could probably be optimized).
Cheers,
Danilo
[1]https://lpstats.canonical.com/graphs/TranslationImportsImported/20090826/20100826/
Follow ups
References