launchpad-dev team mailing list archive

Thread
Date

Re: DB replication lag events

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Jeroen Vermeulen <jtv@xxxxxxxxxxxxx>
Date: Thu, 26 Aug 2010 02:07:20 +0700
In-reply-to: <1282735963.2406.20.camel@bljubuntu>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6

On 2010-08-25 18:32, Danilo Šegan wrote:

So, it's likely poimport that does a lot more writes (including writes
to pofiletranslator).  However, it's also does a single file import in a
single transaction, so it shouldn't be too many changes.  We do have
some files which are huge, though (ddtp-ubuntu have around 40k messages:
everything else is less than 10k messages, with only a small number of
them not less than 2k).

We're also duplicating each POFileTranslator update across all sharingPOFiles nowadays. Which means that a single TranslationMessage updatecan be multiplied by the number of templates that share it. I think forUbuntu right now, that can be as many as 8.

Note that we currently get spikes of how many files we've got to do due
to Ubuntu[1].  The nature of our implementation makes it not run
as-fast-as-possible (script run is limited to 9 mins, with a pause
between runs).  Also, it's already heavily optimized to do as little
writes as possible on translationmessage table, but not on the
pofiletranslator table.  pofiletranslator is maintained by a trigger
(which is overly complex and could probably be optimized).

Say... I think we should do that, but until then, what about thisshort-term fix?

All the POFileTranslator records that the trigger inserts/updates duringan import should be identical except in which TranslationMessage theyrefer to. And which TranslationMessage they refer to is actually prettyarbitrary--AFAIC it doesn't _have_ to be the last updated one in thefile. It could just as validly be the first updated one in the file.

We could change the trigger: give the UPDATE on POFileTranslator anextra WHERE condition that says "date_last_touched <> now()."

Bailing out of the function just because the UPDATE hit something is nolonger an option. But looking at the trigger now, I see that that's abug anyway. We can't safely do that in the message-sharing model atall. Instead of ignoring unique violations, we now need a WHEREcondition that avoids duplicates. And a "return NULL" if no rows areinserted.

Frankly I'm not even sure how we break out of that loop at all in somecases. I must be missing something--it'd be ludicrous to think that wemight be repeating the same UPDATE indefinitely. It'd explain some ofthe problems we see now, but it wouldn't explain why things seem to beworking normally otherwise.



Jeroen

Follow ups

Re: DB replication lag events
From: Danilo Šegan, 2010-08-30

References

Re: DB replication lag events
From: Stuart Bishop, 2010-08-25
Re: DB replication lag events
From: Danilo Šegan, 2010-08-25