Thread Previous • Date Previous • Date Next • Thread Next |
On 2010-08-25 18:32, Danilo Šegan wrote:
So, it's likely poimport that does a lot more writes (including writes to pofiletranslator). However, it's also does a single file import in a single transaction, so it shouldn't be too many changes. We do have some files which are huge, though (ddtp-ubuntu have around 40k messages: everything else is less than 10k messages, with only a small number of them not less than 2k).
We're also duplicating each POFileTranslator update across all sharing POFiles nowadays. Which means that a single TranslationMessage update can be multiplied by the number of templates that share it. I think for Ubuntu right now, that can be as many as 8.
Note that we currently get spikes of how many files we've got to do due to Ubuntu[1]. The nature of our implementation makes it not run as-fast-as-possible (script run is limited to 9 mins, with a pause between runs). Also, it's already heavily optimized to do as little writes as possible on translationmessage table, but not on the pofiletranslator table. pofiletranslator is maintained by a trigger (which is overly complex and could probably be optimized).
Say... I think we should do that, but until then, what about this short-term fix?
All the POFileTranslator records that the trigger inserts/updates during an import should be identical except in which TranslationMessage they refer to. And which TranslationMessage they refer to is actually pretty arbitrary--AFAIC it doesn't _have_ to be the last updated one in the file. It could just as validly be the first updated one in the file.
We could change the trigger: give the UPDATE on POFileTranslator an extra WHERE condition that says "date_last_touched <> now()."
Bailing out of the function just because the UPDATE hit something is no longer an option. But looking at the trigger now, I see that that's a bug anyway. We can't safely do that in the message-sharing model at all. Instead of ignoring unique violations, we now need a WHERE condition that avoids duplicates. And a "return NULL" if no rows are inserted.
Frankly I'm not even sure how we break out of that loop at all in some cases. I must be missing something--it'd be ludicrous to think that we might be repeating the same UPDATE indefinitely. It'd explain some of the problems we see now, but it wouldn't explain why things seem to be working normally otherwise.
Jeroen
Thread Previous • Date Previous • Date Next • Thread Next |