← Back to team overview

launchpad-dev team mailing list archive

Re: RFC: Bug import ideas


On Wed, Mar 10, 2010 at 12:12 AM, Francis J. Lacoste <francis.lacoste@xxxxxxxxxxxxx> wrote:
On March 9, 2010, Gavin Panella wrote:
On 9 March 2010 16:32, Francis J. Lacoste <francis.lacoste@xxxxxxxxxxxxx>
> On March 9, 2010, Gavin Panella wrote:
>>  * Commit only at the end.
>>    With this we can do dry runs, and not have to faff with cache
>>    files.
> We don't want long running transaction. Especially if they load a bulk of
> data. So I think this won't work.

Generally bug imports run quite quickly, on the order of a few minutes
for several hundred bugs. Is that too long? The bug importer also
creates almost all new data, rather than updates, so won't be holding
locks that are going to affect other people, I assume.

Then it might be acceptable, the final call should be made by our great DBA!
Stuart what do you think?

If it is just inserting new data rather than modifying existing rows it should be ok at the moment. You say 'almost all new data' though, which is the catch. Even if it is all new data, that doesn't mean it will be fine in the future (eg. we add an ON INSERT trigger to update some cache information). It also doesn't protect us from long running imports, which we will kill off to avoid causing database bloat (garbage cannot be cleared up in the database by VACUUM until it is older than the longest running transaction).

If the goal here is to avoid writing the cache file, I'd suggest just using another method to detect an already imported bug (eg. the bug nickname is set by the importer to allow old bug ids to map to launchpad bug ids).

The other points are valid rationales though. Perhaps we should import into temporary tables and, on success, move all the data from the temporary tables into the real ones. I'd suggest now worrying about these issues though - better validation of the import file before attempting the import would seem to be a better approach. For the database import to fail, you would need to violate database constraints or attempt to link to a non-existant row and there not that many constraints to check and I don't think there are any foreign key references that might get removed mid-run.

Stuart Bishop <stuart@xxxxxxxxxxxxxxxx>

Attachment: signature.asc
Description: OpenPGP digital signature

Follow ups