← Back to team overview

launchpad-reviewers team mailing list archive

Re: [Merge] ~pappacena/launchpad:comment-editing-db-patch into launchpad:db-devel

 

Review: Needs Information



Diff comments:

> diff --git a/database/schema/patch-2210-31-0.sql b/database/schema/patch-2210-31-0.sql
> new file mode 100644
> index 0000000..9bcceb5
> --- /dev/null
> +++ b/database/schema/patch-2210-31-0.sql
> @@ -0,0 +1,21 @@
> +-- Copyright 2021 Canonical Ltd.  This software is licensed under the
> +-- GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +SET client_min_messages=ERROR;
> +
> +ALTER TABLE Message
> +    ADD COLUMN date_deleted timestamp without time zone,
> +    ADD COLUMN date_last_edit timestamp without time zone;

`date_last_edited`, I think.

> +
> +CREATE TABLE MessageRevision (
> +    id serial PRIMARY KEY,
> +    message integer NOT NULL REFERENCES Message,
> +    content text,

This feels more analogous to `MessageChunk` than to `Message` as such.  I expect you were mostly testing with the common case of single-chunk plain-text messages, as with comments added via the web UI, but it gets more complicated when we consider messages that arrive by email, and I think that will probably require at least some up-front consideration in the data model even if we only permit simple kinds of editing.  What happens if somebody sends an email to a bug with some text and an attached image (or for that matter an OpenPGP signature, for a different kind of problem) and then tries to edit the text part?  Or what if there are multiple text parts that get combined into a single string for display by `Message.text_contents`?

Looking at `Message.editContent` in https://code.launchpad.net/~pappacena/launchpad/+git/launchpad/+merge/401894, it looks as though you remove the old `MessageChunk` rows, create a new `MessageChunk` row with the new content and with sequence 1, and create a new `MessageRevision` with just the text content of the old message.  This all feels quite lossy and I wonder if it's possible to do better; I'm particularly unsure about the need to actually delete the old `MessageChunk` rows when making an edit despite the way this design generally tries to keep old revisions around, and it seems to me that that suggests a design error somewhere.

How would it work if we went all-in on the `Message` vs. `MessageChunk` separation, and dropped the separate `MessageRevision` table?  We could add a `revision` column to both `Message` and `MessageChunk` (and I guess also `BugMessage` etc.), adjust constraints to permit multiple revisions as needed, and add `date_deleted` columns where needed.  Editing a message would create new rows of both with a higher revision, either copying or discarding non-text parts as needed (but under the control of the application, rather than being entrenched in the data model).  We'd need to update trigger functions to keep message counts accurate, and message queries would need to do something like using window functions to select the highest-revision message at each index (or indeed message chunk at each sequence number).

Overall, this approach would require more changes when reading messages, but on the other hand it would be much more obvious that the data model is future-proof, because old message revisions would use the exact same data model as current revisions rather than a completely separate table.  What do you think?

> +    date_created timestamp without time zone,
> +    date_deleted timestamp without time zone
> +);
> +
> +CREATE UNIQUE INDEX messagerevision__message__date_created__key
> +    ON MessageRevision(message, date_created);
> +
> +INSERT INTO LaunchpadDatabaseRevision VALUES (2210, 31, 0);


-- 
https://code.launchpad.net/~pappacena/launchpad/+git/launchpad/+merge/401976
Your team Launchpad code reviewers is subscribed to branch ~pappacena/launchpad:comment-editing-db-patch.


References