← Back to team overview

launchpad-dev team mailing list archive

Re: Sorting bugs/bugtasks by patch age

 

On 29.01.2010 10:54, Jonathan Lange wrote:
> Sorry, I had meant to reply to all.
> 
> On Fri, Jan 29, 2010 at 9:53 AM, Jonathan Lange <jml@xxxxxxxxxxxxx> wrote:
>> On Thu, Jan 28, 2010 at 7:30 PM, Abel Deuring
>> <abel.deuring@xxxxxxxxxxxxx> wrote:
>>> Hi Stuart, hi Jono,
>>>
>> Hello Abel,
>>
>>> Tom and myself started to work on sorting bugs/bugtasks by the age of
>>> the youngest patch attachment (see
>>> https://bugs.edge.launchpad.net/malone/+bug/506018 ,
>>> https://bugs.edge.launchpad.net/malone/+bug/512500 ). Basically, this
>>> means to add an option "sort by creation time of youngest patch
>>> attachment" to IHasBugs.searchTasks().
>>>
>>> We are a bit concerned that a regulary query, where we sort by the
>>> youngest patch via something like
>>>
>>>    ORDER BY (SELECT max(message.datecreated)
>>>                FROM message, bugattachment
>>>                WHERE bugattachment.message=message.id AND
>>>                    bugattachment.bug=<bug.id from main query> AND
>>>                    bugattachment.type=1)
>>>
>>> in a query like "SELECT BugTask.whatever FROM BugTask,... WHERE...",
>>> might easily lead to timeouts when this sorting is used for Ubuntu
>>> bugs... (Tom wanted to write such a query for a test on staging, so we
>>> might soon have some data.)
>>>
>> Have you got data yet?

yes. This is the query: https://pastebin.canonical.com/27199/ It is a
query that is generated for an lplib call like

  ubuntu.searchTasks(has_patch=True)

but with a modified ORDER BY expression, basically the above one.

mthaddon ran an EXPLAIN ANALYSE for it on staging. The result:
https://pastebin.canonical.com/27202/ . 9 minutes run time; as I
understand it, most time is spent for sorting...

But the query might not be the best one can think of...

>>
>>> So we also considered to add a column date_created_youngest_patch to the
>>> table Bug, where this column would be updated only by triggers for
>>> changes of the table BugAttachment, like so:
>>>
>> ...
>>> In other words, a deliberate denormalisation. What do you think, should
>>> we start with a plain query without such a denormalsation, or do you
>>> think that it is better to add the "cache column"
>>> date_created_youngest_patch to the Bug table?
>>>
>> This is really Stuart's province. I don't have a strong opinion on this.
>>
>> However, it seems to me that we're getting increasingly frequent
>> requests to denormalize data in order to make querying faster. Perhaps
>> we need a general answer, or some other technology.

Perhaps, but finding a general answer is not trivial, I think. "Think
harder to find a better query" is one answer, but won't give useful
results in each case; "We don't really need such a query" is another
(though Bryce had good reasons to ask for the sort order we're
discussing); "Let's accept denormalisation after thoughtful
consideration if we find no better way" seems most reasonable to me.

Regarding other technology, I'm quite curious about suggestions.

Abel



Follow ups

References