← Back to team overview

launchpad-dev team mailing list archive

Re: performance tuesday - slow query analysis

 

On Thu, Jan 27, 2011 at 5:37 AM, John Arbash Meinel
<john@xxxxxxxxxxxxxxxxx> wrote:
> To summarize (so I'm sure I understood what your point is):
>
>  Function cost is not clearly identified in EXPLAIN [ANALYZE] queries,
>  so be on the lookout for them.

Yes.

>  If they are the overhead, you can probably look into caching the
>  result. (even generating it as an ON INSERT trigger?)

Triggers can easily generate unnecessary queries back into the
appserver as storm tries to keep things coherent; certainly we could
do a trigger (needs to be after UPDATE too).

>  Going a step further and improving the query selectivity by
>  specialized indexes can be difficult, especially when that index
>  would need to apply across tables.

Its orthogonal: if the thing can be calculated from index (with the
index containing the result of the function call), then caching the
result of the function call may be unnecessary. Range constraints
would incur a function call per range endpoint, but that may be
cheaper than the overhead of a fatter row due to the caching.


>>    -> Sort (cost=361303.95..361444.57 rows=56248 width=163) (actual
>> time=12539.223..12539.243 rows=76 loops=1)
>>          Sort Key: binarypackagename.name,
>> (debversion_sort_key(binarypackagerelease.version)),
>> binarypackagepublishinghistory.id
>>          Sort Method: top-N heapsort Memory: 46kB
>>          -> Hash Join (cost=5140.97..359265.54 rows=56248 width=163)
>> (actual time=228.492..12402.625 rows=55469 loops=1)
>
>
> ^- What is really confusing (for me) here is that the Sort Key is listed
> as part of the outer Sort call, which is actually quite cheap (12539.243
> - - 12539.223 = 0.02ms), but the Hash Join is where the function is
> actually getting called. I also really wonder why the outer Sort claims
> it is going to need to look at 56,248 rows, but the actual rows is only
> 76. That seems like a really bad estimate.

Its entirely accurate: it sorts 56248 rows from the intermediary table
and then applies the LIMIT 76 to grab the first 76 rows. Note that its
using a heapsort which means it can stop (sorting) once it has the
first 76 rows, but it needs the function value to be able to build the
initial heap.

> I can sort of see how 'this key is applied to this method', but at least
> in my head the costly key stuff would be underneath the method.
> (Thinking like lsprof layout.)

Yeah, analyze isn't a profile :(.

> If Postgresql just assigned how much time it spent calling the Sort Key,
> this probably would have been easier to notice, but I don't know what
> that would take for them to implement.

>> There are three reasons it might not be visible: it might be that we
>> can determine the order from an index, which saves evaluation of the
>> entire result set (queries that can do this are generally extremely
>> fast and efficient); or it may be that its not a significant time, or
>> its just unclear.
>>
>> In this case the latter applies: the function call is actually the
>> dominating factor for this query. Here's how to break it down.
>
> I'm a bit confused by your comments here. I think the reason the sort
> key isn't visible as a time consumer is because Postgres doesn't assign
> time to the actual function. It only sees from the outer position. I
> started doing a top-N heapsort at this time, and finished it at this
> time. (Which happened to be using this function as the sort key.)

Ah, so lets break down what happens a little more.
We can work from bottom up to get a time-ordered sequence of operations:
                    -> Seq Scan on binarypackagename
(cost=0.00..2470.03 rows=141503 width=29) (actual time=0.014..73.582
rows=141503 loops=1)

This pulls all the binarypackage names out (140K)

              -> Hash (cost=2470.03..2470.03 rows=141503 width=29)
(actual time=208.698..208.698 rows=141503 loops=1)

And then the join condition gets hashed so they can be used later

Next up is a loop, so each part of it is invoked with a containing
constraint. The output of the loop is a intermediary table - its part
of the calculation but it can't be referred to later.

              -> Nested Loop (cost=902.15..339628.83 rows=56248
width=142) (actual time=19.106..716.124 rows=55469 loops=1)

So, once through, and dealing with 55469 different constraints

                    -> Bitmap Heap Scan on
binarypackagepublishinghistory (cost=902.15..96928.16 rows=56248
width=117) (actual time=19.075..94.576 rows=55469 loops=1)
                          Recheck Cond: (archive = 2034)
                          -> Bitmap Index Scan on
securebinarypackagepublishinghistory__archive__status__idx
(cost=0.00..888.09 rows=56248 width=0) (actual time=15.623..15.623
rows=55469 loops=1)
                                Index Cond: (archive = 2034)

The above bit walks the index to pull out all bpph rows with archive 2034

                    -> Index Scan using binarypackage_pkey on
binarypackagerelease (cost=0.00..4.30 rows=1 width=29) (actual
time=0.008..0.009 rows=1 loops=55469)
                          Index Cond: (binarypackagerelease.id =
binarypackagepublishinghistory.binarypackagerelease)

and we pull out a row from binarypackagerelease for each bpph row. The
time marker here is a little different to the outer ones, I*think* its
relative in this case to the actions within that loop.

So at the end of this nested loop we have an intermediary table with
all (relevant) columns from binarypackagepublishinghistory and
binarypackagerelease. Think of it as what you'd get from
select *, debversion_sort_key(binarypackagerelease.version) from
binarypackagepublishinghistory, binarypackagerelease where
binarypackagepublishinghistory.binarypackagerelease=binarypackagerelease.id
and binarypackagepublishinghistory.archive=2304;

The function call is included in the nested loop costs because thats
where we're outputting the rows, and the table stats told postgresql
that we'd probably see 1:1 between the 'history' and 'release' tables
- what its doing is building the intermediary table row by row:
for row in binarypackagepublishinghistory.find(archive=2304):
  new_row = Row()
  new_row.extend(row)
  new_row.extend(binarypackagerelease.get(row.binarypackagerelease)
  new_row.append(debversion_sort_key(new_row['binarypackagerelease.version'])

This is why the cost doesn't show against the index lookups but rather
against the loop.

We also still have the intermediary table created from
binarypackagename up above. Now we combine them:

        -> Hash Join (cost=5140.97..359265.54 rows=56248 width=163)
(actual time=228.492..12402.625 rows=55469 loops=1)
              Hash Cond: (binarypackagerelease.binarypackagename =
binarypackagename.id)

This gives us one intermediary table with all candidate rows from all
three separate tables.
And finally we sort-and-limit.

 Limit (cost=361303.95..361304.14 rows=76 width=163) (actual
time=12539.225..12539.275 rows=76 loops=1)
  -> Sort (cost=361303.95..361444.57 rows=56248 width=163) (actual
time=12539.223..12539.243 rows=76 loops=1)
        Sort Key: binarypackagename.name,
(debversion_sort_key(binarypackagerelease.version)),
binarypackagepublishinghistory.id
        Sort Method: top-N heapsort Memory: 46kB

> You don't really know if "top-N heapsort" is the expensive part (because
> it had so many rows to sort relative to eachother), or if the Sort Key
> was the overhead (because it was so expensive to evaluate each row.)

The sort key - its clear because the sort cost is cheap, and the loop
cost was high.

>>
>> Firstly, note that the order by is coming from three different tables
>> - this means we would need to traverse several different indices to
>> generate the ordering by index. There are 140K package names in the
>> system, and 55K history rows for archive 2304 (see the row counts in
>> the plan). So postgresql is choosing to start with the 55K history
>> rows rather than the package names.
>
> I think here you switched topics, and changed to focusing on how could
> we get the whole result from a very specific indexed query?

I stayed within the broad topic of 'how can we answer this use case
efficiently' :). Yes, having dealt with why the function call was
incurring cost, I switched to alternative approaches to address the
issue.

>> What indices would we need to generate the results incrementally? Well
>> we'd need :
>>  - binarypackagename.name (indexed),
>>  - debversion_sort_key(binarypackagerelease.version)  - but we want to
>> skip all the other archives so it probably needs to be a composite
>> index:
>>  - binarypackagepublishinghistory.archive,
>> debversion_sort_key(binarypackagerelease.version,
>> -binarypackagepublishinghistory.id) - which we cannot do because that
>> is across two different tables.
>>    (if we index bpph.archive, bpph.binarypackagerelease the sort order
>> of binarypackagerelease will be a simple int: useless for our needs)
>>
>
> ^- I think you're formatting lost something here. It looks like you were
> trying to indent something?
>
> I don't see how the "but we want to skip all the other archives" applies
> to "debversion_sort_key(binarypackagerelease.version)".

Its a constraint: we have a million row data set and want to get the
first 76 rows matching archive=2304 sorted by name,
debversion_sort_key(..), id. To build a direct iterator of that we
need an index which will let us select a subrange of the index (to
filter by archive=2304) and then be sorted from there.

> I can't say that I fully understand everything going on here, but is it
> possible to define a global ordering?

Across tables? Generally no. I have a whacky idea about using
functions, but I need to check with a pgsql god whether I'm proposing
something insane or not.

> Anyway, I think this was interesting, but I'm not sure I followed all of
> it. I really don't understand the "binarypackagesort_key" that you are
> describing here. It may just be misunderstanding how the tables interrelate.

I was suggesting using a function to get a sort across tables and
store that in an index, depending on the immutability of some of our
data.

-Rob



References