← Back to team overview

launchpad-dev team mailing list archive

Re: bug notifications database utilization

 

Hi folks

An update on the status of this. The system appears ok *for now*.

A cowboy was applied to prod (the optimisation gary refers to below) but
in reality it didn't make much difference to the sql being run. By far
the latest number of sql statements were single selects to get
subscribers' email details and subscription details. There were 100's of
them. These should each be consolidated into a bulk select. Or something
else done to address the issue. See bug
https://bugs.launchpad.net/launchpad/+bug/742230

The latest dbr report shows the bugnotification user using approx 10%
cpu over the past few hours. Bug email is going out but the cpu is still
far too high. Why is the system running "ok" now? I'm not sure and
that's a worry.

Another issue I noticed from the logs is as follows. The cronjob runs
every 5 minutes. The logs over the past few days show that there are
times when notifications are still being sent when the next script
invocation occurs eg

2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
2011-03-23 15:15:08 INFO    Creating lockfile:
/var/lock/launchpad-send-bug-notifications.lock
2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
...
2011-03-23 15:15:16 INFO    Notifying xxx about bug 733732.

Often also the notification lines in the log are several seconds or more
apart, indicating the call to sendmail() blocks for a time. So I have 2
questions:

1. How is the new script invocation happening if the old one appears to
still be running? My theory is that the new script starts and blocks
until the old one finishes. And if the next one is slow too, then it all
compounds....

2. Why do some calls to sendmail() take so long to complete. And given
they do, what can be done about it.

So that's all for now. There's no definitive fix that's been applied,
but the logs have given perhaps a little more insight into where to
start looking next.


On 24/03/11 23:05, Gary Poster wrote:
> Thanks for the heads up, Stuart.
> 
> Stuart clarified on IRC that this is currently a scalability problem rather than a performance problem, though that could change.
> 
> This started happening after the most recent DB deploy.  The most likely cause is my own work to address bug 164196, "Quickly-undone actions shouldn't send mail notifications" (code: http://bazaar.launchpad.net/~launchpad-pqm/launchpad/devel/revision/12533#lib/lp/bugs/scripts/bugnotification.py).
> 
> There is an obvious optimization to try (get the activity record along with the notification that points to it). I expect that will reduce the database usage, but I would be very surprised if it would get us down to the previous number.  We need to look at more data than before in order to answer these questions.
> 
> I've created https://bugs.launchpad.net/launchpad/+bug/741684 for this optimization.
> 
> Gary
> 
> On Mar 24, 2011, at 4:25 AM, Stuart Bishop wrote:
> 
>> Hi.
>>
>> The database utilization report has picked up that bug notifications
>> is now chewing 17% of a master database CPU core. 2 months ago, it was
>> using <1% of a master database CPU core.
>>
>> This seems excessive, and could be a red flag to making things more
>> complex with more advanced filters.
>>
>> -- 
>> Stuart Bishop <stuart@xxxxxxxxxxxxxxxx>
>> http://www.stuartbishop.net/
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~launchpad-dev
> Post to     : launchpad-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~launchpad-dev
> More help   : https://help.launchpad.net/ListHelp
> 



Follow ups

References