← Back to team overview

launchpad-dev team mailing list archive

Re: bug notifications database utilization

 

On 2011-03-26 09:04, Gary Poster wrote:

2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
2011-03-23 15:15:08 INFO    Notifying xxx about bug 736049.
...
2011-03-23 15:15:16 INFO    Notifying xxx about bug 733732.

Often also the notification lines in the log are several seconds or more
apart, indicating the call to sendmail() blocks for a time. So I have 2
questions:

1. How is the new script invocation happening if the old one appears to
still be running? My theory is that the new script starts and blocks
until the old one finishes. And if the next one is slow too, then it all
compounds....

That doesn't quite jibe with what I think we see here, but I could be wrong.  The core issue does appear to be that it seems to be possible for a script to run simultaneously, though we haven't caught that smoking gun yet.

If you just call script.run(), multiple instances of the script can run simultaneously — though of course they may still block each other out in the database or elsewhere. There's also lock_and_run, which can optionally block for the lock to become available but does not block by default.

Multiple instances blocking each other out in the database seems more likely if they eat out of the same queue. For example, an attempt to delete the record at the head of the queue will block on another transaction that has deleted the same record (but has not committed yet). If there's some other blocking lock involved as well, e.g. in sendmail itself, then the two script instances could even deadlock outside the database's field of view.

I sometimes find stone-age profiling helpful with scripts: ctrl-C the thing and see what the traceback says it was doing.


Jeroen



References