← Back to team overview

launchpad-dev team mailing list archive

Re: Branch scans taking a really long time?

 

On 06/04/10 02:58, Aaron Bentley wrote:
On 04/02/2010 07:51 PM, William Grant wrote:
I've been noticing for a while that the Codehosting scanner has been
taking ages, but had no hard evidence. I pushed up
lp:~wgrant/launchpad/revert-soyuz-sample-data-changes this morning, and:

last_mirrored
2010-04-02 23:29:31.302992+00:00

last_scanned
2010-04-02 23:45:03.022906+00:00

That's not quite 2006 slowness, but it's close!

The same branch takes 20 seconds to scan locally.


According to our logs the branch in question was mirrored while a large
scan was already running. That meant it had to wait until that run
completed before we could consider scanning it.

The previous scanner run started at 2010-04-02 23:28:15 and completed at
2010-04-02 23:41:42, running 279 BranchScanJobs. According to my
calculations* this is 2.89 seconds per job.

The next script run began at 2010-04-02 23:42:13 and completed at
2010-04-02 23:49:44, running 103 BranchScanJobs, including
revert-soyuz-sample-data-changes

2.89 seconds per job seems to be acceptable performance in absolute
terms, but perhaps not it is not relative to the workload. The workload
appears to be bursty. The mean number of completed jobs per run is 5.24,
and the median is 2. I've attached a file listing all the job counts per
run for our log scan_branches.log-20100405. It covers the period of
2010-03-31 23:01:08 to 2010-04-04 23:00:41. I'm sorry I cannot attach
the actual log, but it contains private information.

*I'd* love to see the branch names, even if you can't post that to the list :-) I guess I can read the logs myself.

It looks like on some hours, we are flooded with new branch scan jobs.
02:00 being the most common. This is probably due to automated
processes; mirror branches updating, imports, upstream tarball imports,
etc.

Imports shouldn't trigger scan jobs (or even pull jobs) unless there are new revisions to import. It could be mirrored branches and it could be the package import branches, although I wouldn't expect them to be bursty from what I understand of how the system works.

...

Looking at the logs, it does seem to be mirrored branches although I wasn't exactly scientific about it. It would be relatively easy to only created a scan job when the tip revid changes but then we'll miss format changes. Probably the fix there is to record formats from the puller...

Cheers,
mwh



References