launchpad-reviewers team mailing list archive

Thread
Date
Re: [Merge] lp:~julian-edwards/launchpad/buildd-manager-parallel-scan into lp:launchpad/devel

To: Jonathan Lange <jml@xxxxxxxxxxxxx>
From: Julian Edwards <julian.edwards@xxxxxxxxxxxxx>
Date: Fri, 30 Jul 2010 12:04:44 -0000
In-reply-to: <AANLkTinYzENf1g_qwabCfCu0NiM09SRqN3UZcTz3XCPd@mail.gmail.com>
Reply-to: mp+30672@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx
On Tuesday 27 July 2010 18:25:58 Jonathan Lange wrote:
> Hey Julian,
> 
> Thanks for doing this work. I'm very sorry that one of your first
> exposures to Twisted has been in this fashion.

Thanks for taking on the review Jono.  We've already gone through this on the 
phone so I'll summarise stuff inline.

> The review below is quite detailed and raises lots of questions, which
> is only to be expected with major changes to old code. I'm afraid I
> ran out of steam by the time it came to the tests.

So did I :(

> I look forward to getting your reply, and sincerely hope this review helps.

It did!

> > === modified file 'lib/lp/buildmaster/manager.py'
> > --- lib/lp/buildmaster/manager.py       2010-05-27 13:13:47 +0000
> > +++ lib/lp/buildmaster/manager.py       2010-07-22 15:26:12 +0000
[snip]
> > +class SlaveScanner:
> > +    """A manager for a single builder."""
> > +
> > +    SCAN_INTERVAL = 5
> > +
> > +    # These are for the benefit of tests, it seems.
> >     reset_result = ResetDispatchResult
> >     fail_result = FailDispatchResult
> 
> Please either explain how these benefit the tests, or refer to
> somewhere else that explains them.

The tests poke different versions in for testing callbacks etc.  I've added a 
better comment.

[snip]
> > +    def __init__(self, builder_name, logger):
> > +        # XXX in the original code, it doesn't set the slave as as
> > +        # RecordingSlave until tries to dispatch a build.  This is the
> > +        # only part that's using Deferreds right now, and can be
> > +        # improved if we add more methods to RecordingSlave.  In fact we
> > +        # should use it exclusively and remove the wrapper hack.
> 
> It seems like there's quite a lot of thinking behind this comment. As
> it stands, I wouldn't be able to fix the XXX without more information.
> 
> Also, once this branch lands, no one is going to care what the
> original code did. It would be better to say clearly what the problem
> is and why it's a problem.
> 
> Could you please expand the comment? Perhaps the right thing to do is
> file a bug and refer to it?

https://bugs.edge.launchpad.net/soyuz/+bug/611258

Comment fixed! (and moved to a more appropriate place to make it)

> 
> > +        self.builder_name = builder_name
> > +        self.logger = logger
> > +        self._deferred_list = []
> > +
> 
> > +    def scheduleNextScanCycle(self):
> This needs a docstring.

Fixed.

> 
> > +        self._deferred_list = []
> > +        reactor.callLater(self.SCAN_INTERVAL, self.startCycle)
> > +
> > +    def startCycle(self):
> > +        """Main entry point for each scan cycle on this builder."""
> 
> Yes, but what does it do?
> 
> As far as I can tell, this method is the one you would call if you
> wanted to do a scan. Perhaps it should have a different name? The
> docstring should have something like:
>   """Scan the builder and dispatch results to blahdiblah"""

Fixed.

> 
> > +        self.logger.debug("Scanning builder: %s" % self.builder_name)
> > 
> >         d = defer.maybeDeferred(self.scan)
> >         d.addCallback(self.resumeAndDispatch)
> >         d.addErrback(self.scanFailed)
> 
> It's almost always a good idea to return a Deferred. That way the
> method can be called directly in tests and you can do things with the
> result.

I've fixed this method, see below.

> 
> > +    @write_transaction
> > +    def scan(self):
> > +        """Probe the builder and update/dispatch/collect as appropriate.
> > +
> > +        The whole method is wrapped in a transaction, but we do partial
> > +        commits to avoid holding locks on tables.
> > +        """
> 
> Looking further down, it seems that the return value of this function
> is significant. Could you please add documentation about the possible
> return values and what they mean?

Yep, fixed.

> > +        # We need to re-fetch the builder object on each cycle as the
> > +        # Storm store is invalidated over transaction boundaries.
> > +
> > +        # Avoid circular import.
> > +        from lp.buildmaster.interfaces.builder import IBuilderSet
> > +        builder_set = getUtility(IBuilderSet)
> > +        self.builder = builder_set[self.builder_name]
> > +
> > +        if self.builder.builderok:
> > +            self.builder.updateStatus(self.logger)
> > +            transaction.commit()
> > +
> > +        # See if we think there's an active build on the builder.
> > +        store = store = getUtility(
> > +            IStoreSelector).get(MAIN_STORE, DEFAULT_FLAVOR)
> 
> "store =" is repeated unnecessarily.

Fixed.

> 
> > +        buildqueue = store.find(
> > +            BuildQueue,
> > +            BuildQueue.job == Job.id,
> > +            BuildQueue.builder == self.builder.id,
> > +            Job._status == JobStatus.RUNNING,
> > +            Job.date_started != None).one()
> > +
> 
> The comment makes me think there should be a method called
> IBuilder.getActiveBuild().

I've made IBuildQueueSet.getByActiveBuilder()

> > +        # Scan the slave and get the logtail, or collect the build if
> > +        # it's ready.
> > +        if buildqueue is not None:
> > +            self.builder.updateBuild(buildqueue)
> > +            transaction.commit()
> > +
> 
> I don't understand how calling "updateBuild" scans the slave and gets
> the logtail. Is the comment up-to-date?

As discussed, it's fine although the method name is a bit dodgy.

> > +        # If the builder is in manual mode, don't dispatch anything.
> > +        if self.builder.manual:
> > +            self.logger.debug(
> > +            '%s is in manual mode, not dispatching.' %
> > self.builder.name)
> 
> Technically this line should be indented by another four spaces.

Fixed.

> 
> > +            return None
> > +
> > +        # If the builder is marked unavailable, don't dispatch anything.
> > +        # Additionally see if we think there's a job running on it.  If
> > +        # there is then it means the builder was just taken out of the
> > +        # pool and we need to put the job back in the queue.
> 
> It took me a couple of tries to parse this, but I don't think it's
> because of the prose. :\

I've changed the comment in an attempt to make it clearer what's going on.

> 
> > +        if not self.builder.is_available:
> > +            job = self.builder.currentjob
> > +            if job is not None and not self.builder.builderok:
> > +                self.logger.info(
> > +                    "%s was made unavailable, resetting attached "
> > +                    "job" % self.builder.name)
> > +                job.reset()
> > +                transaction.commit()
> > +            return None
> > +
> 
> Just to clarify my understanding...
> 
> If this code were written in a more event-driven style, I would put
> the bulk of this method in a builderMadeUnavailable event handler.
> Does this square with your understanding?

Spot on.

> 
> > +        # See if there is a job we can dispatch to the builder slave.
> > +        slave = RecordingSlave(
> > +            self.builder.name, self.builder.url, self.builder.vm_host)
> > +        candidate = self.builder.findAndStartJob(buildd_slave=slave)
> > +        if self.builder.currentjob is not None:
> > +            transaction.commit()
> > +            return slave
> > +
> > +        return None
> > +
> 
> OK, now I'm confused.
> 
> AFAICT, scan() is only called in startCycle(), at which point it's
> wrapped in a maybeDeferred(). However, there's no circumstance in
> which scan() will return a Deferred, which makes maybeDeferred() a
> little pointless. What's going on?

Comment below ...

> 
> >     def scanFailed(self, error):
> >         """Deal with scanning failures."""
> > -        self.logger.info(
> > -            'Scanning failed with: %s\n%s' %
> > +        self.logger.info("Scanning failed with: %s\n%s" %
> >             (error.getErrorMessage(), error.getTraceback()))
> 
> ...
> 
> > +        # We should probably detect continuous failures here and mark
> > +        # the builder down.
> > +        self.scheduleNextScanCycle()
> > +
> 
> XXX: Would this be clearer in an addBoth / finally?
> 
> > +    def resumeAndDispatch(self, slave):
> > +        """Chain the resume and dispatching Deferreds.
> > +
> > +        `extra_callback` is only set by the test suite.
> 
> What's extra_callback?

Something that should not be there!

> 
> > +        """
> > +        if slave is not None:
> > +            if slave.resume_requested:
> > +                # The slave needs to be reset before we can dispatch to
> > +                # it (e.g. a virtual slave)
> > +                d = slave.resumeSlave()
> > +                d.addBoth(self.checkResume, slave)
> > +            else:
> > +                # No resume required, build dispatching can commence.
> > +                d = defer.succeed(None)
> > +
> > +            # Use Twisted events to dispatch the build to the slave.
> > +            d.addCallback(self.initiateDispatch, slave)
> 
> I personally don't think that comment adds much. Those proficient with
> Twisted know what addCallback does, and it doesn't give much to go on
> for those unfamiliar with Twisted.

Changed to reference that it's dispatching asynchronously.

> 
> > +            # Store this deferred so we can wait for it along with all
> > +            # the others that will be generated.
> > +            self._deferred_list.append(d)
> 
> What others? It would be nice to have something in the comment to
> indicate what would generate other Deferreds and why.

I've updated the comment.

> 
> I had written "Is there a particular reason for not scheduling a scan
> cycle here, given that you do in the other cases?", but I see that
> it's because scheduleNextScanCycle does something involving the clock
> and you only want to call it when you know the scan is done. I guess
> ideally there would be a single call to next scan cycle, given that
> it's something that you always want to do.
> 
> What's really missing here is a function you call to do a single scan
> and that returns a Deferred that only fires when the scan is done.

Yes, that would really help.  I might sort that out in a new branch.

> 
> > +        else:
> > +            # The scan for this builder didn't want to dispatch
> > +            # anything, so we can finish this cycle.
> > +            self.scheduleNextScanCycle()
> > +
> 
> This function doesn't return a Deferred either, and the code that
> waits on its Deferreds firing is in a totally different place.
> 
> AFAICT, this means that startCycle could be written along the following
> lines:
> 
> try:
>   slave = self.scan()
>   if slave is None:
>     self.scheduleNextScanCycle()
>   else:
>     self.resumeAndDispatch(slave)
> except:
>   self.logger.info(...)
>   self.scheduleNextScanCycle()
> 
> without changing any of the logic.
> 
> Although it's definitely a matter of opinion, I think the version here
> gives a better idea of the overall flow of a build scan.

I've changed the code to do this and....WIN.  Great suggestion, thanks.

> 
> > +    def initiateDispatch(self, resume_result, slave):
> > +        """Start dispatching a build to a slave.
> > +
> > +        If the previous task in chain (slave resuming) has failed it will
> > +        receive a `ResetBuilderRequest` instance as 'resume_result' and
> > +        will immediately return that so the subsequent callback can 
collect
> > +        it.
> > +
> 
> This seems unnecessarily complex. As in, the reality, not the prose.

It was Celso's way of putting another "special" callable object on the end of 
the callables in the deferred chain, so that reset/failure scenarios can be 
handled.  It's not really the job of this branch to change that so I decided 
to add that docstring to at least make it documented.

> 
> > +        If the slave resuming succeeded, it starts the XMLRPC dialogue. 
The
> >  +        dialogue may consist of many calls to the slave before the build
> > +        starts.  Each call is done via a Deferred event, where slave 
calls
> > +        are sent in callSlave(), and checked in checkDispatch() which 
will
> > +        keep firing events via callSlave() until all the events are done 
or
> > +        an error occurrs.
> 
> One "r" in "occurs".

Fixed, thanks.

> 
> > +        """
> > +        if resume_result is not None:
> > +            self.waitOnDeferredList()
> 
> That method name is a little misleading. The code is not going to sit
> here and wait until the Deferred list fires. Instead, it's going to
> add a callback that gets fired when all of the Deferreds in the list
> have fired. What's the point of calling it here?

As we agreed on the call, it's renamed to slaveDialogueEnded.

> 
> > +            return resume_result
> > +
> > +        self.logger.info('Dispatching: %s' % slave)
> > +        self.callSlave(slave)
> > +
> > +    def _getProxyForSlave(self, slave):
> > +        """Return a twisted.web.xmlrpc.Proxy for the buildd slave.
> > +
> > +        Uses a protocol with timeout support, See
> > QueryFactoryWithTimeout.
> > +        """
> > +        proxy = xmlrpc.Proxy(str(urlappend(slave.url, 'rpc')))
> > +        proxy.queryFactory = QueryFactoryWithTimeout
> > +        return proxy
> > +
> > +    def callSlave(self, slave):
> > +        """Dispatch the next XMLRPC for the given slave."""
> > +        if len(slave.calls) == 0:
> > +            # That's the end of the dialogue with the slave.
> > +            self.waitOnDeferredList()
> 
> See previous comment about waitOnDeferredList.

All fixed.

> 
> > +            return
> > +
> > +        # Get an XMPRPC proxy for the buildd slave.
> 
> Spelling mistake. XMLRPC.

Fixed.

> 
> > +        proxy = self._getProxyForSlave(slave)
> > +        method, args = slave.calls.pop(0)
> > +        d = proxy.callRemote(method, *args)
> > +        d.addBoth(self.checkDispatch, method, slave)
> > +        self._deferred_list.append(d)
> > +        self.logger.debug('%s -> %s(%s)' % (slave, method, args))
> > +
> > +    def waitOnDeferredList(self):
> > +        """After all the Deferreds are set up, wait on them."""
> > +        dl = defer.DeferredList(self._deferred_list, consumeErrors=True)
> 
> Looking at some of the code in evaluateDispatchResult, you probably
> want to use lp.services.twistedsupport.gatherResults instead of
> DeferredList.

On IRC you suggested this:

  dl = gatherResults(self._deferred_list)
  dl.addBoth(self.evaluateDispatchResult)
  return dl

and in the callback:

  dispatch_results = [
    x for x in deferred_list_results if isinstance(x, BaseDispatchResult]

gatherResults is invoking DeferredList with fireOnOneErrback=1 which is 
different behaviour to the existing.  I'd like to leave this for now until I 
understand the ramifications of that.

> 
> > +        dl.addBoth(self.evaluateDispatchResult)
> > +        return dl
> > +
> > +    def evaluateDispatchResult(self, deferred_list_results):
> > +        """Process the DispatchResult for this dispatch chain.
> > +
> > +        After waiting for the Deferred chain to finish, we'll have a
> > +        DispatchResult to evaluate, which deals with the result of
> > +        dispatching.
> > +        """
> > +        # The `deferred_list_results` is what we get when waiting on a
> > +        # DeferredList.  It's a list of tuples of (status, result) where
> > +        # result is what the last callback in that chain returned.
> > +
> 
> If you use gatherResults, it'll just be the results.
> 
> > +        # If the result is an instance of BaseDispatchResult we need to
> > +        # evaluate it, as there's further action required at the end of
> > +        # the dispatch chain.  None, resulting from successful chains,
> > +        # are discarded.
> > +
> > +        dispatch_results = [
> > +            result for status, result in deferred_list_results
> > +            if isinstance(result, BaseDispatchResult)]
> > +
> > +        for result in dispatch_results:
> 
> > +            if isinstance(result, BaseDispatchResult):
> You don't need this check here, since it's in the list comprehension above.

Good spot!

> 
> > +                self.logger.info("%r" % result)
> >                 result()
> > 
> > +        # At this point, we're done dispatching, so we can schedule the
> > +        # next scan cycle.
> > +        self.scheduleNextScanCycle()
> > +
> > +        # For the test suite so that it can chain callback results.
> > +        return deferred_list_results
> 
> ...
> 
> >     def checkResume(self, response, slave):
> > -        """Deal with a slave resume failure.
> > +        """Check the result of resuming a slave.
> > 
> > -        Return a corresponding `ResetDispatchResult` dispatch result,
> > -        which is chained to the next callback, dispatchBuild.
> > +        If there's a problem resuming, we return a ResetDispatchResult
> > which +        will get evaluated at the end of the scan, or None if the
> > resume +        was OK.
> >         """
> 
> Looking through the method, 'response' can also be a list of [status,
> info]. Could you please add documentation about this to the docstring.
> 
> I guess 'response' always comes from slave.resumeSlave? If so, it's
> got a very unusual behaviour for Python code, since it returns
> different types of objects depending on how successful it was, which
> practically forces a caller to do type checking, which is slow &
> brittle.

As we discussed on IRC, you missed that `response` comes from 
ProcessWithTimeout.processEnded()

> 
> >         # 'response' is the tuple that's constructed in
> >         # ProcessWithTimeout.processEnded(), or is a Failure that

Right here :)

> > @@ -395,9 +453,9 @@
> >         If it failed and it compromises the slave then return a
> > corresponding `FailDispatchResult`, if it was a communication failure,
> > simply reset the slave by returning a `ResetDispatchResult`.
> > -
> > -        Otherwise dispatch the next call if there are any and return
> > None. """
> > +        # XXX these DispatchResult classes are badly named and do the
> > +        # same thing.  We need to fix that.
> 
> I recommend just fixing it. It will be less work than trying to
> explain why they are badly named :)

As discussed on the phone - next branch.

> 
> >         self.logger.debug(
> >             '%s response for "%s": %s' % (slave, method, response))
> > 
> > @@ -405,15 +463,15 @@
> >             self.logger.warn(
> >                 '%s communication failed (%s)' %
> >                 (slave, response.getErrorMessage()))
> > -            self.slaveDone(slave)
> > +            self.waitOnDeferredList()
> >             return self.reset_result(slave)
> > 
> > -        if isinstance(response, list) and len(response) == 2 :
> 
> > +        if isinstance(response, list) and len(response) == 2:
> ...
> 
> >             if method in buildd_success_result_map.keys():
> >                 expected_status = buildd_success_result_map.get(method)
> 
> Given that we're fetching 'method' from the dict anyway, this could
> also be written as:
> 
> expected_status = buildd_success_result_map.get(method, None)
> if expect_status is not None:
> 
> Either way, 'keys()' is redundant.

keys() removed.

> 
> >                 status, info = response
> >                 if status == expected_status:
> > -                    self._mayDispatch(slave)
> > +                    self.callSlave(slave)
> >                     return None
> >             else:
> >                 info = 'Unknown slave method: %s' % method
> > @@ -423,76 +481,98 @@
> >         self.logger.error(
> >             '%s failed to dispatch (%s)' % (slave, info))
> > 
> > -        self.slaveDone(slave)
> > +        self.waitOnDeferredList()
> 
> I wonder whether each of these calls is actually needed. If they are
> all needed, how can you tell?

I can't remember what you're referring to here.

> 
> >         return self.fail_result(slave, info)
> 
> ...
> 
> > +
> > +class NewBuildersScanner:
> > +    """If new builders appear, create a scanner for them."""
> > +
> > +    # How often to check for new builders, in seconds.
> > +    SCAN_INTERVAL = 300
> > +
> > +    def __init__(self, manager):
> > +        self.manager = manager
> > +        # Avoid circular import.
> > +        from lp.buildmaster.interfaces.builder import IBuilderSet
> > +        self.current_builders = [
> > +            builder.name for builder in getUtility(IBuilderSet)]
> > +
> > +    def scheduleScan(self):
> > +        """Schedule a callback SCAN_INTERVAL seconds later."""
> > +        reactor.callLater(self.SCAN_INTERVAL, self.scan)
> > +
> 
> This reminds me: check out LoopingCall some time. It'll make this code
> a bit simpler, I reckon.

Yep - I'll look into that, thanks.

> 
> > +    def scan(self):
> > +        """If a new builder appears, create a SlaveScanner for it."""
> > +        new_builders = self.checkForNewBuilders()
> > +        if new_builders is not None:
> > +            self.manager.addScanForBuilders(new_builders)
> > +        self.scheduleScan()
> > +
> > +    def checkForNewBuilders(self):
> > +        """See if any new builders were added."""
> > +        # Avoid circular import.
> > +        from lp.buildmaster.interfaces.builder import IBuilderSet
> > +        new_builders = set(
> > +            builder.name for builder in getUtility(IBuilderSet))
> > +        old_builders = set(self.current_builders)
> > +        extra_builders = new_builders.difference(old_builders)
> > +        if len(extra_builders) == 0:
> > +            # No new builders.
> > +            return None
> > +
> 
> Returning None here and then checking for None up in scan() before
> calling addScanForBuilders which does nothing in the case of an empty
> list seems unnecessarily complex.

Good point - fixed.

> 
> > +        return [builder for builder in extra_builders]
> > +
> 
> This could also be written as list(extra_builders).

What was I thinking ...!

> > === modified file 'lib/lp/buildmaster/tests/test_manager.py'
> > --- lib/lp/buildmaster/tests/test_manager.py    2010-07-18 17:43:23 +0000
> > +++ lib/lp/buildmaster/tests/test_manager.py    2010-07-22 15:26:12 +0000
[snip]
> >     def test_checkResume(self):
> > -        """`BuilddManager.checkResume` is chained after resume requests.
> > +        """`SlaveScanner.checkResume` is chained after resume requests.
> > 
> >         If the resume request succeed it returns None, otherwise it
> > returns a `ResetBuildResult` (the one in the test context) that will be
> > @@ -355,24 +311,26 @@
> >         # Make a failing slave that is requesting a resume.
> >         slave = RecordingSlave('foo', 'http://foo.buildd:8221/',
> > 'foo.host') slave.resume_requested = True
> > -        slave.resumeSlave = lambda: defer.fail(Failure(('out', 'err',
> > 1))) +        slave.resumeSlave = lambda: deferLater(
> > +            reactor, 0, defer.fail, Failure(('out', 'err', 1)))
> 
> Why did you need to make this change?

As explained on the call - the old defer.fail was getting exectuted 
immediately.  I've no idea why and wasted 2 days at the Epic trying to figure 
it out :/

> 
> >         # Make the manager log the reset result calls.
> >         self.manager.reset_result = LoggingResetResult
> > -        # Restore the slaveDone method. It's very relevant to this test.
> > -        self.manager.slaveDone = self._realSlaveDone
> > +
> >         # We only care about this one slave. Reset the list of manager
> >         # deferreds in case setUp did something unexpected.
> > -        self.manager._deferreds = []
> > -
> > -        self.manager.resumeAndDispatch([slave])
> > -        # Note: finishCycle isn't generally called by external users,
> > normally -        # resumeAndDispatch or slaveDone calls it. However,
> > these calls -        # swallow the Deferred that finishCycle returns,
> > and we need that -        # Deferred to make sure this test completes
> > properly.
> > -        d = self.manager.finishCycle()
> > -        return d.addCallback(
> > -            lambda ignored: self.assertEqual([slave],
> > reset_result_calls)) +        self.manager._deferred_list = []
> > +
> > +        # Here, we're patching the waitOnDeferredList method so we can
> > +        # get an extra callback at the end of it, so we can
> > +        # verify that the reset_result was really called.
> > +        def _waitOnDeferredList():
> > +            d = self._realwaitOnDeferredList()
> > +            return d.addCallback(
> > +                lambda ignored: self.assertEqual([slave],
> > reset_result_calls)) +        self.manager.waitOnDeferredList =
> > _waitOnDeferredList
> > +
> 
> What you should be doing instead is always, always, always returning
> Deferreds from things that make Deferreds or call something that
> returns a Deferred.

There's no need to tell *me* that now :)

> 
> > +        self.manager.resumeAndDispatch(slave)
> > 
> >     def test_failed_to_resume_slave_ready_for_reset(self):
> >         # When a slave fails to resume, the manager has a Deferred in its
> > @@ -385,11 +343,12 @@
> > 
> >         # We only care about this one slave. Reset the list of manager
> >         # deferreds in case setUp did something unexpected.
> > -        self.manager._deferreds = []
> > -        # Restore the slaveDone method. It's very relevant to this test.
> > -        self.manager.slaveDone = self._realSlaveDone
> > -        self.manager.resumeAndDispatch([slave])
> > -        [d] = self.manager._deferreds
> > +        self.manager._deferred_list = []
> > +        # Restore the waitOnDeferredList method. It's very relevant to
> > +        # this test.
> > +        self.manager.waitOnDeferredList = self._realwaitOnDeferredList
> 
> The diff doesn't make it clear. I guess the setUp() method mutates
> waitOnDeferredList?

Unfortunately, yes :(

[snip]
> 
> I understand a lot of effort went into getting these tests to work. I
> appreciate that, but it's very hard to have confidence in them given
> the amount of monkey-patching that's going on and given the size of
> the tests.
> 
> I don't really know what to do about that. Sorry.

Same for me and mwh.  It needs a serious code re-org to start returning 
Deferreds in the right places, and a total test re-write.

But now is not the right time :(

> 
> > +class TestBuilddManager(TrialTestCase):
> > +
> > +    layer = LaunchpadZopelessLayer
> > +
> > +    def _stub_out_scheduleNextScanCycle(self):
> > +        # stub out the code that adds a callLater, so that later tests
> > +        # don't get surprises.
> > +        self._saved_schedule = SlaveScanner.scheduleNextScanCycle
> > +        def cleanup():
> > +            SlaveScanner.scheduleNextScanCycle = self._saved_schedule
> > +        self.addCleanup(cleanup)
> > +        SlaveScanner.scheduleNextScanCycle = FakeMethod
> > +
> 
> Really, the SlaveScanner object should take an optional 'clock'
> parameter that it uses to do reactor gibberish. See task.py and
> test_task.py in our tree for examples.

So I've replaced this code with:
  self.patch(SlaveScanner, 'scheduleNextScanCycle', FakeMethod)

And changed the calls to reactor to use a TimeoutMixin so I can override the 
clock, which is now manipulated in the tests.

> 
> > +    def test_addScanForBuilders(self):
> > +        # Test that addScanForBuilders generates NewBuildersScanner
> > objects. +
> > +        self._stub_out_scheduleNextScanCycle()
> > +
> > +        manager = BuilddManager()
> > +        builder_names = [builder.name for builder in
> >                           getUtility(IBuilderSet)]
> > +        scanners = manager.addScanForBuilders(builder_names)
> > +        scanner_names = [scanner.builder_name for scanner in scanners]
> > +        for builder in builder_names:
> > +            self.assertTrue(builder in scanner_names)
> 
> This test checks that builder_names is a subset of scanner_names. Is
> this actually what you want?

Good point, I've changed it to use set comparisons.

> > +
> > +    def test_startService_adds_NewBuildersScanner(self):
> > +        # When startService is called, the manager will start up a
> > +        # NewBuildersScanner object.
> > +        self._stub_out_scheduleNextScanCycle()
> > +        NewBuildersScanner.SCAN_INTERVAL = 0
> > +        manager = BuilddManager()
> > +
> > +        # Replace scan() with FakeMethod so we can see if it was called.
> > +        manager.new_builders_scanner.scan = FakeMethod
> > +
> > +        def assertCalled():
> > +            self.failIf(manager.new_builders_scanner.scan.call_count ==
> > 0) +
> > +        manager.startService()
> > +        reactor.callLater(0, assertCalled)
> > +
> 
> Why do you need to spin the reactor here?

I've converted this to use task.Clock() as before.


> > +
> > +class TestNewBuilders(TrialTestCase):
> > +    """Test detecting of new builders."""
> > +
> > +    layer = LaunchpadZopelessLayer
> > +
> > +    def _getScanner(self, manager=None):
> > +        return NewBuildersScanner(manager=manager)
> > +
> > +    def test_init_stores_existing_builders(self):
> > +        # Make sure that NewBuildersScanner initialises itself properly
> > +        # by storing a list of existing builders.
> > +        all_builders = [builder.name for builder in
> > getUtility(IBuilderSet)] +        builder_scanner = self._getScanner()
> > +        self.assertEqual(all_builders, builder_scanner.current_builders)
> > +
> > +    def test_scheduleScan(self):
> > +        # Test that scheduleScan calls the "scan" method.
> > +        builder_scanner = self._getScanner()
> > +        builder_scanner.SCAN_INTERVAL = 0
> > +
> > +        builder_scanner.scan = FakeMethod
> > +        builder_scanner.scheduleScan()
> > +
> > +        def assertCalled():
> > +            self.failIf(
> > +                builder_scanner.scan.call_count == 0,
> > +                "scheduleScan did not schedule anything")
> > +
> > +        reactor.callLater(0, assertCalled)
> > +
> 
> Why do you need to spin the reactor here?

Same here.

> 
> > +    def test_checkForNewBuilders(self):
> > +        # Test that checkForNewBuilders() detects a new builder
> > +
> > +        # The basic case, where no builders are added.
> > +        builder_scanner = self._getScanner()
> > +        self.failUnlessIdentical(None, 
builder_scanner.checkForNewBuilders())
> > +
> 
> I find assertIs more succinct.

It's not in Twisted TestCase!  assertIdentical would be the nearest 
equivalent.  It's moot now anyway since I can assertEqual on the empty list.

> 
> > +        # Add two builders and ensure they're returned.
> > +        new_builders = ["scooby", "lassie"]
> > +        factory = LaunchpadObjectFactory()
> > +        for builder_name in new_builders:
> > +            factory.makeBuilder(name=builder_name)
> > +        self.failUnlessEqual(
> > +            new_builders, builder_scanner.checkForNewBuilders())
> > +
> 
> Best to be consistent with assertEqual vs failUnlessEqual.

I fixed all these failXXX calls.

> 
> > +    def test_scan(self):
> > +        # See if scan detects new builders and schedules the next scan.
> > +
> > +        # stub out the addScanForBuilders and scheduleScan methods since
> > +        # they use callLater; we only want to assert that they get
> > +        # called.
> > +        def fake_checkForNewBuilders():
> > +            return "new_builders"
> > +
> > +        def fake_addScanForBuilders(new_builders):
> > +            self.failUnlessEqual("new_builders", new_builders)
> > +
> > +        def assertCalled():
> > +            self.failIf(
> > +                builder_scanner.scheduleScan.call_count == 0,
> > +                "scheduleScan did not get called")
> > +
> > +        builder_scanner = self._getScanner(BuilddManager())
> > +        builder_scanner.checkForNewBuilders = fake_checkForNewBuilders
> > +        builder_scanner.manager.addScanForBuilders =
> > fake_addScanForBuilders +        builder_scanner.scheduleScan =
> > FakeMethod
> > +
> > +        reactor.callLater(0, builder_scanner.scan)
> > +        reactor.callLater(0, assertCalled)
> > +
> 
> I'm not clear why you need to spin the reactor here.

Again, fixed to use task.Clock()


*phew*

Please check the last few commits on the branch to see the changes.
bzr diff -r 10911..

-- 
https://code.launchpad.net/~julian-edwards/launchpad/buildd-manager-parallel-scan/+merge/30672
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~julian-edwards/launchpad/buildd-manager-parallel-scan into lp:launchpad/devel.
Follow ups

Re: [Merge] lp:~julian-edwards/launchpad/buildd-manager-parallel-scan into lp:launchpad/devel
From: Jonathan Lange, 2010-08-02
References

Re: [Merge] lp:~julian-edwards/launchpad/buildd-manager-parallel-scan into lp:launchpad/devel
From: Jonathan Lange, 2010-07-27