← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~julian-edwards/launchpad/double-build-bug-705342 into lp:launchpad

 

Julian Edwards has proposed merging lp:~julian-edwards/launchpad/double-build-bug-705342 into lp:launchpad with lp:~jml/launchpad/failing-tests-bug-711209 as a prerequisite.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~julian-edwards/launchpad/double-build-bug-705342/+merge/48219

The attached bug describes how some build logs end up with two apparently separate builds in them.  This happens when a buildd-admin kills the builder's processes off but doesn't kill everything quite properly for whatever reason.  Normally the buildd-manager will handle this fine for virtual builders, as when it sees an ABORTED state it resets the builder.

However for nonvirtual builders we just try and clean the slave which is clamitous due to a bug in sbuild which won't kill the actual build process off, thus leaving the builder open to receiving a second build.

This change just fails the (nonvirtual) builder if it gets into this state, since there's nothing we can do on the manager side.
-- 
https://code.launchpad.net/~julian-edwards/launchpad/double-build-bug-705342/+merge/48219
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~julian-edwards/launchpad/double-build-bug-705342 into lp:launchpad.
=== modified file 'lib/lp/buildmaster/model/builder.py'
--- lib/lp/buildmaster/model/builder.py	2010-12-06 04:41:45 +0000
+++ lib/lp/buildmaster/model/builder.py	2011-02-01 18:20:53 +0000
@@ -338,6 +338,16 @@
         # communications with the slave and the build manager had to reset
         # the job.
         if status == 'BuilderStatus.ABORTED' and builder.currentjob is None:
+            if not builder.virtualized:
+                # We can't reset non-virtual builders reliably as the
+                # abort() function doesn't kill the actual build job,
+                # only the sbuild process!  All we can do here is fail
+                # the builder with a message indicating the problem and
+                # wait for an admin to reboot it.
+                builder.failBuilder(
+                    "Non-virtual builder in ABORTED state, requires admin to "
+                    "restart")
+                return "dummy status"
             if logger is not None:
                 logger.info(
                     "Builder '%s' being cleaned up from ABORTED" %

=== modified file 'lib/lp/buildmaster/tests/mock_slaves.py'
--- lib/lp/buildmaster/tests/mock_slaves.py	2010-11-29 14:51:07 +0000
+++ lib/lp/buildmaster/tests/mock_slaves.py	2011-02-01 18:20:53 +0000
@@ -239,11 +239,11 @@
     """A mock slave that looks like it's aborted."""
 
     def clean(self):
-        self.call_log.append('status')
+        self.call_log.append('clean')
         return defer.succeed(None)
 
     def status(self):
-        self.call_log.append('clean')
+        self.call_log.append('status')
         return defer.succeed(('BuilderStatus.ABORTED', '1-1'))
 
 

=== modified file 'lib/lp/buildmaster/tests/test_builder.py'
--- lib/lp/buildmaster/tests/test_builder.py	2010-12-20 03:21:03 +0000
+++ lib/lp/buildmaster/tests/test_builder.py	2011-02-01 18:20:53 +0000
@@ -279,9 +279,10 @@
         builder = removeSecurityProxy(self.factory.makeBuilder())
         self.assertEqual(builder.url, builder.slave.url)
 
-    def test_recovery_of_aborted_slave(self):
-        # If a slave is in the ABORTED state, rescueBuilderIfLost should
-        # clean it if we don't think it's currently building anything.
+    def test_recovery_of_aborted_virtual_slave(self):
+        # If a virtual_slave is in the ABORTED state,
+        # rescueBuilderIfLost should clean it if we don't think it's
+        # currently building anything.
         # See bug 463046.
         aborted_slave = AbortedSlave()
         builder = MockBuilder("mock_builder", aborted_slave)
@@ -291,6 +292,21 @@
             self.assertIn('clean', aborted_slave.call_log)
         return d.addCallback(check_slave_calls)
 
+    def test_recovery_of_aborted_nonvirtual_slave(self):
+        # Nonvirtual slaves in the ABORTED state cannot be reliably
+        # cleaned since the sbuild process doesn't properly kill the
+        # build job.  We test that the builder is marked failed.
+        aborted_slave = AbortedSlave()
+        builder = MockBuilder("mock_builder", aborted_slave)
+        builder.currentjob = None
+        builder.virtualized = False
+        builder.builderok = True
+        d = builder.rescueIfLost()
+        def check_failed(ignored):
+            self.assertFalse(builder.builderok)
+            self.assertNotIn('clean', aborted_slave.call_log)
+        return d.addCallback(check_failed)
+
     def test_recover_ok_slave(self):
         # An idle slave is not rescued.
         slave = OkSlave()