← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~cjwatson/launchpad/builder-version into lp:launchpad

 

Colin Watson has proposed merging lp:~cjwatson/launchpad/builder-version into lp:launchpad.

Commit message:
Show the launchpad-buildd version on the webservice and in the UI for each builder.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #680514 in Launchpad itself: "No way to discover version of launchpad-buildd installed on builders"
  https://bugs.launchpad.net/launchpad/+bug/680514

For more details, see:
https://code.launchpad.net/~cjwatson/launchpad/builder-version/+merge/192819

== Summary ==

Figuring out which version of launchpad-buildd is running involves either asking webops or guessing based on the heads of recent build logs.  I've been working on passing this information around in such a way that Launchpad can keep track of it and show it to us directly, and this is the last piece of that work.

== Proposed fix ==

Model Builder.version, export it, and show it on Builder:+index.  Fetch the builder_version entry from the builder status dictionary and update Builder.version with it as necessary.

== Implementation details ==

As a bonus, we now only ask building slaves for their status once per scan cycle rather than twice.

== LOC Rationale ==

+57.  I think this is worth it because it makes it much easier to tell what's going on when we're deploying new launchpad-buildd code.

I removed Builder:+portlet-details to try to mitigate the increase, since as far as I can tell it was unused.

== Tests ==

bin/test -vvct buildmaster -t soyuz

== Demo and Q/A ==

After deploying on dogfood and waiting a scan cycle, we should immediately see the current versions of launchpad-buildd on its non-disabled builders both on the API (Builder.version) and in the web UI (Builder:+index).  Get webops to upgrade one of the builders and check that Launchpad notices.
-- 
https://code.launchpad.net/~cjwatson/launchpad/builder-version/+merge/192819
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~cjwatson/launchpad/builder-version into lp:launchpad.
=== modified file 'lib/lp/buildmaster/interactor.py'
--- lib/lp/buildmaster/interactor.py	2013-10-09 00:02:50 +0000
+++ lib/lp/buildmaster/interactor.py	2013-10-27 14:41:29 +0000
@@ -214,7 +214,7 @@
 BuilderVitals = namedtuple(
     'BuilderVitals',
     ('name', 'url', 'virtualized', 'vm_host', 'builderok', 'manual',
-     'build_queue'))
+     'build_queue', 'version'))
 
 _BQ_UNSPECIFIED = object()
 
@@ -224,7 +224,7 @@
         build_queue = builder.currentjob
     return BuilderVitals(
         builder.name, builder.url, builder.virtualized, builder.vm_host,
-        builder.builderok, builder.manual, build_queue)
+        builder.builderok, builder.manual, build_queue, builder.version)
 
 
 class BuilderInteractor(object):
@@ -248,7 +248,8 @@
 
     @classmethod
     @defer.inlineCallbacks
-    def rescueIfLost(cls, vitals, slave, expected_cookie, logger=None):
+    def rescueIfLost(cls, vitals, slave, slave_status, expected_cookie,
+                     logger=None):
         """Reset the slave if its job information doesn't match the DB.
 
         This checks the build ID reported in the slave status against
@@ -261,9 +262,8 @@
             False otherwise.
         """
         # Determine the slave's current build cookie.
-        status_dict = yield slave.status_dict()
-        status = status_dict['builder_status']
-        slave_cookie = status_dict.get('build_id')
+        status = slave_status['builder_status']
+        slave_cookie = slave_status.get('build_id')
 
         if slave_cookie == expected_cookie:
             # The master and slave agree about the current job. Continue.
@@ -423,13 +423,13 @@
         defer.returnValue(candidate)
 
     @staticmethod
-    def extractBuildStatus(status_dict):
+    def extractBuildStatus(slave_status):
         """Read build status name.
 
-        :param status_dict: build status dict from BuilderSlave.status_dict.
+        :param slave_status: build status dict from BuilderSlave.status_dict.
         :return: the unqualified status name, e.g. "OK".
         """
-        status_string = status_dict['build_status']
+        status_string = slave_status['build_status']
         lead_string = 'BuildStatus.'
         assert status_string.startswith(lead_string), (
             "Malformed status string: '%s'" % status_string)
@@ -437,7 +437,8 @@
 
     @classmethod
     @defer.inlineCallbacks
-    def updateBuild(cls, vitals, slave, builder_factory, behavior_factory):
+    def updateBuild(cls, vitals, slave, slave_status, builder_factory,
+                    behavior_factory):
         """Verify the current build job status.
 
         Perform the required actions for each state.
@@ -448,8 +449,7 @@
         # impossible to get past rescueIfLost unless the slave matches
         # the DB, and this method isn't called unless the DB says
         # there's a job.
-        status = yield slave.status_dict()
-        builder_status = status['builder_status']
+        builder_status = slave_status['builder_status']
         if builder_status == 'BuilderStatus.BUILDING':
             # Build still building, collect the logtail.
             if vitals.build_queue.job.status != JobStatus.RUNNING:
@@ -458,7 +458,7 @@
                 raise AssertionError(
                     "Job not running when assigned and slave building.")
             vitals.build_queue.logtail = encoding.guess(
-                str(status.get('logtail')))
+                str(slave_status.get('logtail')))
             transaction.commit()
         elif builder_status == 'BuilderStatus.ABORTING':
             # Build is being aborted.
@@ -470,7 +470,8 @@
             builder = builder_factory[vitals.name]
             behavior = behavior_factory(vitals.build_queue, builder, slave)
             yield behavior.handleStatus(
-                vitals.build_queue, cls.extractBuildStatus(status), status)
+                vitals.build_queue, cls.extractBuildStatus(slave_status),
+                slave_status)
         else:
             raise AssertionError("Unknown status %s" % builder_status)
 

=== modified file 'lib/lp/buildmaster/interfaces/builder.py'
--- lib/lp/buildmaster/interfaces/builder.py	2013-10-02 05:47:26 +0000
+++ lib/lp/buildmaster/interfaces/builder.py	2013-10-27 14:41:29 +0000
@@ -160,6 +160,10 @@
         title=_('Failure Count'), required=False, default=0,
        description=_("Number of consecutive failures for this builder.")))
 
+    version = exported(Text(
+        title=_('Version'), required=False,
+        description=_('The version of launchpad-buildd on the slave.')))
+
     def gotFailure():
         """Increment failure_count on the builder."""
 

=== modified file 'lib/lp/buildmaster/manager.py'
--- lib/lp/buildmaster/manager.py	2013-10-10 07:54:44 +0000
+++ lib/lp/buildmaster/manager.py	2013-10-27 14:41:29 +0000
@@ -385,6 +385,13 @@
             self._cached_build_queue = vitals.build_queue
         return self._cached_build_cookie
 
+    def updateVersion(self, vitals, slave_status):
+        """Update the DB's record of the slave version if necessary."""
+        version = slave_status.get("builder_version")
+        if version != vitals.version:
+            self.builder_factory[self.builder_name].version = version
+            transaction.commit()
+
     @defer.inlineCallbacks
     def scan(self):
         """Probe the builder and update/dispatch/collect as appropriate.
@@ -395,6 +402,11 @@
         vitals = self.builder_factory.getVitals(self.builder_name)
         interactor = self.interactor_factory()
         slave = self.slave_factory(vitals)
+        if vitals.builderok:
+            self.logger.debug("Scanning %s." % self.builder_name)
+            slave_status = yield slave.status_dict()
+        else:
+            slave_status = None
 
         # Confirm that the DB and slave sides are in a valid, mutually
         # agreeable state.
@@ -402,12 +414,14 @@
         if not vitals.builderok:
             lost_reason = '%s is disabled' % vitals.name
         else:
-            self.logger.debug("Scanning %s." % self.builder_name)
+            self.updateVersion(vitals, slave_status)
             cancelled = yield self.checkCancellation(vitals, slave, interactor)
             if cancelled:
                 return
+            assert slave_status is not None
             lost = yield interactor.rescueIfLost(
-                vitals, slave, self.getExpectedCookie(vitals), self.logger)
+                vitals, slave, slave_status, self.getExpectedCookie(vitals),
+                self.logger)
             if lost:
                 lost_reason = '%s is lost' % vitals.name
 
@@ -430,8 +444,10 @@
         if vitals.build_queue is not None:
             # Scan the slave and get the logtail, or collect the build
             # if it's ready.  Yes, "updateBuild" is a bad name.
+            assert slave_status is not None
             yield interactor.updateBuild(
-                vitals, slave, self.builder_factory, self.behavior_factory)
+                vitals, slave, slave_status, self.builder_factory,
+                self.behavior_factory)
         elif vitals.manual:
             # If the builder is in manual mode, don't dispatch anything.
             self.logger.debug(

=== modified file 'lib/lp/buildmaster/model/builder.py'
--- lib/lp/buildmaster/model/builder.py	2013-10-10 06:31:30 +0000
+++ lib/lp/buildmaster/model/builder.py	2013-10-27 14:41:29 +0000
@@ -83,6 +83,7 @@
     vm_host = StringCol(dbName='vm_host')
     active = BoolCol(dbName='active', notNull=True, default=True)
     failure_count = IntCol(dbName='failure_count', default=0, notNull=True)
+    version = StringCol(dbName='version')
 
     # The number of times a builder can consecutively fail before we
     # reset its current job.

=== modified file 'lib/lp/buildmaster/tests/mock_slaves.py'
--- lib/lp/buildmaster/tests/mock_slaves.py	2013-10-08 11:56:24 +0000
+++ lib/lp/buildmaster/tests/mock_slaves.py	2013-10-27 14:41:29 +0000
@@ -47,7 +47,8 @@
     """Emulates a IBuilder class."""
 
     def __init__(self, name='mock-builder', builderok=True, manual=False,
-                 virtualized=True, vm_host=None, url='http://fake:0000'):
+                 virtualized=True, vm_host=None, url='http://fake:0000',
+                 version=None):
         self.currentjob = None
         self.builderok = builderok
         self.manual = manual
@@ -56,6 +57,7 @@
         self.virtualized = virtualized
         self.vm_host = vm_host
         self.failnotes = None
+        self.version = version
 
     def failBuilder(self, reason):
         self.builderok = False
@@ -69,12 +71,16 @@
 
     The architecture tag can be customised during initialization."""
 
-    def __init__(self, arch_tag=I386_ARCHITECTURE_NAME):
+    def __init__(self, arch_tag=I386_ARCHITECTURE_NAME, version=None):
         self.call_log = []
         self.arch_tag = arch_tag
+        self.version = version
 
     def status_dict(self):
-        return defer.succeed({'builder_status': 'BuilderStatus.IDLE'})
+        slave_status = {'builder_status': 'BuilderStatus.IDLE'}
+        if self.version is not None:
+            slave_status['builder_version'] = self.version
+        return defer.succeed(slave_status)
 
     def ensurepresent(self, sha1, url, user=None, password=None):
         self.call_log.append(('ensurepresent', url, user, password))

=== modified file 'lib/lp/buildmaster/tests/test_interactor.py'
--- lib/lp/buildmaster/tests/test_interactor.py	2013-10-09 00:02:50 +0000
+++ lib/lp/buildmaster/tests/test_interactor.py	2013-10-27 14:41:29 +0000
@@ -149,20 +149,20 @@
         slave = BuilderInteractor.makeSlaveFromVitals(vitals)
         self.assertEqual(5, slave.timeout)
 
+    @defer.inlineCallbacks
     def test_rescueIfLost_aborts_lost_and_broken_slave(self):
         # A slave that's 'lost' should be aborted; when the slave is
         # broken then abort() should also throw a fault.
         slave = LostBuildingBrokenSlave()
-        d = BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), slave, 'trivial')
-
-        def check_slave_status(failure):
+        slave_status = yield slave.status_dict()
+        try:
+            yield BuilderInteractor.rescueIfLost(
+                extract_vitals_from_db(MockBuilder()), slave, slave_status,
+                'trivial')
+        except xmlrpclib.Fault:
             self.assertIn('abort', slave.call_log)
-            # 'Fault' comes from the LostBuildingBrokenSlave, this is
-            # just testing that the value is passed through.
-            self.assertIsInstance(failure.value, xmlrpclib.Fault)
-
-        return d.addBoth(check_slave_status)
+        else:
+            self.fail("xmlrpclib.Fault not raised")
 
     @defer.inlineCallbacks
     def test_recover_idle_slave(self):
@@ -170,8 +170,10 @@
         # idle. SlaveScanner.scan() will clean up the DB side, because
         # we still report that it's lost.
         slave = OkSlave()
+        slave_status = yield slave.status_dict()
         lost = yield BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), slave, 'trivial')
+            extract_vitals_from_db(MockBuilder()), slave, slave_status,
+            'trivial')
         self.assertTrue(lost)
         self.assertEqual([], slave.call_log)
 
@@ -179,8 +181,9 @@
     def test_recover_ok_slave(self):
         # An idle slave that's meant to be idle is not rescued.
         slave = OkSlave()
+        slave_status = yield slave.status_dict()
         lost = yield BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), slave, None)
+            extract_vitals_from_db(MockBuilder()), slave, slave_status, None)
         self.assertFalse(lost)
         self.assertEqual([], slave.call_log)
 
@@ -189,8 +192,10 @@
         # rescueIfLost does not attempt to abort or clean a builder that is
         # WAITING.
         waiting_slave = WaitingSlave(build_id='trivial')
+        slave_status = yield waiting_slave.status_dict()
         lost = yield BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), waiting_slave, 'trivial')
+            extract_vitals_from_db(MockBuilder()), waiting_slave, slave_status,
+            'trivial')
         self.assertFalse(lost)
         self.assertEqual(['status_dict'], waiting_slave.call_log)
 
@@ -202,8 +207,10 @@
         # builder is reset for a new build, and the corrupt build is
         # discarded.
         waiting_slave = WaitingSlave(build_id='non-trivial')
+        slave_status = yield waiting_slave.status_dict()
         lost = yield BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), waiting_slave, 'trivial')
+            extract_vitals_from_db(MockBuilder()), waiting_slave, slave_status,
+            'trivial')
         self.assertTrue(lost)
         self.assertEqual(['status_dict', 'clean'], waiting_slave.call_log)
 
@@ -212,8 +219,10 @@
         # rescueIfLost does not attempt to abort or clean a builder that is
         # BUILDING.
         building_slave = BuildingSlave(build_id='trivial')
+        slave_status = yield building_slave.status_dict()
         lost = yield BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), building_slave, 'trivial')
+            extract_vitals_from_db(MockBuilder()), building_slave,
+            slave_status, 'trivial')
         self.assertFalse(lost)
         self.assertEqual(['status_dict'], building_slave.call_log)
 
@@ -222,8 +231,10 @@
         # If a slave is BUILDING with a build id we don't recognize, then we
         # abort the build, thus stopping it in its tracks.
         building_slave = BuildingSlave(build_id='non-trivial')
+        slave_status = yield building_slave.status_dict()
         lost = yield BuilderInteractor.rescueIfLost(
-            extract_vitals_from_db(MockBuilder()), building_slave, 'trivial')
+            extract_vitals_from_db(MockBuilder()), building_slave,
+            slave_status, 'trivial')
         self.assertTrue(lost)
         self.assertEqual(['status_dict', 'abort'], building_slave.call_log)
 

=== modified file 'lib/lp/buildmaster/tests/test_manager.py'
--- lib/lp/buildmaster/tests/test_manager.py	2013-10-11 12:08:28 +0000
+++ lib/lp/buildmaster/tests/test_manager.py	2013-10-27 14:41:29 +0000
@@ -451,6 +451,31 @@
         self.assertEqual(build.status, BuildStatus.NEEDSBUILD)
 
     @defer.inlineCallbacks
+    def test_update_slave_version(self):
+        # If the reported slave version differs from the DB's record of it,
+        # then scanning the builder updates the DB.
+        slave = OkSlave(version="100")
+        builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
+        builder.version = "99"
+        self._resetBuilder(builder)
+        self.patch(BuilderSlave, 'makeBuilderSlave', FakeMethod(slave))
+        scanner = self._getScanner()
+        yield scanner.scan()
+        self.assertEqual("100", builder.version)
+
+    def test_updateVersion_no_op(self):
+        # If the slave version matches the DB, then updateVersion does not
+        # touch the DB.
+        builder = getUtility(IBuilderSet)[BOB_THE_BUILDER_NAME]
+        builder.version = "100"
+        transaction.commit()
+        vitals = extract_vitals_from_db(builder)
+        scanner = self._getScanner()
+        with StormStatementRecorder() as recorder:
+            scanner.updateVersion(vitals, {"builder_version": "100"})
+        self.assertThat(recorder, HasQueryCount(Equals(0)))
+
+    @defer.inlineCallbacks
     def test_cancelling_a_build(self):
         # When scanning an in-progress build, if its state is CANCELLING
         # then the build should be aborted, and eventually stopped and moved

=== modified file 'lib/lp/soyuz/browser/configure.zcml'
--- lib/lp/soyuz/browser/configure.zcml	2013-09-11 06:05:44 +0000
+++ lib/lp/soyuz/browser/configure.zcml	2013-10-27 14:41:29 +0000
@@ -430,9 +430,6 @@
         <browser:page
             template="../templates/builder-index.pt"
             name="+index"/>
-        <browser:page
-            template="../templates/builder-portlet-details.pt"
-            name="+portlet-details"/>
     </browser:pages>
     <browser:pages
         for="lp.buildmaster.interfaces.builder.IBuilder"

=== modified file 'lib/lp/soyuz/stories/soyuz/xx-builder-page.txt'
--- lib/lp/soyuz/stories/soyuz/xx-builder-page.txt	2013-09-27 04:13:23 +0000
+++ lib/lp/soyuz/stories/soyuz/xx-builder-page.txt	2013-10-27 14:41:29 +0000
@@ -1,9 +1,18 @@
-= Builder page =
+Builder page
+============
 
 An anonymous user visits the +builds page. He can see a summary of the
 builder state. In the sampledata, the builder 'bob' is building
 'mozilla-firefox'.
 
+    >>> from zope.component import getUtility
+    >>> from lp.buildmaster.interfaces.builder import IBuilderSet
+
+    >>> login('foo.bar@xxxxxxxxxxxxx')
+    >>> bob_builder = getUtility(IBuilderSet)['bob']
+    >>> bob_builder.version = "100"
+    >>> logout()
+
     >>> anon_browser.mech_browser.set_handle_equiv(False)
     >>> anon_browser.open("http://launchpad.dev/builders";)
     >>> anon_browser.getLink("bob").click()
@@ -51,13 +60,15 @@
     Owner: Launchpad Buildd Admins
     Mode: This builder is in auto-mode and accepting jobs from the
     auto-build system.
+    Version: 100
 
 The builder status is now displayed as normal text and not as as a
 notification alert.
 
     >>> print_feedback_messages(anon_browser.contents)
 
-== Builder Actions ==
+Builder Actions
+---------------
 
 All builder actions require authorization, some launchpad.Edit, some
 launchpad.Admin, both are only granted to members of
@@ -144,6 +155,7 @@
     Owner: Launchpad Buildd Admins
     Mode: This builder is in auto-mode and accepting jobs from the
     auto-build system. Switch to manual-mode
+    Version: 100
 
 He clicks on the Toggle mode button to put the builder into manual mode.
 
@@ -158,6 +170,7 @@
     ...
     Mode: This builder is in manual-mode and not accepting jobs from the
     auto-build system. Switch to auto-mode
+    ...
 
 And a relevant notification is displayed after the mode toggle.
 
@@ -184,7 +197,8 @@
     The builder &quot;Bob The Builder&quot; was updated successfully.
 
 
-== Marking a builder as inactive ==
+Marking a builder as inactive
+-----------------------------
 
 The builder administrators can hide a builder from the public list
 when they judge it convenient, for instance, when the builder present
@@ -237,7 +251,8 @@
     Bob The Builder : Build Farm
 
 
-== Actions permissions ==
+Actions permissions
+-------------------
 
 Normal users, such as No Privileges Person are not shown the
 Change details link.

=== modified file 'lib/lp/soyuz/stories/webservice/xx-builders.txt'
--- lib/lp/soyuz/stories/webservice/xx-builders.txt	2012-05-29 22:34:12 +0000
+++ lib/lp/soyuz/stories/webservice/xx-builders.txt	2013-10-27 14:41:29 +0000
@@ -35,6 +35,7 @@
     name
     title
     url
+    version
     virtualized
     vm_host
 

=== modified file 'lib/lp/soyuz/templates/builder-index.pt'
--- lib/lp/soyuz/templates/builder-index.pt	2012-03-10 13:48:37 +0000
+++ lib/lp/soyuz/templates/builder-index.pt	2013-10-27 14:41:29 +0000
@@ -70,6 +70,10 @@
                   Switch to x-mode</button>
               </form>
             </dd>
+            <tal:version tal:condition="context/version">
+              <dt>Version:</dt>
+              <dd tal:content="context/version" />
+            </tal:version>
           </dl>
         </div><!-- portlet -->
       </div><!-- yui-u -->

=== removed file 'lib/lp/soyuz/templates/builder-portlet-details.pt'
--- lib/lp/soyuz/templates/builder-portlet-details.pt	2012-07-06 06:02:33 +0000
+++ lib/lp/soyuz/templates/builder-portlet-details.pt	1970-01-01 00:00:00 +0000
@@ -1,24 +0,0 @@
-<div
-  xmlns:tal="http://xml.zope.org/namespaces/tal";
-  xmlns:metal="http://xml.zope.org/namespaces/metal";
-  xmlns:i18n="http://xml.zope.org/namespaces/i18n";
-  class="portlet" id="portlet-details">
-   <h2><span tal:replace="context/title">BUILDER</span> details</h2>
-
-     <b>Name</b>: <span tal:replace="context/title"/><br/>
-
-     <b>Architecture</b>: <span tal:replace="context/processor/title"/><br/>
-
-     <b>Location</b>: <span tal:replace="context/url"/><br/>
-
-     <b>Virtual</b>: <span tal:replace="context/virtualized"/><br/>
-
-     <b>Description</b>
-     <div tal:content="structure context/description/fmt:text-to-html"/>
-
-     <b>Status</b>
-     <div tal:content="structure context/status/fmt:text-to-html"/>
-
-     <b>Owner</b>
-     <a tal:replace="structure context/owner/fmt:link">OWNER</a>
-</div>


Follow ups