← Back to team overview

launchpad-reviewers team mailing list archive

Re: [Merge] lp:~maxiberta/launchpad/bing-search into lp:launchpad

 

Review: Needs Fixing

This looks like a good start, thanks.  This is just a first-pass review; as you say, it could use a good deal of code deduplication, probably in some number of preparatory branches, and I think it needs some thought about how we'll deploy the switch.

Diff comments:

> 
> === modified file 'configs/development/launchpad-lazr.conf'
> --- configs/development/launchpad-lazr.conf	2018-02-02 15:29:38 +0000
> +++ configs/development/launchpad-lazr.conf	2018-03-17 20:34:09 +0000
> @@ -86,6 +86,15 @@
>  [google_test_service]
>  launch: True
>  
> +[bing]
> +# Development and the testrunner should use the stub service be default.

s/be/by/

> +site: http://launchpad.dev:8093/bingcustomsearch/v7.0/search
> +subscription_key: abcdef01234567890abcdef012345678
> +custom_config_id: 1234567890
> +
> +[bing_test_service]
> +launch: True
> +
>  [gpghandler]
>  host: keyserver.launchpad.dev
>  public_host: keyserver.launchpad.dev
> 
> === modified file 'lib/lp/app/browser/root.py'
> --- lib/lp/app/browser/root.py	2018-03-16 14:50:01 +0000
> +++ lib/lp/app/browser/root.py	2018-03-17 20:34:09 +0000
> @@ -510,24 +517,24 @@
>      def searchPages(self, query_terms, start=0):
>          """Return the up to 20 pages that match the query_terms, or None.
>  
> -        :param query_terms: The unescaped terms to query Google.
> +        :param query_terms: The unescaped terms to query for.
>          :param start: The index of the page that starts the set of pages.
> -        :return: A GooglBatchNavigator or None.
> +        :return: A SearchResultsBatchNavigator or None.
>          """
>          if query_terms in [None, '']:
>              return None
> -        google_search = getUtility(ISearchService)
> +        site_search = getUtility(ISearchService)

AIUI this allows the Bing implementation to be swapped in by changing lib/lp/services/sitesearch/configure.zcml - is that right?  I think ideally we'd register both utilities with name="google" and name="bing", and then have a feature flag to switch our preferred implementation; that way we could flip the switch at run-time.

>          try:
> -            page_matches = google_search.search(
> +            page_matches = site_search.search(
>                  terms=query_terms, start=start)
> -        except GoogleResponseError:
> -            # There was a connectivity or Google service issue that means
> +        except (BingResponseError, GoogleResponseError):

Perhaps introduce a superclass - SiteSearchResponseError or something?

> +            # There was a connectivity or the search service issue that means
>              # there is no data available at this moment.
>              self.has_page_service = False
>              return None
>          if len(page_matches) == 0:
>              return None
> -        navigator = GoogleBatchNavigator(
> +        navigator = SearchResultsBatchNavigator(
>              page_matches, self.request, start=start)
>          navigator.setHeadings(*self.batch_heading)
>          return navigator
> @@ -589,7 +596,7 @@
>          return self.start + len(self.list._window)
>  
>  
> -class GoogleBatchNavigator(BatchNavigator):
> +class SearchResultsBatchNavigator(BatchNavigator):

This seems to be SiteSearchBatchNavigator in other places, and I like that name better (since SearchResults is overly-generic).

>      """A batch navigator with a fixed size of 20 items per batch."""
>  
>      _batch_factory = WindowedListBatch
> 
> === modified file 'lib/lp/app/browser/tests/test_views.py'
> --- lib/lp/app/browser/tests/test_views.py	2011-12-28 17:03:06 +0000
> +++ lib/lp/app/browser/tests/test_views.py	2018-03-17 20:34:09 +0000
> @@ -11,7 +11,7 @@
>  
>  from lp.testing.layers import (
>      DatabaseFunctionalLayer,
> -    GoogleLaunchpadFunctionalLayer,
> +    BingLaunchpadFunctionalLayer,

Sort.

>      )
>  from lp.testing.systemdocs import (
>      LayeredDocFileSuite,
> @@ -26,7 +26,7 @@
>  # that require something special like the librarian or mailman must run
>  # on a layer that sets those services up.
>  special_test_layer = {
> -    'launchpad-search-pages.txt': GoogleLaunchpadFunctionalLayer,
> +    'launchpad-search-pages.txt': BingLaunchpadFunctionalLayer,

I kind of feel that it would be nice to continue testing both until we remove the Google support, but if that's an excessively large amount of work it probably isn't strictly necessary.

>      }
>  
>  
> 
> === modified file 'lib/lp/services/config/schema-lazr.conf'
> --- lib/lp/services/config/schema-lazr.conf	2018-03-16 14:02:16 +0000
> +++ lib/lp/services/config/schema-lazr.conf	2018-03-17 20:34:09 +0000
> @@ -794,6 +794,42 @@
>  # Example: help.launchpad.net login.launchapd.net
>  url_rewrite_exceptions: help.launchpad.net
>  
> +[bing_test_service]
> +# Run a web service stub that simulates the Bing search service.
> +
> +# Where are our canned JSON responses stored?
> +canned_response_directory: lib/lp/services/sitesearch/tests/data/
> +
> +# Which file maps service URLs to the JSON that the server returns?
> +mapfile: lib/lp/services/sitesearch/tests/data/bingsearchservice-mapping.txt
> +
> +# Where should the service log files live?
> +log: logs/bing-stub.log
> +
> +# Do we actually want to run the service?
> +launch: False
> +
> +[bing]
> +# site is the host and path that search requests are made to.
> +# eg. https://api.cognitive.microsoft.com/bingcustomsearch/v7.0/search
> +# datatype: string, a url to a host
> +site: https://api.cognitive.microsoft.com/bingcustomsearch/v7.0/search
> +
> +# subscription_key is the Cognitive Services subscription key for
> +# Bing Custom Search API.
> +# datatype: string
> +subscription_key:
> +
> +# custom_config_id is the id that identifies the custom search instance.
> +# datatype: string
> +custom_config_id:
> +
> +# url_rewrite_exceptions is a list of launchpad.net domains that must
> +# not be rewritten.
> +# datatype: string of space separated domains
> +# Example: help.launchpad.net login.launchapd.net

s/launchapd/launchpad/

(Feel free to correct the same typo in the Google section above, of course.)

> +url_rewrite_exceptions: help.launchpad.net
> +
>  [gpghandler]
>  # Should we allow uploading keys to the keyserver?
>  # datatype: boolean
> 
> === modified file 'lib/lp/services/sitesearch/__init__.py'
> --- lib/lp/services/sitesearch/__init__.py	2018-03-16 14:02:16 +0000
> +++ lib/lp/services/sitesearch/__init__.py	2018-03-17 20:34:09 +0000
> @@ -337,3 +340,161 @@
>              raise GoogleWrongGSPVersion(
>                  "Could not get any PageMatches from the GSP XML response.")
>          return PageMatches(page_matches, start, total)
> +
> +
> +@implementer(ISearchService)
> +class BingSearchService:
> +    """See `ISearchService`.
> +
> +    A search service that search Bing for launchpad.net pages.

"that searches Bing"

> +    """
> +
> +    _default_values = {
> +        'customConfig': None,
> +        'mkt': 'en-US',  # FIXME: set language based on the current request?

Please use XXX for comments like this, and sign them (https://dev.launchpad.net/PolicyandProcess/XXXPolicy).

> +        'count': 20,
> +        'offset': 0,
> +        'q': None,
> +        }
> +
> +    @property
> +    def subscription_key(self):
> +        """The subscription key issued by Bing Custom Search."""
> +        return config.bing.subscription_key
> +
> +    @property
> +    def custom_config_id(self):
> +        """The custom search instance as configured in Bing Custom Search."""
> +        return config.bing.custom_config_id
> +
> +    @property
> +    def site(self):
> +        """The URL to the Bing Custom Search service.
> +
> +        The URL is probably
> +        https://api.cognitive.microsoft.com/bingcustomsearch/v7.0/search.
> +        """
> +        return config.bing.site
> +
> +    def search(self, terms, start=0):
> +        """See `ISearchService`.
> +
> +        The `subscription_key` and `custom_config_id` are used in the
> +        search request. Search returns 20 or fewer results for each query.
> +        For terms that match more than 20 results, the start param can be
> +        used over multiple queries to get successive sets of results.
> +
> +        :return: `ISearchResults` (PageMatches).
> +        :raise: `BingResponseError` if the json response is incomplete or
> +            cannot be parsed.
> +        """
> +        search_url = self.create_search_url(terms, start=start)
> +        search_headers = self.create_search_headers()
> +        request = get_current_browser_request()
> +        timeline = get_request_timeline(request)
> +        action = timeline.start("bing-search-api", search_url)
> +        try:
> +            response = urlfetch(search_url, headers=search_headers)
> +        except (TimeoutError, requests.RequestException) as error:
> +            raise BingResponseError("The response errored: %s" % str(error))
> +        finally:
> +            action.finish()
> +        page_matches = self._parse_bing_response(response.content, start)
> +        return page_matches
> +
> +    def _checkParameter(self, name, value, is_int=False):
> +        """Check that a parameter value is not None or an empty string."""
> +        if value in (None, ''):
> +            raise AssertionError("Missing value for parameter '%s'." % name)
> +        if is_int:
> +            try:
> +                int(value)
> +            except ValueError:
> +                raise AssertionError(
> +                    "Value for parameter '%s' is not an int." % name)

Isn't this user-submitted data?  I don't think that should result in an AssertionError (which is for programming errors).

> +
> +    def create_search_url(self, terms, start=0):
> +        """Return a Bing Custom Search search url."""
> +        self._checkParameter('q', terms)
> +        self._checkParameter('offset', start, is_int=True)
> +        self._checkParameter('customConfig', self.custom_config_id)
> +        safe_terms = urllib.quote_plus(terms.encode('utf8'))
> +        search_params = dict(self._default_values)
> +        search_params['q'] = safe_terms
> +        search_params['offset'] = start
> +        search_params['customConfig'] = self.custom_config_id
> +        search_param_list = [
> +            '%s=%s' % (name, value)
> +            for name, value in sorted(search_params.items())]
> +        query_string = '&'.join(search_param_list)

This looks like a longhand version of `urllib.urlencode` (and using that would also avoid the need to manually `quote_plus` things).

> +        return self.site + '?' + query_string
> +
> +    def create_search_headers(self):
> +        """Return a dict with Bing Custom Search compatible request headers."""
> +        self._checkParameter('subscription_key', self.subscription_key)
> +        return {
> +            'Ocp-Apim-Subscription-Key': self.subscription_key,
> +            }
> +
> +    def _parse_bing_response(self, bing_json, start=0):
> +        """Return a `PageMatches` object.
> +
> +        :param bing_json: A string containing Bing Custom Search API v7 JSON.
> +        :return: `ISearchResults` (PageMatches).
> +        :raise: `BingResponseError` if the json response is incomplete or
> +            cannot be parsed.
> +        """
> +        try:
> +            bing_doc = json.loads(bing_json)
> +        except (TypeError, ValueError):
> +            raise BingResponseError("The response was incomplete, no JSON.")
> +
> +        try:
> +            response_type = bing_doc['_type']
> +        except (AttributeError, KeyError, ValueError):
> +            raise BingResponseError(
> +                "Could not get the '_type' from the Bing JSON response.")
> +
> +        if response_type == 'ErrorResponse':
> +            try:
> +                errors = [error['message'] for error in bing_doc['errors']]
> +                raise BingResponseError(
> +                    "Error response from Bing: %s" % '; '.join(errors))
> +            except (AttributeError, KeyError, TypeError, ValueError):
> +                raise BingResponseError(
> +                    "Unable to parse the Bing JSON error response.")
> +        elif response_type != 'SearchResponse':
> +            raise BingResponseError(
> +                "Unknown Bing JSON response type: '%s'." % response_type)
> +
> +        page_matches = []
> +        total = 0
> +        try:
> +            results = bing_doc['webPages']['value']
> +        except (AttributeError, KeyError, ValueError):
> +            # Bing did not match any pages. Return an empty PageMatches.
> +            return PageMatches(page_matches, start, total)
> +
> +        try:
> +            total = int(bing_doc['webPages']['totalEstimatedMatches'])
> +        except (AttributeError, KeyError, ValueError):
> +            # The datatype is not what PageMatches requires.
> +            raise BingResponseError(
> +                "Could not get the total from the Bing JSON response.")
> +        if total < 0:
> +            # See bug 683115.
> +            total = 0
> +        for result in results:
> +            url = result.get('url')
> +            title = result.get('name')
> +            summary = result.get('snippet')
> +            if None in (url, title, summary):
> +                # There is not enough data to create a PageMatch object.
> +                # This can be caused by an empty title or summary which
> +                # has been observed for pages that are from vhosts that
> +                # should not be indexed.
> +                continue
> +            summary = summary.replace('<br>', '')
> +            page_matches.append(PageMatch(title, url, summary))
> +
> +        return PageMatches(page_matches, start, total)
> 
> === added file 'lib/lp/services/sitesearch/bingtestservice.py'
> --- lib/lp/services/sitesearch/bingtestservice.py	1970-01-01 00:00:00 +0000
> +++ lib/lp/services/sitesearch/bingtestservice.py	2018-03-17 20:34:09 +0000
> @@ -0,0 +1,271 @@
> +#!/usr/bin/python
> +#
> +# Copyright 2018 Canonical Ltd.  This software is licensed under the
> +# GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +"""
> +This script runs a simple HTTP server. The server returns JSON files
> +when given certain user-configurable URLs.
> +"""
> +
> +
> +from BaseHTTPServer import (
> +    BaseHTTPRequestHandler,
> +    HTTPServer,
> +    )

Might be worth using `from six.moves.BaseHTTPServer import ...` while you're here, to avoid having to port this later.

> +import errno
> +import logging
> +import os
> +import signal
> +import socket
> +import subprocess
> +import time
> +
> +from lp.services.config import config
> +from lp.services.osutils import ensure_directory_exists
> +from lp.services.pidfile import (
> +    get_pid,
> +    make_pidfile,
> +    pidfile_path,
> +    )
> +from lp.services.webapp.url import urlsplit
> +
> +# Set up basic logging.
> +log = logging.getLogger(__name__)
> +
> +# The default service name, used by the Launchpad service framework.
> +service_name = 'bing-webservice'
> +
> +
> +class BingRequestHandler(BaseHTTPRequestHandler):
> +    """Return an JSON file depending on the requested URL."""
> +
> +    default_content_type = 'application/json; charset=utf-8'
> +
> +    def do_GET(self):
> +        """See BaseHTTPRequestHandler in the Python Standard Library."""
> +        urlmap = url_to_json_map()
> +        if self.path in urlmap:
> +            self.return_file(urlmap[self.path])
> +        else:
> +            # Return our default route.
> +            self.return_file(urlmap['*'])
> +
> +    def return_file(self, filename):
> +        """Return a HTTP response with 'filename' for content.
> +
> +        :param filename: The file name to find in the canned-data
> +            storage location.
> +        """
> +        self.send_response(200)
> +        self.send_header('Content-Type', self.default_content_type)
> +        self.end_headers()
> +
> +        content_dir = config.bing_test_service.canned_response_directory
> +        filepath = os.path.join(content_dir, filename)
> +        content_body = file(filepath).read()
> +        self.wfile.write(content_body)
> +
> +    def log_message(self, format, *args):
> +        """See `BaseHTTPRequestHandler.log_message()`."""
> +        # Substitute the base class's logger with the Python Standard
> +        # Library logger.
> +        message = ("%s - - [%s] %s" %
> +                   (self.address_string(),
> +                    self.log_date_time_string(),
> +                    format % args))
> +        log.info(message)
> +
> +
> +def url_to_json_map():
> +    """Return our URL-to-JSON mapping as a dictionary."""
> +    mapfile = config.bing_test_service.mapfile
> +    mapping = {}
> +    for line in file(mapfile):
> +        if line.startswith('#') or len(line.strip()) == 0:
> +            # Skip comments and blank lines.
> +            continue
> +        url, fname = line.split()
> +        mapping[url.strip()] = fname.strip()
> +
> +    return mapping
> +
> +
> +def get_service_endpoint():
> +    """Return the host and port that the service is running on."""
> +    return hostpair(config.bing.site)
> +
> +
> +def service_is_available(timeout=2.0):
> +    """Return True if the service is up and running.
> +
> +    :param timeout: BLOCK execution for at most 'timeout' seconds
> +        before returning False.
> +    """
> +    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> +    sock.settimeout(timeout)  # Block for 'timeout' seconds.
> +    host, port = get_service_endpoint()
> +    try:
> +        try:
> +            sock.connect((host, port))
> +        except socket.error:
> +            return False
> +        else:
> +            return True
> +    finally:
> +        sock.close()  # Clean up.
> +
> +
> +def wait_for_service(timeout=15.0):
> +    """Poll the service and BLOCK until we can connect to it.
> +
> +    :param timeout: The socket should timeout after this many seconds.
> +        Refer to the socket module documentation in the Standard Library
> +        for possible timeout values.
> +    """
> +    host, port = get_service_endpoint()
> +    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> +    sock.settimeout(timeout)  # Block for at most X seconds.
> +
> +    start = time.time()  # Record when we started polling.
> +    try:
> +        while True:
> +            try:
> +                sock.connect((host, port))
> +            except socket.error as err:
> +                if err.args[0] in [errno.ECONNREFUSED, errno.ECONNABORTED]:
> +                    elapsed = (time.time() - start)
> +                    if elapsed > timeout:
> +                        raise RuntimeError("Socket poll time exceeded.")
> +                else:
> +                    raise
> +            else:
> +                break
> +            time.sleep(0.1)
> +    finally:
> +        sock.close()  # Clean up.
> +
> +
> +def wait_for_service_shutdown(seconds_to_wait=10.0):
> +    """Poll the service until it shuts down.
> +
> +    Raises a RuntimeError if the service doesn't shut down within the allotted
> +    time, under normal operation.  It may also raise various socket errors if
> +    there are issues connecting to the service (host lookup, etc.)
> +
> +    :param seconds_to_wait: The number of seconds to wait for the socket to
> +        open up.
> +    """
> +    host, port = get_service_endpoint()
> +
> +    start = time.time()  # Record when we started polling.
> +    try:
> +        while True:
> +            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> +            sock.settimeout(5.0)  # Block for at most X seconds.
> +            try:
> +                sock.connect((host, port))
> +                sock.close()
> +            except socket.error as err:
> +                if err.args[0] == errno.ECONNREFUSED:
> +                    # Success!  The socket is closed.
> +                    return
> +                else:
> +                    raise
> +            else:
> +                elapsed = (time.time() - start)
> +                if elapsed > seconds_to_wait:
> +                    raise RuntimeError(
> +                        "The service did not shut down in the allotted time.")
> +            time.sleep(0.1)
> +    finally:
> +        sock.close()  # Clean up.
> +
> +
> +def hostpair(url):
> +    """Parse the host and port number out of a URL string."""
> +    parts = urlsplit(url)
> +    host, port = parts[1].split(':')
> +    port = int(port)
> +    return (host, port)
> +
> +
> +def start_as_process():
> +    """Run this file as a stand-alone Python script.
> +
> +    Returns a subprocess.Popen object. (See the `subprocess` module in
> +    the Python Standard Library for details.)
> +    """
> +    script = os.path.join(
> +        os.path.dirname(__file__),
> +        os.pardir, os.pardir, os.pardir, os.pardir, 'bin',
> +        'bingtestservice')
> +    # Make sure we aren't using the parent stdin and stdout to avoid spam
> +    # and have fewer things that can go wrong shutting down the process.
> +    proc = subprocess.Popen(
> +        script, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
> +        stderr=subprocess.STDOUT)
> +    proc.stdin.close()
> +    return proc
> +
> +
> +def kill_running_process():
> +    """Find and kill any running web service processes."""
> +    global service_name
> +    try:
> +        pid = get_pid(service_name)
> +    except IOError:
> +        # We could not find an existing pidfile.
> +        return
> +    except ValueError:
> +        # The file contained a mangled and invalid PID number, so we should
> +        # clean the file up.
> +        safe_unlink(pidfile_path(service_name))
> +    else:
> +        if pid is not None:
> +            try:
> +                os.kill(pid, signal.SIGTERM)
> +                # We need to use a busy-wait to find out when the socket
> +                # becomes available.  Failing to do so causes a race condition
> +                # between freeing the socket in the killed process, and
> +                # opening it in the current one.
> +                wait_for_service_shutdown()
> +            except os.error as err:
> +                if err.errno == errno.ESRCH:
> +                    # Whoops, we got a 'No such process' error. The PID file
> +                    # is probably stale, so we'll remove it to prevent trash
> +                    # from lying around in the test environment.
> +                    # See bug #237086.
> +                    safe_unlink(pidfile_path(service_name))
> +                else:
> +                    raise

This all feels rather duplicative of lp.services.pidfile.  I don't know if you can use that exactly, but worth checking.

> +
> +
> +def safe_unlink(filepath):
> +    """Unlink a file, but don't raise an error if the file is missing."""
> +    try:
> +        os.unlink(filepath)
> +    except os.error as err:
> +        if err.errno != errno.ENOENT:
> +            raise

That's lp.services.osutils.remove_if_exists.

> +
> +
> +def main():
> +    """Run the HTTP server."""
> +    # Redirect our service output to a log file.
> +    global log
> +    ensure_directory_exists(os.path.dirname(config.bing_test_service.log))
> +    filelog = logging.FileHandler(config.bing_test_service.log)
> +    log.addHandler(filelog)
> +    log.setLevel(logging.DEBUG)
> +
> +    # To support service shutdown we need to create a PID file that is
> +    # understood by the Launchpad services framework.
> +    global service_name
> +    make_pidfile(service_name)
> +
> +    host, port = get_service_endpoint()
> +    server = HTTPServer((host, port), BingRequestHandler)
> +
> +    log.info("Starting HTTP Bing webservice server on port %s", port)
> +    server.serve_forever()
> 
> === added file 'lib/lp/services/sitesearch/tests/bingserviceharness.py'
> --- lib/lp/services/sitesearch/tests/bingserviceharness.py	1970-01-01 00:00:00 +0000
> +++ lib/lp/services/sitesearch/tests/bingserviceharness.py	2018-03-17 20:34:09 +0000
> @@ -0,0 +1,107 @@
> +# Copyright 2018 Canonical Ltd.  This software is licensed under the
> +# GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +"""
> +Fixtures for running the Bing test webservice.
> +"""
> +
> +__metaclass__ = type
> +
> +__all__ = ['BingServiceTestSetup']
> +
> +
> +import errno
> +import os
> +import signal
> +
> +from lp.services.sitesearch import bingtestservice
> +
> +
> +class BingServiceTestSetup:
> +    """Set up the Bing web service stub for use in functional tests.
> +    """
> +
> +    # XXX gary 2008-12-06 bug=305858: Spurious test failures discovered on
> +    # buildbot, builds 40 and 43. The locations of the failures are marked
> +    # below with " # SPURIOUS FAILURE". To reinstate, add the text below back
> +    # to the docstring above.  Note that the test that uses this setup,
> +    # bing-service-stub.txt, is also disabled.  See test_doc.py.
> +    """
> +    >>> from lp.services.sitesearch.bingtestservice import (
> +    ...     service_is_available)
> +    >>> from lp.services.config import config

It'd be super-nice to refactor all this as unit tests while you're here, and maybe that would make it feasible to see why things are going wrong sometimes (if they indeed still are).

> +
> +    >>> assert not service_is_available()  # Sanity check. # SPURIOUS FAILURE
> +
> +    >>> BingServiceTestSetup().setUp()
> +
> +    After setUp is called, a Bing test service instance is running.
> +
> +    >>> assert service_is_available()
> +    >>> assert BingServiceTestSetup.service is not None
> +
> +    After tearDown is called, the service is shut down.
> +
> +    >>> BingServiceTestSetup().tearDown()
> +
> +    >>> assert not service_is_available()
> +    >>> assert BingServiceTestSetup.service is None
> +
> +    The fixture can be started and stopped multiple time in succession:
> +
> +    >>> BingServiceTestSetup().setUp()
> +    >>> assert service_is_available()
> +
> +    Having a service instance already running doesn't prevent a new
> +    service from starting.  The old instance is killed off and replaced
> +    by the new one.
> +
> +    >>> old_pid = BingServiceTestSetup.service.pid
> +    >>> BingServiceTestSetup().setUp() # SPURIOUS FAILURE
> +    >>> BingServiceTestSetup.service.pid != old_pid
> +    True
> +
> +    Tidy up.
> +
> +    >>> BingServiceTestSetup().tearDown()
> +    >>> assert not service_is_available()
> +
> +    """
> +
> +    service = None  # A reference to our running service.
> +
> +    def setUp(self):
> +        self.startService()
> +
> +    def tearDown(self):
> +        self.stopService()
> +
> +    @classmethod
> +    def startService(cls):
> +        """Start the webservice."""
> +        bingtestservice.kill_running_process()
> +        cls.service = bingtestservice.start_as_process()
> +        assert cls.service, "The Search service process did not start."
> +        try:
> +            bingtestservice.wait_for_service()
> +        except RuntimeError:
> +            # The service didn't start itself soon enough.  We must
> +            # make sure to kill any errant processes that may be
> +            # hanging around.
> +            cls.stopService()
> +            raise
> +
> +    @classmethod
> +    def stopService(cls):
> +        """Shut down the webservice instance."""
> +        if cls.service:
> +            try:
> +                os.kill(cls.service.pid, signal.SIGTERM)
> +            except OSError as error:
> +                if error.errno != errno.ESRCH:
> +                    raise
> +                # The process with the given pid doesn't exist, so there's
> +                # nothing to kill or wait for.
> +            else:
> +                cls.service.wait()
> +        cls.service = None
> 
> === added file 'lib/lp/services/sitesearch/tests/test_bing.py'
> --- lib/lp/services/sitesearch/tests/test_bing.py	1970-01-01 00:00:00 +0000
> +++ lib/lp/services/sitesearch/tests/test_bing.py	2018-03-17 20:34:09 +0000
> @@ -0,0 +1,133 @@
> +# Copyright 2011-2018 Canonical Ltd.  This software is licensed under the
> +# GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +"""Test the bing search service."""
> +
> +__metaclass__ = type
> +
> +from contextlib import contextmanager
> +
> +from requests.exceptions import (
> +    ConnectionError,
> +    HTTPError,
> +    )
> +
> +from lp.services.sitesearch import BingSearchService
> +from lp.services.sitesearch.interfaces import BingResponseError
> +from lp.services.timeout import TimeoutError
> +from lp.testing import TestCase
> +from lp.testing.layers import (
> +    BingLaunchpadFunctionalLayer,
> +    LaunchpadFunctionalLayer,
> +    )
> +
> +
> +@contextmanager
> +def urlfetch_exception(test_error, *args):
> +    """Raise an error during the execution of urlfetch.
> +
> +    This function replaces urlfetch() with a function that
> +    raises an error.
> +    """
> +
> +    def raise_exception(url):
> +        raise test_error(*args)
> +
> +    from lp.services import timeout
> +    original_urlfetch = timeout.urlfetch
> +    timeout.urlfetch = raise_exception
> +    try:
> +        yield
> +    finally:
> +        timeout.urlfetch = original_urlfetch

Should no doubt be common with the Google tests, and maybe this should just be `from fixtures import MockPatch; self.useFixture(MockPatch('lp.services.timeout.urlfetch', side_effect=test_error(*args)))` or something like that.

> +
> +
> +class TestBingSearchService(TestCase):
> +    """Test GoogleSearchService."""

:-)

> +
> +    layer = LaunchpadFunctionalLayer
> +
> +    def setUp(self):
> +        super(TestBingSearchService, self).setUp()
> +        self.search_service = BingSearchService()
> +
> +    def test_search_converts_HTTPError(self):
> +        # The method converts HTTPError to BingResponseError.
> +        args = ('url', 500, 'oops', {}, None)
> +        with urlfetch_exception(HTTPError, *args):
> +            self.assertRaises(
> +                BingResponseError, self.search_service.search, 'fnord')
> +
> +    def test_search_converts_ConnectionError(self):
> +        # The method converts ConnectionError to BingResponseError.
> +        with urlfetch_exception(ConnectionError, 'oops'):
> +            self.assertRaises(
> +                BingResponseError, self.search_service.search, 'fnord')
> +
> +    def test_search_converts_TimeoutError(self):
> +        # The method converts TimeoutError to BingResponseError.
> +        with urlfetch_exception(TimeoutError, 'oops'):
> +            self.assertRaises(
> +                BingResponseError, self.search_service.search, 'fnord')
> +
> +    def test_parse_bing_reponse_TypeError(self):

s/reponse/response/ (same below).

> +        # The method converts TypeError to BingResponseError.
> +        self.assertRaises(
> +            BingResponseError,
> +            self.search_service._parse_bing_response, None)
> +
> +    def test_parse_bing_reponse_ValueError(self):
> +        # The method converts ValueError to BingResponseError.
> +        self.assertRaises(
> +            BingResponseError,
> +            self.search_service._parse_bing_response, '')
> +
> +    def test_parse_bing_reponse_KeyError(self):
> +        # The method converts KeyError to BingResponseError.
> +        self.assertRaises(
> +            BingResponseError,
> +            self.search_service._parse_bing_response, '{}')
> +
> +
> +class FunctionalTestBingSearchService(TestCase):
> +    """Test GoogleSearchService."""
> +
> +    layer = BingLaunchpadFunctionalLayer
> +
> +    def setUp(self):
> +        super(FunctionalTestBingSearchService, self).setUp()
> +        self.search_service = BingSearchService()
> +
> +    def test_search_with_results(self):
> +        matches = self.search_service.search('bug')
> +        self.assertEqual(0, matches.start)
> +        self.assertEqual(87000, matches.total)
> +        self.assertEqual(20, len(matches))
> +
> +    def test_search_with_results_and_offset(self):
> +        matches = self.search_service.search('bug', start=20)
> +        self.assertEqual(20, matches.start)
> +        self.assertEqual(87000, matches.total)
> +        self.assertEqual(15, len(matches))
> +
> +    def test_search_no_results(self):
> +        matches = self.search_service.search('fnord')
> +        self.assertEqual(0, matches.start)
> +        self.assertEqual(0, matches.total)
> +        self.assertEqual(0, len(matches))
> +
> +    def test_search_no_meaningful_results(self):
> +        matches = self.search_service.search('no-meaningful')
> +        self.assertEqual(0, matches.start)
> +        self.assertEqual(0, matches.total)
> +        self.assertEqual(0, len(matches))
> +
> +    def test_search_incomplete_response(self):
> +        self.assertRaises(
> +            BingResponseError,
> +            self.search_service.search, 'gnomebaker')
> +
> +    def test_search_error_response(self):
> +        self.assertRaises(
> +            BingResponseError,
> +            self.search_service.search, 'errors-please')
> 
> === added file 'lib/lp/services/sitesearch/tests/test_bingservice.py'
> --- lib/lp/services/sitesearch/tests/test_bingservice.py	1970-01-01 00:00:00 +0000
> +++ lib/lp/services/sitesearch/tests/test_bingservice.py	2018-03-17 20:34:09 +0000
> @@ -0,0 +1,53 @@
> +# Copyright 2009-2018 Canonical Ltd.  This software is licensed under the
> +# GNU Affero General Public License version 3 (see the file LICENSE).
> +
> +"""
> +Unit tests for the Bing test service stub.
> +"""
> +
> +__metaclass__ = type
> +
> +
> +import errno
> +import os
> +import unittest
> +
> +from lp.services.pidfile import pidfile_path
> +from lp.services.sitesearch import bingtestservice
> +
> +
> +class TestServiceUtilities(unittest.TestCase):
> +    """Test the service's supporting functions."""
> +
> +    def test_stale_pid_file_cleanup(self):
> +        """The service should be able to clean up invalid PID files."""
> +        bogus_pid = 9999999
> +        self.assertFalse(
> +            process_exists(bogus_pid),
> +            "There is already a process with PID '%d'." % bogus_pid)
> +
> +        # Create a stale/bogus PID file.
> +        filepath = pidfile_path(bingtestservice.service_name)
> +        pidfile = file(filepath, 'w')
> +        pidfile.write(str(bogus_pid))
> +        pidfile.close()

Use a context manager.

> +
> +        # The PID clean-up code should silently remove the file and return.
> +        bingtestservice.kill_running_process()
> +        self.assertFalse(
> +            os.path.exists(filepath),
> +            "The PID file '%s' should have been deleted." % filepath)
> +
> +
> +def process_exists(pid):
> +    """Return True if the specified process already exists."""
> +    try:
> +        os.kill(pid, 0)
> +    except os.error as err:
> +        if err.errno == errno.ESRCH:
> +            # All is well - the process doesn't exist.
> +            return False
> +        else:
> +            # We got a strange OSError, which we'll pass upwards.
> +            raise
> +    return True

We have similar code in a few places; I think this should go into lp.services.osutils, and all the existing users refactored.  (You should probably do this in a preparatory branch.)



-- 
https://code.launchpad.net/~maxiberta/launchpad/bing-search/+merge/341549
Your team Launchpad code reviewers is subscribed to branch lp:launchpad.


References