← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] lp:~jcsackett/launchpad/page-match-rewrite-url into lp:launchpad

 

j.c.sackett has proposed merging lp:~jcsackett/launchpad/page-match-rewrite-url into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)
Related bugs:
  Bug #784273 in Launchpad itself: "InvalidURIError generated by unescaped search on frontpage"
  https://bugs.launchpad.net/launchpad/+bug/784273

For more details, see:
https://code.launchpad.net/~jcsackett/launchpad/page-match-rewrite-url/+merge/61643

Summary
=======
An error occurs on our search page because bad url data is passed in from the googleservice that our PageMatch class doesn't handle well.

This branch adds a facility to the PageMatch to deal with bad URIs, and adds nofollow noindex to our search pages because they shouldn't be cached by google in the first place.

Preimplemenation
================
Spoke with Curtis Hovey

Implementation
==============
lib/lp/app/templates/launchpad-search.pt
----------------------------------------
Added nofollow, noindex to the header.

lib/lp/services/googlesearch/__init__.py
lib/lp/services/googlesearch/tests/test_pagematch.py
----------------------------------------------------
Sanitizing code and the tests.

Tests
=====
bin/test -t rewrite_url

QA
==
Follow the link on the bug report; it shouldn't OOPs.

Lint
====

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  lib/lp/app/templates/launchpad-search.pt
  lib/lp/services/googlesearch/__init__.py
  lib/lp/services/googlesearch/tests/test_pagematch.py

-- 
https://code.launchpad.net/~jcsackett/launchpad/page-match-rewrite-url/+merge/61643
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jcsackett/launchpad/page-match-rewrite-url into lp:launchpad.
=== modified file 'lib/lp/app/templates/launchpad-search.pt'
--- lib/lp/app/templates/launchpad-search.pt	2010-04-23 13:50:57 +0000
+++ lib/lp/app/templates/launchpad-search.pt	2011-05-19 19:11:42 +0000
@@ -6,7 +6,11 @@
   metal:use-macro="view/macro:page/searchless"
   i18n:domain="launchpad"
 >
-
+  <head>
+    <tal:head_epilogue metal:fill-slot="head_epilogue">
+      <meta name="robots" content="noindex,nofollow" />
+    </tal:head_epilogue>
+  </head>
   <body>
     <tal:heading metal:fill-slot="heading">
       <h1 tal:content="view/page_heading">

=== modified file 'lib/lp/services/googlesearch/__init__.py'
--- lib/lp/services/googlesearch/__init__.py	2011-03-07 16:32:12 +0000
+++ lib/lp/services/googlesearch/__init__.py	2011-05-19 19:11:42 +0000
@@ -16,7 +16,10 @@
 import xml.etree.cElementTree as ET
 import urllib
 import urllib2
-from urlparse import urlunparse
+from urlparse import (
+    urlunparse,
+    parse_qsl,
+    )
 
 from lazr.restful.utils import get_current_browser_request
 from lazr.uri import URI
@@ -80,6 +83,15 @@
         self.summary = summary
         self.url = self._rewrite_url(url)
 
+    def _sanitize_query_string(self, url):
+        """Escapes invalid urls."""
+        parts = urlparse(url)
+        querydata = parse_qsl(parts.query)
+        querystring = urllib.urlencode(querydata)
+        urldata = list(parts)
+        urldata[-2] = querystring
+        return urlunparse(urldata)
+
     def _strip_trailing_slash(self, url):
         """Return the url without a trailing slash."""
         uri = URI(url).ensureNoSlash()
@@ -96,6 +108,7 @@
             launchpad environment.
         :return: A URL str.
         """
+        url = self._sanitize_query_string(url)
         if self.url_rewrite_hostname == 'launchpad.net':
             # Do not rewrite the url is the hostname is the public hostname.
             return self._strip_trailing_slash(url)

=== added file 'lib/lp/services/googlesearch/tests/test_pagematch.py'
--- lib/lp/services/googlesearch/tests/test_pagematch.py	1970-01-01 00:00:00 +0000
+++ lib/lp/services/googlesearch/tests/test_pagematch.py	2011-05-19 19:11:42 +0000
@@ -0,0 +1,26 @@
+# Copyright 2011 Canonical Ltd.  This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test the search result PageMatch class."""
+
+__metaclass__ = type
+
+from canonical.testing.layers import DatabaseFunctionalLayer
+from lp.services.googlesearch import PageMatch
+from lp.testing import TestCaseWithFactory
+
+
+class TestPageMatchURLHandling(TestCaseWithFactory):
+
+    layer = DatabaseFunctionalLayer
+
+    def test_rewrite_url_handles_invalid_data(self):
+        # Given a bad url, pagematch can get a valid one.
+        bad_url = ("http://launchpad.dev/+search?";
+                   "field.text=WUSB54GC+ karmic&"
+                   "field.actions.search=Search")
+        p = PageMatch('Bad,', bad_url, 'Bad data')
+        expected = ("http://launchpad.dev/+search?";
+                   "field.text=WUSB54GC++karmic&"
+                   "field.actions.search=Search")
+        self.assertEqual(expected, p.url)