launchpad-reviewers team mailing list archive
-
launchpad-reviewers team
-
Mailing list archive
-
Message #03685
[Merge] lp:~jcsackett/launchpad/page-match-rewrite-url into lp:launchpad
j.c.sackett has proposed merging lp:~jcsackett/launchpad/page-match-rewrite-url into lp:launchpad.
Requested reviews:
Launchpad code reviewers (launchpad-reviewers)
Related bugs:
Bug #784273 in Launchpad itself: "InvalidURIError generated by unescaped search on frontpage"
https://bugs.launchpad.net/launchpad/+bug/784273
For more details, see:
https://code.launchpad.net/~jcsackett/launchpad/page-match-rewrite-url/+merge/61643
Summary
=======
An error occurs on our search page because bad url data is passed in from the googleservice that our PageMatch class doesn't handle well.
This branch adds a facility to the PageMatch to deal with bad URIs, and adds nofollow noindex to our search pages because they shouldn't be cached by google in the first place.
Preimplemenation
================
Spoke with Curtis Hovey
Implementation
==============
lib/lp/app/templates/launchpad-search.pt
----------------------------------------
Added nofollow, noindex to the header.
lib/lp/services/googlesearch/__init__.py
lib/lp/services/googlesearch/tests/test_pagematch.py
----------------------------------------------------
Sanitizing code and the tests.
Tests
=====
bin/test -t rewrite_url
QA
==
Follow the link on the bug report; it shouldn't OOPs.
Lint
====
= Launchpad lint =
Checking for conflicts and issues in changed files.
Linting changed files:
lib/lp/app/templates/launchpad-search.pt
lib/lp/services/googlesearch/__init__.py
lib/lp/services/googlesearch/tests/test_pagematch.py
--
https://code.launchpad.net/~jcsackett/launchpad/page-match-rewrite-url/+merge/61643
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~jcsackett/launchpad/page-match-rewrite-url into lp:launchpad.
=== modified file 'lib/lp/app/templates/launchpad-search.pt'
--- lib/lp/app/templates/launchpad-search.pt 2010-04-23 13:50:57 +0000
+++ lib/lp/app/templates/launchpad-search.pt 2011-05-19 19:11:42 +0000
@@ -6,7 +6,11 @@
metal:use-macro="view/macro:page/searchless"
i18n:domain="launchpad"
>
-
+ <head>
+ <tal:head_epilogue metal:fill-slot="head_epilogue">
+ <meta name="robots" content="noindex,nofollow" />
+ </tal:head_epilogue>
+ </head>
<body>
<tal:heading metal:fill-slot="heading">
<h1 tal:content="view/page_heading">
=== modified file 'lib/lp/services/googlesearch/__init__.py'
--- lib/lp/services/googlesearch/__init__.py 2011-03-07 16:32:12 +0000
+++ lib/lp/services/googlesearch/__init__.py 2011-05-19 19:11:42 +0000
@@ -16,7 +16,10 @@
import xml.etree.cElementTree as ET
import urllib
import urllib2
-from urlparse import urlunparse
+from urlparse import (
+ urlunparse,
+ parse_qsl,
+ )
from lazr.restful.utils import get_current_browser_request
from lazr.uri import URI
@@ -80,6 +83,15 @@
self.summary = summary
self.url = self._rewrite_url(url)
+ def _sanitize_query_string(self, url):
+ """Escapes invalid urls."""
+ parts = urlparse(url)
+ querydata = parse_qsl(parts.query)
+ querystring = urllib.urlencode(querydata)
+ urldata = list(parts)
+ urldata[-2] = querystring
+ return urlunparse(urldata)
+
def _strip_trailing_slash(self, url):
"""Return the url without a trailing slash."""
uri = URI(url).ensureNoSlash()
@@ -96,6 +108,7 @@
launchpad environment.
:return: A URL str.
"""
+ url = self._sanitize_query_string(url)
if self.url_rewrite_hostname == 'launchpad.net':
# Do not rewrite the url is the hostname is the public hostname.
return self._strip_trailing_slash(url)
=== added file 'lib/lp/services/googlesearch/tests/test_pagematch.py'
--- lib/lp/services/googlesearch/tests/test_pagematch.py 1970-01-01 00:00:00 +0000
+++ lib/lp/services/googlesearch/tests/test_pagematch.py 2011-05-19 19:11:42 +0000
@@ -0,0 +1,26 @@
+# Copyright 2011 Canonical Ltd. This software is licensed under the
+# GNU Affero General Public License version 3 (see the file LICENSE).
+
+"""Test the search result PageMatch class."""
+
+__metaclass__ = type
+
+from canonical.testing.layers import DatabaseFunctionalLayer
+from lp.services.googlesearch import PageMatch
+from lp.testing import TestCaseWithFactory
+
+
+class TestPageMatchURLHandling(TestCaseWithFactory):
+
+ layer = DatabaseFunctionalLayer
+
+ def test_rewrite_url_handles_invalid_data(self):
+ # Given a bad url, pagematch can get a valid one.
+ bad_url = ("http://launchpad.dev/+search?"
+ "field.text=WUSB54GC+ karmic&"
+ "field.actions.search=Search")
+ p = PageMatch('Bad,', bad_url, 'Bad data')
+ expected = ("http://launchpad.dev/+search?"
+ "field.text=WUSB54GC++karmic&"
+ "field.actions.search=Search")
+ self.assertEqual(expected, p.url)