launchpad-reviewers team mailing list archive

Thread
Date

[Merge] lp:~flacoste/launchpad/ppr-enhancements into lp:launchpad

To: mp+59700@xxxxxxxxxxxxxxxxxx
From: "Francis J. Lacoste" <francis.lacoste@xxxxxxxxxx>
Date: Mon, 02 May 2011 20:36:49 -0000
Reply-to: mp+59700@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Francis J. Lacoste has proposed merging lp:~flacoste/launchpad/ppr-enhancements into lp:launchpad.

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~flacoste/launchpad/ppr-enhancements/+merge/59700

This branch contains a whole bunch of enhancements to the page-performance
report. It's a big branch and should probably have been developped using a
pipeline, I'm sorry.

 * It improves the regular expression used on the categories. Many categories
 turned empty because the regexp wasn't appropriate. It turns out that the
 logged URL is normalized and usually contain the view name. (So +index
 instead /, etc.)
 * I added a Web (non-operational / non XML-RPC category.) category.
 * It's now possible to define a set of categories making a partition of the
 data (exclusive categories that should contain all of the data). The
 defined partition is (Operational, API, Public XML-RPC, Private XML-RPC, Web
 and Other). Other is for stuff like redirect and other early return which for
 some reasons don't get a URL with a hostname.
 * In the metrics file (used for import in Tuolumne), it removes the mean -
 we don't really look at it, but logs the raw hit count (will make an
 interesting graph).
 * Drop the total SQL time and statements and replace them with estimated 99%
 under metrics.
 * As a drive-by, I fixed the CSS class name to be valid (by replacing the
 underscore with an hyphen).
 * The bulk of the code change are related to adding support for a
 configurable histogram resolution. (And changes the default from 1s to 0.5s).
 In order to do this, I consolidated all the histogram logic in its class. And
 because of the merging algorithm I need to support merging histogram with
 different resolution and size, so I supported that.

Let me know if you have any questions.
-- 
https://code.launchpad.net/~flacoste/launchpad/ppr-enhancements/+merge/59700
Your team Launchpad code reviewers is requested to review the proposed merge of lp:~flacoste/launchpad/ppr-enhancements into lp:launchpad.

=== modified file 'lib/canonical/launchpad/xmlrpc/application.py'
--- lib/canonical/launchpad/xmlrpc/application.py	2010-11-08 12:52:43 +0000
+++ lib/canonical/launchpad/xmlrpc/application.py	2011-05-02 20:36:45 +0000
@@ -38,6 +38,8 @@
 from lp.registry.interfaces.person import ISoftwareCenterAgentApplication
 
 
+# NOTE: If you add a traversal here, you should update
+# the regular expression in utilities/page-performance-report.ini
 class PrivateApplication:
     implements(IPrivateApplication)
 

=== modified file 'lib/lp/scripts/utilities/pageperformancereport.py'
--- lib/lp/scripts/utilities/pageperformancereport.py	2011-02-16 02:24:33 +0000
+++ lib/lp/scripts/utilities/pageperformancereport.py	2011-05-02 20:36:45 +0000
@@ -62,6 +62,7 @@
         self.title = title
         self.regexp = regexp
         self._compiled_regexp = re.compile(regexp, re.I | re.X)
+        self.partition = False
 
     def match(self, request):
         """Return true when the request match this category."""
@@ -275,12 +276,20 @@
         return self.mean + 3*self.std
 
     @property
-    def relative_histogram(self):
-        """Return an histogram where the frequency is relative."""
-        if self.histogram:
-            return [[x, float(f)/self.total_hits] for x, f in self.histogram]
-        else:
-            return None
+    def ninetyninth_percentile_sqltime(self):
+        """SQL time under which 99% of requests are rendered.
+
+        This is estimated as 3 std deviations from the mean.
+        """
+        return self.mean_sqltime + 3*self.std_sqltime
+
+    @property
+    def ninetyninth_percentile_sqlstatements(self):
+        """Number of SQL statements under which 99% of requests are rendered.
+
+        This is estimated as 3 std deviations from the mean.
+        """
+        return self.mean_sqlstatements + 3*self.std_sqlstatements
 
     def text(self):
         """Return a textual version of the stats."""
@@ -304,15 +313,14 @@
     with minimum storage space.
     """
 
-    def __init__(self, histogram_width):
+    def __init__(self, histogram_width, histogram_resolution):
         self.time_stats = OnlineStatsCalculator()
         self.time_median_approximate = OnlineApproximateMedian()
         self.sql_time_stats = OnlineStatsCalculator()
         self.sql_time_median_approximate = OnlineApproximateMedian()
         self.sql_statements_stats = OnlineStatsCalculator()
         self.sql_statements_median_approximate = OnlineApproximateMedian()
-        self._histogram = [
-            [x, 0] for x in range(histogram_width)]
+        self.histogram = Histogram(histogram_width, histogram_resolution)
 
     @property
     def total_hits(self):
@@ -366,13 +374,6 @@
     def std_sqlstatements(self):
         return self.sql_statements_stats.std
 
-    @property
-    def histogram(self):
-        if self.time_stats.count:
-            return self._histogram
-        else:
-            return None
-
     def update(self, request):
         """Update the stats based on request."""
         self.time_stats.update(request.app_seconds)
@@ -381,9 +382,7 @@
         self.sql_time_median_approximate.update(request.sql_seconds)
         self.sql_statements_stats.update(request.sql_statements)
         self.sql_statements_median_approximate.update(request.sql_statements)
-
-        idx = int(min(len(self.histogram)-1, request.app_seconds))
-        self.histogram[idx][1] += 1
+        self.histogram.update(request.app_seconds)
 
     def __add__(self, other):
         """Merge another OnlineStats with this one."""
@@ -396,11 +395,120 @@
         results.sql_statements_stats += other.sql_statements_stats
         results.sql_statements_median_approximate += (
             other.sql_statements_median_approximate)
-        for i, (n, f) in enumerate(other._histogram):
-            results._histogram[i][1] += f
+        results.histogram = self.histogram + other.histogram
         return results
 
 
+class Histogram:
+    """A simple object to compute histogram of a value."""
+
+    @staticmethod
+    def from_bins_data(data):
+        """Create an histogram from existing bins data."""
+        assert data[0][0] == 0, "First bin should start at zero."
+
+        hist = Histogram(len(data), data[1][0])
+        for idx, bin in enumerate(data):
+            hist.count += bin[1]
+            hist.bins[idx][1] = bin[1]
+
+        return hist
+
+    def __init__(self, bins_count, bins_size):
+        """Create a new histogram.
+
+        The histogram will count the frequency of values in bins_count bins
+        of bins_size each.
+        """
+        self.count = 0
+        self.bins_count = bins_count
+        self.bins_size = bins_size
+        self.bins = []
+        for x in range(bins_count):
+            self.bins.append([x*bins_size, 0])
+
+    @property
+    def bins_relative(self):
+        """Return the bins with the frequency expressed as a ratio."""
+        return [[x, float(f)/self.count] for x, f in self.bins]
+
+    def update(self, value):
+        """Update the histogram for this value.
+
+        All values higher than the last bin minimum are counted in that last
+        bin.
+        """
+        self.count += 1
+        idx = int(min(self.bins_count-1, value / self.bins_size))
+        self.bins[idx][1] += 1
+
+    def __repr__(self):
+        """A string representation of this histogram."""
+        return "<Histogram %s>" % self.bins
+
+    def __eq__(self, other):
+        """Two histogram are equals if they have the same bins content."""
+        if not isinstance(other, Histogram):
+            return False
+
+        if self.bins_count != other.bins_count:
+            return False
+
+        if self.bins_size != other.bins_size:
+            return False
+
+        for idx, other_bin in enumerate(other.bins):
+            if self.bins[idx][1] != other_bin[1]:
+                return False
+
+        return True
+
+    def __add__(self, other):
+        """Add the frequency of the other histogram to this one.
+
+        The resulting histogram has the same bins_size than this one.
+        If the other one has a bigger bins_size, we'll assume an even
+        distribution and distribute the frequency across the smaller bins. If
+        it has a lower bin_size, we'll aggregate its bins into the larger
+        ones. We only support different bins_size if the ratio can be
+        expressed as the ratio between 1 and an integer.
+
+        The resulting histogram is as wide as the widest one.
+        """
+        ratio = float(other.bins_size) / self.bins_size
+        bins_count = max(self.bins_count, math.ceil(other.bins_count * ratio))
+        total = Histogram(int(bins_count), self.bins_size)
+        total.count = self.count + other.count
+
+        # Copy our bins into the total
+        for idx, bin in enumerate(self.bins):
+            total.bins[idx][1] = bin[1]
+
+        assert int(ratio) == ratio or int(1/ratio) == 1/ratio, (
+            "We only support different bins size when the ratio is an "
+            "integer to 1: "
+            % ratio)
+
+        if ratio >= 1:
+            # We distribute the frequency across the bins.
+            # For example. if the ratio is 3:1, we'll add a third
+            # of the lower resolution bin to 3 of the higher one.
+            for other_idx, bin in enumerate(other.bins):
+                f = bin[1] / ratio
+                start = int(math.floor(other_idx * ratio))
+                end = int(start + ratio)
+                for idx in range(start, end):
+                    total.bins[idx][1] += f
+        else:
+            # We need to collect the higher resolution bins into the
+            # corresponding lower one.
+            for other_idx, bin in enumerate(other.bins):
+                idx = int(other_idx * ratio)
+                total.bins[idx][1] += bin[1]
+
+        return total
+
+
 class RequestTimes:
     """Collect statistics from requests.
 
@@ -431,29 +539,45 @@
         # distribution become very different than what it currently is.
         self.top_urls_cache_size = self.top_urls * 50
 
-        # Histogram has a bin per second up to 1.5 our timeout.
-        self.histogram_width = int(options.timeout*1.5)
+        # Histogram has a bin per resolution up to our timeout
+        #(and an extra bin).
+        self.histogram_resolution = float(options.resolution)
+        self.histogram_width = int(
+            options.timeout / self.histogram_resolution) + 1
         self.category_times = [
-            (category, OnlineStats(self.histogram_width))
+            (category, OnlineStats(
+                self.histogram_width, self.histogram_resolution))
             for category in categories]
         self.url_times = {}
         self.pageid_times = {}
 
     def add_request(self, request):
         """Add request to the set of requests we collect stats for."""
+        matched = []
         for category, stats in self.category_times:
             if category.match(request):
                 stats.update(request)
+                if category.partition:
+                    matched.append(category.title)
+
+        if len(matched) > 1:
+            log.warning(
+                "Multiple partition categories matched by %s (%s)",
+                request.url, ", ".join(matched))
+        elif not matched:
+            log.warning("%s isn't part of the partition", request.url)
 
         if self.by_pageids:
             pageid = request.pageid or 'Unknown'
             stats = self.pageid_times.setdefault(
-                pageid, OnlineStats(self.histogram_width))
+                pageid, OnlineStats(
+                    self.histogram_width, self.histogram_resolution))
             stats.update(request)
 
         if self.top_urls:
             stats = self.url_times.setdefault(
-                request.url, OnlineStats(self.histogram_width))
+                request.url, OnlineStats(
+                    self.histogram_width, self.histogram_resolution))
             stats.update(request)
             #  Whenever we have more URLs than we need to, discard 10%
             # that is less likely to end up in the top.
@@ -551,14 +675,22 @@
         help="Output reports in DIR directory")
     parser.add_option(
         "--timeout", dest="timeout",
-        # Default to 12: the staging timeout.
-        default=12, type="int",
-        help="The configured timeout value : determines high risk page ids.")
+        # Default to 9: our production timeout.
+        default=9, type="int", metavar="SECONDS",
+        help="The configured timeout value: used to determine high risk " +
+        "page ids. That would be pages which 99% under render time is "
+        "greater than timeoout - 2s. Default is %defaults.")
+    parser.add_option(
+        "--histogram-resolution", dest="resolution",
+        # Default to 0.5s
+        default=0.5, type="float", metavar="SECONDS",
+        help="The resolution of the histogram bin width. Detault to "
+        "%defaults.")
     parser.add_option(
         "--merge", dest="merge",
         default=False, action='store_true',
-        help="Files are interpreted as pickled stats and are aggregated for" +
-        "the report.")
+        help="Files are interpreted as pickled stats and are aggregated " +
+        "for the report.")
 
     options, args = parser.parse_args()
 
@@ -602,6 +734,17 @@
     if len(categories) == 0:
         parser.error("No data in [categories] section of configuration.")
 
+    # Determine the categories making a partition of the requests
+    for option in script_config.options('partition'):
+        for category in categories:
+            if category.title == option:
+                category.partition = True
+                break
+        else:
+            log.warning(
+                "In partition definition: %s isn't a defined category",
+                option)
+
     times = RequestTimes(categories, options)
 
     if options.merge:
@@ -629,33 +772,41 @@
     if options.categories:
         report_filename = _report_filename('categories.html')
         log.info("Generating %s", report_filename)
-        html_report(open(report_filename, 'w'), category_times, None, None)
+        html_report(
+            open(report_filename, 'w'), category_times, None, None,
+            histogram_resolution=options.resolution)
 
     # Pageid only report.
     if options.pageids:
         report_filename = _report_filename('pageids.html')
         log.info("Generating %s", report_filename)
-        html_report(open(report_filename, 'w'), None, pageid_times, None)
+        html_report(
+            open(report_filename, 'w'), None, pageid_times, None,
+            histogram_resolution=options.resolution)
 
     # Top URL only report.
     if options.top_urls:
         report_filename = _report_filename('top%d.html' % options.top_urls)
         log.info("Generating %s", report_filename)
-        html_report(open(report_filename, 'w'), None, None, url_times)
+        html_report(
+            open(report_filename, 'w'), None, None, url_times,
+            histogram_resolution=options.resolution)
 
     # Combined report.
     if options.categories and options.pageids:
         report_filename = _report_filename('combined.html')
         html_report(
             open(report_filename, 'w'),
-            category_times, pageid_times, url_times)
+            category_times, pageid_times, url_times, 
+            histogram_resolution=options.resolution)
 
     # Report of likely timeout candidates
     report_filename = _report_filename('timeout-candidates.html')
     log.info("Generating %s", report_filename)
     html_report(
         open(report_filename, 'w'), None, pageid_times, None,
-        options.timeout - 2)
+        options.timeout - 2, 
+        histogram_resolution=options.resolution)
 
     # Save the times cache for later merging.
     report_filename = _report_filename('stats.pck.bz2')
@@ -679,7 +830,7 @@
                 writer.writerows([
                     ("%s_99" % option, "%f@%d" % (
                         stats.ninetyninth_percentile_time, date)),
-                    ("%s_mean" % option, "%f@%d" % (stats.mean, date))])
+                    ("%s_hits" % option, "%d@%d" % (stats.total_hits, date))])
                 break
         else:
             log.warning("Can't find category %s for metric %s" % (
@@ -828,7 +979,7 @@
 
 def html_report(
     outf, category_times, pageid_times, url_times,
-    ninetyninth_percentile_threshold=None):
+    ninetyninth_percentile_threshold=None, histogram_resolution=0.5):
     """Write an html report to outf.
 
     :param outf: A file object to write the report to.
@@ -838,6 +989,7 @@
     :param ninetyninth_percentile_threshold: Lower threshold for inclusion of
         pages in the pageid section; pages where 99 percent of the requests are
         served under this threshold will not be included.
+    :param histrogram_resolution: used as the histogram bar width
     """
 
     print >> outf, dedent('''\
@@ -858,7 +1010,8 @@
         <style type="text/css">
             h3 { font-weight: normal; font-size: 1em; }
             thead th { padding-left: 1em; padding-right: 1em; }
-            .category-title { text-align: right; padding-right: 2em; }
+            .category-title { text-align: right; padding-right: 2em;
+                              max-width: 25em; }
             .regexp { font-size: x-small; font-weight: normal; }
             .mean { text-align: right; padding-right: 1em; }
             .median { text-align: right; padding-right: 1em; }
@@ -878,8 +1031,8 @@
                 padding: 1em;
                 }
             .clickable { cursor: hand; }
-            .total_hits, .histogram, .median_sqltime,
-            .median_sqlstatements { border-right: 1px dashed #000000; }
+            .total-hits, .histogram, .median-sqltime,
+            .median-sqlstatements { border-right: 1px dashed #000000; }
         </style>
         </head>
         <body>
@@ -905,12 +1058,12 @@
             <th class="clickable">Median Time (secs)</th>
             <th class="sorttable_nosort">Time Distribution</th>
 
-            <th class="clickable">Total SQL Time (secs)</th>
+            <th class="clickable">99% Under SQL Time (secs)</th>
             <th class="clickable">Mean SQL Time (secs)</th>
             <th class="clickable">SQL Time Standard Deviation</th>
             <th class="clickable">Median SQL Time (secs)</th>
 
-            <th class="clickable">Total SQL Statements</th>
+            <th class="clickable">99% Under SQL Statements</th>
             <th class="clickable">Mean SQL Statements</th>
             <th class="clickable">SQL Statement Standard Deviation</th>
             <th class="clickable">Median SQL Statements</th>
@@ -925,28 +1078,28 @@
     histograms = []
 
     def handle_times(html_title, stats):
-        histograms.append(stats.relative_histogram)
+        histograms.append(stats.histogram)
         print >> outf, dedent("""\
             <tr>
             <th class="category-title">%s</th>
-            <td class="numeric total_hits">%d</td>
-            <td class="numeric total_time">%.2f</td>
-            <td class="numeric 99pc_under">%.2f</td>
-            <td class="numeric mean_time">%.2f</td>
-            <td class="numeric std_time">%.2f</td>
-            <td class="numeric median_time">%.2f</td>
+            <td class="numeric total-hits">%d</td>
+            <td class="numeric total-time">%.2f</td>
+            <td class="numeric 99pc-under-time">%.2f</td>
+            <td class="numeric mean-time">%.2f</td>
+            <td class="numeric std-time">%.2f</td>
+            <td class="numeric median-time">%.2f</td>
             <td>
                 <div class="histogram" id="histogram%d"></div>
             </td>
-            <td class="numeric total_sqltime">%.2f</td>
-            <td class="numeric mean_sqltime">%.2f</td>
-            <td class="numeric std_sqltime">%.2f</td>
-            <td class="numeric median_sqltime">%.2f</td>
+            <td class="numeric 99pc-under-sqltime">%.2f</td>
+            <td class="numeric mean-sqltime">%.2f</td>
+            <td class="numeric std-sqltime">%.2f</td>
+            <td class="numeric median-sqltime">%.2f</td>
 
-            <td class="numeric total_sqlstatements">%.f</td>
-            <td class="numeric mean_sqlstatements">%.2f</td>
-            <td class="numeric std_sqlstatements">%.2f</td>
-            <td class="numeric median_sqlstatements">%.2f</td>
+            <td class="numeric 99pc-under-sqlstatement">%.f</td>
+            <td class="numeric mean-sqlstatements">%.2f</td>
+            <td class="numeric std-sqlstatements">%.2f</td>
+            <td class="numeric median-sqlstatements">%.2f</td>
             </tr>
             """ % (
                 html_title,
@@ -954,9 +1107,10 @@
                 stats.ninetyninth_percentile_time,
                 stats.mean, stats.std, stats.median,
                 len(histograms) - 1,
-                stats.total_sqltime, stats.mean_sqltime,
+                stats.ninetyninth_percentile_sqltime, stats.mean_sqltime,
                 stats.std_sqltime, stats.median_sqltime,
-                stats.total_sqlstatements, stats.mean_sqlstatements,
+                stats.ninetyninth_percentile_sqlstatements,
+                stats.mean_sqlstatements,
                 stats.std_sqlstatements, stats.median_sqlstatements))
 
     # Table of contents
@@ -1003,10 +1157,9 @@
         $(function () {
             var options = {
                 series: {
-                    bars: {show: true}
+                    bars: {show: true, barWidth: %s}
                     },
                 xaxis: {
-                    tickDecimals: 0,
                     tickFormatter: function (val, axis) {
                         return val.toFixed(axis.tickDecimals) + "s";
                         }
@@ -1022,7 +1175,7 @@
                         },
                     tickDecimals: 1,
                     tickFormatter: function (val, axis) {
-                        return (val * 100).toFixed(axis.tickDecimals) + "%";
+                        return (val * 100).toFixed(axis.tickDecimals) + "%%";
                         },
                     ticks: [0.001,0.01,0.10,0.50,1.0]
                     },
@@ -1031,10 +1184,10 @@
                     labelMargin: 15
                     }
                 };
-        """)
+        """ % histogram_resolution)
 
     for i, histogram in enumerate(histograms):
-        if histogram is None:
+        if histogram.count == 0:
             continue
         print >> outf, dedent("""\
             var d = %s;
@@ -1043,7 +1196,7 @@
                 $("#histogram%d"),
                 [{data: d}], options);
 
-            """ % (json.dumps(histogram), i))
+            """ % (json.dumps(histogram.bins_relative), i))
 
     print >> outf, dedent("""\
             });

=== modified file 'lib/lp/scripts/utilities/tests/test_pageperformancereport.py'
--- lib/lp/scripts/utilities/tests/test_pageperformancereport.py	2010-12-02 16:13:51 +0000
+++ lib/lp/scripts/utilities/tests/test_pageperformancereport.py	2011-05-02 20:36:45 +0000
@@ -9,6 +9,7 @@
 
 from lp.scripts.utilities.pageperformancereport import (
     Category,
+    Histogram,
     OnlineApproximateMedian,
     OnlineStats,
     OnlineStatsCalculator,
@@ -19,10 +20,11 @@
 
 
 class FakeOptions:
-    timeout = 4
+    timeout = 5
     db_file = None
     pageids = True
     top_urls = 3
+    resolution = 1
 
     def __init__(self, **kwargs):
         """Assign all arguments as attributes."""
@@ -75,7 +77,8 @@
         median_sqlstatements=56, std_sqlstatements=208.94,
         histogram=[[0, 2], [1, 2], [2, 2], [3, 1], [4, 2], [5, 3]],
         )),
-    (Category('Test', ''), FakeStats()),
+    (Category('Test', ''), FakeStats(
+        histogram=[[0, 0], [1, 0], [2, 0], [3, 0], [4, 0], [5, 0]])),
     (Category('Bugs', ''), FakeStats(
         total_hits=6, total_time=51.70, mean=8.62, median=4.5, std=6.90,
         total_sqltime=33.40, mean_sqltime=5.57, median_sqltime=3,
@@ -180,7 +183,8 @@
             self.assertEquals(expected[idx][1].text(), results[idx][1].text(),
                 "Wrong stats for results %d (%s)" % (idx, key))
             self.assertEquals(
-                expected[idx][1].histogram, results[idx][1].histogram,
+                Histogram.from_bins_data(expected[idx][1].histogram),
+                results[idx][1].histogram,
                 "Wrong histogram for results %d (%s)" % (idx, key))
 
     def test_get_category_times(self):
@@ -217,19 +221,20 @@
         self.assertEquals(1, results.url_times['/bugs'].total_hits)
         self.assertEquals(1, results.url_times['/bugs/1'].total_hits)
 
-
-class TestStats(TestCase):
-    """Tests for the Stats class."""
-
-    def test_relative_histogram(self):
-        # Test that relative histogram gives an histogram using
-        # relative frequency.
-        stats = Stats()
-        stats.total_hits = 100
-        stats.histogram = [[0, 50], [1, 10], [2, 33], [3, 0], [4, 0], [5, 7]]
-        self.assertEquals(
-            [[0, 0.5], [1, .1], [2, .33], [3, 0], [4, 0], [5, .07]],
-            stats.relative_histogram)
+    def test_histogram_init_with_resolution(self):
+        # Test that the resolution parameter increase the number of bins
+        db = RequestTimes(
+            self.categories, FakeOptions(timeout=4, resolution=1))
+        self.assertEquals(5, db.histogram_width)
+        self.assertEquals(1, db.histogram_resolution)
+        db = RequestTimes(
+            self.categories, FakeOptions(timeout=4, resolution=0.5))
+        self.assertEquals(9, db.histogram_width)
+        self.assertEquals(0.5, db.histogram_resolution)
+        db = RequestTimes(
+            self.categories, FakeOptions(timeout=4, resolution=2))
+        self.assertEquals(3, db.histogram_width)
+        self.assertEquals(2, db.histogram_resolution)
 
 
 class TestOnlineStats(TestCase):
@@ -237,9 +242,9 @@
 
     def test___add__(self):
         # Ensure that adding two OnlineStats merge all their constituencies.
-        stats1 = OnlineStats(4)
+        stats1 = OnlineStats(4, 1)
         stats1.update(FakeRequest('/', 2.0, 5, 1.5))
-        stats2 = OnlineStats(4)
+        stats2 = OnlineStats(4, 1)
         stats2.update(FakeRequest('/', 1.5, 2, 3.0))
         stats2.update(FakeRequest('/', 5.0, 2, 2.0))
         results = stats1 + stats2
@@ -249,7 +254,9 @@
         self.assertEquals(2, results.median_sqlstatements)
         self.assertEquals(6.5, results.total_sqltime)
         self.assertEquals(2.0, results.median_sqltime)
-        self.assertEquals([[0, 0], [1, 1], [2, 1], [3, 1]], results.histogram)
+        self.assertEquals(
+            Histogram.from_bins_data([[0, 0], [1, 1], [2, 1], [3, 1]]),
+            results.histogram)
 
 
 class TestOnlineStatsCalculator(TestCase):
@@ -365,5 +372,118 @@
         self.assertEquals([[1, 3], [6], [3, 7], [4]], results.buckets)
 
 
+class TestHistogram(TestCase):
+    """Test the histogram computation."""
+
+    def test__init__(self):
+        hist = Histogram(4, 1)
+        self.assertEquals(4, hist.bins_count)
+        self.assertEquals(1, hist.bins_size)
+        self.assertEquals([[0, 0], [1, 0], [2, 0], [3, 0]], hist.bins)
+
+    def test__init__bins_size_float(self):
+        hist = Histogram(9, 0.5)
+        self.assertEquals(9, hist.bins_count)
+        self.assertEquals(0.5, hist.bins_size)
+        self.assertEquals(
+            [[0, 0], [0.5, 0], [1.0, 0], [1.5, 0],
+             [2.0, 0], [2.5, 0], [3.0, 0], [3.5, 0], [4.0, 0]], hist.bins)
+
+    def test_update(self):
+        hist = Histogram(4, 1)
+        hist.update(1)
+        self.assertEquals(1, hist.count)
+        self.assertEquals([[0, 0], [1, 1], [2, 0], [3, 0]], hist.bins)
+
+        hist.update(1.3)
+        self.assertEquals(2, hist.count)
+        self.assertEquals([[0, 0], [1, 2], [2, 0], [3, 0]], hist.bins)
+
+    def test_update_float_bin_size(self):
+        hist = Histogram(4, 0.5)
+        hist.update(1.3)
+        self.assertEquals([[0, 0], [0.5, 0], [1.0, 1], [1.5, 0]], hist.bins)
+        hist.update(0.5)
+        self.assertEquals([[0, 0], [0.5, 1], [1.0, 1], [1.5, 0]], hist.bins)
+        hist.update(0.6)
+        self.assertEquals([[0, 0], [0.5, 2], [1.0, 1], [1.5, 0]], hist.bins)
+
+    def test_update_max_goes_in_last_bin(self):
+        hist = Histogram(4, 1)
+        hist.update(9)
+        self.assertEquals([[0, 0], [1, 0], [2, 0], [3, 1]], hist.bins)
+
+    def test_bins_relative(self):
+        hist = Histogram(4, 1)
+        for x in range(4):
+            hist.update(x)
+        self.assertEquals(
+            [[0, 0.25], [1, 0.25], [2, 0.25], [3, 0.25]], hist.bins_relative)
+
+    def test_from_bins_data(self):
+        hist = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        self.assertEquals(4, hist.bins_count)
+        self.assertEquals(1, hist.bins_size)
+        self.assertEquals(6, hist.count)
+        self.assertEquals([[0, 1], [1, 3], [2, 1], [3, 1]], hist.bins)
+
+    def test___repr__(self):
+        hist = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        self.assertEquals(
+            "<Histogram [[0, 1], [1, 3], [2, 1], [3, 1]]>", repr(hist))
+
+    def test___eq__(self):
+        hist1 = Histogram(4, 1)
+        hist2 = Histogram(4, 1)
+        self.assertEquals(hist1, hist2)
+
+    def test__eq___with_data(self):
+        hist1 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        hist2 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        self.assertEquals(hist1, hist2)
+
+    def test___add__(self):
+        hist1 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        hist2 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        hist3 = Histogram.from_bins_data([[0, 2], [1, 6], [2, 2], [3, 2]])
+        total = hist1 + hist2
+        self.assertEquals(hist3, total)
+        self.assertEquals(12, total.count)
+
+    def test___add___uses_widest(self):
+        # Make sure that the resulting histogram is as wide as the widest one.
+        hist1 = Histogram.from_bins_data([[0, 1], [1, 3], [2, 1], [3, 1]])
+        hist2 = Histogram.from_bins_data(
+            [[0, 1], [1, 3], [2, 1], [3, 1], [4, 2], [5, 3]])
+        hist3 = Histogram.from_bins_data(
+            [[0, 2], [1, 6], [2, 2], [3, 2], [4, 2], [5, 3]])
+        self.assertEquals(hist3, hist1 + hist2)
+
+    def test___add___interpolate_lower_resolution(self):
+        # Make sure that when the other histogram has a bigger bin_size
+        # the frequency is correctly split across the different bins.
+        hist1 = Histogram.from_bins_data(
+            [[0, 1], [0.5, 3], [1.0, 1], [1.5, 1]])
+        hist2 = Histogram.from_bins_data(
+            [[0, 1], [1, 2], [2, 3], [3, 1], [4, 1]])
+
+        hist3 = Histogram.from_bins_data(
+            [[0, 1.5], [0.5, 3.5], [1.0, 2], [1.5, 2],
+            [2.0, 1.5], [2.5, 1.5], [3.0, 0.5], [3.5, 0.5], 
+            [4.0, 0.5], [4.5, 0.5]])
+        self.assertEquals(hist3, hist1 + hist2)
+
+    def test___add___higher_resolution(self):
+        # Make sure that when the other histogram has a smaller bin_size
+        # the frequency is correctly added.
+        hist1 = Histogram.from_bins_data([[0, 1], [1, 2], [2, 3]])
+        hist2 = Histogram.from_bins_data(
+            [[0, 1], [0.5, 3], [1.0, 1], [1.5, 1], [2.0, 3], [2.5, 1],
+             [3, 4], [3.5, 2]])
+
+        hist3 = Histogram.from_bins_data([[0, 5], [1, 4], [2, 7], [3, 6]])
+        self.assertEquals(hist3, hist1 + hist2)
+
+
 def test_suite():
     return unittest.TestLoader().loadTestsFromName(__name__)

=== modified file 'utilities/page-performance-report.ini'
--- utilities/page-performance-report.ini	2011-04-21 05:19:48 +0000
+++ utilities/page-performance-report.ini	2011-05-02 20:36:45 +0000
@@ -5,49 +5,74 @@
 All Launchpad=.
 All Launchpad except operational pages=(?<!\+opstats|\+haproxy)$
 
-Launchpad Frontpage=^https?://launchpad\.[^/]+/?$
+API=(^https?://api\.|/\+access-token$)
+Operational=(\+opstats|\+haproxy)$
+Web (Non API/non operational/non XML-RPC)=^https?://(?!api\.)
+    [^/]+($|/
+     (?!\+haproxy|\+opstats|\+access-token
+      |((authserver|bugs|bazaar|codehosting|
+         codeimportscheduler|mailinglists|softwarecenteragent)/\w+$)))
+Other=^/
+
+Launchpad Frontpage=^https?://launchpad\.[^/]+(/index\.html)?$
 
 # Note that the bug text dump is served on the main launchpad domain
 # and we need to exlude it from the registry stats.
-Registry=^https?://launchpad\.(?<!/\+text)$
-Registry - Person Index=^https?://launchpad\.[^/]+/(~|%7E)[^/]+$
-Registry - Pillar Index=^https?://launchpad\.[^/]+/\w[^/]*$
+Registry=^https?://launchpad\..*(?<!/\+text)(?<!/\+access-token)$
+Registry - Person Index=^https?://launchpad\.[^/]+/%7E[^/]+(/\+index)?$
+Registry - Pillar Index=^https?://launchpad\.[^/]+/\w[^/]*(/\+index)?$
 
 Answers=^https?://answers\.
-Answers - Front page=^https?://answers\.[^/]+/?$
+Answers - Front page=^https?://answers\.[^/]+(/questions/\+index)?$
 
 Blueprints=^https?://blueprints\.
-Blueprints - Front page=^https?://blueprints\.[^/]+/?$
+Blueprints - Front page=^https?://blueprints\.[^/]+(/specs/\+index)?$
 
 # Note that the bug text dump is not served on the bugs domain,
 # probably for hysterical reasons. This is why the bugs regexp is
 # confusing.
 Bugs=^https?://(bugs\.|.+/bugs/\d+/\+text$)
-Bugs - Front page=^https?://bugs\.[^/]+/?$
-Bugs - Bug Page=^https?://bugs\.[^/]+/.+/\+bug/\d+$
-Bugs - Pillar Index=^https?://bugs\.[^/]+/\w[^/]*$
-Bugs - Search=^https?://bugs\.[^/]+/.+/\+bugs\?.*field.searchtext=
+Bugs - Front page=^https?://bugs\.[^/]+(/bugs/\+index)?$
+Bugs - Bug Page=^https?://bugs\.[^/]+/.+/\+bug/\d+(/\+index)?$
+Bugs - Pillar Index=^https?://bugs\.[^/]+/\w[^/]*(/\+bugs-index)?$
+Bugs - Search=^https?://bugs\.[^/]+/.+/\+bugs$
 Bugs - Text Dump=^https?://launchpad\..+/\+text$
 
 Code=^https?://code\.
-Code - Front page=^https?://code\.[^/]+/?$
-Code - PPA Index=^https?://code\.[^/]+/.+/\+archive/[^/]+$
-Code - Pillar Branches=^https?://code\.[^/]+/\w[^/]*$
-Code - Branch Page=^https?://code\.[^/]+/(~|%7E)[^/]+/[^/]+/[^/]+$
-Code - Merge Proposal=^https?://code\.[^/]+/.+/\+merge/\d+$
+Code - Front page=^https?://code\.[^/]+(/\+code/\+index)?$
+Code - Pillar Branches=^https?://code\.[^/]+/\w[^/]*(/\+code-index)?$
+Code - Branch Page=^https?://code\.[^/]+/%7E[^/]+/[^/]+/[^/]+(/\+index)?$
+Code - Merge Proposal=^https?://code\.[^/]+/.+/\+merge/\d+(/\+index)$
+
+Soyuz - PPA Index=^https?://launchpad\.[^/]+/.+/\+archive/[^/]+(/\+index)?$
 
 Translations=^https?://translations\.
-Translations - Front page=^https?://translations\.[^/]+/?$
-Translations - Overview=^https?://translations\..*/\+lang/\w+$
+Translations - Front page=^https?://translations\.[^/]+/translations/\+index$
+Translations - Overview=^https?://translations\..*/\+lang/\w+(/\+index)?$
 
-API=^https?://api\.
-Public XML-RPC=^https?://xmlrpc\.
-Private XML-RPC=^https?://xmlrpc-private\.
+Public XML-RPC=^https://(launchpad|xmlrpc)[^/]+/bazaar/\w+$
+Private XML-RPC=^https://(launchpad|xmlrpc)[^/]+/
+    (authserver|bugs|codehosting|
+     codeimportscheduler|mailinglists|
+     softwarecenteragent)/\w+$
 
 [metrics]
 ppr_all=All Launchpad except operational pages
+ppr_web=Web (Non API/non operational/non XML-RPC)
+ppr_operational=Operational
 ppr_bugs=Bugs
 ppr_api=API
 ppr_code=Code
+ppr_public_xmlrpc=Public XML-RPC
+ppr_private_xmlrpc=Private XML-RPC
 ppr_translations=Translations
 ppr_registry=Registry
+ppr_other=Other
+
+[partition]
+API=
+Operational=
+Private XML-RPC=
+Public XML-RPC=
+Web (Non API/non operational/non XML-RPC)=
+Other=