← Back to team overview

launchpad-dev team mailing list archive

Announce: oops, oops_wsgi, oops_datedir_repo projects

 

I've recently pulled apart our monolithic OOPS system into reusable,
orthogonal components. There is more pulling to do but its going to
need to wait a couple of weeks :)

However whats been done so far is extremely useful and promising, so
much so I thought I'd use up some of your precious bandwidth telling
you about it!

Firstly a teaser: This is enough to OOPS-enable a WSGI microservice
(e.g. loggerhead):
    parser = OptionParser()
    parser.add_option("--oops-root", dest="oops_dir",
            help="The root directory to write OOPS reports to.", default=".")
    parser.add_option("--oops-instance", dest="oops_prefix",
            help=("The instance id for this service, should uniquely "
            "identify the running instance."), default="GPGVERIFY")
    options, args = parser.parse_args()
    oops_config = oops.Config()
    oops_repo = DateDirRepo(options.oops_dir, options.oops_prefix)
    oops_config.publishers.append(oops_repo.publish)
    oops_wsgi.install_hooks(oops_config)
    app = oops_wsgi.make_app(app, oops_config)

These packages are hosted on Launchpad naturally :) - lp:python-oops,
lp:python-oops-wsgi and lp:python-oops-datedir-repo. As part of doing
this I've streamlined + documented the project setup -
https://dev.launchpad.net/CreatingNewProjects - this should now be
fairly straight forward for anyone involved in Launchpad development
that wants to create a new project in the LP project group. The goal
is to make creating a package low-enough friction that we sensibly
split things up rather than having overly-large dependencies from our
components. E.g. only python-oops-storm should import storm, so that
the oops system is friendly to folk using the django ORM (and the same
in reverse ;))
https://bazaar.launchpad.net/~canonical-launchpad-branches/python-oops/trunk
https://bazaar.launchpad.net/~canonical-launchpad-branches/python-oops-wsgi/trunk
https://bazaar.launchpad.net/~canonical-launchpad-branches/python-oops-datedir-repo/trunk

Launchpad itself has had its IErrorHandlingUtility rewritten to use
the components for error reporting, so we're *live* with these
facilities: anyone that uses them is using the same code Launchpad is,
and we can collaborate on features going forward.

For Loggerhead, the Launchpad-Loggerhead glue code no longer uses the
zope ErrorHandlingUtility - it now uses the oops_wsgi module directly,
avoiding a bunch of friction and confusion.

Future work for this extraction involves extracting our logging.error
/ logging/warning -> OOPS code out into lp:python-oops, and extracting
our twisted error -> OOPS code into a new lp:python-oops-tx package.
Finally the timelines we use to get db statements have already been
extracted, but I haven't extract the glue needed to hook storm up to
the timeline, nor the how-to-find-the-timeline glue code (because
currently that is totally LP specific - we need to make up a wsgi
friendly way of doing it for lp:python-oops-wsgi, and a twisted way
for lp:python-oops-tx.

The design centers around an oops.Config - this object maintains the
state needed to create a report, decide if a report should be
published, and where to publish a report. The normal workflow, once
one is created is:
report = config.create(context)
ids = config.publish(report)
if ids:
    # tell the user about the report somehow
    print "OOPS", report['id']

'context' here is a simple dict with information like the exc_info for
an exception, a url, or a wsgi environ.  The config.on_create
callbacks examine the context to populate the report.

The publish method first filters the report - this calls every handler
in config.filters, and if it returns True, stops publishing and
returns None to the caller of 'publish'.
If the report was not filtered, it is then published to every
publisher in config.publishers - and they each return the id allocated
by that publisher if the publisher published it. Multiple publications
are possible (e.g. rabbit + local disk). The last id returned is the
one in report['id'] - this is a bit arbitrary but it seems to work
well so far ;).

And thats it - theres a lot more in e.g. the oops-datedir-repo package
(which has our existing on-disk storage), or the oops-wsgi package
which has wsgi friendly glue for adding things to the oops context or
even directly to the oops report.

The exciting thing for is that we're now one python-oops-rabbitmq away
from just plugging in a queue based handler : all that needs to do is
bson encode the report, shove it over rabbit, and on the rabbit worker
side, bson decode than shove it at a local oops-datedir-repo
instance's publish() method to cause it to be written to disk where
oops-tools can find it.

Win.

-Rob


Follow ups