← Back to team overview

launchpad-dev team mailing list archive

persistence layer sketch/strawman

 

https://dev.launchpad.net/LEP/PersistenceLayer sketches out the top
level constraints for the persistence layer project.

I wanted get some thoughts out about more technical aspects.

Firstly, by adding a new layer we're essentially partitioning our
code; so what should go where?

A starting point to answer this is design principles.

One major principle I have is that on-demand loading is actively
harmful in high performance software: while its not as convenience for
adhoc scripts, its very hard to reliably avoid poor performance due to
object traversal triggering expensive (e.g. 3-4ms) queries thousands
of times in a single web query.

This means we need to be able to get everything needed to satisfy zope
security checks and so forth as part of the initial lookup
description. To avoid repeating ourselves in persistence layer using
code its probably best to have any additional lookups (e.g. 'is member
of admins') happen as part of the persistence layer. (Which may itself
be two or three layers deep).

Actual query code should go in/under the persistence layer. I imagine
we'll have some general code and some code specific to the backend
stores that we have (which today is the three pg stores - session,
launchpad, launchpad_slave). I include in 'actual query code'
collection size estimates. It would be nice to enable systematic use
of size estimates in this layer, though its not a deliberate scoped
task.

Code that *requests* a partial object graph should become a consumer
of the persistence layer.

Code that works on objects must live above the persistence layer.

For instance, code that sends mail, chooses what to render in a
template - above the layer. This code can assume that objects returned
from the persistence layer are all disclosable, and that all relevant
objects are already in memory.

How then, shall we describe what objects and what operations we want
from the persistence layer?

The foundations folk are working on a similar problem (but simplified
- no transactions) - in the webservice layer.
https://dev.launchpad.net/LEP/WebservicePerformance has their draft
efforts. Riffing on that work I spent some time exploring the space of
prepping a query language in Python, for python objects.

It seems to me that having a mutable query object which is itself a
graph, and the graph represents the objects level graph nodes and
edges to retrieve offers great flexability and clear code.

For instance:
Assume that request startup creates a transaction object for us, and
we have a nominal root node that represents the system as whole, we
could write some code like this to implement Person:+commentedbugs
The view is constructed with the Person object so..
>>> query = launchpad.people.query()
>>> query
<'Person' search filter=None>
>>> query.filter = Id(self.context.id)
>>> query
<'Person' search filter=Id(2)>
The bugs relation is all possible related bugs - a huge set.
We want comments by the person we're looking up
>>> query.bugs.filter = CommentBy(self.context)
We could write that in a more generic fashion
- referencing some collection the query itself includes.
>>> query.bugs.filter = CommentBy(query)
We want to slice the returned bugs
>>> query.bugs.slice(0,50)
And we want the total as well - we tell the layer we want that up
front so that when it can be optimised, it is.
>>> query.bugs.stats = Count()
And now transform the query into a result
>>> result = query.execute()
>>> len(result)
1
>>> len(result.bugs)
50
>>> result.bugs.stats
{'count': 254}

Relations that are not traversed are not queried; we can select down
to individual attributes in a similar fashion to the .filter attribute
- using a .get or .retrieve attribute.

What do you think? Does this sound nice to use?

-Rob



Follow ups