← Back to team overview

launchpad-dev team mailing list archive

notifications - an implementation straw man (warning explicit discussion of services follows)

 

So notifications, dashboards, walls, timelines - these things have
been spoken about recently.

I'd love to see LP gain a really fantastic story in these areas, and I
think the plumbing is about 1 weeks yak shaving + 1 weeks actual
development if done as a service using django (or a leaner still
framework like flask, web.py, webob or even just wsgiref) +
postgresql.

It seems to me that there are a few core questions like 'how much
history do we support' and 'if someone leaves a team do notifications
from that team disappear from their timeline' which can be answered
pessimistically ('all history') without making the problem
particularly hard to solve.

Ah, but which problem you may be asking?

Well, I humbly suggest that there are two common problems needed to
implement customisable notifications, dashboards/walls, and timelines:
 - 'who is interested in a given event'
 - 'who was told about a given event'
(Note that the tense is very deliberate here).

Determining who gets sent mail abut a bug, merge proposal, branch,
package upload etc is all a case of determining who is interested in
that event. Determining whose timelines an event should turn up in is
determining who /was/ told about it; Determining if an event is
relevant to a project overview is also the same problem [structural
subscriptions can be considered a multiplier of events relevant to the
project overview].

While Cassandra or other NoSQL DB's offer massive scalability for
per-user data structures, I don't think we need this to solve this
core problem.

Imagine the following API (a strawman, I don't claim its right :)):
---------
notify(subject_template, body_template, summary_template, event_tags,
topics, subject_tags, participants)
"""Notify subscribers about an event.

here the three templates are hopefully obvious.
event_tags is a set that would have things like 'bug', 'project', 'branch'
topics is a set of SOA object ids(*)
subject_tags is a set of tags on the object itself. E.g. a bugs tags
would go in subject_tags
participants is a list of subscribables which are direct participants
in the event (and thus should be notified even if the data in LP
doesn't show them as interested).
"""

subscribe(recipient, subscription_tags, event_tags,
exclude_event_tags, topics, exclude_topics, subject_tags,
exclude_subject_tags)
"""subscribe an object to an event

The foo and exclude_foo sets provide filtering for the subscription -
an entry in foo requires that entry; an entry in exclude_ will reject
notification if the entry is matched.

the subscription_tags set allows for the subscription to be
categorised, For instance, the implicit subscription of a bug assignee
to the bug would be represented by a subscription(bug object id,
['assignee'], None, None, None, None, None, None).
"""

subscriptions(object id):
"""Return the subscription ids that affect object id."""

unsubscribe(object id, subscription id)
"""Drops a subscription."""

events(subscribers, topics, batch_endpoint):
"""Return the events for subscribers, topics from batch_endpoint.

:param subscribers: None or a list of subscribers.
:param topics: None of a list of topics.

If subscribers or topics are supplied, events are limited to those in
common to both.
"""

------------

I think this API would be sufficient to (efficiently):
 - replace structural subscriptions
 - replace bug subscriptions
 - replace package upload notifications
 - provide RSS feeds for per-user notifications
 - provide RSS feeds for object changes
 - provide per user timelines

This API would call back to LP to perform subscription expansion and
then structural subscriptions would be one per team in the service.
(Or we could maintain an expanded cache in the API, but I don't think
thats needed).

Implementation wise, we need to determine what queries are needed. I'm
framing this as a /temporal/ service - it depends on knowing the state
of team expansions etc after they happen.

Performance wise, a holy grail would be being able to deliver low-ms
responses from memory, and <1s responses from disk. This depends on
very high selectivity on queries.

A fact table like the following:
subscriber, event, date
would trivially provide highly efficient queries to deliver a timeline
(given a supporting event table with the summary, ... tag metadata
etc).

Similarly
topic, event, date
will deliver events relevant to a given object in LP (a bug, a
project, a project group) very efficiently.

We could either run with two separate fact tables, or one fact table
with nullable subscriber|topic.

Walls and dashboards are AIUI a combination topic timeline + topic
TODO queues (e.g. merge proposals to review etc). This can be
efficiently served by querying for one pageful of each such thing,
asking for the timeline similarly and then combining in the appserver.

Sending of notifications becomes just an API call and super
performance isn't required.

Querying who is subscribed to an object needs to be fast however. A fact table:
topic event_tag exclude_event_tag subscriber
can give subscription lists for topics very easily, still relying on
in-LP expansion of teams. (A topic being e.g. 'bugs for project foo' -
a structural subscription).

I'm estimating a week to do fiddlying around like extracting our
schema management code for reuse (needed for slony deploys etc), then
a week to put a basic implementation of this schema with a simple
private json API for it. (actually I think a bare bones thing is a
couple of days... but double and double again :P)

It's my hope this email can be a template for folk interested in
bootstrapping better notifications.

*: object ids are something we haven't pinned down yet, but one
typical form is <type>:<row id> - e.g. Person:1234.


Follow ups