← Back to team overview

launchpad-dev team mailing list archive

riptano 0-60

 

Gary and I are attending a course on Cassandra, which several teams in
Canonical are evaluating for use.

We have two more days during which we get to pick the teachers brains
- mdennis - on modelling, operational issues and so forth.

The first day was the 0-60 course that Riptano offered, the next two
will be more freeform.

Gary and I traded notes after the course and it seemed to me that many
of you will have concerns or opinions that could be very relevant.

So here's my (from memory - all errors are mine) summary of Cassandra
and how I see it as being relevant to Launchpad today. I've skipped
most of the ops specific stuff because while its interesting I think
its not hugely relevant from a development perspective.

Cassandra is essentially a highly parallel - at the unit of 'row' -
database server. It doesn't model relationships in any declarative
form - instead you get the basic primitives : a thing called a
ColumnFamily(CF), which contains rows, and rows contain N columns with
one value per column. Typically one would have a ColumnFamily per
single sort of object in the system - User, EmailPart, Email, Bug,
BugSubscription etc. Cassandra itself has no single-point-of-failure
components. Each row can have up to billions of columns - and the
number is *per-row*, so its possible to have one row with (say) 2
columns and the next with 1000. A common pattern is to transform
1-to-many join tables into very long rows in a shorter CF.

The schema is dynamic: at the database level new columns can be
defined without downtime, but the database has no transactions: the
strongest guarantee you can get is that a single write group to a
single partition will all get put into a single write-ahead-log.

Cassandra is less efficient than a single-server database server like
Postgresql or Mysql, but you can get greater performance out of it due
to near-linear scaling as machines are added: both reads and writes
become more efficient. Its better at writes vs reads (because it has
an append-only store (which does automatic compaction - rather like
bzr)). If we fit our system on a single DB server *and expect to do so
indefinitely* then staying in a relational single-server model is
ideal. (We've outgrown a single-server for reads, but not for writes -
and we have headroom there).

Clients talk to any arbitary Cassandra

Counting - assigning numbers based on data in the database - is
tricky, and there are a few techniques to do it. Running a counting
service - a single point of failure that manages a lock and can issue
numbers - is something we'd probably need to do to allocate bugids,
were we to migrate to Cassandra.

In Cassandra, most indexes are a CF that has row keys that are either
the key [or some named value] from another CF, and values that are the
key into another CF. E.g. BugSubscription might have a key of bugid,
and in every row a column called 'emailaddress' with value being the
email address subscribed to it. I chose this deliberately to emphasis
how we might denormalise to make calculating notifications absolutely
trivial. When someone changes their email address, we'd find their
subscribed bugs (via a secondary index which would index the
emailaddress column in BugSubscription) and update those
subscriptions.

Costs of using cassandra:
 - more servers are needed vs existing thing being replaced [because
its less efficient and needs parallelism]
 - we'd need to write supporting ware of some sort to automate things
that are simple sql now, like creating indexes [change the schema,
generate an automated script to populate the index, update our data
definition to cause writes to the index]
 - writes need to be change from ACID - where we rollback in the event
of error to BASE - where everything we write is correct as far as it
goes and things get made sensible eventually. (Eventually might be
milliseconds, but its not instant).
 - its a pain to package, so we'd need to gain some java glue in buildout.
 - more operational complexity than we have today (jvm vs CPython)

Potential benefits of using cassandra:
 - highly available, scalable platform
 - real twisted support, should we want that - native async library support
 - parallelism within single queries
 - online schema changes [no downtime!]

Places where Cassandra may make sense for us [short term]:
 - librarian storage [nb most folk doing s3-like things use simple
files on N disks for the backing store, metadata in Cassandra : in
that model we'd just stay with pgsql]
 - a backend for solr/lucene, the search engine at the top of my list
for fixing our search story (LEP/Search)
 - our OOPS system would be a decent place to experiment if we want to
learn about bringing Cassandra higher up the foodchain.
 - Session storage would fit Cassandra very neatly: run with a low
(e.g. 2) replication factor (the number of copies of each row) and
insert a TTL - it will auto cleanup after itself.
 - librarian token storage would also fit very very nicely - set a 24
hour TTL, it supports very high volume writes, the tokens are
naturally write-once
 - could replace memcached, which would give us a higher hit rate
(because we would be sharing one effective cache)

What about Cassandra for the main Launchpad system? Or even
significant components?

There are broadly speaking three challenges I see here: how can we
model what we need in Cassandra? How would we migrate incrementally?
How would we prevent a nightmare mix of sql, domain and cassandra
logic in our classes?

In terms of modelling, please throw specific use cases (queries) at me
and I'll discuss them here. mdennis is convinced that other users have
not had trouble modelling stuff (once they wrapped their heads around
the basics) - and I believe him. Still, skill needs to be acquired.

For incremental migration there are three broad approaches:
 - migrate isolated sets of tables at a time; join in appservers by
querying multiple sources - have to have managable temporary result
sizes : can't depend on major filtering happening in tables that are
on the sql|cassandra side. We have few such things, but - as a for
instance - the live data about builds, or importd status - are good
examples.
 - double-write to both systems from appservers and once particular
tables are completely in sync and not needed on the source system,
start dropping them.
 - use a write-through approach: write a pgsql->cassandra trigger
based system, or vice versa a cassandra->sql thing that watches for
new data and inserts to pgsql.

The biggest concerns I have though are around keeping sane: our code
base is very hectic at the moment, with query logic and processing
logic all intertwined. Migrating anything in that environment is hard
because there isn't a single inflection point where we can add any of
these approachs in and be confident that things would be
comprehensively covered.

At this point I think Cassandra would be beneficial in some or all of
the short term items I listed earlier - and *if* at the end of the
week Gary and I think Cassandra is worth exploring in significantly
more depth for things currently in postgresql, then I'd really want us
to clean up our persistence layer - that is, to have one - before we
start working directly on Cassandra (outside of the short term items).

P.S. send me your use cases!

-Rob



Follow ups