launchpad-dev team mailing list archive

Thread
Date
Re: riptano 0-60

To: Robert Collins <robert.collins@xxxxxxxxxxxxx>
From: Danilo Šegan <danilo@xxxxxxxxxxxxx>
Date: Tue, 16 Nov 2010 17:48:44 +0100
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <AANLkTikXNxc42k2XjAAbJ4RsX+AZ_ZT7AXgyuRX-oyHm@mail.gmail.com>
Organization: Canonical Ltd.
Hi Rob,

Sorry for taking a while.  I'm doing a few things at the same time and
have written this over the few hours so I am sure I haven't been very
coherent throughout.

У сре, 17. 11 2010. у 02:57 +1300, Robert Collins пише:
> On Wed, Nov 17, 2010 at 1:13 AM, Danilo Šegan <danilo@xxxxxxxxxxxxx> wrote:
> > Heya Rob,
> >
> > У уто, 16. 11 2010. у 17:37 +1300, Robert Collins пише:
> >> Its better at writes vs reads (because it has an append-only store
> >> (which does automatic compaction - rather like bzr)). If we fit our
> >> system on a single DB server *and expect to do so indefinitely* then
> >> staying in a relational single-server model is ideal. (We've outgrown
> >> a single-server for reads, but not for writes - and we have headroom
> >> there).
> >
> > When you say "better at writes vs reads", I wonder if that includes
> > updates: with a fully "pre-joined" data set, I can imagine it being even
> > slower than reads if it doesn't simply "deprecate" the old row
> > ("append-only" suggests it does).  How does it actually work?
> 
> For further reading you could look at the bigtable, dynamo papers.

Thanks for references and a sketch write-up.

> Reads then are:
>  - query all relevant nodes (some queries can be targeted to a few
> specific nodes, others have to go to many to satisfy - e.g. scans.
>  - compare the results depending on the consistency level desired - 1:
> any result, quorum: more than half the data holders agree, all: all
> data holders agree.
> 
> So reads have to do more work (they have to compare) but do also
> parallelise across nodes.

Parallelising helps a lot if it includes data set partitioning as well.
In case of Cassandra it seems to just ensure no-worse-than status
instead, especially if you always want a definite state.

> When data is replaced, the read on a data holding node will just serve
> the row from the memtable, or the newest sstable that has the row.

Right, so update is fast.  With a denormalised model we'd still have to
do an order of magnitude more updates than today, so I doubt we could
see a win even there.

> > That relates to a specific use-case I have in mind: translations sharing
> > that we do.  With our current model, updating a single translation in
> > one place updates it for a dozen or so "contexts" (i.e. in both Ubuntu
> > Lucid and Ubuntu Maverick).  It means we'd have to do a dozen updates to
> > replicate the functionality with a fully denormalized model, and if
> > updates are slower (they basically include a read, right?) then we'd hit
> > a lot of trouble.
> 
> updates are writes - they don't (by default at least) need to read the
> old data at all. And if the db servers were to read the old data, that
> would be localised per-node holding the result, so - lets say we had 6
> nodes (which is what a loose discussion with mdennis suggested we'd
> need), then an write of a row would:
>  - on 3 machines add a row to the memtable
>  - on the coordinator, wait for 2 machines to ack that they had done the write
>  - return
> Writing two rows would be the same as one row, twice - but the three
> machines would be different : not the other three, but a
> three-per-row-key hash.
> 
> If we in the appserver needed to read-then-write, that would be a
> little different - but its also a bit of an anti pattern in Cassandra,
> apparently.

Right, but I can't imagine an application like LP Translations, where
you are constantly working with small bits of data (short English
strings like "Open file..." and their translations) working in any other
way.

For example of queries that we do prior to doing very simple writes you
can check out getPOTMsgSet*() methods in
lib/lp/translations/model/pofile.py.

I've also ran into this article about Cassandra updates:
http://maxgrinev.com/2010/07/12/update-idempotency-why-it-is-important-in-cassandra-applications-2/
That is an interesting observation and further stresses the need for
denormalisation (it's very hard to make idempotent updates to a
normalised model), imho.

> > I can elaborate if you are interested in exploring this use case, but
> > it's probably best done through a live chat.
> >
> > OTOH, if update performance is very good, the read performance for the
> > other "direction" (where we collate all translations for a particular
> > English string) would be more interesting.  Basically, it's a simplistic
> > "translation memory" feature: go through entire DB of different contexts
> > and fetch translations for a particular language for that particular
> > English string.  That's a feature that's causing us mild issues with
> > Postgres atm, and if reads are comparatively slower, we'd be even worse
> > off.
> 
> Paraphrasing, is it:
> result = defaultdict(set)
> for language in all_languages:
>     for product in products:
>         result[language].add(product.translations[language][english_string])
> ?

Basically, yes.  Except that you'd only do it for a language at a time.

Also, our model is structured slightly differently because of our
existing use cases (though, with *relational* DBs, that doesn't make
much difference).  See below.

> I can imagine storing that normalised and ready to use all the time :)

Well, normalised means probably a different thing to us in this context.
We do more things with this data.  I.e. a normal usage pattern is:

result = []
product = product1
for english_string in product.english_strings:
    result.append((english_string, product.translations[language][english_string]))

(or more simply,
product.english_strings[english_string].translations[language], which is
how our model roughly looks like today, and which is why we are having
some issues with above queries).

So, our entry points are both product.english_strings and
product.translations.  And, 'normalised' for us today means that when
these translations are repeated between product1 and product2
(paraphrasing still), then
product1.translations[ANY-LANGUAGE][english_string] is equivalent to
product2.translations[ANY-LANGUAGE][english_string] (or, translated to
our model, we've got "product1.english_strings[string].translations ===
product2.english_strings[string].translations".

When a translation is updated on product1, it needs to be automatically
updated on product2.  That invalidates the option of having normalised
data set by language, or at least makes it hard.

Cheers,
Danilo
Follow ups

Re: riptano 0-60
From: Robert Collins, 2010-11-16
References

riptano 0-60
From: Robert Collins, 2010-11-16
Re: riptano 0-60
From: Danilo Šegan, 2010-11-16
Re: riptano 0-60
From: Robert Collins, 2010-11-16