openstack team mailing list archive

Thread
Date

Re: Database stuff

To: Jay Pipes <jaypipes@xxxxxxxxx>
From: Soren Hansen <soren@xxxxxxxxxxx>
Date: Tue, 29 Nov 2011 22:09:01 +0100
Cc: openstack@xxxxxxxxxxxxxxxxxxx
In-reply-to: <CAAE6tVYRV8xvkKStkD3Jbk2xsB+vKJKxKngRU9tH3XsKo+xi=Q@mail.gmail.com>

2011/11/29 Jay Pipes <jaypipes@xxxxxxxxx>:
> On Tue, Nov 29, 2011 at 3:43 PM, Soren Hansen <soren@xxxxxxxxxxx> wrote:
>> 2011/11/29 Jay Pipes <jaypipes@xxxxxxxxx>:
>>>> Besides, we don't really use transactions. I could easily read the
>>>> same data from two separate nodes, make different (irreconcilable)
>>>> changes on both nodes, and write them back, and the last one to write
>>>> simply wins.
>>> Sure, but using a KV store doesn't solve this problem...
>>
>> I'm not suggesting it will. My point is simply that using a KV store
>> wouldn't lose us anything in that respect.
> I see your point. But then again, it comes down to whether we care
> about referential integrity or transactional safety.

...and right now we have neither (by choice, not by limitations imposed
by the data store). Would you not agree?

> If we don't, then we're just building a distributed system that has
> unreliable persistent storage built into it, and that, IMHO, is a
> bigger problem than the as-yet-unproven assertions around scalability
> of a relational database in a distributed system. (more below)

Yes. This is what we have now. And it sucks.

>>> As soon as someone can demonstrate the performance, scalability, and
>>> robustness advantages of rewriting the data layer to use a
>>> non-relational data store, I'm all ears. Until that point, I remain
>>> unconvinced that the relational database is the source of major
>>> bottlenecks.
>> I understand that MySQL (and the other backends supported by
>> SQLAlchemy, too) scales very well. Vertically. I doubt they'll be
>> bottlenecks. Heck, they're even well-understood enough that people
>> have built very decent HA setups using them. I just don't think
>> they're a particularly good fit for a distributed system. You can
>> have a highly available datastore all you want, but I'd sleep better
>> knowing that our data is stored in a distributed system that is
>> designed to handle network partitions well.
> I guess I don't understand this. How do you sleep at night TODAY
> knowing that the data Nova stores in its persistent storage is wide
> open to referential integrity problems and transactional state
> inconsistencies?

Not very well at all. If I thought everything was in good shape, I
woulnd't have bothered with all of this :)

> What's the point of having a data store that "understands network
> partitions" if we don't care enough to protect the integrity of the
> data we're putting in the data store in the first place? :(

None at all. I hope I haven't said anything to suggest otherwise.

MySQL simply was not designed to be distributed. Generally speaking, if
you do end up in a situation where there's been a network partition and
your master is on one side and you have a slave on the other side, a
couple of things can happen:

1. You can automatically promote the slave to master, thus letting both
sides of the partition keep going.

2. You can leave the slave be and let the entire one side of the
partition be in read-only mode.

I think the usual case is 1, since MySQL HA setups are usually designed
to handle the case where the master dies rather than handling network
partitions. Would you agree with this assertion?

If both have acted as master, what happens when the the network is
joined again?  Hell breaks loose, because MySQL wasn't designed for this
sort of thing.

Something like Riak, on the other hand, is designed to excel for exactly
this sort of situation. It makes no attempt to handle these conflicts
(unless you explicitly tell it to just let last write win). If there are
conflicts, you get to handle it in your application in whatever way
makes sense.

-- 
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

References

Database stuff
From: Soren Hansen, 2011-11-29
Re: Database stuff
From: Jason Kölker, 2011-11-29
Re: Database stuff
From: Jay Pipes, 2011-11-29
Re: Database stuff
From: Soren Hansen, 2011-11-29
Re: Database stuff
From: Jay Pipes, 2011-11-29
Re: Database stuff
From: Soren Hansen, 2011-11-29
Re: Database stuff
From: Jay Pipes, 2011-11-29