← Back to team overview

openstack team mailing list archive

Re: iSCSI target service on the top of swift


On Thu, Sep 16, 2010 at 2:26 AM, FUJITA Tomonori

> 2010/9/14 Gregory Holt <gholt@xxxxxxxxxxxxx>:
> >> Read-your-writes consistency works nicely for the iSCSI service. We
> >> could live with weaker consistency models though.
> >
> > Swift has the small possibility you'd read older data, even just after
> writing newer data with the same HTTP Keep-Alive connection.
> >
> > Example scenario: PUT obj(v1) goes to the three replica nodes desired
> (1-2-3), no problem on read; then PUT obj(v2) times out on the first replica
> node (x-2-3) but succeeds with two of the three saving the data, but a read
> that succeeds on node 1 will return obj(v1).
> >
> > We have discussed making read hit all known replicas and return the
> greatest version, but we have to test the impact of that at scale first.
> Can we support that optionally?  e.g. selecting a consistency model
> per container or object?

In theory, but that could get complicated.

> > Even with greatest version support, there always is a chance that only
> one node could be reached on read, and that node might have older data.
> Yeah, in such a case (nodes having the latest data are down, etc), it
> would be better if a client gets an I/O error explicitly. But I don't
> think that it's easy
> to guarantee that (we did for Sheepdog storage system). Getting old
> data is kinda silent data corruption, which could happen even with
> real disk.
> I think that we could live with that if the possibility is small (as
> you know, some file systems can handle such failure).

This is the heart of CAP theorem. In the event of partitions (failures), no
distributed system can guarantee that it will respond with the correct data.
Choosing 'availability' means there is some chance that the data is

When consistent data is a higher requirement than available data, then
eventually consistent storage, like swift, is probably not the best choice.

The key is understanding what the choices are and making an appropriate
choice based on system requirements.