← Back to team overview

openstack team mailing list archive

Re: Swift Consistency Guarantees?


Hi Chuck,

Thanks for the detailed explanation! That pretty much answers all of my
questions. I think this can (and should) placed as-is somewhere in the
Swift Dokumentation and/or the Wiki.


On 01/20/2012 04:58 PM, Chuck Thier wrote:
> Some general notes for consistency and swift (all of the below assumes
> 3 replicas):
> Objects:
>   When swift PUTs an object, it attempts to write to all 3 replicas
> and only returns success if 2 or more replicas were written
> successfully.  When a new object is created, it has a fairly strong
> consistency for read after create.  The only case this would not be
> true, is if all of the devices that hold the object are not available.
>  When an object is  PUT on top of another object, then there is more
> eventual consistency that can come in to play for failure scenarios.
> This is very similar to S3's consistency model.  It is also important
> to note that in the case of failure, and a device is not available for
> a new replica to be written to, it will attempt to write the replica
> to a handoff node.
>   When swift GETs an object, by default it will return the first
> object it finds from any available replicas.  Using the X-Newest
> header will require swift to compare the times tamps and only serve a
> replica that has the most recent time stamp.  If only one replica is
> available with an older version of the object, it will be returned,
> but in practice this would be quite an edge case.
> Container Listings:
>   When an object is PUT in to swift, each object server that a replica
> is written to is also assigned one of the containers servers to
> update.  On the object server, after the replica is successfully
> written, an attempt will be made to update the listing of its assigned
> container server.  If that update fails, it is queued locally (which
> is called an async pending), to be updated out of band by another
> process.  The container updater process continually looks for these
> async pendings and will attempt to make the update, and will remove it
> from the queue when successful.  There are many reasons that a
> container update can fail (failed device, timeout, heavily used
> container, etc.).  Thus container listings are eventually consistent
> in all cases (which is also very similar to S3).
> Consistency Window:
> For objects, the biggest factor that determines the consistency window
> is object replication time.  In general this is pretty quick for even
> large clusters, and we are always working on making this better.  If
> you want to limit consistency windows for objects, then you want to
> make sure you isolate the chances of failure as much as possible.  By
> setting up your zones to be as isolated as possible (separate power,
> network, physical locality, etc.) you minimize the chance that there
> will be a consistency window.
> For containers, the biggest factor that determines the consistency
> window, is disk IO for the sqlite databases.  In recent testing, basic
> SATA hardware can handle somewhere in the range of 100 PUTs per second
> (for smaller containers) to around 10 PUTs per second for very large
> containers (millions of objects) before aync pendings start stacking
> up and you begin to see consistency issues.  With better hardware (for
> example RAID 10 of SSD drives), it is easy to get 400-500 PUTs per
> second with containers that have a billion objects in it.  It is also
> a good idea to run your container/account servers on separate hardware
> than the object servers. After that, the same things for object
> servers also apply to the container servers.
> All that said, please don't just take my word for it, and test it for
> yourself :)
> --
> Chuck


 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C