← Back to team overview

openstack team mailing list archive

Re: Several questions about HOW SWIFT WORKS

 

Answers inline.

On Jan 3, 2012, at 11:32 AM, Alejandro Comisario wrote:

> 
> So, lets get down to business.
> 
> # 1 we have memcache service running on each proxy, so as far as we know, memcache actually caches keystone tokens and object paths as the request ( PUT , GET) enters the proxy, but for example, if we restart one proxy server, so the memcached service is empty, is the restarted proxy node going to the neighbor memcache on nex request, lookup for what it needs, and cache the answer on itself so the next query is solved locally ?

Memcache works as a distributed lookup. So the keys that were stored on the server that was restarted are no longer cached. The proxies share a memcache pool (at least in the example proxy config), so requests are fetched from that pool. Since the keys are balanced across the entire memcache pool, roughly 1/N memcache requests will be local (where N == the number of proxy servers).

> 
> # 2 the documentation says regarding "For each request, it will look up the location of the account, container, or object in the ring (see below) and route the request accordingly" in what way the proxy actually does the look-up regarding WHERE is an object / container in the cluster ? does it connect to any datanode asking for an object location ? does the proxy have any locally sotarge data ??

The proxy does not store any data locally (not even to buffer reads or writes). The proxy uses the ring to determine how to handle the read or write. The ring is a mapping of the storage volumes that, given an account, container, and object, provides the final location of where the data is to be stored. The proxy then uses this information to either read or write the object.

> 
> # 3 Maybe it has to do with the previous question but, every dataNode knows everything that is stored on the cluster (container service) or only knows the object that has itself, and the replicas of its objects?

Things are stored in swift deterministically, so data nodes don't know where everything is stored, but they know how to find where it should be stored (ie the ring).

> 
> # 4 We are building a production cluster of 24 datanodes, having 6 drives each (144 immediate drives) we know, that a good default number of partitions per drive is 100, so the math for creating the ring will be (24 nodes * 6 drives * 100 partitions) but we know the at the end of the year, the amount of datanodes (and drives also) could be 2x or 3x more. So, for the initial setup, can we build the RING with our 144 drives and 100 partitions per drive so we can modify the ring / partitions later and rebalance? or is safer to think about future infrastructure increase, and build the ring with those numbers in mind ?

Your partition power should take into account the largest size your cluster can be. You cannot change the partition power after you deploy the ring unless you migrate everything in your cluster (a manual process of GET from the old ring and PUT to the new ring), so it is important to select the proper partition power up front.

> 
> # 5 We put a new object into the cluster, the proxy decides where to write the object (is it in a round-robin manner ?) is the proxy server giving a "Created" response when the 1st replica is actually writen and put into the account and container SQLite databases ? or there is and ok just when the OBJECT service actually wrote the data on disc ?

The proxy sends the write to 3 object servers. The object servers write to disk and then send a request to the container servers to update the container listing. The object servers then return success to the proxy. After 2 object servers have returned success, the proxy can return success to the client.

> 
> Hope, we can shed some lights regarding this doubts.

There are obviously some details I've glossed over in the short answers above. Much of the complexity in swift comes from failure scenarios. Please ask if you need more detail.


--John

Attachment: smime.p7s
Description: S/MIME cryptographic signature


Follow ups

References