← Back to team overview

openstack team mailing list archive

Re: [OpenStack] SWIFT Object Store spanning multiple data centers

 

The global clusters feature is Swift is very new and just now being finished up. We are finishing up the last part of it and will have it completed in our next release (tentatively scheduled for June 27). The last part is the affinity write (ie don't write to a WAN region).

The regions concept is exactly as you have described it: use it for separate DCs. With 3 replicas, you'll have 2 replicas in one DC and one in the other. In your case, because of the way you have configured zones, it looks like you'll always have 2 replicas in A and one in B. Note that this is not a requirement of the system: you should set up your zones to match your failure domains.

Are you using the separate replication network feature (it's not required, but it may allow you some more control over the cross-DC replication)?

What is the latency between your DCs?

--John



On Jun 16, 2013, at 9:58 PM, Balamurugan V G <balamuruganvg@xxxxxxxxx> wrote:

> Hi,
> 
> I am exploring setting up a SWIFT Object Store across two data
> centers. Lets say I have DC-A and DC-B. I have setup a swift-proxy and
> two swift-storage nodes in DC-A. And I have setup one storage node in
> DC-B. This is just an experimental setup and if this works well, will
> have more storage nodes and proxy nodes in each DC. I have added the
> storage nodes in DC-A in Zone1 and Zone2. And storage nodes in DC-B is
> in Zone3. The replication count has been set to 3. My goal is to setup
> a multi site OpenStack and I am exploring using SWIFT to store the
> images such that the images can be shared across the DCs.
> 
> Here are my questions:
> 
>   1. There seems to be a concept of regions. How do I use that with
> SWIFT in this case. I cant find any good documentation on it.
>   2. In my current setup explained above, I can see that the
> partitions are getting copied(and synced) fine between the nodes in
> DC-A as confirmed by the used size returned by 'df -h /srv/node/sdb1'
> (I know its crube but its good enough). I see that the node in DC-B
> behaves differently. I see that the partitions are copied to this node
> and then removed again continuously. It never settles. That is for
> example, if I have a 5G content stored in the system, the DC-A nodes
> shows that 5Gb is used. But the DC-B node shows that it increases to
> 5Gb and it then drops again to say 2Gb and then again increases to 5gb
> and then drop again and so forth.The rsyncd logs shows few errors as
> shown below:
> 
> 2013/06/17 04:38:54 [6965] receiving file list
> 2013/06/17 04:53:52 [6963] rsync: connection unexpectedly closed
> (731405377 bytes received so far) [receiver]
> 2013/06/17 04:53:52 [6963] rsync error: error in rsync protocol data
> stream (code 12) at io.c(605) [receiver=3.0.9]
> 2013/06/17 04:53:52 [6963] rsync: connection unexpectedly closed (87
> bytes received so far) [generator]
> 2013/06/17 04:53:52 [6963] rsync error: error in rsync protocol data
> stream (code 12) at io.c(605) [generator=3.0.9]
> 2013/06/17 04:53:54 [6965] rsync: connection unexpectedly closed
> (716563202 bytes received so far) [receiver]
> 2013/06/17 04:53:54 [6965] rsync error: error in rsync protocol data
> stream (code 12) at io.c(605) [receiver=3.0.9]
> 2013/06/17 04:53:54 [6965] rsync: connection unexpectedly closed (87
> bytes received so far) [generator]
> 2013/06/17 04:53:54 [6965] rsync error: error in rsync protocol data
> stream (code 12) at io.c(605) [generator=3.0.9]
> 2013/06/16 21:54:24 [6996] name lookup failed for 10.5.64.47:
> Temporary failure in name resolution
> 2013/06/16 21:54:24 [6996] connect from UNKNOWN (10.5.64.47)
> 2013/06/16 21:54:24 [6997] name lookup failed for 10.5.64.48:
> Temporary failure in name resolution
> 2013/06/16 21:54:24 [6997] connect from UNKNOWN (10.5.64.48)
> 2013/06/17 04:54:25 [6996] rsync to object/sdb1/objects/189659 from
> UNKNOWN (10.5.64.47)
> 2013/06/17 04:54:25 [6996] receiving file list
> 2013/06/17 04:54:25 [6997] rsync to object/sdb1/objects/189659 from
> UNKNOWN (10.5.64.48)
> 2013/06/17 04:54:25 [6997] receiving file list
> 
>    I even tried to increase the timeout value of rsync from default
> 30sec to 600 sec but I still see the same issue. What could be wrong
> here?
> 
> 
> Any help will be greatly appreciated. Also any pointers to good
> documentation on how to setup a multi site OpenStack deployment will
> be very helpful. I see that there is good documentation to getup and
> runing with a single or 3 nodes OpenStack setup, there is not much to
> know how to deploy a large scale multi site OpenStack deployment :-(
> 
> Regards,
> Balu
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp



Follow ups

References