← Back to team overview

openstack team mailing list archive

Re: [OpenStack][Swift] Some questions about the performance of swift .

 

Have you monitored the cpu utilization of the proxy server and the storage
nodes? I did similar tests with Swift and the proxy server exhausted its
capacity with a only few concurrent requests for very small objects.

If you notice object servers are not overloaded, but proxy is overloaded, a
solution might be to have more proxy servers if you hav

It seems a problem of overload, since there are only 4 servers in the
system and a large level of concurrency. Have you tried slowly increasing
the number of concurrency to find the point where the problem starts? This
point may be the capacity of your system.

Also, are you using persistent connections to the proxy server to send the
object? If so, maybe try to renew them once in a while.

Cheers,

Paulo

2012/7/20 Kuo Hugo <tonytkdk@xxxxxxxxx>

> Hi Sam , and all openstacker
>
> This is Hugo . I'm facing an issue about the performance  *degradation*  of
> swift .
> I tried to figure out the problem of the issue which I faced in recent
> days.
>
> Environment :
> Swift version : master branch . latest code.
> Tried on Ubuntu 12.04/11.10
> 1 Swift-proxy : 32GB-ram / CPU 4*2 / 1Gb NIC*2
> 3 Storage-nodes : each for 32GB-ram / CPU 4*2 / 2TB*7 / 1Gb NIC*2
>
> storage nodes runs only main workers(object-server , container-server ,
> account-server)
>
> I'm in testing with 4K size objects by swift-bench.
>
> Per round bench.conf
> object_size = 4096
> Concurrency : 200
> Object number: 200000
> Containers : 200
> no delete objects ..
>
> At beginning , everything works fine in my environment.  The average speed
> of PUT is reached to 1200/s .
> After several rounds test . I found that the performance is down to
> 300~400/s
> And after more rounds , failures appeared  , and ERROR in proxy's log as
> followed
>
> Jul 20 18:44:54 angryman-proxy-01 proxy-server ERROR with Object server
> 192.168.100.101:36000/DISK5 re: Trying to get final status of PUT to
> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_51/87b48a3474c7485c95aeef95c6911afb:
> Timeout (10s) (txn: txb4465d895c9345be95d81632db9729af) (client_ip:
> 172.168.1.2)
> Jul 20 18:44:54 angryman-proxy-01 proxy-server ERROR with Object server
> 192.168.100.101:36000/DISK4 re: Trying to get final status of PUT to
> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_50/7405e5824cff411f8bb3ecc7c52ffd5a:
> Timeout (10s) (txn: txe0efab51f99945a7a09fa664b821777f) (client_ip:
> 172.168.1.2)
> Jul 20 18:44:55 angryman-proxy-01 proxy-server ERROR with Object server
> 192.168.100.101:36000/DISK5 re: Trying to get final status of PUT to
> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_33/f322f4c08b124666bf7903812f4799fe:
> Timeout (10s) (txn: tx8282ecb118434f828b9fb269f0fb6bd0) (client_ip:
> 172.168.1.2)
>
>
> After trace the code of object-server swift/obj/server.py and insert a
> timer on
> https://github.com/openstack/swift/blob/master/swift/obj/server.py#L591
>
>
> for chunk in iter(lambda: reader(self.network_chunk_size), ''):
>
>
> Seems that the reader sometimes took a lot of time for receiving data from
> wsgi.input. Not every request , it looks like has a time of periods.
>
> So that I check the history of Swift , I saw your commit
> https://github.com/openstack/swift/commit/783f16035a8e251d2138eb5bbaa459e9e4486d90
>  . That's the only one which close to my issue.  So that I hope that
> there's some suggestions for me.
>
> My considerations :
>
> 1. Does it possible  caused by greenio switch ?
>
> 2. Does it related to the number of objects existing on storage disks ?
>
> 3. Did someone play with swift by small size + fast client request ?
>
> 4. I found that the performance would never back to 1200/s . The only way
> to do is flush all data from disk. Once disk cleaned , the performance get
> back to  the best one.
>
> 5. I re-read entire workflow of object server to handle a PUT request , I
> don't understand the reason why that the number of objects will affect
> reading wsgi.input data. With 4K size objects. no need to be chunked as I
> know.
>
>
> The time consumed by *reader(self.network_chunk_size)*
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.001391
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.001839
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.00164
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.002786
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 2.716707
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 1.005659
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.055982
>
> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.002205
>
>
> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.000968
>
> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001328
>
> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 10.003368
>
> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001243
>
> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001562
>
>
> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 0.001067
>
> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 13.804413
>
> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 5.301166
>
> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 0.001167
>
>
>
>
> Would it be a bug of eventlet or SWIFT ?   Please feel free to let me know
> that should I file a bug for Swift .
>
> Appreciate ~
>
> --
> +Hugo Kuo+
> tonytkdk@xxxxxxxxx
> + <tonytkdk@xxxxxxxxx>886 935004793
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Paulo Ricardo

-- 
European Master in Distributed Computing***
Royal Institute of Technology - KTH
*
*Instituto Superior Técnico - IST*
*http://paulormg.com*

Follow ups

References