← Back to team overview

openstack team mailing list archive

Re: [OpenStack][Swift] Some questions about the performance of swift .

 

2012/7/21 Paulo Ricardo Motta Gomes <pauloricardomg@xxxxxxxxx>

> Have you monitored the cpu utilization of the proxy server and the storage
> nodes? I did similar tests with Swift and the proxy server exhausted its
> capacity with a only few concurrent requests for very small objects.


Maximum CPU utilization on Proxy is reached to 100% on all cores at
beginning. and 60~70 % on storage nodes.
It seems up and down in a periodical duration. I did several system tunings
, such as sysctl , ulimit etc... The concurrency request could be 500+  for
4K size though.


> If you notice object servers are not overloaded, but proxy is overloaded,
> a solution might be to have more proxy servers if you hav
>

Result still same with multiple proxy servers(2-4) . and each powerful
swift-bench clients for each proxy node.  Also , I did a test by
swift-benc's direct client function to a particular node . I found there's
a closed result .  Once more objects been uploaded . The

for chunk in iter(lambda: reader(self.network_chunk_size), ''):

take lots of time periodically.



>
> It seems a problem of overload, since there are only 4 servers in the
> system and a large level of concurrency. Have you tried slowly increasing
> the number of concurrency to find the point where the problem starts? This
> point may be the capacity of your system.
>

Last week , I got more servers from another HW providers with more
CPU/RAM/DISKs . 12 Disks in each storage node.  This deployment of swift
cluster keep in better performance for longer time. Unfortunately , after
15,000,000 object . The performance reduced to half and the Failure
appeared.
I concerned about that if the (total number objs/disk numbers) = ?  will
cause such affect in large deployment.(aka. cloud storage provider ,
telecom , bank etc.)

Really confusing  ......


>
> Also, are you using persistent connections to the proxy server to send the
> object? If so, maybe try to renew them once in a while.
>

Renew connections for each round in swift-bench as I know.

Well , swift-bench create a connection pool with concurrency = x
connections . I think that connections been renew in every round.

Something strange is that the performance back to beginning while I flush
all data on storage nodes. (whatever by format disk / r m )


>
> Cheers,
>
> Paulo
>
Thanks for your reply

>
> 2012/7/20 Kuo Hugo <tonytkdk@xxxxxxxxx>
>
>> Hi Sam , and all openstacker
>>
>> This is Hugo . I'm facing an issue about the performance  *degradation*  of
>> swift .
>> I tried to figure out the problem of the issue which I faced in recent
>> days.
>>
>> Environment :
>> Swift version : master branch . latest code.
>> Tried on Ubuntu 12.04/11.10
>> 1 Swift-proxy : 32GB-ram / CPU 4*2 / 1Gb NIC*2
>> 3 Storage-nodes : each for 32GB-ram / CPU 4*2 / 2TB*7 / 1Gb NIC*2
>>
>> storage nodes runs only main workers(object-server , container-server ,
>> account-server)
>>
>> I'm in testing with 4K size objects by swift-bench.
>>
>> Per round bench.conf
>> object_size = 4096
>> Concurrency : 200
>> Object number: 200000
>> Containers : 200
>> no delete objects ..
>>
>> At beginning , everything works fine in my environment.  The average
>> speed of PUT is reached to 1200/s .
>> After several rounds test . I found that the performance is down to
>> 300~400/s
>> And after more rounds , failures appeared  , and ERROR in proxy's log as
>> followed
>>
>> Jul 20 18:44:54 angryman-proxy-01 proxy-server ERROR with Object server
>> 192.168.100.101:36000/DISK5 re: Trying to get final status of PUT to
>> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_51/87b48a3474c7485c95aeef95c6911afb:
>> Timeout (10s) (txn: txb4465d895c9345be95d81632db9729af) (client_ip:
>> 172.168.1.2)
>> Jul 20 18:44:54 angryman-proxy-01 proxy-server ERROR with Object server
>> 192.168.100.101:36000/DISK4 re: Trying to get final status of PUT to
>> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_50/7405e5824cff411f8bb3ecc7c52ffd5a:
>> Timeout (10s) (txn: txe0efab51f99945a7a09fa664b821777f) (client_ip:
>> 172.168.1.2)
>> Jul 20 18:44:55 angryman-proxy-01 proxy-server ERROR with Object server
>> 192.168.100.101:36000/DISK5 re: Trying to get final status of PUT to
>> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_33/f322f4c08b124666bf7903812f4799fe:
>> Timeout (10s) (txn: tx8282ecb118434f828b9fb269f0fb6bd0) (client_ip:
>> 172.168.1.2)
>>
>>
>> After trace the code of object-server swift/obj/server.py and insert a
>> timer on
>> https://github.com/openstack/swift/blob/master/swift/obj/server.py#L591
>>
>>
>> for chunk in iter(lambda: reader(self.network_chunk_size), ''):
>>
>>
>> Seems that the reader sometimes took a lot of time for receiving data
>> from wsgi.input. Not every request , it looks like has a time of periods.
>>
>> So that I check the history of Swift , I saw your commit
>> https://github.com/openstack/swift/commit/783f16035a8e251d2138eb5bbaa459e9e4486d90
>>  . That's the only one which close to my issue.  So that I hope that
>> there's some suggestions for me.
>>
>> My considerations :
>>
>> 1. Does it possible  caused by greenio switch ?
>>
>> 2. Does it related to the number of objects existing on storage disks ?
>>
>> 3. Did someone play with swift by small size + fast client request ?
>>
>> 4. I found that the performance would never back to 1200/s . The only way
>> to do is flush all data from disk. Once disk cleaned , the performance get
>> back to  the best one.
>>
>> 5. I re-read entire workflow of object server to handle a PUT request , I
>> don't understand the reason why that the number of objects will affect
>> reading wsgi.input data. With 4K size objects. no need to be chunked as I
>> know.
>>
>>
>> The time consumed by *reader(self.network_chunk_size)*
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.001391
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.001839
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.00164
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.002786
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 2.716707
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 1.005659
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.055982
>>
>> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.002205
>>
>>
>> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.000968
>>
>> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001328
>>
>> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 10.003368
>>
>> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001243
>>
>> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001562
>>
>>
>> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 0.001067
>>
>> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 13.804413
>>
>> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 5.301166
>>
>> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 0.001167
>>
>>
>>
>>
>> Would it be a bug of eventlet or SWIFT ?   Please feel free to let me
>> know that should I file a bug for Swift .
>>
>> Appreciate ~
>>
>> --
>> +Hugo Kuo+
>> tonytkdk@xxxxxxxxx
>> + <tonytkdk@xxxxxxxxx>886 935004793
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Paulo Ricardo
>
> --
> European Master in Distributed Computing***
> Royal Institute of Technology - KTH
> *
> *Instituto Superior Técnico - IST*
> *http://paulormg.com*
>
>


-- 
+Hugo Kuo+
tonytkdk@xxxxxxxxx
+ <tonytkdk@xxxxxxxxx>886 935004793

Follow ups

References