← Back to team overview

openstack team mailing list archive

Re: Swift performance for very small objects

 

Remember that when an object is written to swift, it's not written
just to the  object server, the container and account servers are
updated as well... the container for object listings (and timestmaps)
and the account for overall statistics. Also, the proxy ensures a
quorum for the newly written object - there will be 2/3 of the
replicas written before the request is ack'd to the client.
If you're trying to find ways to optimize swift for performance,
especially large clusters, I'd probably focus on performance
optimization of the account and container servers.
A few more thoughts:
 * swift is designed to scale out very well - both across machines and
across disks. You effectively defeat that scaling when you use
loopback devices - since your effectively force all the disk activity
onto the same physical disk.
 * you might want to ""prime"" your environment before your
performance tests. Things like ARP caches, and DNS name resolution.
Also, make sure to ""prime"" your accounts and containers, and not
have them be created as part of the test.
 * There are some caches in swift that run around 128K entries (in the
account / container servers). You might want to run larger tests, to
make sure you get those flashed once in a while.
 * once you have real disks, you might want to play around with disk
to zone ratios. Replicas are guaranteed to go to different zones....so
the number of disk-spindles in a zone will affect the overall
performance of your cluster.

It will be interesting to hear more about your results !

Oh... persistent connections. I believe the python httplib will
auto-negotiate persistent connections, so no app level code is
required (good thought though ;)


On Sat, May 19, 2012 at 9:34 PM, Paulo Ricardo Motta Gomes
<pauloricardomg@xxxxxxxxx> wrote:
> Hello,
>
> I'm doing some experiments in a Swift cluster testbed of 9 nodes/devices and
> 3 zones (3 nodes on each zone).
>
> In one of my tests, I noticed that PUTs of very small objects are extremely
> inefficient.
>
> - 5000 PUTs of objects with an average size of 40K - total of 195MB - took
> 67s (avg time per request: 0.0135s)
> - 5000 PUTS of objects with an average size of 190 bytes - total of 930KB -
> took 60s (avg time per request: 0.0123s)
>
> I plotted object size vs request time and found that there is significant
> difference in request times only after 200KB. When objects are smaller than
> this PUT requests have a minimum execution time of 0.01s, no matter the
> object size.
>
> I suppose swift is not optimized for such small objects, but I wonder what
> is the main cause for this, if it's the HTTP overhead or disk writing. I
> checked the log of the object servers and requests are taking an average of
> 0.006s, whether objects are 40K or 190 bytes, which indicate part of the
> bottleneck could be at the disk. Curently I'm using a loopback device for
> storage.
>
> I thought that maybe this could be improved a bit if the proxy server
> maintained persistent connections to the storage nodes instead of opening a
> new one for each request?
>
> It would be great if you could share your thoughts on this and how could the
> performance of this special case be improved.
>
> Cheers,
>
> Paulo
>
> --
> European Master in Distributed Computing
> Royal Institute of Technology - KTH
> Instituto Superior Técnico - IST
> http://paulormg.com
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>


References