← Back to team overview

openstack team mailing list archive

Re: Caching strategies in Nova ...


Working on this independently, I created a branch with some simple
performance logging around the nova-api, and individually around 
glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
copy and its on a different computer right now, and probably needs
a rebase. I will rebase and publish it on GitHub tomorrow.) 

With this logging, I could get some simple profiling that I found
very useful. Here is a GH project with the analysis code as well
as some nova-api logs I was using as input. 


With these tools, you can get a wall-time profile for individual
requests. For example, looking at one server create request (and
you can run this directly from the checkout as the logs are saved

markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
key                                        count    avg
nova.api.openstack.wsgi.POST                   1  0.657
nova.db.api.instance_update                    1  0.191
nova.image.show                                1  0.179
nova.db.api.instance_add_security_group        1  0.082
nova.rpc.cast                                  1  0.059
nova.db.api.instance_get_all_by_filters        1  0.034
nova.db.api.security_group_get_by_name         2  0.029
nova.db.api.instance_create                    1  0.011
nova.db.api.quota_get_all_by_project           3  0.003
nova.db.api.instance_data_get_for_project      1  0.003

key                      count  total
nova.api.openstack.wsgi      1  0.657
nova.db.api                 10  0.388
nova.image                   1  0.179
nova.rpc                     1  0.059

All times are in seconds. The nova.rpc time is probably high
since this was the first call since server restart, so the
connection handshake is probably included. This is also probably
1.5 months stale.

The conclusion I reached from this profiling is that we just plain
overuse the db (and we might do the same in glance). For example,
whenever we do updates, we actually re-retrieve the item from the
database, update its dictionary, and save it. This is double the
cost it needs to be. We also handle updates for data across tables
inefficiently, where they could be handled in single database round

In particular, in the case of server listings, extensions are just
rough on performance. Most extensions hit the database again
at least once. This isn't really so bad, but it clearly is an area
where we should improve, since these are the most frequent api

I just see a ton of specific performance problems that are easier
to address one by one, rather than diving into a general (albeit
obvious) solution such as caching.

"Sandy Walsh" <sandy.walsh@xxxxxxxxxxxxx> said:

> We're doing tests to find out where the bottlenecks are, caching is the
> most obvious solution, but there may be others. Tools like memcache do a
> really good job of sharing memory across servers so we don't have to
> reinvent the wheel or hit the db at all.
> In addition to looking into caching technologies/approaches we're gluing
> together some tools for finding those bottlenecks. Our first step will
> be finding them, then squashing them ... however.
> -S
> On 03/22/2012 06:25 PM, Mark Washenberger wrote:
>> What problems are caching strategies supposed to solve?
>> On the nova compute side, it seems like streamlining db access and
>> api-view tables would solve any performance problems caching would
>> address, while keeping the stale data management problem small.
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

Follow ups