← Back to team overview

openstack team mailing list archive

Re: Caching strategies in Nova ...

 

Great suggestions guys ... we'll give some thought on how the community
can share and compare performance measurements in a consistent way.

-S

On 03/23/2012 07:26 PM, Joe Gordon wrote:
> +1
> 
> Documenting these findings would be nice too.
> 
> 
> best,
> Joe
> 
> On Fri, Mar 23, 2012 at 2:15 PM, Justin Santa Barbara
> <justin@xxxxxxxxxxxx <mailto:justin@xxxxxxxxxxxx>> wrote:
> 
>     This is great: hard numbers are exactly what we need.  I would love
>     to see a statement-by-statement SQL log with timings from someone
>     that has a performance issue.  I'm happy to look into any DB
>     problems that demonstrates.
> 
>     The nova database is small enough that it should always be in-memory
>     (if you're running a million VMs, I don't think asking for one
>     gigabyte of RAM on your DB is unreasonable!)
> 
>     If it isn't hitting disk, PostgreSQL or MySQL with InnoDB can serve
>     10k 'indexed' requests per second through SQL on a low-end (<$1000)
>     box.  With tuning you can get 10x that.  Using one of the SQL bypass
>     engines (e.g. MySQL HandlerSocket) can supposedly give you 10x
>     again.  Throwing money at the problem in the form of multi-processor
>     boxes (or disks if you're I/O bound) can probably get you 10x again.
> 
>     However, if you put a DB on a remote host, you'll have to wait for a
>     network round-trip per query.  If your ORM is doing a 1+N query, the
>     total read time will be slow.  If your DB is doing a sync on every
>     write, writes will be slow.  If the DB isn't tuned with a sensible
>     amount of cache (at least as big as the DB size), it will be
>     slow(er).  Each of these has a very simple fix for OpenStack.
> 
>     Relational databases have very efficient caching mechanisms built
>     in.  Any out-of-process cache will have a hard time beating it.
>      Let's make sure the bottleneck is the DB, and not (for example)
>     RabbitMQ, before we go off a huge rearchitecture.
> 
>     Justin
> 
> 
> 
> 
>     On Thu, Mar 22, 2012 at 7:53 PM, Mark Washenberger
>     <mark.washenberger@xxxxxxxxxxxxx
>     <mailto:mark.washenberger@xxxxxxxxxxxxx>> wrote:
> 
>         Working on this independently, I created a branch with some simple
>         performance logging around the nova-api, and individually around
>         glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
>         copy and its on a different computer right now, and probably needs
>         a rebase. I will rebase and publish it on GitHub tomorrow.)
> 
>         With this logging, I could get some simple profiling that I found
>         very useful. Here is a GH project with the analysis code as well
>         as some nova-api logs I was using as input.
> 
>         https://github.com/markwash/nova-perflog
> 
>         With these tools, you can get a wall-time profile for individual
>         requests. For example, looking at one server create request (and
>         you can run this directly from the checkout as the logs are saved
>         there):
> 
>         markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
>         profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
>         key                                        count    avg
>         nova.api.openstack.wsgi.POST                   1  0.657
>         nova.db.api.instance_update                    1  0.191
>         nova.image.show                                1  0.179
>         nova.db.api.instance_add_security_group        1  0.082
>         nova.rpc.cast                                  1  0.059
>         nova.db.api.instance_get_all_by_filters        1  0.034
>         nova.db.api.security_group_get_by_name         2  0.029
>         nova.db.api.instance_create                    1  0.011
>         nova.db.api.quota_get_all_by_project           3  0.003
>         nova.db.api.instance_data_get_for_project      1  0.003
> 
>         key                      count  total
>         nova.api.openstack.wsgi      1  0.657
>         nova.db.api                 10  0.388
>         nova.image                   1  0.179
>         nova.rpc                     1  0.059
> 
>         All times are in seconds. The nova.rpc time is probably high
>         since this was the first call since server restart, so the
>         connection handshake is probably included. This is also probably
>         1.5 months stale.
> 
>         The conclusion I reached from this profiling is that we just plain
>         overuse the db (and we might do the same in glance). For example,
>         whenever we do updates, we actually re-retrieve the item from the
>         database, update its dictionary, and save it. This is double the
>         cost it needs to be. We also handle updates for data across tables
>         inefficiently, where they could be handled in single database round
>         trip.
> 
>         In particular, in the case of server listings, extensions are just
>         rough on performance. Most extensions hit the database again
>         at least once. This isn't really so bad, but it clearly is an area
>         where we should improve, since these are the most frequent api
>         queries.
> 
>         I just see a ton of specific performance problems that are easier
>         to address one by one, rather than diving into a general (albeit
>         obvious) solution such as caching.
> 
> 
>         "Sandy Walsh" <sandy.walsh@xxxxxxxxxxxxx
>         <mailto:sandy.walsh@xxxxxxxxxxxxx>> said:
> 
>         > We're doing tests to find out where the bottlenecks are,
>         caching is the
>         > most obvious solution, but there may be others. Tools like
>         memcache do a
>         > really good job of sharing memory across servers so we don't
>         have to
>         > reinvent the wheel or hit the db at all.
>         >
>         > In addition to looking into caching technologies/approaches
>         we're gluing
>         > together some tools for finding those bottlenecks. Our first
>         step will
>         > be finding them, then squashing them ... however.
>         >
>         > -S
>         >
>         > On 03/22/2012 06:25 PM, Mark Washenberger wrote:
>         >> What problems are caching strategies supposed to solve?
>         >>
>         >> On the nova compute side, it seems like streamlining db
>         access and
>         >> api-view tables would solve any performance problems caching
>         would
>         >> address, while keeping the stale data management problem small.
>         >>
>         >
>         > _______________________________________________
>         > Mailing list: https://launchpad.net/~openstack
>         > Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>         <mailto:openstack@xxxxxxxxxxxxxxxxxxx>
>         > Unsubscribe : https://launchpad.net/~openstack
>         > More help   : https://help.launchpad.net/ListHelp
>         >
> 
> 
> 
>         _______________________________________________
>         Mailing list: https://launchpad.net/~openstack
>         Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>         <mailto:openstack@xxxxxxxxxxxxxxxxxxx>
>         Unsubscribe : https://launchpad.net/~openstack
>         More help   : https://help.launchpad.net/ListHelp
> 
> 
> 
>     _______________________________________________
>     Mailing list: https://launchpad.net/~openstack
>     Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>     <mailto:openstack@xxxxxxxxxxxxxxxxxxx>
>     Unsubscribe : https://launchpad.net/~openstack
>     More help   : https://help.launchpad.net/ListHelp
> 
> 
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp


References