openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #09078
Re: Caching strategies in Nova ...
+1
Documenting these findings would be nice too.
best,
Joe
On Fri, Mar 23, 2012 at 2:15 PM, Justin Santa Barbara
<justin@xxxxxxxxxxxx>wrote:
> This is great: hard numbers are exactly what we need. I would love to see
> a statement-by-statement SQL log with timings from someone that has a
> performance issue. I'm happy to look into any DB problems that
> demonstrates.
>
> The nova database is small enough that it should always be in-memory (if
> you're running a million VMs, I don't think asking for one gigabyte of RAM
> on your DB is unreasonable!)
>
> If it isn't hitting disk, PostgreSQL or MySQL with InnoDB can serve 10k
> 'indexed' requests per second through SQL on a low-end (<$1000) box. With
> tuning you can get 10x that. Using one of the SQL bypass engines (e.g.
> MySQL HandlerSocket) can supposedly give you 10x again. Throwing money at
> the problem in the form of multi-processor boxes (or disks if you're I/O
> bound) can probably get you 10x again.
>
> However, if you put a DB on a remote host, you'll have to wait for a
> network round-trip per query. If your ORM is doing a 1+N query, the total
> read time will be slow. If your DB is doing a sync on every write, writes
> will be slow. If the DB isn't tuned with a sensible amount of cache (at
> least as big as the DB size), it will be slow(er). Each of these has a
> very simple fix for OpenStack.
>
> Relational databases have very efficient caching mechanisms built in. Any
> out-of-process cache will have a hard time beating it. Let's make sure the
> bottleneck is the DB, and not (for example) RabbitMQ, before we go off a
> huge rearchitecture.
>
> Justin
>
>
>
>
> On Thu, Mar 22, 2012 at 7:53 PM, Mark Washenberger <
> mark.washenberger@xxxxxxxxxxxxx> wrote:
>
>> Working on this independently, I created a branch with some simple
>> performance logging around the nova-api, and individually around
>> glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
>> copy and its on a different computer right now, and probably needs
>> a rebase. I will rebase and publish it on GitHub tomorrow.)
>>
>> With this logging, I could get some simple profiling that I found
>> very useful. Here is a GH project with the analysis code as well
>> as some nova-api logs I was using as input.
>>
>> https://github.com/markwash/nova-perflog
>>
>> With these tools, you can get a wall-time profile for individual
>> requests. For example, looking at one server create request (and
>> you can run this directly from the checkout as the logs are saved
>> there):
>>
>> markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
>> profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
>> key count avg
>> nova.api.openstack.wsgi.POST 1 0.657
>> nova.db.api.instance_update 1 0.191
>> nova.image.show 1 0.179
>> nova.db.api.instance_add_security_group 1 0.082
>> nova.rpc.cast 1 0.059
>> nova.db.api.instance_get_all_by_filters 1 0.034
>> nova.db.api.security_group_get_by_name 2 0.029
>> nova.db.api.instance_create 1 0.011
>> nova.db.api.quota_get_all_by_project 3 0.003
>> nova.db.api.instance_data_get_for_project 1 0.003
>>
>> key count total
>> nova.api.openstack.wsgi 1 0.657
>> nova.db.api 10 0.388
>> nova.image 1 0.179
>> nova.rpc 1 0.059
>>
>> All times are in seconds. The nova.rpc time is probably high
>> since this was the first call since server restart, so the
>> connection handshake is probably included. This is also probably
>> 1.5 months stale.
>>
>> The conclusion I reached from this profiling is that we just plain
>> overuse the db (and we might do the same in glance). For example,
>> whenever we do updates, we actually re-retrieve the item from the
>> database, update its dictionary, and save it. This is double the
>> cost it needs to be. We also handle updates for data across tables
>> inefficiently, where they could be handled in single database round
>> trip.
>>
>> In particular, in the case of server listings, extensions are just
>> rough on performance. Most extensions hit the database again
>> at least once. This isn't really so bad, but it clearly is an area
>> where we should improve, since these are the most frequent api
>> queries.
>>
>> I just see a ton of specific performance problems that are easier
>> to address one by one, rather than diving into a general (albeit
>> obvious) solution such as caching.
>>
>>
>> "Sandy Walsh" <sandy.walsh@xxxxxxxxxxxxx> said:
>>
>> > We're doing tests to find out where the bottlenecks are, caching is the
>> > most obvious solution, but there may be others. Tools like memcache do a
>> > really good job of sharing memory across servers so we don't have to
>> > reinvent the wheel or hit the db at all.
>> >
>> > In addition to looking into caching technologies/approaches we're gluing
>> > together some tools for finding those bottlenecks. Our first step will
>> > be finding them, then squashing them ... however.
>> >
>> > -S
>> >
>> > On 03/22/2012 06:25 PM, Mark Washenberger wrote:
>> >> What problems are caching strategies supposed to solve?
>> >>
>> >> On the nova compute side, it seems like streamlining db access and
>> >> api-view tables would solve any performance problems caching would
>> >> address, while keeping the stale data management problem small.
>> >>
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~openstack
>> > Post to : openstack@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~openstack
>> > More help : https://help.launchpad.net/ListHelp
>> >
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
>
Follow ups
References