openstack team mailing list archive

Thread
Date

Re: Caching strategies in Nova ...

To: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
From: "Mark Washenberger" <mark.washenberger@xxxxxxxxxxxxx>
Date: Fri, 23 Mar 2012 10:56:13 -0400 (EDT)
Importance: Normal
In-reply-to: <4F6C64B6.5000602@rackspace.com>

Alas, I let my patch get too stale to rebase properly. However, it is a
fairly "dumb" approach I took that can be demonstrated just from the
patch. And in any case, I think the approach you're taking, profiling
based on Tach, is going to be better in the long run and more share-able
in the community.

+ 1 gazillion to getting good metrics!

"Sandy Walsh" <sandy.walsh@xxxxxxxxxxxxx> said:

> (resent to list as I realized I just did a Reply)
> 
> Cool! This is great stuff. Look forward to seeing the branch.
> 
> I started working on a similar tool that takes the data collected from
> Tach and fetches the data from Graphite to look at the performance
> issues (no changes to nova trunk requires since Tach is awesome).
> 
> It's a shell of an idea yet, but the basics work:
> https://github.com/ohthree/novaprof
> 
> But if there is something already existing, I'm happy to kill it off.
> 
> I don't doubt for a second the db is the culprit for many of our woes.
> 
> The thing I like about internal caching using established tools is that
> it works for db issues too without having to resort to custom tables.
> SQL query optimization, I'm sure, will go equally far.
> 
> Thanks again for the great feedback ... keep it comin'!
> 
> -S
> 
> 
> On 03/22/2012 11:53 PM, Mark Washenberger wrote:
>> Working on this independently, I created a branch with some simple
>> performance logging around the nova-api, and individually around
>> glance, nova.db, and nova.rpc calls. (Sorry, I only have a local
>> copy and its on a different computer right now, and probably needs
>> a rebase. I will rebase and publish it on GitHub tomorrow.)
>>
>> With this logging, I could get some simple profiling that I found
>> very useful. Here is a GH project with the analysis code as well
>> as some nova-api logs I was using as input.
>>
>> https://github.com/markwash/nova-perflog
>>
>> With these tools, you can get a wall-time profile for individual
>> requests. For example, looking at one server create request (and
>> you can run this directly from the checkout as the logs are saved
>> there):
>>
>> markw@poledra:perflogs$ cat nova-api.vanilla.1.5.10.log | python
>> profile-request.py req-3cc0fe84-e736-4441-a8d6-ef605558f37f
>> key                                        count    avg
>> nova.api.openstack.wsgi.POST                   1  0.657
>> nova.db.api.instance_update                    1  0.191
>> nova.image.show                                1  0.179
>> nova.db.api.instance_add_security_group        1  0.082
>> nova.rpc.cast                                  1  0.059
>> nova.db.api.instance_get_all_by_filters        1  0.034
>> nova.db.api.security_group_get_by_name         2  0.029
>> nova.db.api.instance_create                    1  0.011
>> nova.db.api.quota_get_all_by_project           3  0.003
>> nova.db.api.instance_data_get_for_project      1  0.003
>>
>> key                      count  total
>> nova.api.openstack.wsgi      1  0.657
>> nova.db.api                 10  0.388
>> nova.image                   1  0.179
>> nova.rpc                     1  0.059
>>
>> All times are in seconds. The nova.rpc time is probably high
>> since this was the first call since server restart, so the
>> connection handshake is probably included. This is also probably
>> 1.5 months stale.
>>
>> The conclusion I reached from this profiling is that we just plain
>> overuse the db (and we might do the same in glance). For example,
>> whenever we do updates, we actually re-retrieve the item from the
>> database, update its dictionary, and save it. This is double the
>> cost it needs to be. We also handle updates for data across tables
>> inefficiently, where they could be handled in single database round
>> trip.
>>
>> In particular, in the case of server listings, extensions are just
>> rough on performance. Most extensions hit the database again
>> at least once. This isn't really so bad, but it clearly is an area
>> where we should improve, since these are the most frequent api
>> queries.
>>
>> I just see a ton of specific performance problems that are easier
>> to address one by one, rather than diving into a general (albeit
>> obvious) solution such as caching.
>>
>>
>> "Sandy Walsh" <sandy.walsh@xxxxxxxxxxxxx> said:
>>
>>> We're doing tests to find out where the bottlenecks are, caching is the
>>> most obvious solution, but there may be others. Tools like memcache do a
>>> really good job of sharing memory across servers so we don't have to
>>> reinvent the wheel or hit the db at all.
>>>
>>> In addition to looking into caching technologies/approaches we're gluing
>>> together some tools for finding those bottlenecks. Our first step will
>>> be finding them, then squashing them ... however.
>>>
>>> -S
>>>
>>> On 03/22/2012 06:25 PM, Mark Washenberger wrote:
>>>> What problems are caching strategies supposed to solve?
>>>>
>>>> On the nova compute side, it seems like streamlining db access and
>>>> api-view tables would solve any performance problems caching would
>>>> address, while keeping the stale data management problem small.
>>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>

diff --git nova/api/openstack/wsgi.py nova/api/openstack/wsgi.py
index af62f9b..70f4474 100644
--- nova/api/openstack/wsgi.py
+++ nova/api/openstack/wsgi.py
@@ -769,6 +769,9 @@ class Resource(wsgi.Application):
 
         LOG.info("%(method)s %(url)s" % {"method": request.method,
                                          "url": request.url})
+        LOG.info("MJW %(method)s %(url)s" % {"method": request.method,
+                                             "url": request.url},
+                 context=request.environ['nova.context'])
 
         # Identify the action, its arguments, and the requested
         # content type
@@ -782,8 +785,12 @@ class Resource(wsgi.Application):
         #            function.  If we try to audit __call__(), we can
         #            run into troubles due to the @webob.dec.wsgify()
         #            decorator.
-        return self._process_stack(request, action, action_args,
-                                   content_type, body, accept)
+        ret = self._process_stack(request, action, action_args,
+                                  content_type, body, accept)
+        LOG.info("MJW finish %(method)s %(url)s" % {"method": request.method,
+                                                    "url": request.url},
+                 context=request.environ['nova.context'])
+        return ret
 
     def _process_stack(self, request, action, action_args,
                        content_type, body, accept):
diff --git nova/db/api.py nova/db/api.py
index a886945..63bdf69 100644
--- nova/db/api.py
+++ nova/db/api.py
@@ -42,11 +42,13 @@ these objects be simple dictionaries.
                        pool of available hardware (Default: True)
 
 """
+import functools
 
 from nova import exception
 from nova import flags
 from nova.openstack.common import cfg
 from nova import utils
+from nova import log as logging
 
 
 db_opts = [
@@ -73,8 +75,34 @@ db_opts = [
 FLAGS = flags.FLAGS
 FLAGS.add_options(db_opts)
 
-IMPL = utils.LazyPluggable(FLAGS['db_backend'],
-                           sqlalchemy='nova.db.sqlalchemy.api')
+
+LOG = logging.getLogger('nova.db.api')
+
+
+class LogEverything(object):
+
+    def __init__(self, wrapped):
+        self.wrapped = wrapped
+
+    def __getattr__(self, name):
+        original = getattr(self.wrapped, name)
+        if callable(original):
+            return self._log_everything(original, name)
+        else:
+            return original
+
+    def _log_everything(self, fun, name):
+        @functools.wraps(fun)
+        def logged_fun(context, *args, **kwargs):
+            LOG.info("MJW %s" % name, context=context)
+            ret = fun(context, *args, **kwargs)
+            LOG.info("MJW finish %s" % name, context=context)
+            return ret
+        return logged_fun
+
+
+IMPL = LogEverything(utils.LazyPluggable(FLAGS['db_backend'],
+                           sqlalchemy='nova.db.sqlalchemy.api'))
 
 
 class NoMoreNetworks(exception.Error):
diff --git nova/image/__init__.py nova/image/__init__.py
index c5ca6f3..ac047fc 100644
--- nova/image/__init__.py
+++ nova/image/__init__.py
@@ -14,19 +14,47 @@
 #    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 #    License for the specific language governing permissions and limitations
 #    under the License.
-
+import functools
 
 import nova
 from nova import utils
 from nova import flags
 from nova.image import glance
+from nova import log as logging
 
 FLAGS = flags.FLAGS
+LOG = logging.getLogger('nova.image')
+
+
+class LogEverything(object):
+
+    def __init__(self, *args, **kwargs):
+        if not hasattr(self, 'already_inited'):
+            self.wrapped = args[0]
+            self.already_inited = True
+        else:
+            self.wrapped.__init__(*args, **kwargs)
+
+    def __getattr__(self, name):
+        original = getattr(self.wrapped, name)
+        if callable(original):
+            return self._log_everything(original, name)
+        else:
+            return original
+
+    def _log_everything(self, fun, name):
+        @functools.wraps(fun)
+        def logged_fun(context, *args, **kwargs):
+            LOG.info("MJW %s" % name, context=context)
+            ret = fun(context, *args, **kwargs)
+            LOG.info("MJW finish %s" % name, context=context)
+            return ret
+        return logged_fun
 
 
 def get_default_image_service():
     ImageService = utils.import_class(FLAGS.image_service)
-    return ImageService()
+    return LogEverything(ImageService())
 
 
 def get_image_service(context, image_href):
@@ -48,4 +76,4 @@ def get_image_service(context, image_href):
         (glance_client, image_id) = glance.get_glance_client(context,
                                                              image_href)
         image_service = nova.image.glance.GlanceImageService(glance_client)
-        return (image_service, image_id)
+        return (LogEverything(image_service), image_id)
diff --git nova/log.py nova/log.py
index c0bc256..00a203f 100644
--- nova/log.py
+++ nova/log.py
@@ -30,6 +30,7 @@ It also allows setting of formatting information through flags.
 """
 
 import cStringIO
+import eventlet
 import inspect
 import json
 import logging
@@ -49,6 +50,7 @@ from nova import version
 log_opts = [
     cfg.StrOpt('logging_context_format_string',
                default='%(asctime)s %(levelname)s %(name)s [%(request_id)s '
+                       '%(eventlet_id)x '
                        '%(user_id)s %(project_id)s] %(instance)s'
                        '%(message)s',
                help='format string to use for log messages with context'),
@@ -177,6 +179,8 @@ class NovaLogger(logging.Logger):
             params['extra'] = {}
         extra = params['extra']
 
+        extra['eventlet_id'] = id(eventlet.getcurrent())
+
         context = None
         if 'context' in params:
             context = params['context']
diff --git nova/rpc/__init__.py nova/rpc/__init__.py
index 7c6ed29..80a7a7c 100644
--- nova/rpc/__init__.py
+++ nova/rpc/__init__.py
@@ -16,11 +16,13 @@
 #    WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 #    License for the specific language governing permissions and limitations
 #    under the License.
+import functools
 
 from nova import flags
 from nova.openstack.common import cfg
 from nova.rpc.common import RemoteError, LOG
 from nova.utils import import_object
+from nova import log as logging
 
 
 rpc_backend_opt = \
@@ -30,6 +32,29 @@ rpc_backend_opt = \
 
 FLAGS = flags.FLAGS
 FLAGS.add_option(rpc_backend_opt)
+LOG = logging.getLogger('nova.rpc')
+
+
+class LogEverything(object):
+
+    def __init__(self, wrapped):
+        self.wrapped = wrapped
+
+    def __getattr__(self, name):
+        original = getattr(self.wrapped, name)
+        if callable(original):
+            return self._log_everything(original, name)
+        else:
+            return original
+
+    def _log_everything(self, fun, name):
+        @functools.wraps(fun)
+        def logged_fun(context, *args, **kwargs):
+            LOG.info("MJW %s" % name, context=context)
+            ret = fun(context, *args, **kwargs)
+            LOG.info("MJW finish %s" % name, context=context)
+            return ret
+        return logged_fun
 
 
 def create_connection(new=True):
@@ -169,5 +194,5 @@ def _get_impl():
     """Delay import of rpc_backend until FLAGS are loaded."""
     global _RPCIMPL
     if _RPCIMPL is None:
-        _RPCIMPL = import_object(FLAGS.rpc_backend)
+        _RPCIMPL = LogEverything(import_object(FLAGS.rpc_backend))
     return _RPCIMPL

References

Caching strategies in Nova ...
From: Sandy Walsh, 2012-03-22
Re: Caching strategies in Nova ...
From: Mark Washenberger, 2012-03-22
Re: Caching strategies in Nova ...
From: Sandy Walsh, 2012-03-22
Re: Caching strategies in Nova ...
From: Mark Washenberger, 2012-03-23
Re: Caching strategies in Nova ...
From: Sandy Walsh, 2012-03-23