openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #14473
Re: [nova] mysql connection optimization
-
To:
openstack@xxxxxxxxxxxxxxxxxxx
-
From:
Monty Taylor <mordred@xxxxxxxxxxxx>
-
Date:
Wed, 11 Jul 2012 14:52:16 -0500
-
In-reply-to:
<CAExZKErNShcgPPY=58ZPy_ZYfYRGo+YPhQpsB39wH1Hhv+WMLA@mail.gmail.com>
-
User-agent:
Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1
On 07/11/2012 12:33 PM, Devananda van der Veen wrote:
> Hi all,
>
> I've been taking a look at the way Nova uses its MySQL database. Having
> done MySQL performance audits for years as a consultant, a few things
> jumped out right away at me. First is the way that SQLAlchemy is
> wrapping nearly every query in an unnecessary "ping check" and rollback,
> eg.:
>
> select 1;
> select ... from ... where ...;
> rollback;
> select 1;
> update ... where ...;
> commit;
> rollback;
>
>
> You can find a complete sample here: http://paste.openstack.org/show/18731/
>
> I think I understand the reason for both the "select 1" and the
> "rollback" statements. However, in the interest of performance for
> large-scale deployments, I feel pretty strongly that they need to go away.
>
> As I see it, there are three factors here.
>
> (I) Most of the code in db/sqlalchemy/api.py abstracts a "unit of work"
> to a very low level, generally a single database read or write, and does
> not currently support transactions spanning multiple consistent writes.
> This is why "select 1" and "rollback" appear around almost every query
> -- most functions in api.py is checking out a session object, doing one
> or two queries, and then releasing the session. This actually creates a
> much larger issue about transactional atomicity for larger operations,
> such as the open bug about network creation here, and is probably better
> for another discussion.
> https://bugs.launchpad.net/nova/+bug/755138.
>
> (II) Connections are tested by the MySQLPingListener function, using
> "SELECT 1" statements, every time a connection / session is checked out
> of the pool. Since connections are usually OK, this adds overhead
> unnecessarily. It would be more efficient to handle the errors when
> connections aren't OK. I've opened a bug with a description of one
> possible way to fix this issue. There are probably other viable
> solutions as well.
> https://bugs..launchpad.net/nova/+bug/1007027
> <https://bugs.launchpad.net/nova/+bug/1007027>
>
> My understanding is that this was implemented to prevent "Database has
> gone away" errors from occurring every time that the database is
> restarted, or when connections are closed by the database after being
> idle for long periods. In my opinion, a better solution to these
> problems is to:
> * wrap queries in retry logic, which will catch disconnect errors and
> attempt to reconnect to the database. I have a patch for this, if folks
> are interested.
> * set Nova's sql_idle_timeout to less than MySQL's wait_timeout, and set
> them both to a reasonably short interval (eg, 1 minute), rather than the
> default (which I think is 8 hours).
>
> (III) Transaction state is reset when connections are returned to the
> pool, even if no transaction was in progress, or the
> transaction-in-progress already committed. This is completely wasteful,
> and easy to disable.
> https://bugs.launchpad.net/nova/+bug/1007038
>
> Caveat here is that this reset-on-return functionality *is* useful when
> code doesn't properly clean up its own transaction state. When I turned
> it off, it exposed a few bugs, which I haven't tracked down yet.
> Lowering the sql_idle_timeout will provide the same "solution" as the
> current reset-on-return behavior in that it will prevent long-running
> idle transactions that tie up resources unnecessarily.
>
>
> In summary, I'd like to see Nova stop spamming the database with "SELECT
> 1" and "ROLLBACK", and think this should be pretty easy to do. Testing
> these two changes in my devstack seems to work fine, but I'd like to
> know what others think before I propose a changeset with this kind of
> potential impact.
I agree and welcome work in that direction. Should we be looking at
patches to sqlalchemy to increase its support for dealing appropriately
with mysql-gone-away errors?
Monty
References