← Back to team overview

openstack team mailing list archive

Re: Nova DB Connection Pooling

 


On 09/25/2011 07:44 PM, Brian Lamar wrote:
> Hey Monty/All,
> 
> The original goal of my eventlet connection pooling patch was to
> increase overall throughput of the OpenStack API. By itself,
> SQLAlchemy provides a lot of nifty features such as connection
> pooling and connection timeout limits, but all of these were being
> lost on us because of our use of eventlet in the API.
> 
> To elaborate, when multiples users query the API simultaneously, what
> we'd expect to happen is for a greenthread to be created for each
> request. This happens as expected, however since SQLAlchemy uses the
> MySQLdb module by default all database calls block *all*
> greenthreads. This is because MySQLdb is written in C and thus can't
> be monkey patched by eventlet.

Awesome. Yes - this makes perfect sense. (although now my evil brain is
wondering about doing evil things inside of the MySQLdb c module ...)
WAIT - diversion - have you tried using Geert's MySQL Connector/Python?
Pure-python MySQL lib - would be interesting to see if eventlet could
monkeypatch it in useful ways...

> As a result, the API basically does all SQL queries in serial and
> we're obviously (?) going to have to support multiple concurrent
> connections in the API. It was very evident in load testing of the
> API that something needed to change.

I'm almost in total agreement with you - although I'm not _100%_
convinced that we'll have to support mutliple concurrent connections ...
but that's purely because I like to do empirical testing. ;) So just
considering that me being ornery and stubborn. Let's assume for now that
you're right...

> The patch introduced db_pool
> (http://eventlet.net/doc/modules/db_pool.html), which more or less
> uses threads to overcome the limitation of using MySQLdb in
> conjunction with eventlet.
> 
> Long story shot, my patch ended up creating a lot of issues and I
> absolutely agree that something needs to change ASAP.
> 
> Monty, I can try to answer your questions/concerns:
> 
>> Running the script a single time caused 16 connection threads to
>> be spawned. This seems a bit excessive.
> 
> There are two flags that define how many connections should be
> maintained per service. These flags are 'sql_min_pool_size' and
> 'sql_max_pool_size'. Unfortunately I set both of these to 10 by
> default. When I ran the test script provided by Vish I saw 15
> connections created. What seems to be happening is that initially 10
> connections are pooled and 5 additional connections are being created
> (one per greenlet it's spawning). Long story short here is that
> something is wrong because at most you should be seeing 10
> connections, so that's one outstanding issue.

Cool - that makes sense - I mean - it now makes sense why there were
sometimes 15 per connection and sometimes not. Bugs have a way of doing
that sort of thing. :)

>> When I tried spawning five copies at the same time, I wound up with
>> 60 connected threads, plus some connect timeouts, plus some of the 
>> tracebacks you're seeing below.
> 
> Absolutely. With the current defaults you'll hit limits quickly.
> 
>> More vexing is that all of these queries were doing select ... for 
>> update (makes sense why) - but that just means that we're stacking
>> up on locks in the db trying to get these resources... one of those
>> situations where greater parallelism actually isn't better.
> 
> Absolutely, a fix for this could be to place a nova.utils
> 'synchronized' lock on methods like these to make sure that they're
> only run in parallel. While this might be "vexing", that's exactly
> what this script was designed to show...a worst case (SELECT .. FOR
> UPDATE).

Great. Well, it's doing a great job!

>> Without really knowing more, my vote would be not to have
>> app-level connection pooling by default. MySQL specifically doesn't
>> really like having a bunch of unused connection sitting around for
>> long lifecycles (with a few exceptions - the exceptions always
>> prove the rule, of course)
> 
> I'm not a MySQL guru by any means, but can you explain this to me?
> I've never read anywhere that MySQL "doesn't really like having a
> bunch of unused connection sitting around for long lifecycles". It
> seems pretty logical to me to have at least 2 persistent connections
> to the database to avoid being completely blocked on database calls.

So I should probably phrase that a little differently - and some of it
is a question of scale. 2 connections, yes - 15 probably not. Obviously
many things have to do with workload, but the default scheduler in
versions of MySQL that folks are likely to be running creates an
OS-level thread for every connection to the db - which means if you have
500 compute nodes each with 2 long-lived connections to the db, you're
going to have 1000 os threads that the MySQL scheduler is going to
(poorly) deal with. (we're using extremes here for illustration
purposes) On the other hand, the MySQL connection protocol is extremely
cheap (there's an extraneous round trip, but we can live with it) - so
depending on the exact architecture, you could actually potentially see
better performance if each of your nodes connected when they need to
talk and disconnected when they were done - keeping the scheduler
pressure down on at the db side. (still - it all depends, so really just
has to be looked at/tuned - the exact opposite could also be true. :)
Additionally - per-thread memory that's allocated in a connection
context does not get freed until the threads are reaped (this means
falling all of the way out of the thread_cache) - so depending on what's
going on, you could have memory pressure issues if you had 1000 threads
that stuck around for forever, because if any of the queries on any of
those connections happened to max out that connections sort_buffer, for
instance - well, you're just going to keep that allocated sort_buffer
laying around for a while.

In any case - as with all things it's about happy medium - so
technically solving the greethreads/MySQL connection issue is important
- but also consider figuring out ways to go ahead and disconnect if
you're detecting that you're going to be idle for a while. Alternately -
make sure that the db_pool gracefully handles server-initiated
disconnects ... because that way you could just have a normal pool setup
for small to medium sized installations, and for larger ones the dba
could set connection_timeout to something low, like 2 seconds, and then
idle stuff would get reaped as it was idle.

I am honestly curious to see how https://launchpad.net/myconnpy fares.
sqlalchemy has support for it - I think you add +myconnpy to the db url
- so like mysql+myconnpy:// ... blah.

Monty
> 
> -----Original Message----- From: "Monty Taylor"
> <mordred@xxxxxxxxxxxx> Sent: Sunday, September 25, 2011 8:48pm To:
> openstack@xxxxxxxxxxxxxxxxxxx Subject: Re: [Openstack] Nova DB
> Connection Pooling
> 
> What was the intent of the connection pooling? That is, what was it 
> trying to fix?
> 
> Running the script a single time caused 16 connection threads to be 
> spawned. This seems a bit excessive.
> 
> When I tried spawning five copies at the same time, I wound up with
> 60 connected threads, plus some connect timeouts, plus some of the 
> tracebacks you're seeing below.
> 
> Increasing the thread_cache_size from the default on ubuntu (which is
> 8) helped things, but things still seemed to be going to the bad
> place.
> 
> More vexing is that all of these queries were doing select ... for 
> update (makes sense why) - but that just means that we're stacking up
> on locks in the db trying to get these resources... one of those
> situations where greater parallelism actually isn't better.
> 
> Without really knowing more, my vote would be not to have app-level 
> connection pooling by default. MySQL specifically doesn't really
> like having a bunch of unused connection sitting around for long
> lifecycles (with a few exceptions - the exceptions always prove the
> rule, of course)
> 
> Of course- I could be wrong... which is why I'd like to know more
> about what the issue was that incited connection pooling.
> 
> Monty
> 
> On 09/25/2011 01:53 PM, Vishvananda Ishaya wrote:
>> Hey everyone,
>> 
>> I'm a bit concerned with the connection pooling in the db.  It
>> seems that things are not getting cleaned up properly.  I have a
>> repro-case that causes failures that we have seen before.  if I
>> revert the nova/db/sqlalchemy/session.py to before the eventlet db
>> pool was added I get no failures.  If you want to see the issue,
>> try the attached code.  You will need to run from the nova
>> directory or do python setup.py develop.  You will also need to
>> create a mysql database called test and edit the sql_connection
>> string if you have a mysql password, etc.  Please check this code.
>> If we can't come up with a fix, I htink we need to revert back to
>> no connection pooling.
>> 
>> Run the attached script at least 3 times The code below runs fine
>> the first couple of times, Then it starts to fail with the
>> following error:
>> 
>> 2011-09-24 12:51:02,799 INFO
>> sqlalchemy.engine.base.Engine.0x...36d0 [-] ROLLBACK Traceback
>> (most recent call last): File
>> "/Library/Python/2.7/site-packages/eventlet/hubs/hub.py", line 336,
>> in fire_timers timer() File
>> "/Library/Python/2.7/site-packages/eventlet/hubs/timer.py", line
>> 56, in __call__ cb(*args, **kw) File
>> "/Library/Python/2.7/site-packages/eventlet/event.py", line 163, in
>> _do_send waiter.switch(result) File
>> "/Library/Python/2.7/site-packages/eventlet/greenthread.py", line
>> 192, in main result = function(*args, **kwargs) File "dbrepro.py",
>> line 44, in associate ip = db.fixed_ip_associate_pool(ctxt, 1,
>> instance_id=val) File "/Users/vishvananda/os/nova/nova/db/api.py",
>> line 352, in fixed_ip_associate_pool instance_id, host) File
>> "/Users/vishvananda/os/nova/nova/db/sqlalchemy/api.py", line 102,
>> in wrapper return f(*args, **kwargs) File
>> "/Users/vishvananda/os/nova/nova/db/sqlalchemy/api.py", line 725,
>> in fixed_ip_associate_pool filter_by(host=None).\ File
>> "/Library/Python/2.7/site-packages/sqlalchemy/orm/query.py", line
>> 1496, in first ret = list(self[0:1]) File
>> "/Library/Python/2.7/site-packages/sqlalchemy/orm/query.py", line
>> 1405, in __getitem__ return list(res) File
>> "/Library/Python/2.7/site-packages/sqlalchemy/orm/query.py", line
>> 1669, in instances fetch = cursor.fetchall() File
>> "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line
>> 2383, in fetchall l = self.process_rows(self._fetchall_impl()) File
>> "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line
>> 2366, in process_rows keymap = metadata._keymap AttributeError:
>> 'NoneType' object has no attribute '_keymap'
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________ Mailing list:
>> https://launchpad.net/~openstack Post to     :
>> openstack@xxxxxxxxxxxxxxxxxxx Unsubscribe :
>> https://launchpad.net/~openstack More help   :
>> https://help.launchpad.net/ListHelp
> 
> _______________________________________________ Mailing list:
> https://launchpad.net/~openstack Post to     :
> openstack@xxxxxxxxxxxxxxxxxxx Unsubscribe :
> https://launchpad.net/~openstack More help   :
> https://help.launchpad.net/ListHelp This email may include
> confidential information. If you received it in error, please delete
> it.
> 
> 
> 
> 


Follow ups

References