← Back to team overview

openstack team mailing list archive

Re: eventlet weirdness

 

> By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely?

Yep - following your posting in here pointing to the glance changes we back-ported that into the Diablo API server.   We're now running each API server with 20 OS processes and 20 EC2 processes, and the world looks a lot happier.  The same changes were being done in parallel into Essex by someone in the community I thought ?

> Curious... do you have a list of all the places where sleep(0) calls were inserted in the HP Nova code? I can turn that into a bug report and get to work on adding them... 

So far the only two cases we've done this are in the _sync_power_state and  in the security group refresh handling (libvirt/firewall/do_refresh_security_group_rules) - which we modified to only refresh for instances in the group and added a sleep in the loop (I need to finish writing the bug report for this one).

I have contemplated doing something similar in the image code when reading chunks from glance - but am slightly worried that in this case the only thing that currently stops two creates for the same image from making separate requests to glance might be that one gets queued behind the other.  It would be nice to do the same thing on snapshot (as this can also be a real hog), but there the transfer is handled completely within the glance client.   A more radical approach would be to split out the image handling code from compute manager into a separate (co-hosted) image_manager so at least only commands which need interaction with glance will block each other.

Phil




-----Original Message-----
From: openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx [mailto:openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of Jay Pipes
Sent: 02 March 2012 15:17
To: openstack@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Openstack] eventlet weirdness

On 03/02/2012 05:34 AM, Day, Phil wrote:
> In our experience (running clusters of several hundred nodes) the DB performance is not generally the significant factor, so making its calls non-blocking  gives only a very small increase in processing capacity and creates other side effects in terms of slowing all eventlets down as they wait for their turn to run.

Yes, I believe I said that this was the case at the last design summit
-- or rather, I believe I said "is there any evidence that the database is a performance or scalability problem at all"?

> That shouldn't really be surprising given that the Nova DB is pretty small and MySQL is a pretty good DB - throw reasonable hardware at the DB server and give it a bit of TLC from a DBA (remove deleted entries from the DB, add indexes where the slow query log tells you to, etc) and it shouldn't be the bottleneck in the system for performance or scalability.

++

> We use the python driver and have experimented with allowing the eventlet code to make the db calls non-blocking (its not the default setting), and it works, but didn't give us any significant advantage.

Yep, identical results to the work that Mark Washenberger did on the same subject.

> For example in the API server (before we made it properly 
> multi-threaded)

By "properly multi-threaded" are you instead referring to making the nova-api server multi-*processed* with eventlet greenthread pools in each process? i.e. The way Swift (and now Glance) works? Or are you referring to a different approach entirely?

 > with blocking db calls the server was essentially a serial processing queue - each request was fully processed before the next.  With non-blocking db calls we got a lot more apparent concurrencybut only at the expense of making all of the requests equally bad.

Yep, not surprising.

> Consider a request takes 10 seconds, where after 5 seconds there is a call to the DB which takes 1 second, and three are started at the same time:
>
> Blocking:
> 0 - Request 1 starts
> 10 - Request 1 completes, request 2 starts
> 20 - Request 2 completes, request 3 starts
> 30 - Request 3 competes
> Request 1 completes in 10 seconds
> Request 2 completes in 20 seconds
> Request 3 completes in 30 seconds
> Ave time: 20 sec
>
> Non-blocking
> 0 - Request 1 Starts
> 5 - Request 1 gets to db call, request 2 starts
> 10 - Request 2 gets to db call, request 3 starts
> 15 - Request 3 gets to db call, request 1 resumes
> 19 - Request 1 completes, request 2 resumes
> 23 - Request 2 completes,  request 3 resumes
> 27 - Request 3 completes
>
> Request 1 completes in 19 seconds  (+ 9 seconds) Request 2 completes 
> in 24 seconds (+ 4 seconds) Request 3 completes in 27 seconds (- 3 
> seconds) Ave time: 20 sec
>
> So instead of worrying about making db calls non-blocking we've been working to make certain eventlets non-blocking - i.e. add sleep(0) calls to long running iteration loops - which IMO has a much bigger impact on the performance of the apparent latency of the system.

Yep, and I think adding a few sleep(0) calls in various places in the Nova codebase (as was recently added in the _sync_power_states() periodic task) is an easy and simple win with pretty much no ill side-effects. :)

Curious... do you have a list of all the places where sleep(0) calls were inserted in the HP Nova code? I can turn that into a bug report and get to work on adding them...

All the best,
-jay

> Phil
>
>
>
> -----Original Message-----
> From: openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx 
> [mailto:openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx] On 
> Behalf Of Brian Lamar
> Sent: 01 March 2012 21:31
> To: openstack@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Openstack] eventlet weirdness
>
>>> How is MySQL access handled in eventlet? Presumably it's external C 
>>> library so it's not going to be monkey patched. Does that make every 
>>> db access call a blocking call? Thanks,
>
>> Nope, it goes through a thread pool.
>
> I feel like this might be an over-simplification. If the question is:
>
> "How is MySQL access handled in nova?"
>
> The answer would be that we use SQLAlchemy which can load any number of SQL-drivers. These drivers can be either pure Python or C-based drivers. In the case of pure Python drivers, monkey patching can occur and db calls are non-blocking. In the case of drivers which contain C code (or perhaps other blocking calls), db calls will most likely be blocking.
>
> If the question is "How is MySQL access handled in eventlet?" the answer would be to use the eventlet.db_pool module to allow db access using thread pools.
>
> B
>
> -----Original Message-----
> From: "Adam Young"<ayoung@xxxxxxxxxx>
> Sent: Thursday, March 1, 2012 3:27pm
> To: openstack@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Openstack] eventlet weirdness
>
> On 03/01/2012 02:45 PM, Yun Mao wrote:
>> There are plenty eventlet discussion recently but I'll stick my 
>> question to this thread, although it's pretty much a separate 
>> question. :)
>>
>> How is MySQL access handled in eventlet? Presumably it's external C 
>> library so it's not going to be monkey patched. Does that make every 
>> db access call a blocking call? Thanks,
>
> Nope, it goes through a thread pool.
>>
>> Yun
>>
>> On Wed, Feb 29, 2012 at 9:18 PM, Johannes Erdfelt<johannes@xxxxxxxxxxx>   wrote:
>>> On Wed, Feb 29, 2012, Yun Mao<yunmao@xxxxxxxxx>   wrote:
>>>> Thanks for the explanation. Let me see if I understand this.
>>>>
>>>> 1. Eventlet will never have this problem if there is only 1 OS 
>>>> thread
>>>> -- let's call it main thread.
>>> In fact, that's exactly what Python calls it :)
>>>
>>>> 2. In Nova, there is only 1 OS thread unless you use xenapi and/or 
>>>> the virt/firewall driver.
>>>> 3. The python logging module uses locks. Because of the monkey 
>>>> patch, those locks are actually eventlet or "green" locks and may 
>>>> trigger a green thread context switch.
>>>>
>>>> Based on 1-3, does it make sense to say that in the other OS 
>>>> threads (i.e. not main thread), if logging (plus other pure python 
>>>> library code involving locking) is never used, and we do not run a 
>>>> eventlet hub at all, we should never see this problem?
>>> That should be correct. I'd have to double check all of the monkey 
>>> patching that eventlet does to make sure there aren't other cases 
>>> where you may inadvertently use eventlet primitives across real threads.
>>>
>>> JE
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Follow ups

References