← Back to team overview

openstack team mailing list archive

Re: eventlet weirdness

 

On 03/05/2012 05:08 PM, Yun Mao wrote:
Hi Phil,

My understanding is that, (forget Nova for a second) in a perfect
eventlet world, a green thread is either doing CPU intensive
computing, or wait in system calls that are IO related. In the latter
case, the eventlet scheduler will suspend the green thread and switch
to another green thread that is ready to run.

Back to reality, as you mentioned this is broken - some IO bound
activity won't cause an eventlet switch. To me the only possibility
that happens is the same reason those MySQL calls are blocking - we
are using C-based modules that don't respect monkey patch and never
yield. I'm suspecting that all libvirt based calls also belong to this
category.

Agree. I expect that to be the case of any native library. Monkey patching only changes the Python side of the call, anything in native code is too far along for it to be redirected.

Now if those blocking calls can finish in a very short of time (as we
assume for DB calls), then I think inserting a sleep(0) after every
blocking call should be a quick fix to the problem.
Nope. The blocking call still blocks, then it returns, hits the sleep, and is scheduled. The only option is to wrap it with a thread pool.

From an OS perspective, there are no such things as greenthreads. The same task_struct in the Linux Kernel (representing a Posix thread) that manages the body of the web application is used to process the IO. The Linux thread goes into a sleep state until the IO comes back, and the Kernel scheduler will schedule another OS process or task. In order to get both the IO to complete and the greenthread scheudler to process another greenthread, you need to have two Posix threads.

If the libvirt API (or other Native API) has an async mode, what you can do is provide a synchronos, python based wrapper that does the following.

register_request callback()
async_call()
sleep()

The only time sleep() as called from Python code is going to help you is if you have a long running stretch of Python code, and you sleep() in the middle of it.




But if it's a long
blocking call like the snapshot case, we are probably screwed anyway
and need OS thread level parallelism or multiprocessing to make it
truly non-blocking.. Thanks,

Yep.

Yun

On Mon, Mar 5, 2012 at 10:43 AM, Day, Phil<philip.day@xxxxxx>  wrote:
Hi Yun,

The point of the sleep(0) is to explicitly yield from a long running eventlet to so that other eventlets aren't blocked for a long period.   Depending on how you look at that either means we're making an explicit judgement on priority, or trying to provide a more equal sharing of run-time across eventlets.

It's not that things are CPU bound as such - more just that eventlets have every few pre-emption points.    Even an IO bound activity like creating a snapshot won't cause an eventlet switch.

So in terms of priority we're trying to get to the state where:
  - Important periodic events (such as service status) run when expected  (if these take a long time we're stuffed anyway)
  - User initiated actions don't get blocked by background system eventlets (such as refreshing power-state)
- Slow action from one user don't block actions from other users (the first user will expect their snapshot to take X seconds, the second one won't expect their VM creation to take X + Y seconds).

It almost feels like the right level of concurrency would be to have a task/process running for each VM, so that there is concurrency across un-related VMs, but serialisation for each VM.

Phil

-----Original Message-----
From: Yun Mao [mailto:yunmao@xxxxxxxxx]
Sent: 02 March 2012 20:32
To: Day, Phil
Cc: Chris Behrens; Joshua Harlow; openstack
Subject: Re: [Openstack] eventlet weirdness

Hi Phil, I'm a little confused. To what extend does sleep(0) help?

It only gives the greenlet scheduler a chance to switch to another green thread. If we are having a CPU bound issue, sleep(0) won't give us access to any more CPU cores. So the total time to finish should be the same no matter what. It may improve the fairness among different green threads but shouldn't help the throughput. I think the only apparent gain to me is situation such that there is 1 green thread with long CPU time and many other green threads with small CPU time.
The total finish time will be the same with or without sleep(0), but with sleep in the first threads, the others should be much more responsive.

However, it's unclear to me which part of Nova is very CPU intensive.
It seems that most work here is IO bound, including the snapshot. Do we have other blocking calls besides mysql access? I feel like I'm missing something but couldn't figure out what.

Thanks,

Yun


On Fri, Mar 2, 2012 at 2:08 PM, Day, Phil<philip.day@xxxxxx>  wrote:
I didn't say it was pretty - Given the choice I'd much rather have a threading model that really did concurrency and pre-emption all the right places, and it would be really cool if something managed the threads that were started so that is a second conflicting request was received it did some proper tidy up or blocking rather than just leaving the race condition to work itself out (then we wouldn't have to try and control it by checking vm_state).

However ...   In the current code base where we only have user space based eventlets, with no pre-emption, and some activities that need to be prioritised then forcing pre-emption with a sleep(0) seems a pretty small bit of untidy.   And it works now without a major code refactor.

Always open to other approaches ...

Phil


-----Original Message-----
From: openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx
[mailto:openstack-bounces+philip.day=hp.com@xxxxxxxxxxxxxxxxxxx] On
Behalf Of Chris Behrens
Sent: 02 March 2012 19:00
To: Joshua Harlow
Cc: openstack; Chris Behrens
Subject: Re: [Openstack] eventlet weirdness

It's not just you


On Mar 2, 2012, at 10:35 AM, Joshua Harlow wrote:

Does anyone else feel that the following seems really "dirty", or is it just me.

"adding a few sleep(0) calls in various places in the Nova codebase
(as was recently added in the _sync_power_states() periodic task) is
an easy and simple win with pretty much no ill side-effects. :)"

Dirty in that it feels like there is something wrong from a design point of view.
Sprinkling "sleep(0)" seems like its a band-aid on a larger problem imho.
But that's just my gut feeling.

:-(

On 3/2/12 8:26 AM, "Armando Migliaccio"<Armando.Migliaccio@xxxxxxxxxxxxx>  wrote:

I knew you'd say that :P

There you go: https://bugs.launchpad.net/nova/+bug/944145

Cheers,
Armando

-----Original Message-----
From: Jay Pipes [mailto:jaypipes@xxxxxxxxx]
Sent: 02 March 2012 16:22
To: Armando Migliaccio
Cc: openstack@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Openstack] eventlet weirdness

On 03/02/2012 10:52 AM, Armando Migliaccio wrote:
I'd be cautious to say that no ill side-effects were introduced.
I found a
race condition right in the middle of sync_power_states, which I
assume was exposed by "breaking" the task deliberately.

Such a party-pooper! ;)

Got a link to the bug report for me?

Thanks!
-jay
_______________________________________________
Mailing list: https://launchpad.net/~openstack Post to     :
openstack@xxxxxxxxxxxxxxxxxxx Unsubscribe :
https://launchpad.net/~openstack More help   :
https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack Post to     :
openstack@xxxxxxxxxxxxxxxxxxx Unsubscribe :
https://launchpad.net/~openstack More help   :
https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack Post to     :
openstack@xxxxxxxxxxxxxxxxxxx Unsubscribe :
https://launchpad.net/~openstack More help   :
https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack Post to     :
openstack@xxxxxxxxxxxxxxxxxxx Unsubscribe :
https://launchpad.net/~openstack More help   :
https://help.launchpad.net/ListHelp
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



Follow ups

References