nova team mailing list archive
-
nova team
-
Mailing list archive
-
Message #00113
Re: Why not python threads?
I pushed a super-simple proof-of-concept, using simple threading. There's
essentially a thread for everything... one for the heartbeat, one that polls
the message queues, a thread is created for the processing of each message.
http://bazaar.launchpad.net/~justin-fathomdb/nova/kiss-backend/revision/206
Code obviously doesn't run, but it shows the basic approach I'm thinking of
pursuing. Signals are not yet implemented, but I think I can trap the
signals on the main thread, and manually terminate running threads as
appropriate. (What 'appropriate' means for a long-running message handler -
e.g. machine launch - isn't yet entirely clear to me)
Feedback? I don't really understand the point about returning data from a
thread, which is why I thought pushing some code might make this clearer...
On Wed, Aug 4, 2010 at 2:29 PM, Vishvananda Ishaya <vishvananda@xxxxxxxxx>wrote:
> We were initially using a multiproccessing process pool in the code. The
> ProcessPool in process.py is there because multiprocessing had wierd issues
> and needed hackish workarounds to function properly with tornado's ioloop
> and twisted's reactor. We were having a lot if issues with exceptions
> disappearing and processes hanging.
>
> In general threads and processes in python seem a little bit annoying, that
> is why I've been so grumbly about moving away from twisted, since the
> interaction between async and signals is a bit nasty. A lot of effort and
> debugging went in to the current implementation of deferring to shell calls
> and having the message pump still work.
>
> I like the idea of moving away from the reactor loop and having a simple
> listener that waits for messages and fires of a another thread/process to
> handle it. The issues arise when this thread needs to return data. Do we
> just return a request id and switch to making clients poll for new data?
> Also waiting on the queue also means there is no looping callback to update
> state. We'll need an external process to poll for state updates.
>
> Crazy idea...forget os signals altogether, and just signal using rabbit.
> That would definitely circumvent a lot of python issues :)
>
> Vish
>
> On Wed, Aug 4, 2010 at 1:59 PM, Eric Day <eday@xxxxxxxxxxxx> wrote:
>
>> We can also do multi-process for this services as well, and signals
>> become a lot easier to deal with. Twisted signal handling may actually
>> be easier than multi-threaded since it's running in a single thread.
>>
>> If you have the time, try hacking up a multi-threaded or multi-process
>> worker and share to see how difficult it would be. :)
>>
>> -Eric
>>
>> On Wed, Aug 04, 2010 at 12:29:40PM -0700, Justin Santa Barbara wrote:
>> > If this is the primary issue, then I think that dealing with signals
>> is
>> > surely easier than dealing with Twisted or Eventlet.
>> > I would propose that for the back-end services we write them 'simply'
>> in
>> > Python threads. For the front end services (which are effectively
>> proxies
>> > to the data store and the back end services), there may be a
>> performance
>> > case to be made for async code. I don't believe that the back-end
>> > services have the high performance requirements, but they do have the
>> > requirement to be correct even when dealing with messy back-end APIs
>> and
>> > things going badly wrong. That logic will end up twisted enough as
>> it is
>> > :-)
>> > I believe the signal problems must be the same for Twisted as for
>> simple
>> > Python threads (in particular with threads.deferToThread), it's just
>> that
>> > Twisted (hopefully) handles signals. Maybe we can look at how they
>> make
>> > it work.
>> > What are the requirements for 'correct signal handling'? Is it "the
>> > process should exit in a timely way in response to SIGINT and
>> SIGTERM, and
>> > immediately for SIGKILL?"
>> > Justin
>> >
>> > On Wed, Aug 4, 2010 at 3:23 AM, Joshua McKenty <jmckenty@xxxxxxxxx>
>> wrote:
>> >
>> > The biggest issue is the interaction with signals and the python
>> > threading model. Multiprocess certainly works (see the nova use of
>> > process pool), but you're making your code simpler at the cost of
>> more
>> > complex process supervision (which I don't object to in this case).
>> > Signals come up in deployment a lot, how to roll out code changes,
>> etc.
>> > If we fix live migration, this gets much easier.
>> >
>> > Sent from my iPhone
>> > On 2010-08-04, at 5:05 AM, Justin Santa Barbara <
>> justin@xxxxxxxxxxxx>
>> > wrote:
>> >
>> > Forgive a Python noob's question, but what's wrong with just
>> using
>> > Python threads? Why introduce multiple processes?
>> > It seems that Eric's benchmarks indicate that the overhead would
>> be
>> > tolerable, and the code would definitely be much cleaner.
>> > The multiple process idea is another argument in favor of simple
>> > threading... if we figure out sharding, we could run multiple
>> compute
>> > service processes to get around scaling limits that going with
>> simple
>> > threading might introduce (e.g. GIL contention).
>> > Justin
>> >
>> > On Tue, Aug 3, 2010 at 7:56 PM, Vishvananda Ishaya
>> > <vishvananda@xxxxxxxxx> wrote:
>> >
>> > If we want to go with the simplest possible approach, we could
>> make
>> > the compute workers synchronous and just run multiple copies on
>> each
>> > host. We could make one of them 'read only' so it only answers
>> > simple/fast requests, and a few (4?) others for other long/io
>> > intensive tasks. The ultimate would be to have each message
>> > actually have its own worker a la erlang, but that might be a
>> bit
>> > extreme.
>> > I've been doing a lot of the changes later that require
>> switching
>> > everything to async. It is a bit annoying to wrap your head
>> around
>> > it, but it really isn't all that bad. That said, I'm all for
>> making
>> > things as simple as possible.
>> > Vish
>> >
>> > On Tue, Aug 3, 2010 at 6:30 PM, Justin Santa Barbara
>> > <justin@xxxxxxxxxxxx> wrote:
>> >
>> > Without meaning to make the twisted/eventlet flamewar any
>> worse,
>> > can I just ask why we're not just using 'good old threads'?
>> I've
>> > asked Eric Day for his input based on his great benchmarks
>> > (http://oddments.org/?p=494). My background is from the
>> Java
>> > world, where threads work wonderfully - possibly even better
>> than
>> > async:
>> http://www.thebuzzmedia.com/java-io-faster-than-nio-old-is-new-again
>> > I feel like Nova is greatly complicated by the async code,
>> and I'm
>> > starting to see some of the pain of Twisted: it seems that
>> > _everything_ needs to be async in the long run, because if
>> > something calls a function that is (or could be) async, it
>> must
>> > itself be async. So yields and @defer.inlineCallbacks start
>> > cropping up everywhere.
>> > One of the project goals seems to be simplicity of the code,
>> for
>> > fewer bugs and to reduce barriers to entry, and it seems that
>> if
>> > we could use 'plain old Python' that we would better achieve
>> this
>> > goal than if we have to use an async framework.
>> > I know that Python has its issues here with the GIL, but I'm
>> just
>> > wondering whether, in the case of nova, threads might be good
>> > enough, and produce much easier to understand code? I'm
>> guessing
>> > that maybe the project started with threads - what happened?
>> > Justin
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~nova
>> > Post to : nova@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~nova
>> > More help : https://help.launchpad.net/ListHelp
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~nova
>> > Post to : nova@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~nova
>> > More help : https://help.launchpad.net/ListHelp
>>
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~nova
>> > Post to : nova@xxxxxxxxxxxxxxxxxxx
>> > Unsubscribe : https://launchpad.net/~nova
>> > More help : https://help.launchpad.net/ListHelp
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~nova
>> Post to : nova@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~nova
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
Follow ups
References