← Back to team overview

nova team mailing list archive

Re: Why not python threads?

 

In regards to carrot, you may be running into the same problem we were with
twisted threads.  It actually isn't specifically a carrot problem, but the
way that python handles socket reads when it gets interrupted with a signal.
 The fact that the whole read wasn't finished isn't properly propogated into
the python layer so you end up with an interrupted system call and the
socket ends up in an undetermined state.  I'm not sure if this is the issue
that you are seeing, but the patch for twisted here
http://twistedmatrix.com/trac/changeset/28447 fixed it by using siginter and
a some sort of wakeup file descriptor.

In regards to formatting, it is basically just straight pep8.  Termie's
suggestion about keeping the same casing and formatting of the twisted code
is probably good.

Vish

On Fri, Aug 6, 2010 at 12:25 PM, Justin Santa Barbara
<justin@xxxxxxxxxxxx>wrote:

> The threading model seems to work great for the compute service; it's still
> a work in progress, but I've got VirtualBox support working, without having
> to mess with propagating deferreds through the call chains:
> https://code.launchpad.net/~justin-fathomdb/nova/kiss-backend
>
> <https://code.launchpad.net/~justin-fathomdb/nova/kiss-backend>Control-C
> also works (out-of-the-box, because the main thread simply goes into a slow
> sleep loop).  Currently, it'll wait for any non-daemon threads to finish,
> and non-deamon threads are used only for message handlers.  This is probably
> the behaviour we want, but we have full control and can easily
> force-terminate message handlers etc if we want to.
>
> I found that Carrot (the message queue library) is not thread safe; in
> particular it doesn't seem to like concurrent calls on multiple threads.
>  For now there's just a global lock protecting the carrot operations.  The
> long term answer is to have a thread per message queue connection, and
> probably I'll need to write a nice queue manager abstraction.
>
> How can I start getting this merged? Right now I have a huge backlog of
> branches to merge, all waiting on this:
>
> https://code.launchpad.net/~justin-fathomdb/nova/check-subprocess-exit-code/+merge/30707
>
> I think that's blocked on the code formatting standards we're using...
>
> If we can get past that, then hopefully I can some of the more interesting
> features merged in:  iSCSI support, raw disk images (no kernel or ramdisk),
> VirtualBox support (for mac development), and eventually the threaded
> compute service and abstract data stores (let people replace Redis with
> SQL)...
>
> Justin
>
>
>
>
>
> On Wed, Aug 4, 2010 at 3:55 PM, Justin Santa Barbara <justin@xxxxxxxxxxxx>wrote:
>
>> I pushed a super-simple proof-of-concept, using simple threading.  There's
>> essentially a thread for everything... one for the heartbeat, one that polls
>> the message queues, a thread is created for the processing of each message.
>>
>>
>> http://bazaar.launchpad.net/~justin-fathomdb/nova/kiss-backend/revision/206
>>
>> Code obviously doesn't run, but it shows the basic approach I'm thinking
>> of pursuing.  Signals are not yet implemented, but I think I can trap the
>> signals on the main thread, and manually terminate running threads as
>> appropriate.  (What 'appropriate' means for a long-running message handler -
>> e.g. machine launch -  isn't yet entirely clear to me)
>>
>> Feedback?  I don't really understand the point about returning data from a
>> thread, which is why I thought pushing some code might make this clearer...
>>
>>
>> On Wed, Aug 4, 2010 at 2:29 PM, Vishvananda Ishaya <vishvananda@xxxxxxxxx
>> > wrote:
>>
>>> We were initially using a multiproccessing process pool in the code.  The
>>> ProcessPool in process.py is there because multiprocessing had wierd issues
>>> and needed hackish workarounds to function properly with tornado's ioloop
>>> and twisted's reactor.  We were having a lot if issues with exceptions
>>> disappearing and processes hanging.
>>>
>>> In general threads and processes in python seem a little bit annoying,
>>> that is why I've been so grumbly about moving away from twisted, since the
>>> interaction between async and signals is a bit nasty.  A lot of effort and
>>> debugging went in to the current implementation of deferring to shell calls
>>> and having the  message pump still work.
>>>
>>> I like the idea of moving away from the reactor loop and having a simple
>>> listener that waits for messages and fires of a another thread/process to
>>> handle it.  The issues arise when this thread needs to return data.  Do we
>>> just return a request id and switch to making clients poll for new data?
>>> Also waiting on the queue also means there is no looping callback to update
>>> state.  We'll need an external process to poll for state updates.
>>>
>>> Crazy idea...forget os signals altogether, and just signal using rabbit.
>>>  That would definitely circumvent a lot of python issues :)
>>>
>>> Vish
>>>
>>> On Wed, Aug 4, 2010 at 1:59 PM, Eric Day <eday@xxxxxxxxxxxx> wrote:
>>>
>>>> We can also do multi-process for this services as well, and signals
>>>> become a lot easier to deal with. Twisted signal handling may actually
>>>> be easier than multi-threaded since it's running in a single thread.
>>>>
>>>> If you have the time, try hacking up a multi-threaded or multi-process
>>>> worker and share to see how difficult it would be. :)
>>>>
>>>> -Eric
>>>>
>>>> On Wed, Aug 04, 2010 at 12:29:40PM -0700, Justin Santa Barbara wrote:
>>>> >    If this is the primary issue, then I think that dealing with
>>>> signals is
>>>> >    surely easier than dealing with Twisted or Eventlet.
>>>> >    I would propose that for the back-end services we write them
>>>> 'simply' in
>>>> >    Python threads.  For the front end services (which are effectively
>>>> proxies
>>>> >    to the data store and the back end services), there may be a
>>>> performance
>>>> >    case to be made for async code.  I don't believe that the back-end
>>>> >    services have the high performance requirements, but they do have
>>>> the
>>>> >    requirement to be correct even when dealing with messy back-end
>>>> APIs and
>>>> >    things going badly wrong.  That logic will end up twisted enough as
>>>> it is
>>>> >    :-)
>>>> >    I believe the signal problems must be the same for Twisted as for
>>>> simple
>>>> >    Python threads (in particular with threads.deferToThread), it's
>>>> just that
>>>> >    Twisted (hopefully) handles signals.  Maybe we can look at how they
>>>> make
>>>> >    it work.
>>>> >    What are the requirements for 'correct signal handling'?  Is it
>>>> "the
>>>> >    process should exit in a timely way in response to SIGINT and
>>>> SIGTERM, and
>>>> >    immediately for SIGKILL?"
>>>> >    Justin
>>>> >
>>>> >    On Wed, Aug 4, 2010 at 3:23 AM, Joshua McKenty <jmckenty@xxxxxxxxx>
>>>> wrote:
>>>> >
>>>> >      The biggest issue is the interaction with signals and the python
>>>> >      threading model. Multiprocess certainly works (see the nova use
>>>> of
>>>> >      process pool), but you're making your code simpler at the cost of
>>>> more
>>>> >      complex process supervision (which I don't object to in this
>>>> case).
>>>> >      Signals come up in deployment a lot, how to roll out code
>>>> changes, etc.
>>>> >      If we fix live migration, this gets much easier.
>>>> >
>>>> >      Sent from my iPhone
>>>> >      On 2010-08-04, at 5:05 AM, Justin Santa Barbara <
>>>> justin@xxxxxxxxxxxx>
>>>> >      wrote:
>>>> >
>>>> >        Forgive a Python noob's question, but what's wrong with just
>>>> using
>>>> >        Python threads?  Why introduce multiple processes?
>>>> >        It seems that Eric's benchmarks indicate that the overhead
>>>> would be
>>>> >        tolerable, and the code would definitely be much cleaner.
>>>> >        The multiple process idea is another argument in favor of
>>>> simple
>>>> >        threading... if we figure out sharding, we could run multiple
>>>> compute
>>>> >        service processes to get around scaling limits that going with
>>>> simple
>>>> >        threading might introduce (e.g. GIL contention).
>>>> >        Justin
>>>> >
>>>> >        On Tue, Aug 3, 2010 at 7:56 PM, Vishvananda Ishaya
>>>> >        <vishvananda@xxxxxxxxx> wrote:
>>>> >
>>>> >          If we want to go with the simplest possible approach, we
>>>> could make
>>>> >          the compute workers synchronous and just run multiple copies
>>>> on each
>>>> >          host.  We could make one of them 'read only' so it only
>>>> answers
>>>> >          simple/fast requests, and a few (4?) others for other long/io
>>>> >          intensive tasks.  The ultimate would be to have each message
>>>> >          actually have its own worker a la erlang, but that might be a
>>>> bit
>>>> >          extreme.
>>>> >          I've been doing a lot of the changes later that require
>>>> switching
>>>> >          everything to async.  It is a bit annoying to wrap your head
>>>> around
>>>> >          it, but it really isn't all that bad.  That said, I'm all for
>>>> making
>>>> >          things as simple as possible.
>>>> >          Vish
>>>> >
>>>> >          On Tue, Aug 3, 2010 at 6:30 PM, Justin Santa Barbara
>>>> >          <justin@xxxxxxxxxxxx> wrote:
>>>> >
>>>> >            Without meaning to make the twisted/eventlet flamewar any
>>>> worse,
>>>> >            can I just ask why we're not just using 'good old threads'?
>>>>  I've
>>>> >            asked Eric Day for his input based on his great benchmarks
>>>> >            (http://oddments.org/?p=494).  My background is from the
>>>> Java
>>>> >            world, where threads work wonderfully - possibly even
>>>> better than
>>>> >            async:
>>>> http://www.thebuzzmedia.com/java-io-faster-than-nio-old-is-new-again
>>>> >            I feel like Nova is greatly complicated by the async code,
>>>> and I'm
>>>> >            starting to see some of the pain of Twisted: it seems that
>>>> >            _everything_ needs to be async in the long run, because if
>>>> >            something calls a function that is (or could be) async, it
>>>> must
>>>> >            itself be async.  So yields and @defer.inlineCallbacks
>>>> start
>>>> >            cropping up everywhere.
>>>> >            One of the project goals seems to be simplicity of the
>>>> code, for
>>>> >            fewer bugs and to reduce barriers to entry, and it seems
>>>> that if
>>>> >            we could use 'plain old Python' that we would better
>>>> achieve this
>>>> >            goal than if we have to use an async framework.
>>>> >            I know that Python has its issues here with the GIL, but
>>>> I'm just
>>>> >            wondering whether, in the case of nova, threads might be
>>>> good
>>>> >            enough, and produce much easier to understand code?  I'm
>>>> guessing
>>>> >            that maybe the project started with threads - what
>>>> happened?
>>>> >            Justin
>>>> >
>>>> >            _______________________________________________
>>>> >            Mailing list: https://launchpad.net/~nova
>>>> >            Post to     : nova@xxxxxxxxxxxxxxxxxxx
>>>> >            Unsubscribe : https://launchpad.net/~nova
>>>> >            More help   : https://help.launchpad.net/ListHelp
>>>> >
>>>> >        _______________________________________________
>>>> >        Mailing list: https://launchpad.net/~nova
>>>> >        Post to     : nova@xxxxxxxxxxxxxxxxxxx
>>>> >        Unsubscribe : https://launchpad.net/~nova
>>>> >        More help   : https://help.launchpad.net/ListHelp
>>>>
>>>> > _______________________________________________
>>>> > Mailing list: https://launchpad.net/~nova
>>>> > Post to     : nova@xxxxxxxxxxxxxxxxxxx
>>>> > Unsubscribe : https://launchpad.net/~nova
>>>> > More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~nova
>>>> Post to     : nova@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~nova
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>
>>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~nova
> Post to     : nova@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~nova
> More help   : https://help.launchpad.net/ListHelp
>
>

References