openstack team mailing list archive

Thread
Date

Re: [metering] Choice of a messaging queue

To: Nick Barcet <nick.barcet@xxxxxxxxxxxxx>
From: Eric Windisch <eric@xxxxxxxxxxxxxxxx>
Date: Fri, 18 May 2012 08:40:15 -0700
Cc: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4FB60B7B.7030505@canonical.com>

> 
> 
> a) the queue must guaranty the delivery of messages.
> To the contrary of monitoring, loss of events may have important billing
> impacts, it therefore cannot be an option that message be lost.

Losing messages should always be an option, in the extreme cases. If a message is undeliverable for an excessive amount of time, it should be dropped. Otherwise, you'll need the queuing equivalent of a DBA doing periodic cleanup, which isn't very cloudy (or scalable).

I agree that the failure cases here are different than we'd normally see with Nova. Timeouts on messages would need to be much higher, and potentially disablable (but I do insist that a timeout, even if high, should be used).

> 
> b) the client should be able to store and forward.
> As the load of system or traffic increases, or if the client is
> temporarily disconnected, client element of the queue should be able to
> hold messages in a local queue to be emitted as soon as condition permit.

The zeromq driver definitely does this (kind of). It will try and send all messages at once via green threads, which is effectively the same thing. The nice thing is that with 0mq, when a message is sent, delivery to a peer is confirmed. 

I think, but may be wrong, that rabbit and qpid essentially do the same for store and forward, blocking their green threads until they hit a successful connection to the queue, or a timeout. With the amqp drivers, the sender only has a confirmation of delivery to the queuing server, not to the destination.
 
One thing the zeromq driver doesn't do is resume sending attempts across a service restart. Messages aren't durable in that fashion. This is largely because the timeout in Nova does not need to be very large, so there would be very little benefit. This goes back to your point in 'a'. Adding this feature would be relatively minor, it just wasn't needed in Nova. Actually, this limitation would be presumably true of rabbit and qpid as well, in the store and forward case.

> c) client must authenticate
> Only client which hold a shared private key should be able to send
> messages on the queue.
> d) queue may support client signing of individual messages
> Each message should be individually signed by the agent that emits it in
> order to guaranty non repudiability.  This function can be done by the
> queue client or by the agent prior to en-queuing of messages


There is a Folsom blueprint to add signing and/or encryption to the rpc layer.

> d) queue must be highly available
> the queue servers must be able to support multiple instances running in
> parallel in order to support continuation of operations with the loss of
> one server.  This should be achievable without the need to use complex
> fail over systems and shared storage.


> e) queue should be horizontally scalable
> The scalability of queue servers should be achievable by increasing the
> number of servers.

d/e are NOT properties of the rabbit (and qpid?) driver today in Nova, but it could (should) be made to work this way. You get this with the zeromq driver, of course ;)

> 
> Not sure this list is exhaustive or viable, feel free to comment on it,
> but the real question is: which queue should we be using here?

The OpenStack common rpc mechanism, for sure. I'm biased, but I believe that while the zeromq driver is the newest, it is the only driver that meets all of the above requirements, except, to the exceptions marked above.

Improving the other implementations should be done, but I don't know of anyone committed to that work.

Regards,
Eric Windisch

Follow ups

Re: [metering] Choice of a messaging queue
From: Doug Hellmann, 2012-05-18

References

[metering] Choice of a messaging queue
From: Nick Barcet, 2012-05-18