← Back to team overview

openstack team mailing list archive

Re: Queue Service, next steps


Hi Mark,

On Sat, Feb 19, 2011 at 11:18:26AM -0500, Mark Washenberger wrote:
> It seems like put and post on ../queue[/id] are reversed from the usual sense. I am probably just not familiar with the idioms for previous queue services, but I imagine the following scheme.
>   POST .../queue    # create a message without specifying the id
>   PUT .../queue/id  # create a message with a specific id,
>                     #  or modify existing message's metadata
>   PUT .../queue     # create a queue or modify existing queue's metadata
>   POST .../queue/id # drop this command (?)

I was wondering if someone would point out this discrepancy with
traditional REST semantics. :) The reason for the change is the following:

We need both an insert and update method for the queue and message
level. Some map well, but the "update all messages in a queue" (or the
set matching some criteria) is the oddball. A PUT .../queue doesn't
quite fit because we're not inserting or replacing a queue, we're
modifying all messages in it. I wanted to keep GET/HEAD read-only and
side-effect free, so I didn't want to overload those. Another option
was to use reserved words, possibly those prefixed with an underscore
(such as POST ../queue/_all), but this seemed unnecessary.

Instead, I propose using POST ../queue to mean modify the
queue (and possibly all messages in it). POST implies more of a
modify/append/process operation that PUT, which should strictly be
insert/replace. Using this form meant not using POST ../queue for
the traditional "insert but auto-generate ID" semantics, which is
why I used PUT instead. In some ways this is more consistent too,
since you are still inserting a message like PUT ../queue/message,
just without the id.

If this divergence from traditional REST semantics is not welcome, we
can certainly change it. We do need to find the best way to represent
an atomic modify/get operation at the queue level though.

> Is the FIFO constraint on GET .../queue a strong constraint? or more of an approximation or eventual consistency type of weaker guarantee?

For a single queue server, yes. When using an aggregate of all queue
servers from a proxy or multi-connection worker, you'll need to handle
this in the worker code since messages may be interleaved from multiple
queue servers.

> Looking at the public cloud service deployment diagram and notes, I started to wonder about the performance demands of different use cases. I imagine that a given account or a given queue could vary widely through time in the number of messages produced and messages consumed. So an individual queue might have a very high variance in the distribution of cpu and disk resources consumed. I am worried about sharding hardware resources on the basis of the account id alone. It seems like this would prevent the queue service from scaling out a single queue to cluster level and would create hot-spots. Do we need to worry about this type of use case? Am I missing how this would be handled in your deployment example?

I suppose there is no reason you couldn't perform the first level
hash on account/queue instead of just account. In fact I think the
plugin point for controlling distribution should do just this, pass
in the full URI and server set, and return some set of servers. It
can hash on whatever it likes. A distribution plugin could also use
consistent hashing so we could optionally grow/shrink the queue server
set if we notice hotspots for a particular set.

> Going way back to a previous discussion on this thread--you had rejected IPC because it would limit performance. It seems to me if we used an IPC-based model of coordination across cores, it would be analogous to the model used for scaling across multiple nodes in a zone. This approach would help us scale an individual queue beyond the limitations of a single machine or HA triplet. If I recall, that is sort of like the way erlang scales across multiple machines. Not that I would want to deal with erlang's syntactic salt.

Erlang doesn't use IPC across machines (well, not traditional System V
IPC mechanisms), it's a socket daemon called epmd (Erlang Port Mapper
Daemon) that forwards messages between different nodes. Within a single
machine Erlang uses a multi-threaded event loop that maps all Erlang
processes on top of it. This lets you efficiently use all cores and
provide very lightweight message passing between the Erlang processes
running on those cores (much lighter weight than any IPC mechanism).

Having said all that, I suppose we could run multiple, independent
queue servers per machine (one per core), and have each running on
their own port. Each queue server process would represent a different
node in the distribution set. The arguments against this are that
it limits resource sharing for a machine since the processes would
not know of one another. For example, if we have local persistent
storage, this means multiple processes fighting for I/O rather than
synchronizing this within one process (or possibly using IPC to a
single process that manages all I/O, but this comes with overhead).

This could be ok for deployments using a proxy, but for simpler
deployments that don't use a load balancer/proxy, this means having
to manage more queue servers (one per core) if you want to utilize
all cores in a machine. So if I have two 8-core machines to manage
my queue, this means 16 queue servers for my clients/workers to use,
not just 2.

In any deployment scenario, this will increase the number
of multiplexed connections to the queue servers (will need to
multiplex per process/port, not per machine) and increase the network
chatter. This may not be a limiting factor, but is something to keep
in mind.

> Thanks for reading this far and for the docs. Cheers!

Thanks for the feedback! Looking forward to hearing more. :)