openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #13129
Re: RPC Semantics
On Tue, Jun 12, 2012, Eric Windisch <eric@xxxxxxxxxxxxxxxx> wrote:
> We actually do have ACKs in ZeroMQ, as far as I understand how they work
> in AMQP, but they're really simple. The send() method is actually
> synchronous with the message being received on the other end. However,
> we don't wait for this and spawn an eventlet coroutine, because there
> is no benefit of blocking the caller.
I guess ACKing a message is one step, but it's also message persistence
and reliability.
For instance, an instance migration can take a while since we need to
copy many gigabytes of disks to another host. If we want to do a
software upgrade, we either need to wait a long time for the migration
to finish, or we need to restart the service and then restart the
processing of the message.
If all software gets restarted, then persistence is important.
> All calls have a timeout (TTL). The ZeroMQ driver also implements a TTL
> on the casts, and I'm quite sure we should support this in Kombu/Qpid
> as well to avoid a thundering-herd.
What thundering herd problems exist in Openstack?
I do know there are problems with queuing combined with timeouts. It
makes less sense to process a get_nw_info request if the requestor has
already timed out and will ignore the response. Is that what you're
referring to with TTLs?
> While there is no message persistence in the ZeroMQ driver, there is
> some limited benefit of having this on casts. There is limited or no
> benefit for calls, because the return value won't be received -- the
> calling stack is no longer going to do anything with the return value.
> (This would be a good case for a better Actor-driven model, because we
> could actually handle return values across a relaunched caller)
Return values aren't the only reason for persistence.
Idempotent actions want persistence so it will actually complete the
action requested in the message. For instance, if nova-compute is
stopped in the middle of an instance-create, we want to actually finish
the create after the process is restarted.
There is no process waiting for a return value, but we certainly would
like for the message to be persisted so we can restart it.
> Anyway, in the ZeroMQ driver, we could have a local queue to track
> casts and remove them when the send() coroutine completes. This would
> provide restart protection for casts.
Assuming the requesting process remains running the entire time?
JE
Follow ups
References