← Back to team overview

openstack team mailing list archive

Re: Push vs Polling (from Versioning Thread)

 

You are describing an all-purpose system, not one that supports the narrow
needs of IaaS state notifications.

There's no reason in this scenario to guarantee message delivery.

Sent from my iPhone

On Oct 28, 2011, at 10:09, Jorge Williams <jorge.williams@xxxxxxxxxxxxx>
wrote:


 On Oct 28, 2011, at 8:11 AM, George Reese wrote:

 Push notifications don't make your core system any more complex. You push
the change to a message queue and rely on another system to do the work.

 The other system is scalable. It has no need to be stateless and can be run
in an on-demand format using agents to handle the growing/shrinking
notification needs.

 Bryan brings up the point that some of these subscription endpoints may go
away. That's a total red-herring. You have mechanisms in place to detect
failed deliveries and unsubscribe after a time (among other strategies).



 I think what Bryan is saying is this.  Someone, on "another system", lets
call it a hub,  has to do the work of tracking what messages have been
received by a particular client.  The failure scenarios there can cause a
lot of head aches.

 You can try to scale  hubs out horizontally, but each hub will be handling
a different set of clients at a particular point in time.  So that data
needs to be tracked.  The best you can do is to have a central data store
tracking when a client has received and acknowledged a particular message.
If there are a lot of clients that's a lot of data to sort through and
partition.  If you don't have a central store then a particular hub will be
responsible for a certain set of clients. And in this case, how many clients
should be tracked by a hub? 100? 1000? 100,000?  The more clients a hub
handles the more memory it needs to use to track those clients.  If a hub is
at  capacity  but you're monitoring system is starting to detect disk
failures, how do you migrate those clients to another hub? Do you split the
clients up among existing hubs, if so what's the algorithm there?  Or do you
have to stand up a new hub?

 As for the other failure states, the issue isn't just about detecting
failed deliveries, it's about tracking down successful deliveries too.  Say
after immediately sending a message to client A, that hub goes down.
 There's no record in the system that the message was sent  to client A.
 How do we detect that that happened? If we do detect it should we resend
the message here? Keep in mind,  the client may have received it but may or
may not have acknowledged it.  If we do resend the message, will that mess
up the client?  Does the client even care?

 There's a whole lot of inefficiencies to.  Consider that there are some
cases where the client also needs to track what messages have been received.
Both the client and the hub are tracking the state in this scenario and
that's pretty inefficient.  I would argue far more inefficient than the
polling scenario because it involves memory and potentially storage space.
 If the client doesn't really care to track state we are tracking it at the
hub for no reason.

 Say we have a client that's tracking sate, maybe saving it to the
datastore. (We have a lot of customers that do this.)  The client receives a
message, but before it can save it, it goes down.  Upon coming up again, it
has no awareness of the lost message, will it be delivered again? How?  How
does the client inform the hub of it's state?

 Other questions arise:  How long should you track clients before you
unsubscribe them? etc...etc...

 There's just so many similar scenarios that add a lot of complexity and I
would argue, at cloud scale, far greater inefficiencies into the system.

 With the polling scenario, the work is split between the server and the
client.  The server keeps track of the messages.  The client keeps track of
it's own state (what was the last message received? etc).  It's scalable
and, I would argue more efficient,  because it allows the client to track
state if it wants to, when it wants to, how it wants to.  On the server end
statelessness means that each pubsub node is a carbon copy of another -- if
one goes down another can replace it with no problem -- no need to transfer
sate.  What's more, the memory usage of the node is constant, no matter how
many clients are hitting it.

 That's not to say that polling is always the right choice.  As Mark said,
there are a lot of factors to consider.  In cases where there are a large
number of messages latencies may increase dramatically. It's just that when
we're talking web scale, it is *usually* a good choice.

 -jOrGe W.

Follow ups

References