← Back to team overview

openstack team mailing list archive

Re: Push vs Polling (from Versioning Thread)

 

On Oct 28, 2011, at 8:11 AM, George Reese wrote:

Push notifications don't make your core system any more complex. You push the change to a message queue and rely on another system to do the work.

The other system is scalable. It has no need to be stateless and can be run in an on-demand format using agents to handle the growing/shrinking notification needs.

Bryan brings up the point that some of these subscription endpoints may go away. That's a total red-herring. You have mechanisms in place to detect failed deliveries and unsubscribe after a time (among other strategies).


I think what Bryan is saying is this.  Someone, on "another system", lets call it a hub,  has to do the work of tracking what messages have been received by a particular client.  The failure scenarios there can cause a lot of head aches.

You can try to scale  hubs out horizontally, but each hub will be handling a different set of clients at a particular point in time.  So that data needs to be tracked.  The best you can do is to have a central data store tracking when a client has received and acknowledged a particular message.   If there are a lot of clients that's a lot of data to sort through and partition.  If you don't have a central store then a particular hub will be responsible for a certain set of clients. And in this case, how many clients should be tracked by a hub? 100? 1000? 100,000?  The more clients a hub handles the more memory it needs to use to track those clients.  If a hub is at  capacity  but you're monitoring system is starting to detect disk failures, how do you migrate those clients to another hub? Do you split the clients up among existing hubs, if so what's the algorithm there?  Or do you have to stand up a new hub?

As for the other failure states, the issue isn't just about detecting failed deliveries, it's about tracking down successful deliveries too.  Say after immediately sending a message to client A, that hub goes down.  There's no record in the system that the message was sent  to client A.  How do we detect that that happened? If we do detect it should we resend the message here? Keep in mind,  the client may have received it but may or may not have acknowledged it.  If we do resend the message, will that mess up the client?  Does the client even care?

There's a whole lot of inefficiencies to.  Consider that there are some cases where the client also needs to track what messages have been received. Both the client and the hub are tracking the state in this scenario and that's pretty inefficient.  I would argue far more inefficient than the polling scenario because it involves memory and potentially storage space.  If the client doesn't really care to track state we are tracking it at the hub for no reason.

Say we have a client that's tracking sate, maybe saving it to the datastore. (We have a lot of customers that do this.)  The client receives a message, but before it can save it, it goes down.  Upon coming up again, it has no awareness of the lost message, will it be delivered again? How?  How does the client inform the hub of it's state?

Other questions arise:  How long should you track clients before you unsubscribe them? etc...etc...

There's just so many similar scenarios that add a lot of complexity and I would argue, at cloud scale, far greater inefficiencies into the system.

With the polling scenario, the work is split between the server and the client.  The server keeps track of the messages.  The client keeps track of it's own state (what was the last message received? etc).  It's scalable and, I would argue more efficient,  because it allows the client to track state if it wants to, when it wants to, how it wants to.  On the server end statelessness means that each pubsub node is a carbon copy of another -- if one goes down another can replace it with no problem -- no need to transfer sate.  What's more, the memory usage of the node is constant, no matter how many clients are hitting it.

That's not to say that polling is always the right choice.  As Mark said, there are a lot of factors to consider.  In cases where there are a large number of messages latencies may increase dramatically. It's just that when we're talking web scale, it is *usually* a good choice.

-jOrGe W.





Follow ups

References