← Back to team overview

openstack team mailing list archive

Re: Push vs Polling (from Versioning Thread)

 

Just to be clear we are talking about APIs fit for customer consumption here, not internal integrations where both ends are under our control.

On 10/27/2011 11:38 AM, George Reese wrote:
I disagree. The web was designed specifically to solve the distributed scaling problem and it's based on HTTP polling. It scales pretty well. The argument against polling not scaling inevitably neglects using caching properly.
The web was not designed to deal with a bunch of clients needing to
know about infrastructure changes the instant they happen.
Neither physics nor math were designed for that either. The CAP theorem simply doesn't allow a distributed system with an uptime guarantee to communicate changes "the instant they happen". Once you realize the best your clients can hope for is eventual consistency, the sooner you'll realize that polling is just fine.

BTW, here's Roy Fielding's article on this subject of poll vs push.
http://roy.gbiv.com/untangled/2008/paper-tigers-and-hidden-dragons
And API data should not be cached. The Rackspace API used to do that,
and it created a mess.
I'm not sure what you are referring to, but this is a classic strawman. Somebody implemented a "mess" using caching, so caching is bad!? You didn't say what the mess was, so there's no way to even evaluate your statement.
Push doesn't scaled because it requires the server to know about every client and track conversational state with them.
No, it doesn't. You push changes as they occur to a message queue. A
separate system tracks subscribers and sends them out. There is no
conversational state if done right.

A "separate" system? That's why you think it's simple -- you push the hard part outside of your box and claim victory. It's not a separate system, it's all one big cloud. If there are N interested clients the process you described requires O(N) resources. Moving it to another tier means it's somebody else's O(N) resources. You are illustrating Fielding's point in the article above: "People need to understand that general-purpose PubSub is not a solution to scalability problems — it simply moves the problem somewhere else, and usually to a place that is inversely supported by the economics. "

How exactly does this separate system know where to "send them out" to? Each client has to tell it and you have to store it and look it up on a per outbound message basis. And keep it accurate. Customers just love keeping you informed of where they want to send their messages. Do you know what happens when they forget to tell you they moved and they don't get the message? They blame you and ask for a credit memo. And do you know what happens when you tell them no. They go to your competitor. If there is no conversational state, then you aren't waiting for an acknowledgement from the other side for each message and you can't prove that it was delivered or even try again.



Follow ups

References