← Back to team overview

openstack team mailing list archive

Re: Canonical AWSOME

 



On Monday, April 23, 2012 at 4:00 PM, Joshua Harlow wrote:

> Re: [Openstack] Canonical AWSOME How are REST endpoints not reliable or scalable ;-)
>  
> I’d like to know, seeing as the web is built on them :-)
The resiliency of the internet is actually built on BGP. REST endpoints fall over constantly. Look no further than Google 500 errors, the fail-whale, etc. Even the EC2 API has been known to fall-over.  Making HTTP services reliable is not as trivial as it should be. The reason is because they are single points.

It is possible, through running many services and doing intelligent load balancing and failover, to make REST reasonably reliable. However, I'd rather not broker my requests through a questionably reliable REST broker, and send messages directly to their destinations to RPC consumers which are already running (and required) on those machines.  If the destination is offline, it doesn't need my message.  If the REST broker is offline, the recipient on the other end of that broker should still be guaranteed delivery…

The problem can be simplified as:
* How many REST endpoints do you need to service 100 compute machines? How many REST endpoints do you need to service 1000000 compute machines? How many points of failure exist?
* How many compute machines do you need to service 100 compute machines? How many compute machines do you need to service 1000000 compute machines?  How many points of failure exist?

It is unclear how many REST endpoints you'll need. The compute machines scale as they scale, they're not dependent on a REST broker. Every compute machine itself can fail, although this failure is likely trivial (messages to a dead machine are generally vain).  Meanwhile, the REST service has to deal with dead compute machines *and* the death of REST services supporting the architecture.

--  
Eric Windisch

References