openstack team mailing list archive

Thread
Date

Re: Some insight into the number of instances Nova needs to spin up...

To: Erik Carlin <erik.carlin@xxxxxxxxxxxxx>
From: Jay Pipes <jaypipes@xxxxxxxxx>
Date: Thu, 30 Dec 2010 11:43:51 -0500
Cc: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <17155_1293650835_oBTJR9L7023806_C940E4A0.73DC%erik.carlin@rackspace.com>

On Wed, Dec 29, 2010 at 2:27 PM, Erik Carlin <erik.carlin@xxxxxxxxxxxxx> wrote:
> We know Amazon is highly, highly elastic.  While the instances launched
> per day is impressive, we know that many of those instances have a short
> life.

OK, good point.  But, this begs the question: what should Nova's
priority be?  Elasticity -- in other words, being able to quickly spin
up and down hundreds of thousands of instances per day?  Or
manageability at large scale -- in other words, a system that is easy
to administer at hundreds of thousands of physical nodes?  Or pure
scalability on the user end -- meaning, given a specific installation
of applications on a given type of instance (say, m1.large), what is
the pattern of throughput for that set of applications as the size of
the grid increases to hundreds of thousands of physical nodes?

Or do we take an ambivalent position on the above and go for some sort
of "general scalability"?

> I see Guy is now teaming up with CloudKick on this report.  The EC2
> instance ID enables precise measurement of instances launched, and
> CloudKick provides some quantitative measure of lifetime of instances.
> Last time I checked, those numbers we're something like 3% of EC2
> instances launched via CK were still running (as a point of reference,
> something like 80% of Rackspace cloud servers were still running).

I see this as tangential at best, and mostly a localized issue with
Rackspace Cloud Servers, and not something that is inherently
important to Nova.  Let me explain.  IMHO, there are two big reasons
why there is less "churn rate" of instances on Cloud Servers than EC2:

1) Different level/type of customers

RS Cloud Servers tends to attract a more "corporate" or "enterprisey"
type of customer.  These customers tend to deploy applications into
the RS Cloud with more permanent patterns.  Applications like
departmental or financial applications don't tend to "disappear" or be
experimental.

2) Application bursting/overflow capacity

Perhaps more important than the type of customer RS Cloud Servers
attracts, I think many people believe that automating capacity
bursting into the RS Cloud is more difficult than EC2 (possibly due to
a larger feature set in the EC2 API for managing groups of servers/IP
ranges?), and that may contribute to the lower instance churn rate.
Due to EC2's elasticity (and "hackability" as termie would call
it...), companies are better able to programmatically spin up
instances to offload peak traffic from web applications and then spin
those instances down after traffic subsides.  Perhaps if RS Cloud
Servers had better hackability, I think you'd see the RS Cloud churn
rate increase dramatically.

These are just my thoughts, though.  I'd be interested to hear what
other's opinions on this are.

> To meet the elasticity demands of EC2, nova would need to support a high
> change rate of adds/deletes (not to mention state polling, resizes, etc).
> Is there a nova change rate target as well or just a physical host limit?
> The 1M host limit still seems reasonable to me.  Large scale deployments
> will break into regions where each region is an independent nova
> deployment that each has a 1M host limit.

This change rate is something that should be tracked in the continuous
integration project.

Cheers!
-jay

Follow ups

Re: Some insight into the number of instances Nova needs to spin up...
From: Erik Carlin, 2010-12-30

References

Some insight into the number of instances Nova needs to spin up...
From: Jay Pipes, 2010-12-29
Re: Some insight into the number of instances Nova needs to spin up...
From: Erik Carlin, 2010-12-29