← Back to team overview

openstack team mailing list archive

Re: Some insight into the number of instances Nova needs to spin up...

 

Jay -

Few comments below...

Erik

On 12/30/10 10:43 AM, "Jay Pipes" <jaypipes@xxxxxxxxx> wrote:

>On Wed, Dec 29, 2010 at 2:27 PM, Erik Carlin <erik.carlin@xxxxxxxxxxxxx>
>wrote:
>> We know Amazon is highly, highly elastic.  While the instances launched
>> per day is impressive, we know that many of those instances have a short
>> life.
>
>OK, good point.  But, this begs the question: what should Nova's
>priority be?  Elasticity -- in other words, being able to quickly spin
>up and down hundreds of thousands of instances per day?  Or
>manageability at large scale -- in other words, a system that is easy
>to administer at hundreds of thousands of physical nodes?  Or pure
>scalability on the user end -- meaning, given a specific installation
>of applications on a given type of instance (say, m1.large), what is
>the pattern of throughput for that set of applications as the size of
>the grid increases to hundreds of thousands of physical nodes?
>
>Or do we take an ambivalent position on the above and go for some sort
>of "general scalability"?

I think we should quantify target absolute and rate limits and seek to
meet those.  I proposed a set in the email I just sent that combines what
we know about EC2 and CS today.  Would love to know what others think.

>
>> I see Guy is now teaming up with CloudKick on this report.  The EC2
>> instance ID enables precise measurement of instances launched, and
>> CloudKick provides some quantitative measure of lifetime of instances.
>> Last time I checked, those numbers we're something like 3% of EC2
>> instances launched via CK were still running (as a point of reference,
>> something like 80% of Rackspace cloud servers were still running).
>
>I see this as tangential at best, and mostly a localized issue with
>Rackspace Cloud Servers, and not something that is inherently
>important to Nova.  Let me explain.  IMHO, there are two big reasons
>why there is less "churn rate" of instances on Cloud Servers than EC2:
>
>1) Different level/type of customers
>
>RS Cloud Servers tends to attract a more "corporate" or "enterprisey"
>type of customer.  These customers tend to deploy applications into
>the RS Cloud with more permanent patterns.  Applications like
>departmental or financial applications don't tend to "disappear" or be
>experimental.
>
>2) Application bursting/overflow capacity
>
>Perhaps more important than the type of customer RS Cloud Servers
>attracts, I think many people believe that automating capacity
>bursting into the RS Cloud is more difficult than EC2 (possibly due to
>a larger feature set in the EC2 API for managing groups of servers/IP
>ranges?), and that may contribute to the lower instance churn rate.
>Due to EC2's elasticity (and "hackability" as termie would call
>it...), companies are better able to programmatically spin up
>instances to offload peak traffic from web applications and then spin
>those instances down after traffic subsides.  Perhaps if RS Cloud
>Servers had better hackability, I think you'd see the RS Cloud churn
>rate increase dramatically.
>
>These are just my thoughts, though.  I'd be interested to hear what
>other's opinions on this are.

The lifetime estimates are relevant in translating the EC2 rate numbers
(which we can quantifiably measure) to absolute host estimates.  I agree
the CS info is somewhat superfluous - I provided it more as an FYI.  The
primary reason for the instance lifetime differentiation between EC2 and
CS IMHO is the persistent vs. ephemeral nature of VMs.  Coupled with spot
instances, cluster compute instances, cluster GPU instances, etc. EC2
becomes a very nice platform for map/reduce, monte carlo simulations,
video encoding, protein folding, etc. which are transitory workloads.  I
actually think there is an equal amount of "enterprisey" customers on
both.  I agree we can and should increase the "hackability" of the CS/OS
API.  IMO, the goal of nova should be combining strong elasticity and
hackability properties with persistence, then you can run either
transitory or persistent workloads well.  If we can achieve that, nova
will meet the needs of a large and diverse set of target clouds.
 

>
>> To meet the elasticity demands of EC2, nova would need to support a high
>> change rate of adds/deletes (not to mention state polling, resizes,
>>etc).
>> Is there a nova change rate target as well or just a physical host
>>limit?
>> The 1M host limit still seems reasonable to me.  Large scale deployments
>> will break into regions where each region is an independent nova
>> deployment that each has a 1M host limit.
>
>This change rate is something that should be tracked in the continuous
>integration project.
>
>Cheers!
>-jay



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse@xxxxxxxxxxxxx, and delete the original message.
Your cooperation is appreciated.




References