openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #21017
Re: deal with booting lots of instance simultaneously
Increasing the RPC timeout should help. I have seen this problem in
nova-network in the past. Vish suggestion sounds good.
Recently we launched by mistake 128 VMs in a production environment of a
customer: 0 errors. They are using 12 cores and several gigs for the
nova-network servers with dual 10G pipes. So hardware matters, of course.
My two cents,
Diego
--
Diego Parrilla
<http://www.stackops.com/>*CEO*
*www.stackops.com | * diego.parrilla@xxxxxxxxxxxx** | +34 649 94 43 29 |
skype:diegoparrilla*
* <http://www.stackops.com/>
*
*
On Tue, Feb 19, 2013 at 10:09 AM, gtt116 <gtt116@xxxxxxx> wrote:
> Hi Diego
>
> Thanks for you reply.
> How many hosts do you have? I have 4 hosts. And in this bug,
> https://bugs.launchpad.net/nova/+bug/1094226, The N is 20. In my
> environment N is about 16.
>
> I found that nova-network is too busy to deal with so many rpc request at
> the same time. The Rabbitmq is strong enough in the scenario.
>
> 于 2013年02月19日 16:54, Diego Parrilla Santamaría 写道:
>
> Hi gtt,
>
> what does it mean for you 'lots of instance simultaneously'? 100, 1000,
> 10000, more?
>
> We have launched >100 (but less than <1000) simultaneously without any
> issue. Rabbit running in a multicore with several gigs of RAM with out of
> the box configuration.
>
> Cheers
> Diego
> --
> Diego Parrilla
> <http://www.stackops.com/>*CEO*
> *www.stackops.com | * diego.parrilla@xxxxxxxxxxxx** | +34 649 94 43 29 |
> skype:diegoparrilla*
> * <http://www.stackops.com/>
> *
>
> *
>
>
>
> On Tue, Feb 19, 2013 at 9:35 AM, gtt116 <gtt116@xxxxxxx> wrote:
>
>> Hi all,
>>
>> When create lots of instance simultaneously, there will be lots of
>> instance in ERROR state. And most of them are caused by network rpc request
>> timeout. This result is not so graceful.
>>
>> I think it will be better if scheduler keep a queue of creating request.
>> when he find all the hosts are busy enough(compute_node.current_workload
>> reach some value), stop cast the request to host temporarily, until he
>> found some host free enough. In this way, we can make sure booting lots of
>> instances simultaneously results in active instances rather than lots of
>> ERROR instance. but will cause a small weak point, if the top value of
>> current_workload small enough, create instance processing will be slow.
>>
>> Do you have another quick fix?
>>
>> Thanks,
>>
>> --
>> best regards,
>> gtt
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> best regards,
> gtt
>
>
References