← Back to team overview

openstack team mailing list archive

how to deal with failed compute node

 

Hi guys,

Today, when terminate an instance, nova-api will check whether nova-compute service is alive. If nova-compute is dead, nova-api just delete the instance from the database, but do not release the fixed-ip, floating-ip, volumes, etc. If the failed nova-compute start again, it will found the erroneously running instance, and do cleanup. But before the nova-compute started, the resource that dead vm associated can not be used. like fixed-ip can not be associated to another vm.

So I found a method to quickly clean these resource. If nova-api find nova-compute is dead. Then it find another nova-compute that is alive. Although the alive nova-compute is not the real host of vm. It can clean the resource in database, even the network by make rpc call to nova-network. maybe some exception it will raise. But that works. What do you think about this?

why do we have a lot of nova-compute, nova-network? I think one reason is when one node failed, another can do some work for it.

Best regards,
gtt



Follow ups

References