← Back to team overview

openstack team mailing list archive

detecting errors when determining libvirt vm power state

 

Hi,

I'm looking at what first manifested as a bug when launching multiple
lxc containers simultaneously, i.e. 'euca-run-instances -n 4', as
reported at https://bugs.launchpad.net/ubuntu/+source/nova/+bug/842845.

The problem appears to be that nova uses self.driver.get_info().  Libvirt
can raise excpetions on this for several reasons - the vm could be bad or
not exist, or it could be in a transient state i.e. cgroups are not set
up yet.

What is the right way to handle this?  Should the drivers categorize
their exceptions into either 'broken' or 'transient' ones, so that
nova can detect former and bail, and retry on the latter?

Note that while the bug was raised for lxc, I suspect the same should
be possible with kvm ones.  However the qemu GetInfo method doesn't
get its cpu/mem usage info from cgroups, so it would not happen the
exact same way.

-serge


Follow ups