← Back to team overview

openstack team mailing list archive

Re: nova-volumes problem after host reboot

 

Hello,

I am still working on this issue. I can not apply the disaster recovery as
describe here:
http://docs.openstack.org/trunk/openstack-compute/admin/content/nova-disaster-recovery-process.html

Thanks to livemoon, I can get the instances back running following his
tips. By the way, I have put that in a shell script below, to make it easy
to run the procedure.

To run, you should type:
restore-instance <instance name>

After that, we have the instance running and the database status update.

The volume attachment still is missing. I can get it to work after a reboot
in the host.
As I said in a previous email, the attach was reporting and error that
device /dev/vdc was already in use (what is not the case.. it must be a bug
or something in the code).
I changed the device to /dev/vde and it accepts and submit the command, but
does not attach the device. Logs are below.

Hope someone - including you livemoon :)  - still has something else to say
about this.

Have anyone of you guys tested this before? Does it work for you?

Cheers.

nova volume-attach c5cf37e2-9e96-45a2-a739-638ac9877128
1f590986-995e-4cf6-bbe7-d8ced6672990 /dev/vde

The compute.log:
2012-11-18 20:28:54 AUDIT nova.compute.manager
[req-4d0e4f88-be19-4788-9c21-c280a77173fc cb302c58bb4245cebc61e132c79c1111
768bd68a0ac149eb8e300665eb3d3950] [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128] Attaching volume
1f590986-995e-4cf6-bbe7-d8ced6672990 to /dev/vde
2012-11-18 20:29:43 AUDIT nova.compute.resource_tracker [-] Free ram (MB):
13061
2012-11-18 20:29:43 AUDIT nova.compute.resource_tracker [-] Free disk (GB):
253
2012-11-18 20:29:43 AUDIT nova.compute.resource_tracker [-] Free VCPUS: -21
2012-11-18 20:29:43 INFO nova.compute.resource_tracker [-] Compute_service
record updated for nova-controller
2012-11-18 20:29:54 ERROR nova.openstack.common.rpc.impl_qpid [-] Timed out
waiting for RPC response: None
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid Traceback
(most recent call last):
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 376, in ensure
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid     return
method(*args, **kwargs)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 425, in _consume
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid
nxt_receiver = self.session.next_receiver(timeout=timeout)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid   File
"<string>", line 6, in next_receiver
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid   File
"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 663,
in next_receiver
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid     raise
Empty
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid Empty: None
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.impl_qpid
2012-11-18 20:29:54 ERROR nova.compute.manager
[req-4d0e4f88-be19-4788-9c21-c280a77173fc cb302c58bb4245cebc61e132c79c1111
768bd68a0ac149eb8e300665eb3d3950] [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128] Failed to connect to volume
1f590986-995e-4cf6-bbe7-d8ced6672990 while attaching at /dev/vde
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128] Traceback (most recent call last):
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1956, in
_attach_volume
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     connector)
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/volume/api.py", line 60, in wrapped
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     return func(self, context,
target_obj, *args, **kwargs)
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/volume/api.py", line 378, in
initialize_connection
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     "connector": connector}})
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/__init__.py",
line 102, in call
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     return _get_impl().call(cfg.CONF,
context, topic, msg, timeout)
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 561, in call
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]
rpc_amqp.get_connection_pool(conf, Connection))
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
368, in call
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     rv = list(rv)
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
329, in __iter__
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     self.done()
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     self.gen.next()
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
326, in __iter__
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     self._iterator.next()
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 434, in iterconsume
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     yield
self.ensure(_error_callback, _consume)
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 380, in ensure
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     error_callback(e)
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 419, in _error_callback
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]     raise rpc_common.Timeout()
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128] Timeout: Timeout while waiting on RPC
response.
2012-11-18 20:29:54 TRACE nova.compute.manager [instance:
c5cf37e2-9e96-45a2-a739-638ac9877128]
2012-11-18 20:29:54 ERROR nova.openstack.common.rpc.amqp [-] Exception
during message handling
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp Traceback (most
recent call last):
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
275, in _process_data
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     rval =
self.proxy.dispatch(ctxt, version, method, **args)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py",
line 145, in dispatch
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     return
getattr(proxyobj, method)(ctxt, **kwargs)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/exception.py", line 117, in wrapped
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     temp_level,
payload)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.gen.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/exception.py", line 92, in wrapped
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     return
f(*args, **kw)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 181, in
decorated_function
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     pass
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.gen.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 167, in
decorated_function
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     return
function(self, context, *args, **kwargs)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 202, in
decorated_function
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp
kwargs['instance']['uuid'], e, sys.exc_info())
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.gen.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 196, in
decorated_function
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     return
function(self, context, *args, **kwargs)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1945, in
attach_volume
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     context,
instance.get('uuid'), mountpoint)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.gen.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1941, in
attach_volume
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     mountpoint,
instance)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1963, in
_attach_volume
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp
self.volume_api.unreserve_volume(context, volume)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.gen.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1956, in
_attach_volume
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     connector)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/volume/api.py", line 60, in wrapped
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     return
func(self, context, target_obj, *args, **kwargs)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/volume/api.py", line 378, in
initialize_connection
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     "connector":
connector}})
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/__init__.py",
line 102, in call
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     return
_get_impl().call(cfg.CONF, context, topic, msg, timeout)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 561, in call
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp
rpc_amqp.get_connection_pool(conf, Connection))
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
368, in call
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     rv = list(rv)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
329, in __iter__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.done()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     self.gen.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line
326, in __iter__
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp
self._iterator.next()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 434, in iterconsume
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     yield
self.ensure(_error_callback, _consume)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 380, in ensure
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp
error_callback(e)
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp   File
"/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py",
line 419, in _error_callback
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp     raise
rpc_common.Timeout()
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp Timeout: Timeout
while waiting on RPC response.
2012-11-18 20:29:54 TRACE nova.openstack.common.rpc.amqp

-------  xx ------ restore-instance -------- xx ------
#!/bin/sh

NOVAINSTDIR=/var/lib/nova/instances
VIRXML=libvirt.xml

get_instanceid_in_xml() {

 INSTANCEDIR=$1
 if [ -f $INSTANCEDIR/$VIRXML ];then
    UUID=$(grep "<uuid>" $INSTANCEDIR/$VIRXML | /bin/cut -f2 -d">" |
/bin/cut -f1 -d"<" )
    echo $UUID
 fi

}

search_instanceid_in_xml() {
 UUID=$1
 FOUND=false
 set $(ls -d $NOVAINSTDIR/instance*)
 while [ $# -gt 0 -a $FOUND == false ]
 do
   UUIDXML=$(get_instanceid_in_xml $1)
   if [ "$UUID" == "$UUIDXML" ];then
      FOUND=true
   else
      shift
   fi
 done

 if [ $FOUND == true ];then
    echo $1
 else
    echo "1"
 fi

}

get_instanceid_in_db() {
 INSTANCENAME=$1
 UUID=$(/usr/bin/nova list | /bin/grep -w "$INSTANCENAME" | /bin/cut -f2
-d"|")
 echo $UUID
}

reset_instance_status_db() {
 INSTUUID=$1
 echo "MySQL Password:"
 read PW
 mysql -unova -p$PW nova -e "update instances set power_state='1',
vm_state='active', task_state=NULL where uuid='$INSTUUID'"
}


INSTUID=$(get_instanceid_in_db $1)
if [ "$INSTUID" == "" ];then
   echo "Instance not found"
else
   INSTDIR=$(search_instanceid_in_xml $INSTUID)
   if [ "$INSTDIR" == "" ];then
      echo "Instance not found"
   else
      INSTIDNAME=$(echo $INSTDIR | /bin/awk -F "/" '{print $NF}')
      /usr/bin/sudo /usr/bin/virsh undefine $INSTIDNAME --managed-save
      /usr/bin/sudo /usr/bin/virsh define $INSTDIR/$VIRXML
      /usr/bin/sudo /usr/bin/virsh start $INSTIDNAME
      reset_instance_status_db "$INSTUID"
   fi
fi
-------  xx ------ restore-instance -------- xx ------


On 10 November 2012 17:25, Ronivon Costa <ronivon.costa@xxxxxxxxx> wrote:

> Hi,
>
> Had some improvement with this issue.
>
> Could boot the instance using virsh, following livemoon advice with small
> adaptation.
> However the problem still is not fixed.
>
> The volume table was update:
>   mysql -unova -p$PW nova -e "update volumes set mountpoint=NULL,
> attach_status='detached', instance_uuid=0"
>   mysql -unova -p$PW nova -e "update volumes set status='available' where
> status <> 'error_deleting'"
>
>
> Restarted the instance:
> # virsh undefine instance-00000038
> error: Refusing to undefine while domain managed save image exists
>
> # virsh undefine instance-00000038 --managed-save
> Domain instance-00000038 has been undefined
>
> # virsh define libvirt.xml
> Domain instance-00000038 defined from libvirt.xml
>
> # virsh start instance-00000038
> Domain instance-00000038 started
>
> Then I update the database with the new instances status:
> # mysql -unova -p nova -e "update instances set
> power_state='1',vm_state='active',task_state=NULL where uuid =
> '7e732b31-2ff8-4cf2-a7ac-f1562070cfb3'"
>
> I can now connect to the instance. That is a great improvement from my
> original problem.
>
> But there still some serius issues to fix:
>
> The instance can not be rebooted (hard reboot). It will not start, with
> the same errors as before.
> Also, we can not attach the volume back to the instance:
> # nova volume-attach 7e732b31-2ff8-4cf2-a7ac-f1562070cfb3
> 647db677-aa48-4d1e-b875-80be73469cb5 /dev/vdc
> ERROR: The supplied device path (/dev/vdb) is in use.
> ...
> The error is: DevicePathInUse: The supplied device path (/dev/vdb) is in
> use.
>
>
> /dev/vdb is one ephemeral disk. Why nova is trying to use /dev/vdb when I
> specified /dev/vdc ?
>



-- 
--
Ronivon C. Costa
IBM Certified for Tivoli Software
ITIL V3 Certified
Tlm: (+351) 96 676 4458
Skype: ronivon.costa
BLog ((hosted in my own personal cloud infrastructure):
http://cloud0.dyndns-web.com/blog/

Follow ups

References