yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #20890
[Bug 1367964] [NEW] Unable to recover from timeout of detaching cinder volume
Public bug reported:
When cinder-volume is under heavy load, RPC call for terminate_connection of cinder volumes may take more time than RPC timeout.
When the timeout occurs, nova gives up the detaching volume and recover the volume state to 'in-use', but doesn't reattach volumes.
This will make DB inconsistent state:
(1) libvirt is already detaches the volume from the instance
(2) cinder volume is disconnected from the host by terminate_connection RPC (but nova doesn't know this because of timeout)
(3) nova.block_device_mapping still remains because of timeout in (2)
and the volume becomes impossible to re-attach or to detach completely.
If volume-detach is issued again, it will fail by the exception exception.DiskNotFound:
2014-07-17 10:58:17.333 2586 AUDIT nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from mountpoint /dev/vdb
2014-07-17 10:58:17.337 2586 ERROR nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Failed to detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from /dev/vdb
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Traceback (most recent call last):
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4169, in _detach_volume
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] encryption=encryption)
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1365, in detach_volume
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] raise exception.DiskNotFound(location=disk_dev)
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] DiskNotFound: No disk at vdb
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]
We should have the way to recover from this situation.
For instance, we need to have something like "volume-detach --force"
which ignores the DiskNotFound exception and continues to delete
nova.block_device_mapping entry.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1367964
Title:
Unable to recover from timeout of detaching cinder volume
Status in OpenStack Compute (Nova):
New
Bug description:
When cinder-volume is under heavy load, RPC call for terminate_connection of cinder volumes may take more time than RPC timeout.
When the timeout occurs, nova gives up the detaching volume and recover the volume state to 'in-use', but doesn't reattach volumes.
This will make DB inconsistent state:
(1) libvirt is already detaches the volume from the instance
(2) cinder volume is disconnected from the host by terminate_connection RPC (but nova doesn't know this because of timeout)
(3) nova.block_device_mapping still remains because of timeout in (2)
and the volume becomes impossible to re-attach or to detach completely.
If volume-detach is issued again, it will fail by the exception exception.DiskNotFound:
2014-07-17 10:58:17.333 2586 AUDIT nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from mountpoint /dev/vdb
2014-07-17 10:58:17.337 2586 ERROR nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Failed to detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from /dev/vdb
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Traceback (most recent call last):
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4169, in _detach_volume
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] encryption=encryption)
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1365, in detach_volume
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] raise exception.DiskNotFound(location=disk_dev)
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] DiskNotFound: No disk at vdb
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]
We should have the way to recover from this situation.
For instance, we need to have something like "volume-detach --force"
which ignores the DiskNotFound exception and continues to delete
nova.block_device_mapping entry.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1367964/+subscriptions
Follow ups
References