yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1367964] [NEW] Unable to recover from timeout of detaching cinder volume

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Tomoki Sekiyama <tomoki.sekiyama@xxxxxxx>
Date: Wed, 10 Sep 2014 23:14:01 -0000
Reply-to: Bug 1367964 <1367964@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

When cinder-volume is under heavy load, RPC call for terminate_connection of cinder volumes may take more time than RPC timeout.
When the timeout occurs, nova gives up the detaching volume and recover the volume state to 'in-use', but doesn't reattach volumes.
This will make DB inconsistent state:

  (1) libvirt is already detaches the volume from the instance
  (2) cinder volume is disconnected from the host by terminate_connection RPC (but nova doesn't know this because of timeout)
  (3) nova.block_device_mapping still remains because of timeout in (2)

and the volume becomes impossible to re-attach or to detach completely.
If volume-detach is issued again, it will fail by the exception exception.DiskNotFound:


2014-07-17 10:58:17.333 2586 AUDIT nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from mountpoint /dev/vdb
2014-07-17 10:58:17.337 2586 ERROR nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Failed to detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from /dev/vdb
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Traceback (most recent call last):
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4169, in _detach_volume
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]     encryption=encryption)
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]   File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1365, in detach_volume
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]     raise exception.DiskNotFound(location=disk_dev)
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] DiskNotFound: No disk at vdb
2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] 


We should have the way to recover from this situation.

For instance, we need to have something like "volume-detach --force"
which ignores the DiskNotFound exception and continues to delete
nova.block_device_mapping entry.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1367964

Title:
  Unable to recover from timeout of detaching cinder volume

Status in OpenStack Compute (Nova):
  New

Bug description:
  When cinder-volume is under heavy load, RPC call for terminate_connection of cinder volumes may take more time than RPC timeout.
  When the timeout occurs, nova gives up the detaching volume and recover the volume state to 'in-use', but doesn't reattach volumes.
  This will make DB inconsistent state:

    (1) libvirt is already detaches the volume from the instance
    (2) cinder volume is disconnected from the host by terminate_connection RPC (but nova doesn't know this because of timeout)
    (3) nova.block_device_mapping still remains because of timeout in (2)

  and the volume becomes impossible to re-attach or to detach completely.
  If volume-detach is issued again, it will fail by the exception exception.DiskNotFound:

  
  2014-07-17 10:58:17.333 2586 AUDIT nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from mountpoint /dev/vdb
  2014-07-17 10:58:17.337 2586 ERROR nova.compute.manager [req-e251f834-9653-47aa-969c-b9524d4a683d f8c2ac613325450fa6403a89d48ac644 4be531199d5240f79733fb071e090e46] [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Failed to detach volume f7d90bc8-eb55-4d46-a2c4-294dc9c6a92a from /dev/vdb
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] Traceback (most recent call last):
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 4169, in _detach_volume
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]     encryption=encryption)
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]   File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1365, in detach_volume
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb]     raise exception.DiskNotFound(location=disk_dev)
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] DiskNotFound: No disk at vdb
  2014-07-17 10:58:17.337 2586 TRACE nova.compute.manager [instance: 48c19bff-ec39-44c5-a63b-cac01ee813eb] 

  
  We should have the way to recover from this situation.

  For instance, we need to have something like "volume-detach --force"
  which ignores the DiskNotFound exception and continues to delete
  nova.block_device_mapping entry.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1367964/+subscriptions

Follow ups

[Bug 1367964] Re: Unable to recover from timeout of detaching cinder volume
From: Thierry Carrez, 2014-12-18
[Bug 1367964] [NEW] Unable to recover from timeout of detaching cinder volume
From: Tomoki Sekiyama, 2014-09-10

References

[Bug 1367964] [NEW] Unable to recover from timeout of detaching cinder volume
From: Tomoki Sekiyama, 2014-09-10