yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1355348] [NEW] Terminating an instance while attaching a volume leads to both actions failing

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Andrew Laski <andrew.laski@xxxxxxxxxxxxx>
Date: Mon, 11 Aug 2014 18:12:15 -0000
Reply-to: Bug 1355348 <1355348@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

This is happening with the xenapi driver, but it's possible that this
can happen with others.  The sequence of events I'm witnessing is:

An attach_volume request is made and shortly after a terminate_instance
request is made.

>From the attach_volume request the block device mapping has been
updated, the volume has been connected to the hypervisor, but has not
been attached to the instance.  The terminate request begins processing
before the volume connection is attached to the instance so when it
detaches volumes and their connections it misses the latest one that's
still attaching.  This leads to a failure when asking Cinder to clean up
the volume, such as:

2014-08-06 20:30:14.324 30737 TRACE nova.compute.manager [instance:
<uuid>] ClientException: DELETE on
http://127.0.0.1/volumes/<uuid>/export?force=False returned '409' with
'Volume '<uuid>' is currently attached to '127.0.0.1'' (HTTP 409)
(Request-ID: req-)

And in turn, when the attach_volume tries to attach the volume to the
instance it finds that the instance no longer exists due to the
terminate request.  This leaves the instance undeletable and the volume
stuck.

Having attach_volume share the instance lock with terminate_instance
should resolve this.  Virt drivers may also want to try to cope with
this internally and not rely on a lock.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1355348

Title:
  Terminating an instance while attaching a volume leads to both actions
  failing

Status in OpenStack Compute (Nova):
  New

Bug description:
  This is happening with the xenapi driver, but it's possible that this
  can happen with others.  The sequence of events I'm witnessing is:

  An attach_volume request is made and shortly after a
  terminate_instance request is made.

  From the attach_volume request the block device mapping has been
  updated, the volume has been connected to the hypervisor, but has not
  been attached to the instance.  The terminate request begins
  processing before the volume connection is attached to the instance so
  when it detaches volumes and their connections it misses the latest
  one that's still attaching.  This leads to a failure when asking
  Cinder to clean up the volume, such as:

  2014-08-06 20:30:14.324 30737 TRACE nova.compute.manager [instance:
  <uuid>] ClientException: DELETE on
  http://127.0.0.1/volumes/<uuid>/export?force=False returned '409' with
  'Volume '<uuid>' is currently attached to '127.0.0.1'' (HTTP 409)
  (Request-ID: req-)

  And in turn, when the attach_volume tries to attach the volume to the
  instance it finds that the instance no longer exists due to the
  terminate request.  This leaves the instance undeletable and the
  volume stuck.

  Having attach_volume share the instance lock with terminate_instance
  should resolve this.  Virt drivers may also want to try to cope with
  this internally and not rely on a lock.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1355348/+subscriptions

Follow ups

[Bug 1355348] Re: Terminating an instance while attaching a volume leads to both actions failing
From: Thierry Carrez, 2014-10-01
[Bug 1355348] [NEW] Terminating an instance while attaching a volume leads to both actions failing
From: Andrew Laski, 2014-08-11

References

[Bug 1355348] [NEW] Terminating an instance while attaching a volume leads to both actions failing
From: Andrew Laski, 2014-08-11