← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1953538] [NEW] Nova does not delete bogus attachments with the same server during detachment

 

Public bug reported:

Symptoms
---------
Met with that several times during server migrations.

RCA
----
Heat stack that has instance + N × (volume + volume attachment to the instance) is failing to delete due to timeout on delete of one of OS::Cinder::VolumeAttachment resources

Upon closer inspection I notice that the volume has erroneous entries in its attachment field, e.g.
$ openstack volume show 718a4ddf-639c-40db-b10a-bd151e2e8732 -f value -c attachments
[{'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': '6e4b5e3b-57e1-4129-8578-dc84a254b328', 'attached_at': '2021-09-02T08:18:07.000000', 'host_name': 'cmp036', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}, {'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': 'b4e7c11f-9616-458e-a16b-920f2938eaff', 'attached_at': '2021-09-01T22:36:30.000000', 'host_name': 'cmp038', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}]

Notice that the same volume seems to be attached twice to the same
instance under the same mount path, which is obviously bogus. At the
same time the instance that is mentioned in such bogus entries is not
showing that it has this volume attached, or sometimes is even already
deleted.

When heat deletes the VolumeAttachment resource (== detaches volume from
instance) it calls the nova to detach the volume from instance, and then
waits for both nova and cinder to acknowledge that the volume is not
attached to the instance. In case of such bogus multiple attachment
records in Cinder, nova presumably deletes one of them, but the rest are
still in place, and that blocks heat from realizing the volume is
detached (the volume also is still in status "in-use" ). Since nova
already considers volume as detached, attempt to detach the volume again
does nothing in Cinder (nova returns 404 volume is not attached to
instance - if there's an instance in the first place), and the heat
stack deletion in effectively stuck.

Workaround
-----------
Manually delete the offending attachments from the volume.

# have proper clouds.yaml and OS_CLOUD set in the shell
import openstack
cloud = openstack.connect()
cinder = cloud.volume
cinder.post("/volumes/<volume-id>/action", json={"os-detach": {"attachment_id": "<volume-attachment-id>"}})

Possible fixes
---------------
have nova recognize such bogus entries and delete duplicates when detaching the volume

** Affects: nova
     Importance: Undecided
     Assignee: Mitya Eremeev (mitos)
         Status: In Progress

** Changed in: nova
     Assignee: (unassigned) => Mitya Eremeev (mitos)

** Changed in: nova
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1953538

Title:
  Nova does not delete bogus attachments with the same server during
  detachment

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Symptoms
  ---------
  Met with that several times during server migrations.

  RCA
  ----
  Heat stack that has instance + N × (volume + volume attachment to the instance) is failing to delete due to timeout on delete of one of OS::Cinder::VolumeAttachment resources

  Upon closer inspection I notice that the volume has erroneous entries in its attachment field, e.g.
  $ openstack volume show 718a4ddf-639c-40db-b10a-bd151e2e8732 -f value -c attachments
  [{'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': '6e4b5e3b-57e1-4129-8578-dc84a254b328', 'attached_at': '2021-09-02T08:18:07.000000', 'host_name': 'cmp036', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}, {'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': 'b4e7c11f-9616-458e-a16b-920f2938eaff', 'attached_at': '2021-09-01T22:36:30.000000', 'host_name': 'cmp038', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}]

  Notice that the same volume seems to be attached twice to the same
  instance under the same mount path, which is obviously bogus. At the
  same time the instance that is mentioned in such bogus entries is not
  showing that it has this volume attached, or sometimes is even already
  deleted.

  When heat deletes the VolumeAttachment resource (== detaches volume
  from instance) it calls the nova to detach the volume from instance,
  and then waits for both nova and cinder to acknowledge that the volume
  is not attached to the instance. In case of such bogus multiple
  attachment records in Cinder, nova presumably deletes one of them, but
  the rest are still in place, and that blocks heat from realizing the
  volume is detached (the volume also is still in status "in-use" ).
  Since nova already considers volume as detached, attempt to detach the
  volume again does nothing in Cinder (nova returns 404 volume is not
  attached to instance - if there's an instance in the first place), and
  the heat stack deletion in effectively stuck.

  Workaround
  -----------
  Manually delete the offending attachments from the volume.

  # have proper clouds.yaml and OS_CLOUD set in the shell
  import openstack
  cloud = openstack.connect()
  cinder = cloud.volume
  cinder.post("/volumes/<volume-id>/action", json={"os-detach": {"attachment_id": "<volume-attachment-id>"}})

  Possible fixes
  ---------------
  have nova recognize such bogus entries and delete duplicates when detaching the volume

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1953538/+subscriptions



Follow ups