yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87818
[Bug 1953538] [NEW] Nova does not delete bogus attachments with the same server during detachment
Public bug reported:
Symptoms
---------
Met with that several times during server migrations.
RCA
----
Heat stack that has instance + N × (volume + volume attachment to the instance) is failing to delete due to timeout on delete of one of OS::Cinder::VolumeAttachment resources
Upon closer inspection I notice that the volume has erroneous entries in its attachment field, e.g.
$ openstack volume show 718a4ddf-639c-40db-b10a-bd151e2e8732 -f value -c attachments
[{'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': '6e4b5e3b-57e1-4129-8578-dc84a254b328', 'attached_at': '2021-09-02T08:18:07.000000', 'host_name': 'cmp036', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}, {'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': 'b4e7c11f-9616-458e-a16b-920f2938eaff', 'attached_at': '2021-09-01T22:36:30.000000', 'host_name': 'cmp038', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}]
Notice that the same volume seems to be attached twice to the same
instance under the same mount path, which is obviously bogus. At the
same time the instance that is mentioned in such bogus entries is not
showing that it has this volume attached, or sometimes is even already
deleted.
When heat deletes the VolumeAttachment resource (== detaches volume from
instance) it calls the nova to detach the volume from instance, and then
waits for both nova and cinder to acknowledge that the volume is not
attached to the instance. In case of such bogus multiple attachment
records in Cinder, nova presumably deletes one of them, but the rest are
still in place, and that blocks heat from realizing the volume is
detached (the volume also is still in status "in-use" ). Since nova
already considers volume as detached, attempt to detach the volume again
does nothing in Cinder (nova returns 404 volume is not attached to
instance - if there's an instance in the first place), and the heat
stack deletion in effectively stuck.
Workaround
-----------
Manually delete the offending attachments from the volume.
# have proper clouds.yaml and OS_CLOUD set in the shell
import openstack
cloud = openstack.connect()
cinder = cloud.volume
cinder.post("/volumes/<volume-id>/action", json={"os-detach": {"attachment_id": "<volume-attachment-id>"}})
Possible fixes
---------------
have nova recognize such bogus entries and delete duplicates when detaching the volume
** Affects: nova
Importance: Undecided
Assignee: Mitya Eremeev (mitos)
Status: In Progress
** Changed in: nova
Assignee: (unassigned) => Mitya Eremeev (mitos)
** Changed in: nova
Status: New => In Progress
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1953538
Title:
Nova does not delete bogus attachments with the same server during
detachment
Status in OpenStack Compute (nova):
In Progress
Bug description:
Symptoms
---------
Met with that several times during server migrations.
RCA
----
Heat stack that has instance + N × (volume + volume attachment to the instance) is failing to delete due to timeout on delete of one of OS::Cinder::VolumeAttachment resources
Upon closer inspection I notice that the volume has erroneous entries in its attachment field, e.g.
$ openstack volume show 718a4ddf-639c-40db-b10a-bd151e2e8732 -f value -c attachments
[{'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': '6e4b5e3b-57e1-4129-8578-dc84a254b328', 'attached_at': '2021-09-02T08:18:07.000000', 'host_name': 'cmp036', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}, {'server_id': 'a8557bfd-e8d8-41b0-b3f6-ffc9651b8b63', 'attachment_id': 'b4e7c11f-9616-458e-a16b-920f2938eaff', 'attached_at': '2021-09-01T22:36:30.000000', 'host_name': 'cmp038', 'volume_id': '718a4ddf-639c-40db-b10a-bd151e2e8732', 'device': '/dev/vdd', 'id': '718a4ddf-639c-40db-b10a-bd151e2e8732'}]
Notice that the same volume seems to be attached twice to the same
instance under the same mount path, which is obviously bogus. At the
same time the instance that is mentioned in such bogus entries is not
showing that it has this volume attached, or sometimes is even already
deleted.
When heat deletes the VolumeAttachment resource (== detaches volume
from instance) it calls the nova to detach the volume from instance,
and then waits for both nova and cinder to acknowledge that the volume
is not attached to the instance. In case of such bogus multiple
attachment records in Cinder, nova presumably deletes one of them, but
the rest are still in place, and that blocks heat from realizing the
volume is detached (the volume also is still in status "in-use" ).
Since nova already considers volume as detached, attempt to detach the
volume again does nothing in Cinder (nova returns 404 volume is not
attached to instance - if there's an instance in the first place), and
the heat stack deletion in effectively stuck.
Workaround
-----------
Manually delete the offending attachments from the volume.
# have proper clouds.yaml and OS_CLOUD set in the shell
import openstack
cloud = openstack.connect()
cinder = cloud.volume
cinder.post("/volumes/<volume-id>/action", json={"os-detach": {"attachment_id": "<volume-attachment-id>"}})
Possible fixes
---------------
have nova recognize such bogus entries and delete duplicates when detaching the volume
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1953538/+subscriptions
Follow ups