yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #82739
[Bug 1856845] Re: Ephemeral storage removal fails with message rbd remove failed
Reviewed: https://review.opendev.org/705764
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6458c3dba53b9a9fb903bdb6e5e08af14ad015d6
Submitter: Zuul
Branch: master
commit 6458c3dba53b9a9fb903bdb6e5e08af14ad015d6
Author: Sasha Andonov <sandonov@xxxxxxxx>
Date: Tue Feb 4 16:59:14 2020 +0100
rbd_utils: increase _destroy_volume timeout
If RBD backend is used for Nova ephemeral storage, Nova tries to remove
ephemeral storage volume from Ceph in a retry loop: 10 attempts at 1
second intervals, totaling 10 seconds overall - which, due to a thirty
second ceph watcher timeout, might result in intermittent volume
removal failures on Ceph side.
This patch adds params rbd_destroy_volume_retries, defaulting to 12, and
rbd_destroy_volume_retry_interval, defaulting to 5, which multiplied, give
Ceph reasonable amount of time to complete the operation successfully.
Closes-Bug: #1856845
Change-Id: Icfd55617f0126f79d9610f8a2fc6b4c817d1a2bd
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1856845
Title:
Ephemeral storage removal fails with message rbd remove failed
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
After destroying instances, ephemeral storage removal intermittently fails with message:
2019-10-17 11:21:08.122 398018 INFO nova.virt.libvirt.driver [-] [instance: 87096add-348e-4c94-8f31-066346e32eef] Instance destroyed successfully.
2019-10-17 11:21:14.619 398018 WARNING nova.virt.libvirt.storage.rbd_utils [-] rbd remove 87096add-348e-4c94-8f31-066346e32eef_disk in pool rbd_pool failed
Ceph logs report lossy connection error:
2019-10-17 11:21:06.181233 7fbbdf2f4700 0 -- 10.248.83.92:6808/20526 submit_message osd_op_reply(192922 rbd_data.77c63845d27cdd.0000000000004728 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1273856~262144] v1504399'62984460 uv62984460 ack = 0) v7 remote, 10.248.54.216:0/2391175308, failed lossy con, dropping message 0x56545f021e40
Steps to reproduce
==================
- Deploy Nova with Ceph ephemeral storage RBD
- Create an instance
- Destroy an instance
Expected result
===============
Nova instance destroyed, ceph ephemeral storage always removed from pool
Actual result
=============
Nova instance destroyed, ceph ephemeral storage sometimes remains in pool
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1856845/+subscriptions
References