← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1856845] Re: Ephemeral storage removal fails with message rbd remove failed

 

Reviewed:  https://review.opendev.org/705764
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6458c3dba53b9a9fb903bdb6e5e08af14ad015d6
Submitter: Zuul
Branch:    master

commit 6458c3dba53b9a9fb903bdb6e5e08af14ad015d6
Author: Sasha Andonov <sandonov@xxxxxxxx>
Date:   Tue Feb 4 16:59:14 2020 +0100

    rbd_utils: increase _destroy_volume timeout
    
    If RBD backend is used for Nova ephemeral storage, Nova tries to remove
    ephemeral storage volume from Ceph in a retry loop: 10 attempts at 1
    second intervals, totaling 10 seconds overall - which, due to a thirty
    second ceph watcher timeout, might result in intermittent volume
    removal failures on Ceph side.
    This patch adds params rbd_destroy_volume_retries, defaulting to 12, and
    rbd_destroy_volume_retry_interval, defaulting to 5, which multiplied, give
    Ceph reasonable amount of time to complete the operation successfully.
    
    Closes-Bug: #1856845
    Change-Id: Icfd55617f0126f79d9610f8a2fc6b4c817d1a2bd


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1856845

Title:
  Ephemeral storage removal fails with message rbd remove failed

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========
  After destroying instances, ephemeral storage removal intermittently fails with message:

  2019-10-17 11:21:08.122 398018 INFO nova.virt.libvirt.driver [-] [instance: 87096add-348e-4c94-8f31-066346e32eef] Instance destroyed successfully.
  2019-10-17 11:21:14.619 398018 WARNING nova.virt.libvirt.storage.rbd_utils [-] rbd remove 87096add-348e-4c94-8f31-066346e32eef_disk in pool rbd_pool failed

  Ceph logs report lossy connection error:
  2019-10-17 11:21:06.181233 7fbbdf2f4700  0 -- 10.248.83.92:6808/20526 submit_message osd_op_reply(192922 rbd_data.77c63845d27cdd.0000000000004728 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1273856~262144] v1504399'62984460 uv62984460 ack = 0) v7 remote, 10.248.54.216:0/2391175308, failed lossy con, dropping message 0x56545f021e40

  Steps to reproduce
  ==================
  - Deploy Nova with Ceph ephemeral storage RBD 
  - Create an instance
  - Destroy an instance

  Expected result
  ===============
  Nova instance destroyed, ceph ephemeral storage always removed from pool

  Actual result
  =============
  Nova instance destroyed, ceph ephemeral storage sometimes remains in pool

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1856845/+subscriptions


References