← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2044703] [NEW] Instance evacuation is stuck and completes after a long time

 

Public bug reported:

Stack:
3 node cluster on CentOS 8 Stream, Libvirt+KVM
Openstack Xena.
Nova packages version - 24.1.1-1
SAN based Shared storage connected through ISCSI
MariaDB Galera Cluster configured in Active-Backup mode on haproxy.

We are observing an issue with instance evacuation when a host with active database is powered off. 
Evacuation task is triggered but it is taking more than 15 minutes. At the same time, we see placement and other Openstack services report 504 gateway timeout errors when talking with keystone. We saw that the major part of time is spent in nova scheduler from the following log : 

Request filter 'map_az_to_placement_aggregate' took 972.3 seconds
wrapper /usr/lib/python3.6/site-
packages/nova/scheduler/request_filter.py:47

Sometimes rebuild task is stuck forever until we restart nova services and force VM to go to an error state.
Please let me know if any other configurations or logs are required to understand this issue.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2044703

Title:
  Instance evacuation is stuck and completes after a long time

Status in OpenStack Compute (nova):
  New

Bug description:
  Stack:
  3 node cluster on CentOS 8 Stream, Libvirt+KVM
  Openstack Xena.
  Nova packages version - 24.1.1-1
  SAN based Shared storage connected through ISCSI
  MariaDB Galera Cluster configured in Active-Backup mode on haproxy.

  We are observing an issue with instance evacuation when a host with active database is powered off. 
  Evacuation task is triggered but it is taking more than 15 minutes. At the same time, we see placement and other Openstack services report 504 gateway timeout errors when talking with keystone. We saw that the major part of time is spent in nova scheduler from the following log : 

  Request filter 'map_az_to_placement_aggregate' took 972.3 seconds
  wrapper /usr/lib/python3.6/site-
  packages/nova/scheduler/request_filter.py:47

  Sometimes rebuild task is stuck forever until we restart nova services and force VM to go to an error state.
  Please let me know if any other configurations or logs are required to understand this issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2044703/+subscriptions