← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1859496] [NEW] Deleting stuck build instance may leak allocations

 

Public bug reported:

Description
===========

After issues in control plane during instance creation,
Instance may stay stuck in BUILD state.

Even after deleting them, placement allocation may remain,
and compute host log is complaining that:
Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.


Steps to reproduce
==================

On a fresh devstack master install


1) open a terminal that display entry in placement.allocations and nova_cell1.instances all seconds:
while true ; do  date ; mysql -e "select * from placement.allocations" ; mysql -e "select * from nova_cell1.instances where deleted=0" ;sleep 1 ; done

2) Trigguer a spawn of 50 instances & kill rabbit after 5sec to simulate issue on control plane:
openstack server create  --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic net-id=private alex --min 50 --max 50 & sleep 5 ;  sudo pkill rabbitmq-server

Note: To reach the bug,  goal is to get instances Allocated by
scheduler, but not let the time to conductor to create entry in
nova_cell1.instances

You should see allocations appearing in allocations:
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
| created_at          | updated_at | id   | resource_provider_id | consumer_id                          | resource_class_id | used |
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
| 2020-01-13 11:02:51 | NULL       | 1727 |                    1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 |                 2 |    1 |
| 2020-01-13 11:02:51 | NULL       | 1728 |                    1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 |                 1 |  512 |
| 2020-01-13 11:02:51 | NULL       | 1729 |                    1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 |                 0 |    1 |
| 2020-01-13 11:02:51 | NULL       | 1730 |                    1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda |                 2 |    1 |
| 2020-01-13 11:02:51 | NULL       | 1731 |                    1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda |                 1 |  512 |
.....

instances are all stuck in BUILD at this stage

3) delete instances:
openstack server list | awk '/m1.tiny/ {print $2}' | xargs openstack server delete
4) service rabbitmq-server start
5) openstack server list 
    <display nothing>
6)  mysql -e "select count(*) from placement.allocations"
+----------+
| count(*) |
+----------+
|      150 |
+----------+
Allocation remains
7) nova-compute logs complaining that:
Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.

Expected result
===============
placement allocation of instance have to be cleanup after deletion

Actual result
=============
placement allocation of instance are leaked.


Environment
===========
At least stein to master seems impacted

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1859496

Title:
  Deleting stuck build instance may leak allocations

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  After issues in control plane during instance creation,
  Instance may stay stuck in BUILD state.

  Even after deleting them, placement allocation may remain,
  and compute host log is complaining that:
  Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.

  
  Steps to reproduce
  ==================

  On a fresh devstack master install

  
  1) open a terminal that display entry in placement.allocations and nova_cell1.instances all seconds:
  while true ; do  date ; mysql -e "select * from placement.allocations" ; mysql -e "select * from nova_cell1.instances where deleted=0" ;sleep 1 ; done

  2) Trigguer a spawn of 50 instances & kill rabbit after 5sec to simulate issue on control plane:
  openstack server create  --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic net-id=private alex --min 50 --max 50 & sleep 5 ;  sudo pkill rabbitmq-server

  Note: To reach the bug,  goal is to get instances Allocated by
  scheduler, but not let the time to conductor to create entry in
  nova_cell1.instances

  You should see allocations appearing in allocations:
  +---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
  | created_at          | updated_at | id   | resource_provider_id | consumer_id                          | resource_class_id | used |
  +---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
  | 2020-01-13 11:02:51 | NULL       | 1727 |                    1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 |                 2 |    1 |
  | 2020-01-13 11:02:51 | NULL       | 1728 |                    1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 |                 1 |  512 |
  | 2020-01-13 11:02:51 | NULL       | 1729 |                    1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 |                 0 |    1 |
  | 2020-01-13 11:02:51 | NULL       | 1730 |                    1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda |                 2 |    1 |
  | 2020-01-13 11:02:51 | NULL       | 1731 |                    1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda |                 1 |  512 |
  .....

  instances are all stuck in BUILD at this stage

  3) delete instances:
  openstack server list | awk '/m1.tiny/ {print $2}' | xargs openstack server delete
  4) service rabbitmq-server start
  5) openstack server list 
      <display nothing>
  6)  mysql -e "select count(*) from placement.allocations"
  +----------+
  | count(*) |
  +----------+
  |      150 |
  +----------+
  Allocation remains
  7) nova-compute logs complaining that:
  Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.

  Expected result
  ===============
  placement allocation of instance have to be cleanup after deletion

  Actual result
  =============
  placement allocation of instance are leaked.

  
  Environment
  ===========
  At least stein to master seems impacted

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1859496/+subscriptions


Follow ups