yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81291
[Bug 1859496] [NEW] Deleting stuck build instance may leak allocations
Public bug reported:
Description
===========
After issues in control plane during instance creation,
Instance may stay stuck in BUILD state.
Even after deleting them, placement allocation may remain,
and compute host log is complaining that:
Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.
Steps to reproduce
==================
On a fresh devstack master install
1) open a terminal that display entry in placement.allocations and nova_cell1.instances all seconds:
while true ; do date ; mysql -e "select * from placement.allocations" ; mysql -e "select * from nova_cell1.instances where deleted=0" ;sleep 1 ; done
2) Trigguer a spawn of 50 instances & kill rabbit after 5sec to simulate issue on control plane:
openstack server create --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic net-id=private alex --min 50 --max 50 & sleep 5 ; sudo pkill rabbitmq-server
Note: To reach the bug, goal is to get instances Allocated by
scheduler, but not let the time to conductor to create entry in
nova_cell1.instances
You should see allocations appearing in allocations:
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
| created_at | updated_at | id | resource_provider_id | consumer_id | resource_class_id | used |
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
| 2020-01-13 11:02:51 | NULL | 1727 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 2 | 1 |
| 2020-01-13 11:02:51 | NULL | 1728 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 1 | 512 |
| 2020-01-13 11:02:51 | NULL | 1729 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 0 | 1 |
| 2020-01-13 11:02:51 | NULL | 1730 | 1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda | 2 | 1 |
| 2020-01-13 11:02:51 | NULL | 1731 | 1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda | 1 | 512 |
.....
instances are all stuck in BUILD at this stage
3) delete instances:
openstack server list | awk '/m1.tiny/ {print $2}' | xargs openstack server delete
4) service rabbitmq-server start
5) openstack server list
<display nothing>
6) mysql -e "select count(*) from placement.allocations"
+----------+
| count(*) |
+----------+
| 150 |
+----------+
Allocation remains
7) nova-compute logs complaining that:
Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.
Expected result
===============
placement allocation of instance have to be cleanup after deletion
Actual result
=============
placement allocation of instance are leaked.
Environment
===========
At least stein to master seems impacted
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1859496
Title:
Deleting stuck build instance may leak allocations
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
After issues in control plane during instance creation,
Instance may stay stuck in BUILD state.
Even after deleting them, placement allocation may remain,
and compute host log is complaining that:
Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.
Steps to reproduce
==================
On a fresh devstack master install
1) open a terminal that display entry in placement.allocations and nova_cell1.instances all seconds:
while true ; do date ; mysql -e "select * from placement.allocations" ; mysql -e "select * from nova_cell1.instances where deleted=0" ;sleep 1 ; done
2) Trigguer a spawn of 50 instances & kill rabbit after 5sec to simulate issue on control plane:
openstack server create --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --nic net-id=private alex --min 50 --max 50 & sleep 5 ; sudo pkill rabbitmq-server
Note: To reach the bug, goal is to get instances Allocated by
scheduler, but not let the time to conductor to create entry in
nova_cell1.instances
You should see allocations appearing in allocations:
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
| created_at | updated_at | id | resource_provider_id | consumer_id | resource_class_id | used |
+---------------------+------------+------+----------------------+--------------------------------------+-------------------+------+
| 2020-01-13 11:02:51 | NULL | 1727 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 2 | 1 |
| 2020-01-13 11:02:51 | NULL | 1728 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 1 | 512 |
| 2020-01-13 11:02:51 | NULL | 1729 | 1 | 8d0a42fe-922b-4c08-afe3-65d65893d355 | 0 | 1 |
| 2020-01-13 11:02:51 | NULL | 1730 | 1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda | 2 | 1 |
| 2020-01-13 11:02:51 | NULL | 1731 | 1 | 3cd1b8be-6997-452e-86e0-5013c9ab6bda | 1 | 512 |
.....
instances are all stuck in BUILD at this stage
3) delete instances:
openstack server list | awk '/m1.tiny/ {print $2}' | xargs openstack server delete
4) service rabbitmq-server start
5) openstack server list
<display nothing>
6) mysql -e "select count(*) from placement.allocations"
+----------+
| count(*) |
+----------+
| 150 |
+----------+
Allocation remains
7) nova-compute logs complaining that:
Instance eba20a0f-5856-4600-bcaa-7b758d04b5c5 has allocations against this compute host but is not found in the database.
Expected result
===============
placement allocation of instance have to be cleanup after deletion
Actual result
=============
placement allocation of instance are leaked.
Environment
===========
At least stein to master seems impacted
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1859496/+subscriptions
Follow ups