yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80299
[Bug 1833581] Re: instance stuck in BUILD state if nova-compute is restarted
Reviewed: https://review.opendev.org/666857
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a1a735bc6efa40d8277c9fc5339f3b74f968b58e
Submitter: Zuul
Branch: master
commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e
Author: Balazs Gibizer <balazs.gibizer@xxxxxxxx>
Date: Fri Jun 21 16:48:14 2019 +0200
Error out interrupted builds
If the compute service is restarted while build requests are
executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
then those instances will be stuck forever in BUILDING state. If the instance
already finished instance_claim then instance.host is set and when the
compute restarts the instance is put to ERROR state.
This patch changes compute service startup to put instances into
ERROR state if they a) are in the BUILDING state, and b) have
allocations on the compute resource provider, but c) do not have
instance.host set to that compute.
Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
Closes-Bug: #1833581
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1833581
Title:
instance stuck in BUILD state if nova-compute is restarted
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
Confirmed
Status in OpenStack Compute (nova) rocky series:
Confirmed
Status in OpenStack Compute (nova) stein series:
Confirmed
Status in OpenStack Compute (nova) train series:
Confirmed
Bug description:
Description
===========
Instance stuck in BUILD state indefinitely if nova-compute service restarted in the mean time. Even after the instance_build_timeout the instance is not put into ERROR state.
Steps to reproduce
==================
1) Start 10 VMs in parallel to increase the chance of hitting the bug
$ for NUM in `seq 1 1 10`; do openstack server create --flavor c1
--image cirros-0.4.0-x86_64-disk --availability-zone nova:ubuntu
vm$NUM & done
2) when the first instance reach the BUILD state restart the nova-compute service
$ sudo systemctl restart devstack@n-cpu.service
3) Observer that instance states after the compute is up again.
Expected result
===============
Instances either in ACTIVE or in ERROR state.
Actual result
=============
Some instance stuck in BUILD state.
Environment
===========
all in one devstack build from recent nova master
61558f274842b149044a14bbe7537b9f278035fd
Logs & Configs
==============
stack@ubuntu:~$ openstack server list
+--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
| 9ee76601-4a61-4682-86f1-743dac2b05e6 | vm3 | BUILD | | cirros-0.4.0-x86_64-disk | cirros256 |
| e459beae-ccb5-4781-b938-2dff68e33bf7 | vm9 | ACTIVE | public=2001:db8::181, 172.24.4.44 | cirros-0.4.0-x86_64-disk | cirros256 |
| 562f44db-cd51-4516-bce9-598bd29c6310 | vm10 | ERROR | public=2001:db8::3a1, 172.24.4.196 | cirros-0.4.0-x86_64-disk | cirros256 |
| 73f1e2c6-78a1-44c5-b178-7adcf9bf58a0 | vm5 | ERROR | public=2001:db8::21, 172.24.4.177 | cirros-0.4.0-x86_64-disk | cirros256 |
| 1b01acfc-b798-48f9-b808-6cfd0d5cd3fb | vm6 | ERROR | public=2001:db8::3e1, 172.24.4.20 | cirros-0.4.0-x86_64-disk | cirros256 |
| c709e3bf-9c71-4f64-bad3-e9e07e911f62 | vm7 | ERROR | public=2001:db8::231, 172.24.4.46 | cirros-0.4.0-x86_64-disk | cirros256 |
| 538d2534-98f1-4e11-9bbb-b4e74bab8c65 | vm4 | ERROR | public=2001:db8::3e9, 172.24.4.157 | cirros-0.4.0-x86_64-disk | cirros256 |
| ed74eb32-00fe-4f24-9379-c57c04ce9af1 | vm2 | ERROR | public=2001:db8::f5, 172.24.4.53 | cirros-0.4.0-x86_64-disk | cirros256 |
| 582b5356-4f3d-42ed-937e-966580303af0 | vm8 | ERROR | public=2001:db8::92, 172.24.4.16 | cirros-0.4.0-x86_64-disk | cirros256 |
| ae36ffca-e4d6-4353-8e7e-41db500a5e0d | vm1 | ERROR | public=2001:db8::1cf, 172.24.4.203 | cirros-0.4.0-x86_64-disk | cirros256 |
+--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
stack@ubuntu:~$ openstack server show 9ee76601-4a61-4682-86f1-743dac2b05e6
+-------------------------------------+-----------------------------------------------------------------+
| Field | Value |
+-------------------------------------+-----------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | None |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None |
| OS-EXT-SRV-ATTR:instance_name | instance-0000004c |
| OS-EXT-STS:power_state | NOSTATE |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | None |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| config_drive | |
| created | 2019-06-19T02:30:16Z |
| flavor | cirros256 (c1) |
| hostId | |
| id | 9ee76601-4a61-4682-86f1-743dac2b05e6 |
| image | cirros-0.4.0-x86_64-disk (8b88f518-ab48-4859-8e8c-6988911ce9bd) |
| key_name | None |
| name | vm3 |
| progress | 0 |
| project_id | 2fc0b14ea1e041998f420ec85a89314d |
| properties | |
| status | BUILD |
| updated | 2019-06-19T02:30:18Z |
| user_id | 262d29f5f0c3445abbde89723b5f01ee |
| volumes_attached | |
+-------------------------------------+-----------------------------------------------------------------+
stack@ubuntu:~$
mysql> select uuid, host from instances where instances.uuid='9ee76601-4a61-4682-86f1-743dac2b05e6';
+--------------------------------------+------+
| uuid | host |
+--------------------------------------+------+
| 9ee76601-4a61-4682-86f1-743dac2b05e6 | NULL |
+--------------------------------------+------+
1 row in set (0.00 sec)
Logs for 9ee76601-4a61-4682-86f1-743dac2b05e6:
http://paste.openstack.org/show/753228/
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1833581/+subscriptions
References