← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1833581] Re: instance stuck in BUILD state if nova-compute is restarted

 

Reviewed:  https://review.opendev.org/666857
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a1a735bc6efa40d8277c9fc5339f3b74f968b58e
Submitter: Zuul
Branch:    master

commit a1a735bc6efa40d8277c9fc5339f3b74f968b58e
Author: Balazs Gibizer <balazs.gibizer@xxxxxxxx>
Date:   Fri Jun 21 16:48:14 2019 +0200

    Error out interrupted builds
    
    If the compute service is restarted while build requests are
    executing the instance_claim or waiting for the COMPUTE_RESOURCE_SEMAPHORE
    then those instances will be stuck forever in BUILDING state. If the instance
    already finished instance_claim then instance.host is set and when the
    compute restarts the instance is put to ERROR state.
    
    This patch changes compute service startup to put instances into
    ERROR state if they a) are in the BUILDING state, and b) have
    allocations on the compute resource provider, but c) do not have
    instance.host set to that compute.
    
    Change-Id: I856a3032c83fc2f605d8c9b6e5aa3bcfa415f96a
    Closes-Bug: #1833581


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1833581

Title:
  instance stuck in BUILD state if nova-compute is restarted

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) queens series:
  Confirmed
Status in OpenStack Compute (nova) rocky series:
  Confirmed
Status in OpenStack Compute (nova) stein series:
  Confirmed
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Description
  ===========
  Instance stuck in BUILD state indefinitely if nova-compute service restarted in the mean time. Even after the instance_build_timeout the instance is not put into ERROR state.

  Steps to reproduce
  ==================

  1) Start 10 VMs in parallel to increase the chance of hitting the bug

  $ for NUM in `seq 1 1 10`; do openstack  server create --flavor c1
  --image cirros-0.4.0-x86_64-disk --availability-zone nova:ubuntu
  vm$NUM &  done

  2) when the first instance reach the BUILD state restart the nova-compute service
  $ sudo systemctl restart devstack@n-cpu.service

  3) Observer that instance states after the compute is up again.

  Expected result
  ===============

  Instances either in ACTIVE or in ERROR state.

  Actual result
  =============
  Some instance stuck in BUILD state.

  
  Environment
  ===========

  all in one devstack build from recent nova master
  61558f274842b149044a14bbe7537b9f278035fd

  
  Logs & Configs
  ==============

  stack@ubuntu:~$ openstack server list
  +--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
  | ID                                   | Name | Status | Networks                           | Image                    | Flavor    |
  +--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+
  | 9ee76601-4a61-4682-86f1-743dac2b05e6 | vm3  | BUILD  |                                    | cirros-0.4.0-x86_64-disk | cirros256 |
  | e459beae-ccb5-4781-b938-2dff68e33bf7 | vm9  | ACTIVE | public=2001:db8::181, 172.24.4.44  | cirros-0.4.0-x86_64-disk | cirros256 |
  | 562f44db-cd51-4516-bce9-598bd29c6310 | vm10 | ERROR  | public=2001:db8::3a1, 172.24.4.196 | cirros-0.4.0-x86_64-disk | cirros256 |
  | 73f1e2c6-78a1-44c5-b178-7adcf9bf58a0 | vm5  | ERROR  | public=2001:db8::21, 172.24.4.177  | cirros-0.4.0-x86_64-disk | cirros256 |
  | 1b01acfc-b798-48f9-b808-6cfd0d5cd3fb | vm6  | ERROR  | public=2001:db8::3e1, 172.24.4.20  | cirros-0.4.0-x86_64-disk | cirros256 |
  | c709e3bf-9c71-4f64-bad3-e9e07e911f62 | vm7  | ERROR  | public=2001:db8::231, 172.24.4.46  | cirros-0.4.0-x86_64-disk | cirros256 |
  | 538d2534-98f1-4e11-9bbb-b4e74bab8c65 | vm4  | ERROR  | public=2001:db8::3e9, 172.24.4.157 | cirros-0.4.0-x86_64-disk | cirros256 |
  | ed74eb32-00fe-4f24-9379-c57c04ce9af1 | vm2  | ERROR  | public=2001:db8::f5, 172.24.4.53   | cirros-0.4.0-x86_64-disk | cirros256 |
  | 582b5356-4f3d-42ed-937e-966580303af0 | vm8  | ERROR  | public=2001:db8::92, 172.24.4.16   | cirros-0.4.0-x86_64-disk | cirros256 |
  | ae36ffca-e4d6-4353-8e7e-41db500a5e0d | vm1  | ERROR  | public=2001:db8::1cf, 172.24.4.203 | cirros-0.4.0-x86_64-disk | cirros256 |
  +--------------------------------------+------+--------+------------------------------------+--------------------------+-----------+

  
  stack@ubuntu:~$ openstack server show 9ee76601-4a61-4682-86f1-743dac2b05e6
  +-------------------------------------+-----------------------------------------------------------------+
  | Field                               | Value                                                           |
  +-------------------------------------+-----------------------------------------------------------------+
  | OS-DCF:diskConfig                   | MANUAL                                                          |
  | OS-EXT-AZ:availability_zone         | nova                                                            |
  | OS-EXT-SRV-ATTR:host                | None                                                            |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | None                                                            |
  | OS-EXT-SRV-ATTR:instance_name       | instance-0000004c                                               |
  | OS-EXT-STS:power_state              | NOSTATE                                                         |
  | OS-EXT-STS:task_state               | None                                                            |
  | OS-EXT-STS:vm_state                 | building                                                        |
  | OS-SRV-USG:launched_at              | None                                                            |
  | OS-SRV-USG:terminated_at            | None                                                            |
  | accessIPv4                          |                                                                 |
  | accessIPv6                          |                                                                 |
  | addresses                           |                                                                 |
  | config_drive                        |                                                                 |
  | created                             | 2019-06-19T02:30:16Z                                            |
  | flavor                              | cirros256 (c1)                                                  |
  | hostId                              |                                                                 |
  | id                                  | 9ee76601-4a61-4682-86f1-743dac2b05e6                            |
  | image                               | cirros-0.4.0-x86_64-disk (8b88f518-ab48-4859-8e8c-6988911ce9bd) |
  | key_name                            | None                                                            |
  | name                                | vm3                                                             |
  | progress                            | 0                                                               |
  | project_id                          | 2fc0b14ea1e041998f420ec85a89314d                                |
  | properties                          |                                                                 |
  | status                              | BUILD                                                           |
  | updated                             | 2019-06-19T02:30:18Z                                            |
  | user_id                             | 262d29f5f0c3445abbde89723b5f01ee                                |
  | volumes_attached                    |                                                                 |
  +-------------------------------------+-----------------------------------------------------------------+
  stack@ubuntu:~$ 

  
  mysql> select uuid, host from instances where instances.uuid='9ee76601-4a61-4682-86f1-743dac2b05e6';
  +--------------------------------------+------+
  | uuid                                 | host |
  +--------------------------------------+------+
  | 9ee76601-4a61-4682-86f1-743dac2b05e6 | NULL |
  +--------------------------------------+------+
  1 row in set (0.00 sec)

  Logs for 9ee76601-4a61-4682-86f1-743dac2b05e6:
  http://paste.openstack.org/show/753228/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1833581/+subscriptions


References