← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2007922] [NEW] Cleanup pending instances in "building" state

 

Public bug reported:

Following up on the ML thread [1], it was recommended to create a bug report.
After a network issue in a Victoria cluster (3 control nodes in HA mode, 26 compute nodes) some instance builds were interrupted. Some of them could be cleaned up with 'openstack server delete' but two of them can not. They already have a mapping but can not be removed (or "reset-state") by nova. Those are both amphora instances from octavia:

control01:~ # openstack server list --project service -c ID -c Name -c Status -f value | grep BUILD
0453a7e5-e4f9-419b-ad71-d837a20ef6bb amphora-0ee32901-0c59-4752-8253-35b66da176ea BUILD
dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 BUILD

control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea
No server with a name or ID of
'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.

control01:~ # openstack server show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb
ERROR (CommandError): No server with a name or ID of
'0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists.

The database tables referring to the UUID
0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these:

nova_cell0/instance_id_mappings.ibd
nova_cell0/instance_info_caches.ibd
nova_cell0/instance_extra.ibd
nova_cell0/instances.ibd
nova_cell0/instance_system_metadata.ibd
octavia/amphora.ibd
nova_api/instance_mappings.ibd
nova_api/request_specs.ibd

I can provide both debug logs and database queries, just let me know
what exactly is required.

The storage back end is ceph (Pacific), we use neutron with OpenVSwitch,
the exact nova versions are:

control01:~ # rpm -qa | grep nova
openstack-nova-conductor-22.2.2~dev15-lp152.1.25.noarch
openstack-nova-api-22.2.2~dev15-lp152.1.25.noarch
openstack-nova-novncproxy-22.2.2~dev15-lp152.1.25.noarch
python3-novaclient-17.2.0-lp152.3.2.noarch
openstack-nova-scheduler-22.2.2~dev15-lp152.1.25.noarch
openstack-nova-22.2.2~dev15-lp152.1.25.noarch
python3-nova-22.2.2~dev15-lp152.1.25.noarch

[1] https://lists.openstack.org/pipermail/openstack-
discuss/2023-February/032308.html

** Affects: nova
     Importance: Undecided
         Status: New

** Description changed:

  Following up on the ML thread [1], it was recommended to create a bug report.
  After a network issue in a Victoria cluster (3 control nodes in HA mode, 26 compute nodes) some instance builds were interrupted. Some of them could be cleaned up with 'openstack server delete' but two of them can not. They already have a mapping but can not be removed (or "reset-state") by nova. Those are both amphora instances from octavia:
  
  control01:~ # openstack server list --project service -c ID -c Name -c Status -f value | grep BUILD
  0453a7e5-e4f9-419b-ad71-d837a20ef6bb amphora-0ee32901-0c59-4752-8253-35b66da176ea BUILD
  dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 BUILD
  
  control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea
- No server with a name or ID of  
+ No server with a name or ID of
  'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.
  
  control01:~ # openstack server show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb
- ERROR (CommandError): No server with a name or ID of  
+ ERROR (CommandError): No server with a name or ID of
  '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists.
  
- The database tables referring to the UUID  
+ The database tables referring to the UUID
  0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these:
  
  nova_cell0/instance_id_mappings.ibd
  nova_cell0/instance_info_caches.ibd
  nova_cell0/instance_extra.ibd
  nova_cell0/instances.ibd
  nova_cell0/instance_system_metadata.ibd
  octavia/amphora.ibd
  nova_api/instance_mappings.ibd
  nova_api/request_specs.ibd
  
  I can provide both debug logs and database queries, just let me know
  what exactly is required.
  
+ The storage back end is ceph (Pacific), we use neutron with OpenVSwitch,
+ the exact nova versions are:
+ 
+ control01:~ # rpm -qa | grep nova
+ openstack-nova-conductor-22.2.2~dev15-lp152.1.25.noarch
+ openstack-nova-api-22.2.2~dev15-lp152.1.25.noarch
+ openstack-nova-novncproxy-22.2.2~dev15-lp152.1.25.noarch
+ python3-novaclient-17.2.0-lp152.3.2.noarch
+ openstack-nova-scheduler-22.2.2~dev15-lp152.1.25.noarch
+ openstack-nova-22.2.2~dev15-lp152.1.25.noarch
+ python3-nova-22.2.2~dev15-lp152.1.25.noarch
+ 
  [1] https://lists.openstack.org/pipermail/openstack-
  discuss/2023-February/032308.html

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2007922

Title:
  Cleanup pending instances in "building" state

Status in OpenStack Compute (nova):
  New

Bug description:
  Following up on the ML thread [1], it was recommended to create a bug report.
  After a network issue in a Victoria cluster (3 control nodes in HA mode, 26 compute nodes) some instance builds were interrupted. Some of them could be cleaned up with 'openstack server delete' but two of them can not. They already have a mapping but can not be removed (or "reset-state") by nova. Those are both amphora instances from octavia:

  control01:~ # openstack server list --project service -c ID -c Name -c Status -f value | grep BUILD
  0453a7e5-e4f9-419b-ad71-d837a20ef6bb amphora-0ee32901-0c59-4752-8253-35b66da176ea BUILD
  dc8cdc3a-f6b2-469b-af6f-ba2aa130ea9b amphora-4990a47b-fe8a-431a-90ec-5ac2368a5251 BUILD

  control01:~ # openstack server delete amphora-0ee32901-0c59-4752-8253-35b66da176ea
  No server with a name or ID of
  'amphora-0ee32901-0c59-4752-8253-35b66da176ea' exists.

  control01:~ # openstack server show 0453a7e5-e4f9-419b-ad71-d837a20ef6bb
  ERROR (CommandError): No server with a name or ID of
  '0453a7e5-e4f9-419b-ad71-d837a20ef6bb' exists.

  The database tables referring to the UUID
  0453a7e5-e4f9-419b-ad71-d837a20ef6bb are these:

  nova_cell0/instance_id_mappings.ibd
  nova_cell0/instance_info_caches.ibd
  nova_cell0/instance_extra.ibd
  nova_cell0/instances.ibd
  nova_cell0/instance_system_metadata.ibd
  octavia/amphora.ibd
  nova_api/instance_mappings.ibd
  nova_api/request_specs.ibd

  I can provide both debug logs and database queries, just let me know
  what exactly is required.

  The storage back end is ceph (Pacific), we use neutron with
  OpenVSwitch, the exact nova versions are:

  control01:~ # rpm -qa | grep nova
  openstack-nova-conductor-22.2.2~dev15-lp152.1.25.noarch
  openstack-nova-api-22.2.2~dev15-lp152.1.25.noarch
  openstack-nova-novncproxy-22.2.2~dev15-lp152.1.25.noarch
  python3-novaclient-17.2.0-lp152.3.2.noarch
  openstack-nova-scheduler-22.2.2~dev15-lp152.1.25.noarch
  openstack-nova-22.2.2~dev15-lp152.1.25.noarch
  python3-nova-22.2.2~dev15-lp152.1.25.noarch

  [1] https://lists.openstack.org/pipermail/openstack-
  discuss/2023-February/032308.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2007922/+subscriptions



Follow ups