← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1874860] [NEW] After removing a compute (scalein) , openstack compute service list shows compute enabled / on , after user powers on removed compute

 

Public bug reported:

Description
===========

After removing a compute node , user powers ON the removed node via
ipmitool / console.

Executing 'openstack compute service list' while sourcing overcloudrc on
undercloud shows the removed compute as enabled / on , even though it
was removed via scaling procedure.

Steps to reproduce
==================
A chronological list of steps :

* Deploy a Rocky version , containerized , 3 controller 3 ovs-compute
overcloud.

* Scale-in ( remove ) ovs-compute-2 . (
https://access.redhat.com/documentation/en-
us/red_hat_openstack_platform/9/html/director_installation_and_usage
/sect-scaling_the_overcloud )

* observe output 'openstack compute service list --service nova-compute' node is removed.
[stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute

+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
|  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
| 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T16:57:44.000000 |
| 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T16:57:37.000000 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

* Power on the removed compute via IPMItool or console.
* observe output 'openstack compute service list --service nova-compute' removed node shows back in    list as State Up.

[stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
|  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
| 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:34.000000 |
| 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:27.000000 |
| 143 | nova-compute | overcloud-ovscompute-2.localdomain | nova  | enabled | up    | 2020-04-24T17:07:28.000000 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

Other observations:

Watching the nova-compute.log of the removed node - during power-up. ,
we see that it communicates with the nova hypervisor and reports itself
as available. This occurs as soon as the nova_compute container becomes
healthy after power on. Please see the screen shot attached. If there is
a missing step , such as disabling the nova_container on the removed
node , this should be specified. I would hope that deleting the nova
service via "nova service-delete [service-id]" from the procedure should
be enough, but it appears it is not.

Expected result
===============

There should be no trace of the removed compute.


We expect to see the following output
[stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
|  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
| 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:34.000000 |
| 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:27.000000 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

Actual result
=============

Instead, the actual result ( after user powers on the scaled-in node ) shows the compute!
[stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
|  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
| 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:34.000000 |
| 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:27.000000 |
| 143 | nova-compute | overcloud-ovscompute-2.localdomain | nova  | enabled | up    | 2020-04-24T17:07:28.000000 |
+-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

Environment
===========
Openstack Rocky
OS                CentOS Linux 7.7.1908

kernel                           4.18.0
qemu-kvm-ev                      2.12.0
libvirt                           4.5.0
openvswitch                      2.11.0
dpdk                            18.11.2

OpenStack Services
aodh                              7.0.1
barbican                          7.0.1
ceilometer                       11.0.2
cinder                           13.0.6
glance                           17.0.1
gnocchi                           4.3.3
heatclient                       1.16.2
heat-engine                      11.0.3
horizon                          14.0.3
keystone                         14.1.1
manila                            7.3.1
neutron                          13.0.4
nova                             18.2.1
panko                             5.0.1

pacemaker                        1.1.20
corosync                          2.4.3
haproxy                          1.5.18
mariadb                         10.1.20
rabbitmq-server                  3.6.16

Ceph Storage

ceph-base                       12.2.11
ceph-ansible                      3.1.6

Monitoring
elasticsearch                     6.2.1
kibana                            6.2.1
logstash                          6.2.4

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute nova rocky scale-in

** Attachment added: "Screen Shot 2020-04-24 at 10.39.37 AM.png"
   https://bugs.launchpad.net/bugs/1874860/+attachment/5359654/+files/Screen%20Shot%202020-04-24%20at%2010.39.37%20AM.png

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1874860

Title:
  After removing a compute (scalein) , openstack compute service list
  shows compute enabled / on , after user powers on removed compute

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  After removing a compute node , user powers ON the removed node via
  ipmitool / console.

  Executing 'openstack compute service list' while sourcing overcloudrc
  on undercloud shows the removed compute as enabled / on , even though
  it was removed via scaling procedure.

  Steps to reproduce
  ==================
  A chronological list of steps :

  * Deploy a Rocky version , containerized , 3 controller 3 ovs-compute
  overcloud.

  * Scale-in ( remove ) ovs-compute-2 . (
  https://access.redhat.com/documentation/en-
  us/red_hat_openstack_platform/9/html/director_installation_and_usage
  /sect-scaling_the_overcloud )

  * observe output 'openstack compute service list --service nova-compute' node is removed.
  [stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute

  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  |  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  | 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T16:57:44.000000 |
  | 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T16:57:37.000000 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

  * Power on the removed compute via IPMItool or console.
  * observe output 'openstack compute service list --service nova-compute' removed node shows back in    list as State Up.

  [stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  |  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  | 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:34.000000 |
  | 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:27.000000 |
  | 143 | nova-compute | overcloud-ovscompute-2.localdomain | nova  | enabled | up    | 2020-04-24T17:07:28.000000 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

  Other observations:

  Watching the nova-compute.log of the removed node - during power-up. ,
  we see that it communicates with the nova hypervisor and reports
  itself as available. This occurs as soon as the nova_compute container
  becomes healthy after power on. Please see the screen shot attached.
  If there is a missing step , such as disabling the nova_container on
  the removed node , this should be specified. I would hope that
  deleting the nova service via "nova service-delete [service-id]" from
  the procedure should be enough, but it appears it is not.

  Expected result
  ===============

  There should be no trace of the removed compute.

  
  We expect to see the following output
  [stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  |  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  | 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:34.000000 |
  | 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:27.000000 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

  Actual result
  =============

  Instead, the actual result ( after user powers on the scaled-in node ) shows the compute!
  [stack@undercloud (overcloudrc) ~]$ openstack compute service list --service nova-compute
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  |  ID | Binary       | Host                               | Zone  | Status  | State | Updated At                 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+
  | 113 | nova-compute | overcloud-ovscompute-0.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:34.000000 |
  | 116 | nova-compute | overcloud-ovscompute-1.localdomain | zone1 | enabled | up    | 2020-04-24T17:07:27.000000 |
  | 143 | nova-compute | overcloud-ovscompute-2.localdomain | nova  | enabled | up    | 2020-04-24T17:07:28.000000 |
  +-----+--------------+------------------------------------+-------+---------+-------+----------------------------+

  Environment
  ===========
  Openstack Rocky
  OS                CentOS Linux 7.7.1908

  kernel                           4.18.0
  qemu-kvm-ev                      2.12.0
  libvirt                           4.5.0
  openvswitch                      2.11.0
  dpdk                            18.11.2

  OpenStack Services
  aodh                              7.0.1
  barbican                          7.0.1
  ceilometer                       11.0.2
  cinder                           13.0.6
  glance                           17.0.1
  gnocchi                           4.3.3
  heatclient                       1.16.2
  heat-engine                      11.0.3
  horizon                          14.0.3
  keystone                         14.1.1
  manila                            7.3.1
  neutron                          13.0.4
  nova                             18.2.1
  panko                             5.0.1

  pacemaker                        1.1.20
  corosync                          2.4.3
  haproxy                          1.5.18
  mariadb                         10.1.20
  rabbitmq-server                  3.6.16

  Ceph Storage

  ceph-base                       12.2.11
  ceph-ansible                      3.1.6

  Monitoring
  elasticsearch                     6.2.1
  kibana                            6.2.1
  logstash                          6.2.4

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1874860/+subscriptions


Follow ups