← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1825876] [NEW] Ironic hypervisor disappears once hashring got rebuilt

 

Public bug reported:

Steps to reproduce
==================
Precondition: Need fresh openstack deployment. Database tables nova.compute_nodes and nova_api.host_mappings must be empty. In other words baremetal nodes were not added to ironic database yet.
It HA deployment. Need to have at least two ironic-conductors running on different servers.

Steps:
1. Create baremetal node . "openstack baremetal node create ..."
2. Change node's state to manageable
3. After sometime "nova hypervisor-list" should list a hypervisor with same UUID as the baremetal node.
3.1 Database should like below
MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
+--------------------------------------+-------------+--------+
| uuid                                 | host        | mapped |
+--------------------------------------+-------------+--------+
| d394aa91-3544-417c-acab-916a22e5a5b5 | ironic.aio1 |      1 |
+--------------------------------------+-------------+--------+
MariaDB [(none)]> select * from nova_api.host_mappings;
+---------------------+------------+----+---------+-------------+
| created_at          | updated_at | id | cell_id | host        |
+---------------------+------------+----+---------+-------------+
| 2019-04-22 09:14:23 | NULL       | 22 |       7 | ironic.aio1 |
+---------------------+------------+----+---------+-------------+

4. Call "nova hypervisor-show <hypervisor UUID>" in order to find out server where ironic-conductor is running. Log into that server and stop ironic-conductor. Need to force hashring to rebuild it's state. Wait for about five minutes.
5. Check output of "nova hypervisor-list". The hypervisor is absent.

Result
==================
Look inside database (see below). ironic.aio3 took the baremetal thus node nova changed 'host' field of compute (d394aa91-3544-417c-acab-916a22e5a5b5) to 'ironic.aio3'. 
Because of mapped = 1 'nova-manage cell_v2 discover_hosts' (run preiodically https://bugs.launchpad.net/nova/+bug/1715646) does not try to create host mapping. 

MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
+--------------------------------------+-------------+--------+
| uuid                                 | host        | mapped |
+--------------------------------------+-------------+--------+
| d394aa91-3544-417c-acab-916a22e5a5b5 | ironic.aio3 |      1 |
+--------------------------------------+-------------+--------+
MariaDB [(none)]> select * from nova_api.host_mappings;
+---------------------+------------+----+---------+-------------+
| created_at          | updated_at | id | cell_id | host        |
+---------------------+------------+----+---------+-------------+
| 2019-04-22 09:14:23 | NULL       | 22 |       7 | ironic.aio1 |
+---------------------+------------+----+---------+-------------+

2019-04-22 19:54:00.813 8 WARNING nova.compute.resource_tracker [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] No compute node record for ironic.aio3:5f9c2619-30bb-40d2-8b62-8923f04d90f2: ComputeHostNotFound_Remote: Compute host ironic.aio3 could not be found.
2019-04-22 19:54:00.831 8 INFO nova.compute.resource_tracker [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] ComputeNode 5f9c2619-30bb-40d2-8b62-8923f04d90f2 moving from ironic.aio1 to ironic.aio3
2019-04-22 19:54:00.891 8 DEBUG nova.virt.ironic.driver [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] Using cache for node 5f9c2619-30bb-40d2-8b62-8923f04d90f2, age: 0.0979330539703 _node_from_cache /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:860

Missing record in host_mappings table causes nova to print "Unable to find service" DEBUG message (see below). The compute become 'invisible'. 
See source code nova/api/openstack/compute/hypervisors.py:HypervisorsController._get_hypervisors

108     def _get_hypervisors(self, req, detail=False, limit=None, marker=None,
109                          links=False):
110         """Get hypervisors for the given request.
111 
112         :param req: nova.api.openstack.wsgi.Request for the GET request
...
161         hypervisors_list = []
162         for hyp in compute_nodes:
163             try:
164                 instances = None
165                 if with_servers:
166                     instances = self.host_api.instance_get_all_by_host(
167                         context, hyp.host)
168                 service = self.host_api.service_get_by_compute_host(
169                     context, hyp.host)
170                 hypervisors_list.append(
171                     self._view_hypervisor(
172                         hyp, service, detail, req, servers=instances))
173             except (exception.ComputeHostNotFound,
174                     exception.HostMappingNotFound):
175                 # The compute service could be deleted which doesn't delete
176                 # the compute node record, that has to be manually removed
177                 # from the database so we just ignore it when listing nodes.
178                 LOG.debug('Unable to find service for compute node %s. The '
179                           'service may be deleted and compute nodes need to '
180                           'be manually cleaned up.', hyp.host)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1825876

Title:
  Ironic hypervisor disappears once hashring got rebuilt

Status in OpenStack Compute (nova):
  New

Bug description:
  Steps to reproduce
  ==================
  Precondition: Need fresh openstack deployment. Database tables nova.compute_nodes and nova_api.host_mappings must be empty. In other words baremetal nodes were not added to ironic database yet.
  It HA deployment. Need to have at least two ironic-conductors running on different servers.

  Steps:
  1. Create baremetal node . "openstack baremetal node create ..."
  2. Change node's state to manageable
  3. After sometime "nova hypervisor-list" should list a hypervisor with same UUID as the baremetal node.
  3.1 Database should like below
  MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
  +--------------------------------------+-------------+--------+
  | uuid                                 | host        | mapped |
  +--------------------------------------+-------------+--------+
  | d394aa91-3544-417c-acab-916a22e5a5b5 | ironic.aio1 |      1 |
  +--------------------------------------+-------------+--------+
  MariaDB [(none)]> select * from nova_api.host_mappings;
  +---------------------+------------+----+---------+-------------+
  | created_at          | updated_at | id | cell_id | host        |
  +---------------------+------------+----+---------+-------------+
  | 2019-04-22 09:14:23 | NULL       | 22 |       7 | ironic.aio1 |
  +---------------------+------------+----+---------+-------------+

  4. Call "nova hypervisor-show <hypervisor UUID>" in order to find out server where ironic-conductor is running. Log into that server and stop ironic-conductor. Need to force hashring to rebuild it's state. Wait for about five minutes.
  5. Check output of "nova hypervisor-list". The hypervisor is absent.

  Result
  ==================
  Look inside database (see below). ironic.aio3 took the baremetal thus node nova changed 'host' field of compute (d394aa91-3544-417c-acab-916a22e5a5b5) to 'ironic.aio3'. 
  Because of mapped = 1 'nova-manage cell_v2 discover_hosts' (run preiodically https://bugs.launchpad.net/nova/+bug/1715646) does not try to create host mapping. 

  MariaDB [(none)]> select uuid, host, mapped from nova.compute_nodes;
  +--------------------------------------+-------------+--------+
  | uuid                                 | host        | mapped |
  +--------------------------------------+-------------+--------+
  | d394aa91-3544-417c-acab-916a22e5a5b5 | ironic.aio3 |      1 |
  +--------------------------------------+-------------+--------+
  MariaDB [(none)]> select * from nova_api.host_mappings;
  +---------------------+------------+----+---------+-------------+
  | created_at          | updated_at | id | cell_id | host        |
  +---------------------+------------+----+---------+-------------+
  | 2019-04-22 09:14:23 | NULL       | 22 |       7 | ironic.aio1 |
  +---------------------+------------+----+---------+-------------+

  2019-04-22 19:54:00.813 8 WARNING nova.compute.resource_tracker [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] No compute node record for ironic.aio3:5f9c2619-30bb-40d2-8b62-8923f04d90f2: ComputeHostNotFound_Remote: Compute host ironic.aio3 could not be found.
  2019-04-22 19:54:00.831 8 INFO nova.compute.resource_tracker [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] ComputeNode 5f9c2619-30bb-40d2-8b62-8923f04d90f2 moving from ironic.aio1 to ironic.aio3
  2019-04-22 19:54:00.891 8 DEBUG nova.virt.ironic.driver [req-1ded2c35-d0e4-4719-a15d-3a83594bab1c - - - - -] Using cache for node 5f9c2619-30bb-40d2-8b62-8923f04d90f2, age: 0.0979330539703 _node_from_cache /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:860

  Missing record in host_mappings table causes nova to print "Unable to find service" DEBUG message (see below). The compute become 'invisible'. 
  See source code nova/api/openstack/compute/hypervisors.py:HypervisorsController._get_hypervisors

  108     def _get_hypervisors(self, req, detail=False, limit=None, marker=None,
  109                          links=False):
  110         """Get hypervisors for the given request.
  111 
  112         :param req: nova.api.openstack.wsgi.Request for the GET request
  ...
  161         hypervisors_list = []
  162         for hyp in compute_nodes:
  163             try:
  164                 instances = None
  165                 if with_servers:
  166                     instances = self.host_api.instance_get_all_by_host(
  167                         context, hyp.host)
  168                 service = self.host_api.service_get_by_compute_host(
  169                     context, hyp.host)
  170                 hypervisors_list.append(
  171                     self._view_hypervisor(
  172                         hyp, service, detail, req, servers=instances))
  173             except (exception.ComputeHostNotFound,
  174                     exception.HostMappingNotFound):
  175                 # The compute service could be deleted which doesn't delete
  176                 # the compute node record, that has to be manually removed
  177                 # from the database so we just ignore it when listing nodes.
  178                 LOG.debug('Unable to find service for compute node %s. The '
  179                           'service may be deleted and compute nodes need to '
  180                           'be manually cleaned up.', hyp.host)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1825876/+subscriptions