yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1623473] [NEW] Overwrite node field by wrong value after ironic instance rebuild

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Tomasz Czekajło <coldgunpl@xxxxxxxxx>
Date: Wed, 14 Sep 2016 12:15:50 -0000
Reply-to: Bug 1623473 <1623473@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

Hi,

When I rebuild ironic instance via nova, after the first rebuild the
node for the instance's overwritten by wrong value, thus next rebuild is
not possible.

Steps to reproduce
==================
1. Spawn new ironic instance
2. Rebuild the instance
After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova.

3. Second rebuild and we can see error as below.

http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/

Environment
===========
Mitaka release and Ubuntu 16

My workaround
=============
After debugging I've found where is bug(?).

https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795

2795:                compute_node = self._get_compute_info(context, self.host)
2796:                scheduled_node = compute_node.hypervisor_hostname

[...]

5118:    def _get_compute_info(self, context, host):
5119:        return objects.ComputeNode.get_first_node_by_host_for_old_compat(
5120:            context, host)

OK, let's dive deep

https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274

274:    def get_first_node_by_host_for_old_compat(cls, context, host,
275:                                              use_slave=False):
276:        computes = ComputeNodeList.get_all_by_host(context, host, use_slave)
277:        # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple
278:        # nodes per host, we should return all the nodes and modify the callers
279:        # instead.
280:        # Arbitrarily returning the first node.
281:        return computes[0]

It's looks the method return the first node for the given host. In case
when we've hypervisor for ironic there is multiple nodes and the first
node which is return is random.

My workaround, nothing sophisticated but works for me:

--- manager.py_org	2016-09-14 13:50:37.807379651 +0200
+++ manager.py	2016-09-14 13:51:40.275126034 +0200
@@ -2793,7 +2793,11 @@
         if not scheduled_node:
             try:
                 compute_node = self._get_compute_info(context, self.host)
-                scheduled_node = compute_node.hypervisor_hostname
+                #workaround for ironic
+                if compute_node.hypervisor_type == 'ironic':
+                    scheduled_node = instance.node
+                else:
+                    scheduled_node = compute_node.hypervisor_hostname
             except exception.ComputeHostNotFound:
                 LOG.exception(_LE('Failed to get compute_info for %s'),
                                 self.host)

I've tested this issue on Mitaka release, but it seems the code is the
same in master branch.

That's all.
Regards

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: ironic rebuild

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1623473

Title:
  Overwrite node field by wrong value after ironic instance rebuild

Status in OpenStack Compute (nova):
  New

Bug description:
  Hi,

  When I rebuild ironic instance via nova, after the first rebuild the
  node for the instance's overwritten by wrong value, thus next rebuild
  is not possible.

  Steps to reproduce
  ==================
  1. Spawn new ironic instance
  2. Rebuild the instance
  After this step you can see that hypervisor_hostname for the instance is totally different than before. (I use "nova show uuid" command to display information). When you display information for instance in ironic (ironic node-show --instance uuid) you can see that UUID of node is different than node in nova.

  3. Second rebuild and we can see error as below.

  http://paste.openstack.org/show/irCzuu5qucX6kF44X6oe/

  Environment
  ===========
  Mitaka release and Ubuntu 16

  My workaround
  =============
  After debugging I've found where is bug(?).

  https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L2795

  2795:                compute_node = self._get_compute_info(context, self.host)
  2796:                scheduled_node = compute_node.hypervisor_hostname

  [...]

  5118:    def _get_compute_info(self, context, host):
  5119:        return objects.ComputeNode.get_first_node_by_host_for_old_compat(
  5120:            context, host)

  OK, let's dive deep

  https://github.com/openstack/nova/blob/stable/mitaka/nova/objects/compute_node.py#L274

  274:    def get_first_node_by_host_for_old_compat(cls, context, host,
  275:                                              use_slave=False):
  276:        computes = ComputeNodeList.get_all_by_host(context, host, use_slave)
  277:        # FIXME(sbauza): Some hypervisors (VMware, Ironic) can return multiple
  278:        # nodes per host, we should return all the nodes and modify the callers
  279:        # instead.
  280:        # Arbitrarily returning the first node.
  281:        return computes[0]

  It's looks the method return the first node for the given host. In
  case when we've hypervisor for ironic there is multiple nodes and the
  first node which is return is random.

  My workaround, nothing sophisticated but works for me:

  --- manager.py_org	2016-09-14 13:50:37.807379651 +0200
  +++ manager.py	2016-09-14 13:51:40.275126034 +0200
  @@ -2793,7 +2793,11 @@
           if not scheduled_node:
               try:
                   compute_node = self._get_compute_info(context, self.host)
  -                scheduled_node = compute_node.hypervisor_hostname
  +                #workaround for ironic
  +                if compute_node.hypervisor_type == 'ironic':
  +                    scheduled_node = instance.node
  +                else:
  +                    scheduled_node = compute_node.hypervisor_hostname
               except exception.ComputeHostNotFound:
                   LOG.exception(_LE('Failed to get compute_info for %s'),
                                   self.host)

  I've tested this issue on Mitaka release, but it seems the code is the
  same in master branch.

  That's all.
  Regards

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1623473/+subscriptions