← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1477261] Re: Juno Compute node unable to register hypervisor with Kilo Controller

 

*** This bug is a duplicate of bug 1431201 ***
    https://bugs.launchpad.net/bugs/1431201

** This bug has been marked a duplicate of bug 1431201
   kilo controller can't conduct juno compute nodes

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1477261

Title:
  Juno Compute node unable to register hypervisor  with Kilo Controller

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Controller Node:
  OS: openSUSE13.2
  python-nova-2015.1.1.dev62-1.1.noarch
  openstack-nova-conductor-2015.1.1.dev62-1.1.noarch
  openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch
  openstack-nova-cert-2015.1.1.dev62-1.1.noarch
  python-novaclient-2.23.0-2.4.noarch
  openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch
  openstack-nova-api-2015.1.1.dev62-1.1.noarch
  openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch
  openstack-nova-2015.1.1.dev62-1.1.noarch

  Compute Node:
  OS: openSUSE13.1
  openstack-nova-compute-2014.2.4.dev56-1.1.noarch
  python-novaclient-2.20.0-2.3.noarch
  python-nova-2014.2.4.dev56-1.1.noarch
  openstack-nova-2014.2.4.dev56-1.1.noarch

  During the installation of OpenStack using a Kilo Controller node,
  Kilo Network node and a Juno compute node, I found that the compute
  node was not registering the hypervisor with the controller. The
  hypervisor-list output was empty but the service-list output showed
  the compute node.  After tracking through the code I found the root of
  the issue:

  During nova-compute startup, I determined that the compute node will
  check to see if it has already registered with the controller by
  querying both the service and compute_nodes tables. I noticed that the
  _get_service call was returning an exception.

  Call flow on the compute node I was looking at:
  /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
      update_available_resource
          _update_available_resource
              _init_compute_node
                  _get_service  <----------------  NotFound exception caught here
                      self.conductor_api.service_get_by_compute_host(context,self.host) 
                          conductor/api.py:service_get_all_by

  
  Looking on the controller to determine the source of the exception I found where the request is handled:
      
  /usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by()
  In this function  the  topic coming in is 'compute' so it is assumed to be a request from a Juno compute node.   The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller.

  I was able to resolve this by catching the exception in
  service_get_all_by creating the expected field and defaulting it to
  None.

             if topic == 'compute':
                  result = self.db.service_get_by_compute_host(context, host)
                  # NOTE(sbauza): Only Juno computes are still calling this
                  # conductor method for getting service_get_by_compute_node,
                  # but expect a compute_node field so we can safely add it.
                  try:
                          result['compute_node'
                                 ] = objects.ComputeNodeList.get_all_by_host(
                                     context, result['host'])
                          # FIXME(comstud) Potentially remove this on bump to v3.0
                          result = [result]
                  except Exception:
                          result['compute_node'] = None
                          result = [result]

  Not sure if this is the correct fix or not but this unblocked me.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1477261/+subscriptions


References