← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1477261] [NEW] Juno Compute node unable to register hypervisor with Kilo Controller

 

Public bug reported:

Controller Node:
OS: openSUSE13.2
python-nova-2015.1.1.dev62-1.1.noarch
openstack-nova-conductor-2015.1.1.dev62-1.1.noarch
openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch
openstack-nova-cert-2015.1.1.dev62-1.1.noarch
python-novaclient-2.23.0-2.4.noarch
openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch
openstack-nova-api-2015.1.1.dev62-1.1.noarch
openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch
openstack-nova-2015.1.1.dev62-1.1.noarch

Compute Node:
OS: openSUSE13.1
openstack-nova-compute-2014.2.4.dev56-1.1.noarch
python-novaclient-2.20.0-2.3.noarch
python-nova-2014.2.4.dev56-1.1.noarch
openstack-nova-2014.2.4.dev56-1.1.noarch

During the installation of OpenStack using a Kilo Controller node,  Kilo
Network node and a Juno compute node, I found that the compute node was
not registering the hypervisor with the controller. The hypervisor-list
output was empty but the service-list output showed the compute node.
After tracking through the code I found the root of the issue:

During nova-compute startup, I determined that the compute node will
check to see if it has already registered with the controller by
querying both the service and compute_nodes tables. I noticed that the
_get_service call was returning an exception.

Call flow on the compute node I was looking at:
/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
    update_available_resource
        _update_available_resource
            _init_compute_node
                _get_service  <----------------  NotFound exception caught here
                    self.conductor_api.service_get_by_compute_host(context,self.host) 
                        conductor/api.py:service_get_all_by


Looking on the controller to determine the source of the exception I found where the request is handled:
    
/usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by()
In this function  the  topic coming in is 'compute' so it is assumed to be a request from a Juno compute node.   The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller.

I was able to resolve this by catching the exception in
service_get_all_by creating the expected field and defaulting it to
None.

           if topic == 'compute':
                result = self.db.service_get_by_compute_host(context, host)
                # NOTE(sbauza): Only Juno computes are still calling this
                # conductor method for getting service_get_by_compute_node,
                # but expect a compute_node field so we can safely add it.
                try:
                        result['compute_node'
                               ] = objects.ComputeNodeList.get_all_by_host(
                                   context, result['host'])
                        # FIXME(comstud) Potentially remove this on bump to v3.0
                        result = [result]
                except Exception:
                        result['compute_node'] = None
                        result = [result]

Not sure if this is the correct fix or not but this unblocked me.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: nova-compute nova-controller

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1477261

Title:
  Juno Compute node unable to register hypervisor  with Kilo Controller

Status in OpenStack Compute (nova):
  New

Bug description:
  Controller Node:
  OS: openSUSE13.2
  python-nova-2015.1.1.dev62-1.1.noarch
  openstack-nova-conductor-2015.1.1.dev62-1.1.noarch
  openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch
  openstack-nova-cert-2015.1.1.dev62-1.1.noarch
  python-novaclient-2.23.0-2.4.noarch
  openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch
  openstack-nova-api-2015.1.1.dev62-1.1.noarch
  openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch
  openstack-nova-2015.1.1.dev62-1.1.noarch

  Compute Node:
  OS: openSUSE13.1
  openstack-nova-compute-2014.2.4.dev56-1.1.noarch
  python-novaclient-2.20.0-2.3.noarch
  python-nova-2014.2.4.dev56-1.1.noarch
  openstack-nova-2014.2.4.dev56-1.1.noarch

  During the installation of OpenStack using a Kilo Controller node,
  Kilo Network node and a Juno compute node, I found that the compute
  node was not registering the hypervisor with the controller. The
  hypervisor-list output was empty but the service-list output showed
  the compute node.  After tracking through the code I found the root of
  the issue:

  During nova-compute startup, I determined that the compute node will
  check to see if it has already registered with the controller by
  querying both the service and compute_nodes tables. I noticed that the
  _get_service call was returning an exception.

  Call flow on the compute node I was looking at:
  /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
      update_available_resource
          _update_available_resource
              _init_compute_node
                  _get_service  <----------------  NotFound exception caught here
                      self.conductor_api.service_get_by_compute_host(context,self.host) 
                          conductor/api.py:service_get_all_by

  
  Looking on the controller to determine the source of the exception I found where the request is handled:
      
  /usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by()
  In this function  the  topic coming in is 'compute' so it is assumed to be a request from a Juno compute node.   The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller.

  I was able to resolve this by catching the exception in
  service_get_all_by creating the expected field and defaulting it to
  None.

             if topic == 'compute':
                  result = self.db.service_get_by_compute_host(context, host)
                  # NOTE(sbauza): Only Juno computes are still calling this
                  # conductor method for getting service_get_by_compute_node,
                  # but expect a compute_node field so we can safely add it.
                  try:
                          result['compute_node'
                                 ] = objects.ComputeNodeList.get_all_by_host(
                                     context, result['host'])
                          # FIXME(comstud) Potentially remove this on bump to v3.0
                          result = [result]
                  except Exception:
                          result['compute_node'] = None
                          result = [result]

  Not sure if this is the correct fix or not but this unblocked me.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1477261/+subscriptions


Follow ups