yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #35579
[Bug 1477261] [NEW] Juno Compute node unable to register hypervisor with Kilo Controller
Public bug reported:
Controller Node:
OS: openSUSE13.2
python-nova-2015.1.1.dev62-1.1.noarch
openstack-nova-conductor-2015.1.1.dev62-1.1.noarch
openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch
openstack-nova-cert-2015.1.1.dev62-1.1.noarch
python-novaclient-2.23.0-2.4.noarch
openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch
openstack-nova-api-2015.1.1.dev62-1.1.noarch
openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch
openstack-nova-2015.1.1.dev62-1.1.noarch
Compute Node:
OS: openSUSE13.1
openstack-nova-compute-2014.2.4.dev56-1.1.noarch
python-novaclient-2.20.0-2.3.noarch
python-nova-2014.2.4.dev56-1.1.noarch
openstack-nova-2014.2.4.dev56-1.1.noarch
During the installation of OpenStack using a Kilo Controller node, Kilo
Network node and a Juno compute node, I found that the compute node was
not registering the hypervisor with the controller. The hypervisor-list
output was empty but the service-list output showed the compute node.
After tracking through the code I found the root of the issue:
During nova-compute startup, I determined that the compute node will
check to see if it has already registered with the controller by
querying both the service and compute_nodes tables. I noticed that the
_get_service call was returning an exception.
Call flow on the compute node I was looking at:
/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
update_available_resource
_update_available_resource
_init_compute_node
_get_service <---------------- NotFound exception caught here
self.conductor_api.service_get_by_compute_host(context,self.host)
conductor/api.py:service_get_all_by
Looking on the controller to determine the source of the exception I found where the request is handled:
/usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by()
In this function the topic coming in is 'compute' so it is assumed to be a request from a Juno compute node. The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller.
I was able to resolve this by catching the exception in
service_get_all_by creating the expected field and defaulting it to
None.
if topic == 'compute':
result = self.db.service_get_by_compute_host(context, host)
# NOTE(sbauza): Only Juno computes are still calling this
# conductor method for getting service_get_by_compute_node,
# but expect a compute_node field so we can safely add it.
try:
result['compute_node'
] = objects.ComputeNodeList.get_all_by_host(
context, result['host'])
# FIXME(comstud) Potentially remove this on bump to v3.0
result = [result]
except Exception:
result['compute_node'] = None
result = [result]
Not sure if this is the correct fix or not but this unblocked me.
** Affects: nova
Importance: Undecided
Status: New
** Tags: nova-compute nova-controller
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1477261
Title:
Juno Compute node unable to register hypervisor with Kilo Controller
Status in OpenStack Compute (nova):
New
Bug description:
Controller Node:
OS: openSUSE13.2
python-nova-2015.1.1.dev62-1.1.noarch
openstack-nova-conductor-2015.1.1.dev62-1.1.noarch
openstack-nova-scheduler-2015.1.1.dev62-1.1.noarch
openstack-nova-cert-2015.1.1.dev62-1.1.noarch
python-novaclient-2.23.0-2.4.noarch
openstack-nova-novncproxy-2015.1.1.dev62-1.1.noarch
openstack-nova-api-2015.1.1.dev62-1.1.noarch
openstack-nova-consoleauth-2015.1.1.dev62-1.1.noarch
openstack-nova-2015.1.1.dev62-1.1.noarch
Compute Node:
OS: openSUSE13.1
openstack-nova-compute-2014.2.4.dev56-1.1.noarch
python-novaclient-2.20.0-2.3.noarch
python-nova-2014.2.4.dev56-1.1.noarch
openstack-nova-2014.2.4.dev56-1.1.noarch
During the installation of OpenStack using a Kilo Controller node,
Kilo Network node and a Juno compute node, I found that the compute
node was not registering the hypervisor with the controller. The
hypervisor-list output was empty but the service-list output showed
the compute node. After tracking through the code I found the root of
the issue:
During nova-compute startup, I determined that the compute node will
check to see if it has already registered with the controller by
querying both the service and compute_nodes tables. I noticed that the
_get_service call was returning an exception.
Call flow on the compute node I was looking at:
/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py
update_available_resource
_update_available_resource
_init_compute_node
_get_service <---------------- NotFound exception caught here
self.conductor_api.service_get_by_compute_host(context,self.host)
conductor/api.py:service_get_all_by
Looking on the controller to determine the source of the exception I found where the request is handled:
/usr/lib/python2.7/site-packages/nova/conductor/manager.py -> service_get_all_by()
In this function the topic coming in is 'compute' so it is assumed to be a request from a Juno compute node. The services table is queried and successful but apparently Juno compute nodes also expect a compute_node field in the response that I presume is not present in Kilo. It proceeds to add the field and queries the compute_nodes table to determine if the host already exists there. This is fine if the host is present in that table, but if it is not present, an exception is thrown that is not handled. This causes service_get_all_by to not return a result. This propagates all the way back to the compute node resulting in the hypervisor not being registered with the controller.
I was able to resolve this by catching the exception in
service_get_all_by creating the expected field and defaulting it to
None.
if topic == 'compute':
result = self.db.service_get_by_compute_host(context, host)
# NOTE(sbauza): Only Juno computes are still calling this
# conductor method for getting service_get_by_compute_node,
# but expect a compute_node field so we can safely add it.
try:
result['compute_node'
] = objects.ComputeNodeList.get_all_by_host(
context, result['host'])
# FIXME(comstud) Potentially remove this on bump to v3.0
result = [result]
except Exception:
result['compute_node'] = None
result = [result]
Not sure if this is the correct fix or not but this unblocked me.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1477261/+subscriptions
Follow ups