yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #41387
[Bug 1517770] [NEW] NULL free_disk_gb causes scheduler failure
Public bug reported:
It appears a race exists between nova-scheduler and the compute manager
when a ComputeNode entry is created for the first time.
The following log messages were noticed after multiple transient
failures to create VM on a newly deployed single node system.
2015-11-03 18:41:27.886 13735 WARNING nova.scheduler.host_manager [req-dd2b0758-78a4-4a67-90c8-9586d4d55489 db30a70a389548ed916f52d2f5c25544 617c3194750f44cfa1e9a747b2ac36f5 - - -] Host zs-zhost1 has more disk space than database expected (13119gb > Nonegb)
2015-11-03 18:41:27.904 13783 WARNING nova.scheduler.utils [req-dd2b0758-78a4-4a67-90c8-9586d4d55489 db30a70a389548ed916f52d2f5c25544 617c3194750f44cfa1e9a747b2ac36f5 - - -] Failed to compute_task_build_instances: unsupported operand type(s) for *: 'NoneType' and 'int'
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
executor_callback))
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
executor_callback)
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
result = func(ctxt, **new_args)
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 86, in select_destinations
filter_properties)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 67, in select_destinations
filter_properties)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 131, in _schedule
hosts = self._get_all_host_states(elevated)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 176, in _get_all_host_states
return self.host_manager.get_all_host_states(context)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 552, in get_all_host_states
host_state = self.host_state_cls(host, node, compute=compute)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 309, in host_state_cls
return HostState(host, node, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 157, in _init_
self.update_from_compute_node(compute)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 202, in update_from_compute_node
free_disk_mb = free_gb * 1024
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
2015-11-03 18:41:27.907 13783 WARNING nova.scheduler.utils [req-dd2b0758-78a4-4a67-90c8-9586d4d55489 db30a70a389548ed916f52d2f5c25544 617c3194750f44cfa1e9a747b2ac36f5 - - -] [instance: bd6bb6a7-e917-4ce7-b207-817144ac7853] Setting instance to ERROR state.
I believe that during the execution of
resource_tracker._update_available_resource() for a new node, the period
between the initial insert of the ComputeNode entry in
_init_compute_node() and the call to _update() leaves a ComputeNode with
a NULL free_disk_gb for a small window of time.
Commit 6aa36ab seems likely to have exposed this more widely.
Versions:
ii nova-common 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - common files
ii nova-compute 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - compute node libvirt support
ii python-nova 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu1~cloud0 all client library for OpenStack Compute API
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1517770
Title:
NULL free_disk_gb causes scheduler failure
Status in OpenStack Compute (nova):
New
Bug description:
It appears a race exists between nova-scheduler and the compute
manager when a ComputeNode entry is created for the first time.
The following log messages were noticed after multiple transient
failures to create VM on a newly deployed single node system.
2015-11-03 18:41:27.886 13735 WARNING nova.scheduler.host_manager [req-dd2b0758-78a4-4a67-90c8-9586d4d55489 db30a70a389548ed916f52d2f5c25544 617c3194750f44cfa1e9a747b2ac36f5 - - -] Host zs-zhost1 has more disk space than database expected (13119gb > Nonegb)
2015-11-03 18:41:27.904 13783 WARNING nova.scheduler.utils [req-dd2b0758-78a4-4a67-90c8-9586d4d55489 db30a70a389548ed916f52d2f5c25544 617c3194750f44cfa1e9a747b2ac36f5 - - -] Failed to compute_task_build_instances: unsupported operand type(s) for *: 'NoneType' and 'int'
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
executor_callback))
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
executor_callback)
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
result = func(ctxt, **new_args)
File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
return func(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 86, in select_destinations
filter_properties)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 67, in select_destinations
filter_properties)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 131, in _schedule
hosts = self._get_all_host_states(elevated)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 176, in _get_all_host_states
return self.host_manager.get_all_host_states(context)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 552, in get_all_host_states
host_state = self.host_state_cls(host, node, compute=compute)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 309, in host_state_cls
return HostState(host, node, **kwargs)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 157, in _init_
self.update_from_compute_node(compute)
File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 202, in update_from_compute_node
free_disk_mb = free_gb * 1024
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'
2015-11-03 18:41:27.907 13783 WARNING nova.scheduler.utils [req-dd2b0758-78a4-4a67-90c8-9586d4d55489 db30a70a389548ed916f52d2f5c25544 617c3194750f44cfa1e9a747b2ac36f5 - - -] [instance: bd6bb6a7-e917-4ce7-b207-817144ac7853] Setting instance to ERROR state.
I believe that during the execution of
resource_tracker._update_available_resource() for a new node, the
period between the initial insert of the ComputeNode entry in
_init_compute_node() and the call to _update() leaves a ComputeNode
with a NULL free_disk_gb for a small window of time.
Commit 6aa36ab seems likely to have exposed this more widely.
Versions:
ii nova-common 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - common files
ii nova-compute 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute - compute node libvirt support
ii python-nova 1:2015.1.1-0ubuntu1~cloud2 all OpenStack Compute Python libraries
ii python-novaclient 1:2.22.0-0ubuntu1~cloud0 all client library for OpenStack Compute API
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1517770/+subscriptions
Follow ups