yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64513
[Bug 1680661] Re: HostMappingNotFound during update_instance_info on n-cpu startup
Reviewed: https://review.openstack.org/454426
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aa943499128ba77b062dff75ec9b48e54f7d5021
Submitter: Jenkins
Branch: master
commit aa943499128ba77b062dff75ec9b48e54f7d5021
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu Apr 6 22:08:52 2017 -0400
Handle new hosts for updating instance info in scheduler
As of change 791cf0643401f72ce834e580938057e325945169 we have
to go through host and cell mappings to get to a cell database
for host (compute node) and instance information.
When a new compute service starts up it casts to update_instance_info
and sends an empty list, which triggers the scheduler to try and
lookup all instances on that host. If the new compute host is not
yet mapped in a cell, we'll get a HostMappingNotFound error.
We can handle this case in both the HostManager and ComputeManager
by not casting from the compute if there is no information to send,
and we can also handle it in the HostManager (for older computes)
but just dealing with the HostMappingNotFound gracefully.
This information eventually self-heals via normal operation.
Change-Id: I7cec2eff35c0615534fcb4d5148f75721824172e
Closes-Bug: #1680661
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680661
Title:
HostMappingNotFound during update_instance_info on n-cpu startup
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Noticed this in the nova-scheduler logs during a failed test run
today:
http://logs.openstack.org/16/453916/4/check/gate-tempest-dsvm-py35
-ubuntu-
xenial/09850e6/logs/screen-n-sch.txt.gz?level=TRACE#_2017-04-06_23_11_45_834
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server [req-36e4db2a-38e1-4810-9309-8893598e195a - -] Exception during message handling
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/server.py", line 157, in _process_incoming
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/manager.py", line 125, in update_instance_info
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server instance_info)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 774, in update_instance_info
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server self._recreate_instance_info(context, host_name)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 745, in _recreate_instance_info
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server inst_dict = self._get_instances_by_host(context, host_name)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 717, in _get_instances_by_host
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server hm = objects.HostMapping.get_by_host(context, host_name)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/host_mapping.py", line 100, in get_by_host
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server db_mapping = cls._get_by_host_from_db(context, host)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 963, in wrapper
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server return fn(*args, **kwargs)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server File "/opt/stack/new/nova/nova/objects/host_mapping.py", line 95, in _get_by_host_from_db
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server raise exception.HostMappingNotFound(name=host)
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server nova.exception.HostMappingNotFound: Host 'ubuntu-xenial-internap-mtl01-8316357' is not mapped to any cell
2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server
This is most likely a side effect of this change:
https://review.openstack.org/#/c/439891/
This could be a race on startup where the compute node is sending it's
instance information to the scheduler before the compute host is
mapped to a cell, in which case it wouldn't have any instances to send
so the scheduler will try to look them up from the database and blow
up because the host isn't mapped yet.
I see from the log:
Total number of compute nodes: 0 _async_init_instance_info
/opt/stack/new/nova/nova/scheduler/host_manager.py:439
Adding 0 instances for hosts 10-20 _async_init_instance_info
/opt/stack/new/nova/nova/scheduler/host_manager.py:459
It's definitely happening on startup.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1680661/+subscriptions
References