← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1680661] Re: HostMappingNotFound during update_instance_info on n-cpu startup

 

Reviewed:  https://review.openstack.org/454426
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=aa943499128ba77b062dff75ec9b48e54f7d5021
Submitter: Jenkins
Branch:    master

commit aa943499128ba77b062dff75ec9b48e54f7d5021
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Thu Apr 6 22:08:52 2017 -0400

    Handle new hosts for updating instance info in scheduler
    
    As of change 791cf0643401f72ce834e580938057e325945169 we have
    to go through host and cell mappings to get to a cell database
    for host (compute node) and instance information.
    
    When a new compute service starts up it casts to update_instance_info
    and sends an empty list, which triggers the scheduler to try and
    lookup all instances on that host. If the new compute host is not
    yet mapped in a cell, we'll get a HostMappingNotFound error.
    
    We can handle this case in both the HostManager and ComputeManager
    by not casting from the compute if there is no information to send,
    and we can also handle it in the HostManager (for older computes)
    but just dealing with the HostMappingNotFound gracefully.
    
    This information eventually self-heals via normal operation.
    
    Change-Id: I7cec2eff35c0615534fcb4d5148f75721824172e
    Closes-Bug: #1680661


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680661

Title:
  HostMappingNotFound during update_instance_info on n-cpu startup

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Noticed this in the nova-scheduler logs during a failed test run
  today:

  http://logs.openstack.org/16/453916/4/check/gate-tempest-dsvm-py35
  -ubuntu-
  xenial/09850e6/logs/screen-n-sch.txt.gz?level=TRACE#_2017-04-06_23_11_45_834

  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server [req-36e4db2a-38e1-4810-9309-8893598e195a - -] Exception during message handling
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/server.py", line 157, in _process_incoming
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.5/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/scheduler/manager.py", line 125, in update_instance_info
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     instance_info)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.5/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 774, in update_instance_info
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     self._recreate_instance_info(context, host_name)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 745, in _recreate_instance_info
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     inst_dict = self._get_instances_by_host(context, host_name)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 717, in _get_instances_by_host
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     hm = objects.HostMapping.get_by_host(context, host_name)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.5/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     result = fn(cls, context, *args, **kwargs)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/objects/host_mapping.py", line 100, in get_by_host
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     db_mapping = cls._get_by_host_from_db(context, host)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.5/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 963, in wrapper
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     return fn(*args, **kwargs)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server   File "/opt/stack/new/nova/nova/objects/host_mapping.py", line 95, in _get_by_host_from_db
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server     raise exception.HostMappingNotFound(name=host)
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server nova.exception.HostMappingNotFound: Host 'ubuntu-xenial-internap-mtl01-8316357' is not mapped to any cell
  2017-04-06 23:11:45.834 24111 ERROR oslo_messaging.rpc.server 

  This is most likely a side effect of this change:
  https://review.openstack.org/#/c/439891/

  This could be a race on startup where the compute node is sending it's
  instance information to the scheduler before the compute host is
  mapped to a cell, in which case it wouldn't have any instances to send
  so the scheduler will try to look them up from the database and blow
  up because the host isn't mapped yet.

  I see from the log:

  Total number of compute nodes: 0 _async_init_instance_info
  /opt/stack/new/nova/nova/scheduler/host_manager.py:439

  Adding 0 instances for hosts 10-20 _async_init_instance_info
  /opt/stack/new/nova/nova/scheduler/host_manager.py:459

  It's definitely happening on startup.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1680661/+subscriptions


References