← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1857139] Re: TypeError: object of type 'object' has no len() from resources_from_request_spec when cells are down

 

Reviewed:  https://review.opendev.org/700186
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0d9622f581e830e7b7bc9763aaa09ba02e99b8bb
Submitter: Zuul
Branch:    master

commit 0d9622f581e830e7b7bc9763aaa09ba02e99b8bb
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Fri Dec 20 10:03:23 2019 -0500

    Handle cell failures in get_compute_nodes_by_host_or_node
    
    get_compute_nodes_by_host_or_node uses the scatter_gather_cells
    function but was not handling the case that a failure result
    was returned, which could be the called function raising some
    exception or the cell timing out. This causes issues when the
    caller of get_compute_nodes_by_host_or_node expects to get a
    ComputeNodeList back and can do something like len(nodes) on it
    which fails when the result is not iterable.
    
    To be clear, if a cell is down there are going to be problems
    which likely result in a NoValidHost error during scheduling, but
    this avoids an ugly TypeError traceback in the scheduler logs.
    
    Change-Id: Ia54b5adf0a125ae1f9b86887a07dd1d79821dd54
    Closes-Bug: #1857139


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1857139

Title:
  TypeError: object of type 'object' has no len() from
  resources_from_request_spec when cells are down

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) train series:
  Confirmed

Bug description:
  Seen here:

  https://zuul.opendev.org/t/openstack/build/c187e207bc1c48a0a7fa49ef9798b696/log/logs/screen-n-sch.txt.gz#2529

  cell1 is down so the call to scatter_gather_cells in
  get_compute_nodes_by_host_or_node yields a result but it's not a
  ComputeNodeList, it's the did_not_respond_sentinel object:

  https://github.com/openstack/nova/blob/02019d2660dfce3facdd64ecdb2bd60ba4a91c6d/nova/scheduler/host_manager.py#L705

  https://github.com/openstack/nova/blob/02019d2660dfce3facdd64ecdb2bd60ba4a91c6d/nova/context.py#L454

  which results in an error here:

  https://github.com/openstack/nova/blob/02019d2660dfce3facdd64ecdb2bd60ba4a91c6d/nova/scheduler/utils.py#L612

  The HostManager.get_compute_nodes_by_host_or_node method should filter
  out fail/timeout results from the scatter_gather_cells results. We'll
  get a NoValidHost either way but this is better than the traceback
  with the TypeError in it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1857139/+subscriptions


References