← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1866380] Re: Ironic driver hash ring treats hostnames differing only by case as different hostnames

 

Reviewed:  https://review.opendev.org/711680
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7145100ee4e732caa532d614e2149ef2a545287a
Submitter: Zuul
Branch:    master

commit 7145100ee4e732caa532d614e2149ef2a545287a
Author: melanie witt <melwittt@xxxxxxxxx>
Date:   Fri Mar 6 17:05:28 2020 +0000

    Lowercase ironic driver hash ring and ignore case in cache
    
    Recently we had a customer case where attempts to add new ironic nodes
    to an existing undercloud resulted in half of the nodes failing to be
    detected and added to nova. Ironic API returned all of the newly added
    nodes when called by the driver, but half of the nodes were not
    returned to the compute manager by the driver.
    
    There was only one nova-compute service managing all of the ironic
    nodes of the all-in-one typical undercloud deployment.
    
    After days of investigation and examination of a database dump from the
    customer, we noticed that at some point the customer had changed the
    hostname of the machine from something containing uppercase letters to
    the same name but all lowercase. The nova-compute service record had
    the mixed case name and the CONF.host (socket.gethostname()) had the
    lowercase name.
    
    The hash ring logic adds all of the nova-compute service hostnames plus
    CONF.host to hash ring, then the ironic driver reports only the nodes
    it owns by retrieving a service hostname from the ring based on a hash
    of each ironic node UUID.
    
    Because of the machine hostname change, the hash ring contained, for
    example: {'MachineHostName', 'machinehostname'} when it should have
    contained only one hostname. And because the hash ring contained two
    hostnames, the driver was able to retrieve only half of the nodes as
    nodes that it owned. So half of the new nodes were excluded and not
    added as new compute nodes.
    
    This adds lowercasing of hosts that are added to the hash ring and
    ignores case when comparing the CONF.host to the hash ring members
    to avoid unnecessary pain and confusion for users that make hostname
    changes that are otherwise functionally harmless.
    
    This also adds logging of the set of hash ring members at level DEBUG
    to help enable easier debugging of hash ring related situations.
    
    Closes-Bug: #1866380
    
    Change-Id: I617fd59de327de05a198f12b75a381f21945afb0


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866380

Title:
  Ironic driver hash ring treats hostnames differing only by case as
  different hostnames

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New
Status in OpenStack Compute (nova) rocky series:
  New
Status in OpenStack Compute (nova) stein series:
  New
Status in OpenStack Compute (nova) train series:
  In Progress

Bug description:
  Recently we had a customer case where attempts to add new ironic nodes
  to an existing undercloud resulted in half of the nodes failing to be
  detected and added to nova. Ironic API returned all of the newly added
  nodes when called by the driver, but half of the nodes were not
  returned to the compute manager by the driver.

  There was only one nova-compute service managing all of the ironic
  nodes of the all-in-one typical undercloud deployment.

  After days of investigation and examination of a database dump from
  the customer, we noticed that at some point the customer had changed
  the hostname of the machine from something containing uppercase
  letters to the same name but all lowercase. The nova-compute service
  record had the mixed case name and the CONF.host
  (socket.gethostname()) had the lowercase name.

  The hash ring logic adds all of the nova-compute service hostnames
  plus CONF.host to hash ring, then the ironic driver reports only the
  nodes it owns by retrieving a service hostname from the ring based on
  a hash of each ironic node UUID.

  Because of the machine hostname change, the hash ring contained, for
  example: {'MachineHostName', 'machinehostname'} when it should have
  contained only one hostname. And because the hash ring contained two
  hostnames, the driver was able to retrieve only half of the nodes as
  nodes that it owned. So half of the new nodes were excluded and not
  added as new compute nodes.

  I propose adding some logging to the driver related to the hash ring
  to help with debugging in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866380/+subscriptions


References