yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81856
[Bug 1866380] [NEW] Difficult to debug unexpected ironic driver behavior related to available nodes
Public bug reported:
Recently we had a customer case where attempts to add new ironic nodes
to an existing undercloud resulted in half of the nodes failing to be
detected and added to nova. Ironic API returned all of the newly added
nodes when called by the driver, but half of the nodes were not returned
to the compute manager by the driver.
There was only one nova-compute service managing all of the ironic nodes
of the all-in-one typical undercloud deployment.
After days of investigation and examination of a database dump from the
customer, we noticed that at some point the customer had changed the
hostname of the machine from something containing uppercase letters to
the same name but all lowercase. The nova-compute service record had the
mixed case name and the CONF.host (socket.gethostname()) had the
lowercase name.
The hash ring logic adds all of the nova-compute service hostnames plus
CONF.host to hash ring, then the ironic driver reports only the nodes it
owns by retrieving a service hostname from the ring based on a hash of
each ironic node UUID.
Because of the machine hostname change, the hash ring contained, for
example: {'MachineHostName', 'machinehostname'} when it should have
contained only one hostname. And because the hash ring contained two
hostnames, the driver was able to retrieve only half of the nodes as
nodes that it owned. So half of the new nodes were excluded and not
added as new compute nodes.
I propose adding some logging to the driver related to the hash ring to
help with debugging in the future.
** Affects: nova
Importance: Low
Assignee: melanie witt (melwitt)
Status: In Progress
** Tags: ironic
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1866380
Title:
Difficult to debug unexpected ironic driver behavior related to
available nodes
Status in OpenStack Compute (nova):
In Progress
Bug description:
Recently we had a customer case where attempts to add new ironic nodes
to an existing undercloud resulted in half of the nodes failing to be
detected and added to nova. Ironic API returned all of the newly added
nodes when called by the driver, but half of the nodes were not
returned to the compute manager by the driver.
There was only one nova-compute service managing all of the ironic
nodes of the all-in-one typical undercloud deployment.
After days of investigation and examination of a database dump from
the customer, we noticed that at some point the customer had changed
the hostname of the machine from something containing uppercase
letters to the same name but all lowercase. The nova-compute service
record had the mixed case name and the CONF.host
(socket.gethostname()) had the lowercase name.
The hash ring logic adds all of the nova-compute service hostnames
plus CONF.host to hash ring, then the ironic driver reports only the
nodes it owns by retrieving a service hostname from the ring based on
a hash of each ironic node UUID.
Because of the machine hostname change, the hash ring contained, for
example: {'MachineHostName', 'machinehostname'} when it should have
contained only one hostname. And because the hash ring contained two
hostnames, the driver was able to retrieve only half of the nodes as
nodes that it owned. So half of the new nodes were excluded and not
added as new compute nodes.
I propose adding some logging to the driver related to the hash ring
to help with debugging in the future.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1866380/+subscriptions
Follow ups