yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #69812
[Bug 1714248] Re: Compute node HA for ironic doesn't work due to the name duplication of Resource Provider
This started breaking ironic multinode CI, see investigation in
https://bugs.launchpad.net/ironic/+bug/1737395
** Also affects: ironic
Importance: Undecided
Status: New
** Changed in: ironic
Status: New => Triaged
** Changed in: ironic
Importance: Undecided => Critical
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1714248
Title:
Compute node HA for ironic doesn't work due to the name duplication of
Resource Provider
Status in Ironic:
Triaged
Status in OpenStack Compute (nova):
In Progress
Bug description:
Description
===========
In an environment where there are multiple compute nodes with ironic driver,
when a compute node goes down, another compute node cannot take over ironic nodes.
Steps to reproduce
==================
1. Start multiple compute nodes with ironic driver.
2. Register one node to ironic.
2. Stop a compute node which manages the ironic node.
3. Create an instance.
Expected result
===============
The instance is created.
Actual result
=============
The instance creation is failed.
Environment
===========
1. Exact version of OpenStack you are running.
openstack-nova-scheduler-15.0.6-2.el7.noarch
openstack-nova-console-15.0.6-2.el7.noarch
python2-novaclient-7.1.0-1.el7.noarch
openstack-nova-common-15.0.6-2.el7.noarch
openstack-nova-serialproxy-15.0.6-2.el7.noarch
openstack-nova-placement-api-15.0.6-2.el7.noarch
python-nova-15.0.6-2.el7.noarch
openstack-nova-novncproxy-15.0.6-2.el7.noarch
openstack-nova-api-15.0.6-2.el7.noarch
openstack-nova-conductor-15.0.6-2.el7.noarch
2. Which hypervisor did you use?
ironic
Details
=======
When a nova-compute goes down, another nova-compute will take over ironic nodes managed by the failed nova-compute by re-balancing a hash-ring. Then the active nova-compute tries creating a
new resource provider with a new ComputeNode object UUID and the hypervisor name (ironic node UUID)[1][2][3]. This creation fails with a conflict(409) since there is a resource provider with the same name created by the failed nova-compute.
When a new instance is requested, the scheduler gets only an old
resource provider for the ironic node[4]. Then, the ironic node is not
selected:
WARNING nova.scheduler.filters.compute_filter [req-
a37d68b5-7ab1-4254-8698-502304607a90 7b55e61a07304f9cab1544260dcd3e41
e21242f450d948d7af2650ac9365ee36 - - -] (compute02, 8904aeeb-a35b-4ba3
-848a-73269fdde4d3) ram: 4096MB disk: 849920MB io_ops: 0 instances: 0
has not been heard from in a while
[1] https://github.com/openstack/nova/blob/stable/ocata/nova/compute/resource_tracker.py#L464
[2] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L630
[3] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/client/report.py#L410
[4] https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filter_scheduler.py#L183
To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1714248/+subscriptions
References