yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79565
[Bug 1839560] [NEW] ironic: moving node to maintenance makes it unusable afterwards
Public bug reported:
If you use the Ironic API to set a node into a maintenance (for whatever
reason), it will no longer be included in the list of available nodes to
Nova.
When Nova refreshes it's resources periodically, it will find that it is
no longer in the list of available nodes and delete it from the
database.
Once you enable the node again and Nova attempts to create the
ComputeNode again, it fails due to the duplicate UUID in the database,
because the old record is soft deleted and had the same UUID.
ref:
https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
- this made computenode.uuid match the baremetal uuid
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
- this soft-deletes the computenode record when it doesn't see it in the list of active nodes
traces:
2019-08-08 17:20:13.921 6379 INFO nova.compute.manager [req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', u'a634fab2-ecea-4cfa-be09-032dce6eaf51', u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
...
2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker [req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for ctl1-xxxx:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: Compute host ctl1-xxxx could not be found.
....
Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'")
....
** Affects: nova
Importance: High
Status: Triaged
** Tags: compute ironic
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839560
Title:
ironic: moving node to maintenance makes it unusable afterwards
Status in OpenStack Compute (nova):
Triaged
Bug description:
If you use the Ironic API to set a node into a maintenance (for
whatever reason), it will no longer be included in the list of
available nodes to Nova.
When Nova refreshes it's resources periodically, it will find that it
is no longer in the list of available nodes and delete it from the
database.
Once you enable the node again and Nova attempts to create the
ComputeNode again, it fails due to the duplicate UUID in the database,
because the old record is soft deleted and had the same UUID.
ref:
https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
- this made computenode.uuid match the baremetal uuid
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
- this soft-deletes the computenode record when it doesn't see it in the list of active nodes
traces:
2019-08-08 17:20:13.921 6379 INFO nova.compute.manager [req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', u'a634fab2-ecea-4cfa-be09-032dce6eaf51', u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
...
2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker [req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for ctl1-xxxx:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: Compute host ctl1-xxxx could not be found.
....
Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'")
....
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions
Follow ups