← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1839560] Related fix merged to nova (master)

 

Reviewed:  https://review.opendev.org/675705
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=89dd74ac7f1028daadf86cb18948e27fe9d1d411
Submitter: Zuul
Branch:    master

commit 89dd74ac7f1028daadf86cb18948e27fe9d1d411
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Fri Aug 9 17:24:07 2019 -0400

    Add functional regression recreate test for bug 1839560
    
    This adds a functional test which recreates bug 1839560
    where the driver reports a node, then no longer reports
    it so the compute manager deletes it, and then the driver
    reports it again later (this can be common with ironic
    nodes as they undergo maintenance). The issue is that since
    Ia69fabce8e7fd7de101e291fe133c6f5f5f7056a in Rocky, the
    ironic node uuid is re-used for the compute node uuid but
    there is a unique constraint on the compute node uuid so
    when trying to create the compute node once the ironic node
    is available again, the compute node create fails with a
    duplicate entry error due to the duplicate uuid. To recreate
    this in the functional test, a new fake virt driver is added
    which provides a predictable uuid per node like the ironic
    driver. The test also shows that archiving the database is
    a way to workaround the bug until it's properly fixed.
    
    Change-Id: If822509e906d5094f13a8700b2b9ed3c40580431
    Related-Bug: #1839560


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1839560

Title:
  ironic: moving node to maintenance makes it unusable afterwards

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) rocky series:
  In Progress
Status in OpenStack Compute (nova) stein series:
  In Progress

Bug description:
  If you use the Ironic API to set a node into a maintenance (for
  whatever reason), it will no longer be included in the list of
  available nodes to Nova.

  When Nova refreshes it's resources periodically, it will find that it
  is no longer in the list of available nodes and delete it from the
  database.

  Once you enable the node again and Nova attempts to create the
  ComputeNode again, it fails due to the duplicate UUID in the database,
  because the old record is soft deleted and had the same UUID.

  ref:
  https://github.com/openstack/nova/commit/9f28727eb75e05e07bad51b6eecce667d09dfb65
  - this made computenode.uuid match the baremetal uuid

  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L8304-L8316
  - this soft-deletes the computenode record when it doesn't see it in the list of active nodes

  
  traces:
  2019-08-08 17:20:13.921 6379 INFO nova.compute.manager [req-c71e5c81-eb34-4f72-a260-6aa7e802f490 - - - - -] Deleting orphan compute node 31 hypervisor host is 77788ad5-f1a4-46ac-8132-2d88dbd4e594, nodes are set([u'6d556617-2bdc-42b3-a3fe-b9218a1ebf0e', u'a634fab2-ecea-4cfa-be09-032dce6eaf51', u'2dee290d-ef73-46bc-8fc2-af248841ca12'])
  ...
  2019-08-08 22:21:25.284 82770 WARNING nova.compute.resource_tracker [req-a58eb5e2-9be0-4503-bf68-dff32ff87a3a - - - - -] No compute node record for ctl1-xxxx:77788ad5-f1a4-46ac-8132-2d88dbd4e594: ComputeHostNotFound_Remote: Compute host ctl1-xxxx could not be found.
  ....
  Remote error: DBDuplicateEntry (pymysql.err.IntegrityError) (1062, u"Duplicate entry '77788ad5-f1a4-46ac-8132-2d88dbd4e594' for key 'compute_nodes_uuid_idx'")
  ....

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1839560/+subscriptions


References