← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2127831] [NEW] Nova service deletion succeeds despite existing instances, causing "Duplicate entry" error on redeployment

 

Public bug reported:

When removing and then redeploying a compute node, the process fails due
to a pymysql.err.IntegrityError: (1062, "Duplicate entry ...") in the
Nova database.

This issue occurs because the nova-compute service deletion succeeds
even when there are instances still present on the host. The service
deletion API call removes the host mapping and the service and resource
provider records from Placement, but it does not delete the compute_node
object from the database. This leaves an orphaned compute_nodes record.

When the compute node is subsequently reprovisioned, Nova attempts to
create a new compute_nodes record, which conflicts with the orphaned
record and violates the database's unique constraint, leading to the
"Duplicate entry" error.

Steps to Reproduce:

* Deploy a compute node and launch an instance on it.
* Disable and delete the nova-compute service for that node.
* Observe that the service deletion succeeds, despite the presence of an instance.
* Attempt to redeploy the same compute node.

The nova-compute service will fail to start, and the logs will show a
"Duplicate entry" error.

Expected Behavior:

The service deletion should fail if there are instances on the host, as per the check in nova/api/openstack/compute/services.py:
https://github.com/openstack/nova/blob/8b81b5f91ffe1f9c38a483d151b82316d443dbf6/nova/api/openstack/compute/services.py#L268-L274

Actual Behavior:
The service deletion succeeds, leaving an orphaned compute_nodes record in the database and causing redeployment to fail.

Workaround:
The only workaround is to manually delete the orphaned compute_nodes record from the database using nova-manage cell_v2 delete_host before attempting to redeploy the node (but I haven't tried this yet!).

Conclusion:
This is a bug in the service deletion logic. The check for existing instances is not functioning as expected, which leads to an inconsistent state in the Nova database and prevents the successful redeployment of compute nodes. This creates a significant operational issue for anyone needing to perform maintenance or hardware replacement on compute nodes.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2127831

Title:
  Nova service deletion succeeds despite existing instances, causing
  "Duplicate entry" error on redeployment

Status in OpenStack Compute (nova):
  New

Bug description:
  When removing and then redeploying a compute node, the process fails
  due to a pymysql.err.IntegrityError: (1062, "Duplicate entry ...") in
  the Nova database.

  This issue occurs because the nova-compute service deletion succeeds
  even when there are instances still present on the host. The service
  deletion API call removes the host mapping and the service and
  resource provider records from Placement, but it does not delete the
  compute_node object from the database. This leaves an orphaned
  compute_nodes record.

  When the compute node is subsequently reprovisioned, Nova attempts to
  create a new compute_nodes record, which conflicts with the orphaned
  record and violates the database's unique constraint, leading to the
  "Duplicate entry" error.

  Steps to Reproduce:

  * Deploy a compute node and launch an instance on it.
  * Disable and delete the nova-compute service for that node.
  * Observe that the service deletion succeeds, despite the presence of an instance.
  * Attempt to redeploy the same compute node.

  The nova-compute service will fail to start, and the logs will show a
  "Duplicate entry" error.

  Expected Behavior:

  The service deletion should fail if there are instances on the host, as per the check in nova/api/openstack/compute/services.py:
  https://github.com/openstack/nova/blob/8b81b5f91ffe1f9c38a483d151b82316d443dbf6/nova/api/openstack/compute/services.py#L268-L274

  Actual Behavior:
  The service deletion succeeds, leaving an orphaned compute_nodes record in the database and causing redeployment to fail.

  Workaround:
  The only workaround is to manually delete the orphaned compute_nodes record from the database using nova-manage cell_v2 delete_host before attempting to redeploy the node (but I haven't tried this yet!).

  Conclusion:
  This is a bug in the service deletion logic. The check for existing instances is not functioning as expected, which leads to an inconsistent state in the Nova database and prevents the successful redeployment of compute nodes. This creates a significant operational issue for anyone needing to perform maintenance or hardware replacement on compute nodes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2127831/+subscriptions