← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1849701] [NEW] Resource_provider entry related to a deleted compute node, unable to migrate vms to the node

 

Public bug reported:

Description
===========
Migrating vm to a node was failing with the following error:

"There was a conflict when trying to complete your request.\n\n
Conflicting resource provider name: mymachine.maas already exists."

https://paste.ubuntu.com/p/4dxS6d8X8p/

Steps to reproduce
==================

We found that the compute node was added multiple times:

Compute node was added multiple time, the valid one is created_at:
2019-08-22 18:47:31

mysql> select created_at, deleted_at from compute_nodes where host="mymachine";
+---------------------+---------------------+
| created_at          | deleted_at          |
+---------------------+---------------------+
| 2019-08-22 18:47:31 | NULL                |
| 2019-08-21 11:50:26 | 2019-08-22 11:04:27 |
| 2019-08-22 16:25:52 | 2019-08-22 16:58:42 |
| 2019-08-22 18:42:39 | 2019-08-22 18:45:36 |
+---------------------+---------------------+
4 rows in set (0.00 sec)


and the resource provider entry was related to an already deleted compute node:

mysql> select created_at from resource_providers where name="mymachine.maas";
+---------------------+
| created_at          |
+---------------------+
| 2019-08-22 18:42:40 |
+---------------------+
1 row in set (0.00 sec)


We tried to delete it: 

mysql>  delete from resource_providers where name="mymachine.maas";
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`nova_api`.`resource_providers`, CONSTRAINT `resource_providers_ibfk_1` FOREIGN KEY (`root_provider_id`) REFERENCES `resource_providers` (`id`))

It is strange that root_provider_id seems to reference the same row of
the same table making deletion of any row of this table impossible:

mysql> select id,root_provider_id from resource_providers;
+----+------------------+
| id | root_provider_id |
+----+------------------+
|  1 |                1 |
|  4 |                4 |
|  7 |                7 |
| 10 |               10 |
| 13 |               13 |
| 16 |               16 |
| 19 |               19 |
| 22 |               22 |
| 28 |               28 |
| 31 |               31 |
| 34 |               34 |
| 37 |               37 |
| 40 |               40 |
| 43 |               43 |
| 45 |               45 |
| 52 |               52 |
| 55 |               55 |
| 58 |               58 |
| 61 |               61 |
| 64 |               64 |
| 67 |               67 |
| 70 |               70 |
| 73 |               73 |
| 76 |               76 |
| 79 |               79 |
| 82 |               82 |
| 91 |               91 |
+----+------------------+

Expected result
===============
Resource provider entry should be deleted when a compute node is deleted allowing to migrate vm to the node.

Workaround
===============
we updated name to invalid:

mysql> update resource_providers set name="invalid" where name="mymachine.maas";
Query OK, 1 row affected (0.01 sec)


Restarted nova-compute on the node with 

systemctl restart nova-compute

Resource provider entry got recreated:

mysql> select * from resource_providers where name="mymachine.maas";
+---------------------+---------------------+-----+--------------------------------------+------------------+------------+----------+------------------+--------------------+
| created_at          | updated_at          | id  | uuid                                 | name             | generation | can_host | root_provider_id | parent_provider_id |
+---------------------+---------------------+-----+--------------------------------------+------------------+------------+----------+------------------+--------------------+
| 2019-10-24 15:16:51 | 2019-10-24 15:18:12 | 384 | e6dabd5d-d1ed-4fd5-a1e0-0be3b360fb28 | mymachine.maas |          2 |     NULL |              384 |               NULL |
+---------------------+---------------------+-----+--------------------------------------+------------------+------------+----------+------------------+--------------------+


And migration worked.


Environment
===============
xenial-queens cloud 


Nova compute node:

dpkg -l | grep nova
ii  nova-api-metadata                     2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - metadata API frontend
ii  nova-common                           2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - common files
ii  nova-compute                          2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - compute node base
ii  nova-compute-kvm                      2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - compute node (KVM)
ii  nova-compute-libvirt                  2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - compute node libvirt support
ii  python-nova                           2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute Python libraries
ii  python-novaclient                     2:9.1.1-0ubuntu1~cloud0                       all          client library for OpenStack Compute API - Python 2.7


Nova Cloud Controller

dpkg -l | grep nova
ii  nova-api-os-compute              2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - OpenStack Compute API frontend
ii  nova-common                      2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - common files
ii  nova-conductor                   2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - conductor service
ii  nova-consoleauth                 2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - Console Authenticator
ii  nova-novncproxy                  2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - NoVNC proxy
ii  nova-placement-api               2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - placement API frontend
ii  nova-scheduler                   2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - virtual machine scheduler
ii  nova-spiceproxy                  2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - spice html5 proxy
ii  python-nova                      2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute Python libraries
ii  python-novaclient                2:9.1.1-0ubuntu1~cloud0                       all          client library for OpenStack Compute API - Python 2.7

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1849701

Title:
  Resource_provider entry related to a deleted compute node, unable to
  migrate vms to the node

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  Migrating vm to a node was failing with the following error:

  "There was a conflict when trying to complete your request.\n\n
  Conflicting resource provider name: mymachine.maas already exists."

  https://paste.ubuntu.com/p/4dxS6d8X8p/

  Steps to reproduce
  ==================

  We found that the compute node was added multiple times:

  Compute node was added multiple time, the valid one is created_at:
  2019-08-22 18:47:31

  mysql> select created_at, deleted_at from compute_nodes where host="mymachine";
  +---------------------+---------------------+
  | created_at          | deleted_at          |
  +---------------------+---------------------+
  | 2019-08-22 18:47:31 | NULL                |
  | 2019-08-21 11:50:26 | 2019-08-22 11:04:27 |
  | 2019-08-22 16:25:52 | 2019-08-22 16:58:42 |
  | 2019-08-22 18:42:39 | 2019-08-22 18:45:36 |
  +---------------------+---------------------+
  4 rows in set (0.00 sec)

  
  and the resource provider entry was related to an already deleted compute node:

  mysql> select created_at from resource_providers where name="mymachine.maas";
  +---------------------+
  | created_at          |
  +---------------------+
  | 2019-08-22 18:42:40 |
  +---------------------+
  1 row in set (0.00 sec)

  
  We tried to delete it: 

  mysql>  delete from resource_providers where name="mymachine.maas";
  ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`nova_api`.`resource_providers`, CONSTRAINT `resource_providers_ibfk_1` FOREIGN KEY (`root_provider_id`) REFERENCES `resource_providers` (`id`))

  It is strange that root_provider_id seems to reference the same row of
  the same table making deletion of any row of this table impossible:

  mysql> select id,root_provider_id from resource_providers;
  +----+------------------+
  | id | root_provider_id |
  +----+------------------+
  |  1 |                1 |
  |  4 |                4 |
  |  7 |                7 |
  | 10 |               10 |
  | 13 |               13 |
  | 16 |               16 |
  | 19 |               19 |
  | 22 |               22 |
  | 28 |               28 |
  | 31 |               31 |
  | 34 |               34 |
  | 37 |               37 |
  | 40 |               40 |
  | 43 |               43 |
  | 45 |               45 |
  | 52 |               52 |
  | 55 |               55 |
  | 58 |               58 |
  | 61 |               61 |
  | 64 |               64 |
  | 67 |               67 |
  | 70 |               70 |
  | 73 |               73 |
  | 76 |               76 |
  | 79 |               79 |
  | 82 |               82 |
  | 91 |               91 |
  +----+------------------+

  Expected result
  ===============
  Resource provider entry should be deleted when a compute node is deleted allowing to migrate vm to the node.

  Workaround
  ===============
  we updated name to invalid:

  mysql> update resource_providers set name="invalid" where name="mymachine.maas";
  Query OK, 1 row affected (0.01 sec)

  
  Restarted nova-compute on the node with 

  systemctl restart nova-compute

  Resource provider entry got recreated:

  mysql> select * from resource_providers where name="mymachine.maas";
  +---------------------+---------------------+-----+--------------------------------------+------------------+------------+----------+------------------+--------------------+
  | created_at          | updated_at          | id  | uuid                                 | name             | generation | can_host | root_provider_id | parent_provider_id |
  +---------------------+---------------------+-----+--------------------------------------+------------------+------------+----------+------------------+--------------------+
  | 2019-10-24 15:16:51 | 2019-10-24 15:18:12 | 384 | e6dabd5d-d1ed-4fd5-a1e0-0be3b360fb28 | mymachine.maas |          2 |     NULL |              384 |               NULL |
  +---------------------+---------------------+-----+--------------------------------------+------------------+------------+----------+------------------+--------------------+

  
  And migration worked.

  
  Environment
  ===============
  xenial-queens cloud 

  
  Nova compute node:

  dpkg -l | grep nova
  ii  nova-api-metadata                     2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - metadata API frontend
  ii  nova-common                           2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - common files
  ii  nova-compute                          2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - compute node base
  ii  nova-compute-kvm                      2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - compute node (KVM)
  ii  nova-compute-libvirt                  2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute - compute node libvirt support
  ii  python-nova                           2:17.0.10-0ubuntu2.1~cloud0                   all          OpenStack Compute Python libraries
  ii  python-novaclient                     2:9.1.1-0ubuntu1~cloud0                       all          client library for OpenStack Compute API - Python 2.7

  
  Nova Cloud Controller

  dpkg -l | grep nova
  ii  nova-api-os-compute              2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - OpenStack Compute API frontend
  ii  nova-common                      2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - common files
  ii  nova-conductor                   2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - conductor service
  ii  nova-consoleauth                 2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - Console Authenticator
  ii  nova-novncproxy                  2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - NoVNC proxy
  ii  nova-placement-api               2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - placement API frontend
  ii  nova-scheduler                   2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - virtual machine scheduler
  ii  nova-spiceproxy                  2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute - spice html5 proxy
  ii  python-nova                      2:17.0.9-0ubuntu1~cloud0                      all          OpenStack Compute Python libraries
  ii  python-novaclient                2:9.1.1-0ubuntu1~cloud0                       all          client library for OpenStack Compute API - Python 2.7

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1849701/+subscriptions


Follow ups