← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1832814] [NEW] Placement API appears to have issues when compute host replaced


Public bug reported:

We have been upgrading our sites from RDO to OSA. This process involved
live migrating all VMs from a compute host before reinstalling it with
OSA playbooks.

Note: the compute host is not "removed" from openstack in anyway; the
new OSA node is the *same* hardware, same hostname etc - just
reinstalled as OSA.

This appears to have consequences for the way the placement API works -
we have noticed that when live migrating the scheduler will often choose
a highly loaded node where an empty node exists - for example - in the
below output from my live migration script the VM is being migrated from
cc-compute04-kna1; the scheduler has chosen cc-compute01-kna1 as the
target this despite the load it currently has, and that compute09, 15
and 18 are all empty

Migration Destination: cc-compute01-kna1
Migration ID: 12993
| Host              | Project                          | CPU | Memory MB | Disk GB |
| cc-compute04-kna1 | (used_now)                       | 124 |    254976 |    2790 |
| cc-compute01-kna1 | (used_now)                       | 230 |    466432 |    8210 |
| cc-compute03-kna1 | (used_now)                       | 174 |    327680 |    4740 |
| cc-compute05-kna1 | (used_now)                       | 198 |    457728 |    4430 |
| cc-compute06-kna1 | (used_now)                       | 163 |    366592 |    4650 |
| cc-compute07-kna1 | (used_now)                       | 170 |    415744 |    4460 |
| cc-compute08-kna1 | (used_now)                       | 178 |    382464 |    4750 |
| cc-compute09-kna1 | (used_now) |   0 |      2048 |       0 |
| cc-compute11-kna1 | (used_now)                       | 131 |    313856 |    3100 |
| cc-compute12-kna1 | (used_now)                       | 176 |    392704 |    4800 |
| cc-compute13-kna1 | (used_now)                       | 173 |    390656 |    5470 |
| cc-compute14-kna1 | (used_now)                       |   2 |      4096 |      50 |
| cc-compute15-kna1 | (used_now) |   0 |      2048 |       0 |
| cc-compute16-kna1 | (used_now)                       | 170 |    355840 |    5410 |
| cc-compute17-kna1 | (used_now)                       | 281 |    646656 |    5370 |
| cc-compute18-kna1 | (used_now) |   0 |      2048 |       0 |
| cc-compute19-kna1 | (used_now)                       | 207 |    517120 |    4860 |
| cc-compute20-kna1 | (used_now)                       | 223 |    560640 |    5150 |
| cc-compute23-kna1 | (used_now)                       | 184 |    406528 |    6350 |
| cc-compute24-kna1 | (used_now)                       | 190 |    585216 |    4820 |
| cc-compute25-kna1 | (used_now)                       | 235 |    491520 |    5500 |
| cc-compute26-kna1 | (used_now)                       | 283 |    610304 |    9390 |
| cc-compute27-kna1 | (used_now)                       | 200 |    573440 |    6730 |
| cc-compute28-kna1 | (used_now)                       | 269 |    587264 |    6600 |
| cc-compute29-kna1 | (used_now)                       | 245 |    494080 |    8480 |

this is not an isolated case, and is something we have seen frequently
to the point where we override the scheduler and use targeted migrations
to achieve better load balancing.

Interrogating the Placement API for a compute (09) prior to
reinstallation I can find the UUID

            "generation": 480003,
            "links": [
                    "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5",
                    "rel": "self"
                    "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/inventories",
                    "rel": "inventories"
                    "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/usages",
                    "rel": "usages"
                    "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/aggregates",
                    "rel": "aggregates"
            "name": "cc-compute09-kna1",
            "uuid": "d6aeeeb0-0cab-4e3f-a070-9808801b94a5"

after the node is reinstalled it has a new UUID

            "generation": 71,
            "links": [
                    "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4",
                    "rel": "self"
                    "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/inventories",
                    "rel": "inventories"
                    "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages",
                    "rel": "usages"
                    "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/aggregates",
                    "rel": "aggregates"
            "name": "compute09.openstack.local",
            "uuid": "d7f483ff-3b91-4d13-9900-0ec24c3a06a4"

this new resource provider shows 0 consumed resources

curl -g  -X GET http://********:8780/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:************" | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    89  100    89    0     0   2870      0 --:--:-- --:--:-- --:--:--  2870
    "resource_provider_generation": 72,
    "usages": {
        "DISK_GB": 0,
        "MEMORY_MB": 0,
        "VCPU": 0

investigating the resource_providers table shows pottential duplicate
entries -

MariaDB [nova_api]> select * from resource_providers;
| created_at          | updated_at          | id  | uuid                                 | name                         | generation | can_host | root_provider_id | parent_provider_id |
| 2018-04-25 21:25:32 | 2019-04-17 07:08:24 |   1 | cbb2c235-ed5f-4f63-9015-1edfe91d63c8 | cc-compute02-kna1            |     195067 |        0 |                1 |               NULL |
| 2018-04-25 21:44:17 | 2019-05-02 13:23:34 |   2 | 6125fdeb-370f-4139-9d1c-369e9eb4e620 | cc-compute-lsd01-kna1        |         41 |        0 |                2 |               NULL |
| 2018-04-25 22:13:01 | 2019-05-20 13:11:55 |   3 | 452b7f99-a178-4dc7-9fea-e9d9ab6a3e99 | cc-compute05-kna1            |     450192 |        0 |                3 |               NULL |
| 2018-04-25 22:13:08 | 2019-06-10 12:28:41 |   4 | 03b420df-79fb-4f0a-aede-bdbd62ce9ce3 | cc-compute03-kna1            |     424867 |        0 |                4 |               NULL |
| 2018-04-25 22:13:08 | 2019-06-14 06:29:47 |   5 | 9386d418-339c-4010-baa5-18e2aa601a3c | cc-compute04-kna1            |     479160 |        0 |                5 |               NULL |
| 2018-04-25 22:46:46 | 2019-05-20 13:39:00 |   6 | 7b0580e3-7592-4c3a-a0e9-a8d23f3550d7 | cc-compute07-kna1            |     441489 |        0 |                6 |               NULL |
| 2018-04-25 22:46:47 | 2019-04-19 18:53:45 |   7 | 98e1b299-239f-488c-a7a0-3f78e76c8f6b | cc-compute06-kna1            |     396721 |        0 |                7 |               NULL |
| 2018-04-25 22:46:50 | 2019-05-24 07:28:59 |   8 | 64c2b0fb-4d7e-4d5f-92bc-69e00a3cb85e | cc-compute08-kna1            |     449994 |        0 |                8 |               NULL |
| 2018-04-26 00:47:56 | 2019-06-11 20:43:47 |  11 | 61708a8f-77fd-47dc-9140-6ea613509506 | cc-compute14-kna1            |     474210 |        0 |               11 |               NULL |
| 2018-04-26 00:48:01 | 2019-05-09 12:20:15 |  12 | 9e082274-568d-49a2-9801-05b2390f7dfa | cc-compute16-kna1            |     432294 |        0 |               12 |               NULL |
| 2018-04-26 00:48:04 | 2019-06-11 20:11:28 |  14 | 396bb173-2e46-4d35-963e-9b49acf0add8 | cc-compute22-kna1            |     448545 |        0 |               14 |               NULL |
| 2018-04-26 00:48:06 | 2019-05-21 13:07:23 |  15 | 80e5f3a7-e4a3-43d1-a7a8-4c118fba7792 | cc-compute12-kna1            |     450359 |        0 |               15 |               NULL |
| 2018-04-26 00:48:20 | 2019-05-16 14:32:54 |  18 | b86db974-5787-4012-a7df-26aeb8e73574 | cc-compute20-kna1            |     425960 |        0 |               18 |               NULL |
| 2018-04-26 00:48:20 | 2019-06-12 12:24:24 |  19 | dfb35aab-2af9-4d86-bccb-76959c7f68ed | cc-compute18-kna1            |     435686 |        0 |               19 |               NULL |
| 2018-04-26 00:48:22 | 2019-05-07 10:55:46 |  20 | 4decfcd0-cca2-4ba5-9f83-a86b8f2a8e4d | cc-compute17-kna1            |     418818 |        0 |               20 |               NULL |
| 2018-10-31 12:04:48 | 2019-04-24 08:32:36 |  28 | 266e5266-f811-4b24-949f-3ed9e841c479 | cc-compute10-kna1            |     166818 |     NULL |               28 |               NULL |
| 2018-11-01 18:59:56 | 2019-06-14 06:29:52 |  34 | 5180de9c-c964-4661-bfbd-893cdfc19f32 | compute25.openstack.local    |     271667 |     NULL |               34 |               NULL |
| 2018-11-01 18:59:56 | 2019-06-14 06:29:47 |  37 | 3a456de2-68ea-4472-95dd-2db1c7b29661 | compute24.openstack.local    |     283689 |     NULL |               37 |               NULL |
| 2019-02-06 19:45:50 | 2019-06-14 06:29:39 |  43 | 0e5e6b94-2992-4075-a922-320bbe8b1bbb | compute26.openstack.local    |     165203 |     NULL |               43 |               NULL |
| 2019-02-06 19:45:50 | 2019-06-14 06:27:26 |  46 | 008c7549-b638-4130-8e79-858556a787c2 | compute27.openstack.local    |     166810 |     NULL |               46 |               NULL |
| 2019-02-10 17:45:03 | 2019-06-14 06:29:16 |  52 | 1fe21d2b-e6f1-4820-b341-a490cf9704d8 | compute29.openstack.local    |     161380 |     NULL |               52 |               NULL |
| 2019-02-10 17:45:03 | 2019-06-14 06:29:08 |  55 | e636f01c-b5da-4886-8a60-1baa5371bcc5 | compute28.openstack.local    |     159388 |     NULL |               55 |               NULL |
| 2019-04-30 09:53:45 | 2019-06-14 06:29:36 |  76 | 34381a1c-1b4e-4716-b7ba-ea72956b92f7 | compute19.openstack.local    |      56127 |     NULL |               76 |               NULL |
| 2019-04-30 13:20:12 | 2019-06-14 06:29:37 |  79 | 946fa4f1-5f1d-47be-b65c-038a7e20c42b | compute06.openstack.local    |      56068 |     NULL |               79 |               NULL |
| 2019-05-08 08:26:45 | 2019-06-14 06:30:01 |  84 | 30a5e17b-96d3-4806-849f-2d814085b130 | compute01.openstack.local    |      46162 |     NULL |               84 |               NULL |
| 2019-05-08 08:27:01 | 2019-06-14 06:29:45 |  87 | 62f85460-4244-429e-9831-357032a8f5e7 | compute17.openstack.local    |      46258 |     NULL |               87 |               NULL |
| 2019-05-13 11:37:50 | 2019-06-14 06:29:36 |  93 | 4e39206e-b00a-41d9-a2d1-a18085a576a7 | compute23.openstack.local    |      31555 |     NULL |               93 |               NULL |
| 2019-05-13 11:37:51 | 2019-06-14 06:29:46 |  96 | 6db0004d-7bcb-4758-accd-52ef580d967b | compute16.openstack.local    |      40197 |     NULL |               96 |               NULL |
| 2019-05-17 11:50:50 | 2019-06-14 06:29:38 | 102 | 18a0a9f5-c9e7-49a2-8e50-d221aec0a9f0 | compute20.openstack.local    |      31563 |     NULL |              102 |               NULL |
| 2019-05-17 11:50:50 | 2019-06-14 06:29:16 | 105 | 97a16a89-055a-4533-86e5-1285ff1911ff | compute07.openstack.local    |      31495 |     NULL |              105 |               NULL |
| 2019-05-29 11:20:15 | 2019-06-14 06:29:05 | 117 | e088c323-c8cb-4dc6-bb11-675a40cd1fcf | compute12.openstack.local    |      19449 |     NULL |              117 |               NULL |
| 2019-05-29 11:20:16 | 2019-06-14 06:29:27 | 120 | 58f85279-1103-42b6-b01d-e1c8de83b8d2 | compute08.openstack.local    |      19407 |     NULL |              120 |               NULL |
| 2019-05-29 11:20:32 | 2019-06-14 06:29:52 | 123 | 58ac9048-eca2-4f51-8d12-b6165f686cf7 | compute05.openstack.local    |      19392 |     NULL |              123 |               NULL |
| 2019-06-11 09:15:59 | 2019-06-14 06:29:29 | 126 | 882f5ad3-f20f-489f-9a20-e2654fcfa925 | compute13.openstack.local    |       3873 |     NULL |              126 |               NULL |
| 2019-06-11 09:16:23 | 2019-06-14 06:29:23 | 129 | 80e266f2-13f2-439c-b04e-736754fd27cd | compute03.openstack.local    |       3823 |     NULL |              129 |               NULL |
| 2019-06-11 09:16:24 | 2019-06-14 06:29:25 | 132 | 09ef46fa-b9e7-429b-8d5b-f4f46ead3c85 | compute11.openstack.local    |       3844 |     NULL |              132 |               NULL |
| 2019-06-12 12:31:49 | 2019-06-14 06:29:08 | 138 | ebc9a09f-08bb-4839-ab56-c4d06bcc6ed4 | vrtx01-lsd01.openstack.local |        362 |     NULL |              138 |               NULL |
| 2019-06-12 12:32:32 | 2019-06-14 06:29:53 | 141 | d982e5bb-a7d9-40af-b667-43c2f8f2001c | vrtx01-lsd02.openstack.local |        355 |     NULL |              141 |               NULL |
| 2019-06-13 19:42:01 | 2019-06-14 06:30:00 | 147 | ba89a743-b86f-4bb8-8cfa-3f08fc016c6a | compute15.openstack.local    |        612 |     NULL |              147 |               NULL |
| 2019-06-13 19:42:24 | 2019-06-14 06:29:44 | 150 | 68f6b408-ab9f-4fe7-be9c-7e690086f631 | compute18.openstack.local    |        611 |     NULL |              150 |               NULL |
| 2019-06-13 19:42:24 | 2019-06-14 06:29:21 | 153 | f981737a-d8f8-4b0e-8631-eedb95c85907 | compute22.openstack.local    |        592 |     NULL |              153 |               NULL |
| 2019-06-13 19:42:25 | 2019-06-14 06:29:17 | 156 | d7f483ff-3b91-4d13-9900-0ec24c3a06a4 | compute09.openstack.local    |        604 |     NULL |              156 |               NULL |
| 2019-06-13 19:42:26 | 2019-06-14 06:29:09 | 159 | bc05c643-a2db-442d-b721-39db8665f923 | compute14.openstack.local    |        598 |     NULL |              159 |               NULL |

placement returns data on both UUIDs, for example compute18

curl -g  -X GET http://*****:8780/resource_providers/dfb35aab-2af9-4d86-bccb-76959c7f68ed/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:******" | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    98  100    98    0     0   2648      0 --:--:-- --:--:-- --:--:--  2648
    "resource_provider_generation": 435686,
    "usages": {
        "DISK_GB": 150,
        "MEMORY_MB": 9728,
        "VCPU": 7

curl -g  -X GET http://*****:8780/resource_providers/68f6b408-ab9f-4fe7-be9c-7e690086f631/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:*****" | python -m json.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    97  100    97    0     0    740      0 --:--:-- --:--:-- --:--:--   740
    "resource_provider_generation": 664,
    "usages": {
        "DISK_GB": 680,
        "MEMORY_MB": 59392,
        "VCPU": 32

i am speculating heavily on the cause of the issue however other symptoms we have seen
- live migration fails as no suitable host found (despite near empty nodes)
- new VMs fail to spawn as no suitable host found (despite near empty nodes)

these issues lead us to have to continually live migrate VMs to get some
load balancing

other potentially useful input (or separate bugs)

nova-compute.log often has

2019-02-07 13:37:59.362 2632 INFO nova.compute.resource_tracker [req-
e0f53ec7-7668-4a64-8ba6-ead35f168e82 - - - - -] Instance
4fba72d0-2e95-4b92-b0f6-a7853dc3e8bd has allocations against this
compute host but is not found in the database.

we find this in normal running, but also have found it in relation to
live migrations which have failed and have not been rolled back (for
example as a result of the port_binding error)

it is also possible to get multiple entries in the services table, though I don't believe this is related, and will be reported in a separate bug

MariaDB [nova]> select host, services.binary, version from services where host="cc-compute01-kna1"
    -> ;
| host              | binary       | version |
| cc-compute01-kna1 | nova-compute |      35 |
| cc-compute01-kna1 | nova-compute |       0 |

** Affects: nova
     Importance: Undecided
         Status: New

You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).

  Placement API appears to have issues when compute host replaced

Status in OpenStack Compute (nova):

Bug description:
  We have been upgrading our sites from RDO to OSA. This process
  involved live migrating all VMs from a compute host before
  reinstalling it with OSA playbooks.

  Note: the compute host is not "removed" from openstack in anyway; the
  new OSA node is the *same* hardware, same hostname etc - just
  reinstalled as OSA.

  This appears to have consequences for the way the placement API works
  - we have noticed that when live migrating the scheduler will often
  choose a highly loaded node where an empty node exists - for example -
  in the below output from my live migration script the VM is being
  migrated from cc-compute04-kna1; the scheduler has chosen cc-
  compute01-kna1 as the target this despite the load it currently has,
  and that compute09, 15 and 18 are all empty

  Migration Destination: cc-compute01-kna1
  Migration ID: 12993
  | Host              | Project                          | CPU | Memory MB | Disk GB |
  | cc-compute04-kna1 | (used_now)                       | 124 |    254976 |    2790 |
  | cc-compute01-kna1 | (used_now)                       | 230 |    466432 |    8210 |
  | cc-compute03-kna1 | (used_now)                       | 174 |    327680 |    4740 |
  | cc-compute05-kna1 | (used_now)                       | 198 |    457728 |    4430 |
  | cc-compute06-kna1 | (used_now)                       | 163 |    366592 |    4650 |
  | cc-compute07-kna1 | (used_now)                       | 170 |    415744 |    4460 |
  | cc-compute08-kna1 | (used_now)                       | 178 |    382464 |    4750 |
  | cc-compute09-kna1 | (used_now) |   0 |      2048 |       0 |
  | cc-compute11-kna1 | (used_now)                       | 131 |    313856 |    3100 |
  | cc-compute12-kna1 | (used_now)                       | 176 |    392704 |    4800 |
  | cc-compute13-kna1 | (used_now)                       | 173 |    390656 |    5470 |
  | cc-compute14-kna1 | (used_now)                       |   2 |      4096 |      50 |
  | cc-compute15-kna1 | (used_now) |   0 |      2048 |       0 |
  | cc-compute16-kna1 | (used_now)                       | 170 |    355840 |    5410 |
  | cc-compute17-kna1 | (used_now)                       | 281 |    646656 |    5370 |
  | cc-compute18-kna1 | (used_now) |   0 |      2048 |       0 |
  | cc-compute19-kna1 | (used_now)                       | 207 |    517120 |    4860 |
  | cc-compute20-kna1 | (used_now)                       | 223 |    560640 |    5150 |
  | cc-compute23-kna1 | (used_now)                       | 184 |    406528 |    6350 |
  | cc-compute24-kna1 | (used_now)                       | 190 |    585216 |    4820 |
  | cc-compute25-kna1 | (used_now)                       | 235 |    491520 |    5500 |
  | cc-compute26-kna1 | (used_now)                       | 283 |    610304 |    9390 |
  | cc-compute27-kna1 | (used_now)                       | 200 |    573440 |    6730 |
  | cc-compute28-kna1 | (used_now)                       | 269 |    587264 |    6600 |
  | cc-compute29-kna1 | (used_now)                       | 245 |    494080 |    8480 |

  this is not an isolated case, and is something we have seen frequently
  to the point where we override the scheduler and use targeted
  migrations to achieve better load balancing.

  Interrogating the Placement API for a compute (09) prior to
  reinstallation I can find the UUID

              "generation": 480003,
              "links": [
                      "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5",
                      "rel": "self"
                      "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/inventories",
                      "rel": "inventories"
                      "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/usages",
                      "rel": "usages"
                      "href": "/resource_providers/d6aeeeb0-0cab-4e3f-a070-9808801b94a5/aggregates",
                      "rel": "aggregates"
              "name": "cc-compute09-kna1",
              "uuid": "d6aeeeb0-0cab-4e3f-a070-9808801b94a5"

  after the node is reinstalled it has a new UUID

              "generation": 71,
              "links": [
                      "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4",
                      "rel": "self"
                      "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/inventories",
                      "rel": "inventories"
                      "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages",
                      "rel": "usages"
                      "href": "/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/aggregates",
                      "rel": "aggregates"
              "name": "compute09.openstack.local",
              "uuid": "d7f483ff-3b91-4d13-9900-0ec24c3a06a4"

  this new resource provider shows 0 consumed resources

  curl -g  -X GET http://********:8780/resource_providers/d7f483ff-3b91-4d13-9900-0ec24c3a06a4/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:************" | python -m json.tool
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
  100    89  100    89    0     0   2870      0 --:--:-- --:--:-- --:--:--  2870
      "resource_provider_generation": 72,
      "usages": {
          "DISK_GB": 0,
          "MEMORY_MB": 0,
          "VCPU": 0

  investigating the resource_providers table shows pottential duplicate
  entries -

  MariaDB [nova_api]> select * from resource_providers;
  | created_at          | updated_at          | id  | uuid                                 | name                         | generation | can_host | root_provider_id | parent_provider_id |
  | 2018-04-25 21:25:32 | 2019-04-17 07:08:24 |   1 | cbb2c235-ed5f-4f63-9015-1edfe91d63c8 | cc-compute02-kna1            |     195067 |        0 |                1 |               NULL |
  | 2018-04-25 21:44:17 | 2019-05-02 13:23:34 |   2 | 6125fdeb-370f-4139-9d1c-369e9eb4e620 | cc-compute-lsd01-kna1        |         41 |        0 |                2 |               NULL |
  | 2018-04-25 22:13:01 | 2019-05-20 13:11:55 |   3 | 452b7f99-a178-4dc7-9fea-e9d9ab6a3e99 | cc-compute05-kna1            |     450192 |        0 |                3 |               NULL |
  | 2018-04-25 22:13:08 | 2019-06-10 12:28:41 |   4 | 03b420df-79fb-4f0a-aede-bdbd62ce9ce3 | cc-compute03-kna1            |     424867 |        0 |                4 |               NULL |
  | 2018-04-25 22:13:08 | 2019-06-14 06:29:47 |   5 | 9386d418-339c-4010-baa5-18e2aa601a3c | cc-compute04-kna1            |     479160 |        0 |                5 |               NULL |
  | 2018-04-25 22:46:46 | 2019-05-20 13:39:00 |   6 | 7b0580e3-7592-4c3a-a0e9-a8d23f3550d7 | cc-compute07-kna1            |     441489 |        0 |                6 |               NULL |
  | 2018-04-25 22:46:47 | 2019-04-19 18:53:45 |   7 | 98e1b299-239f-488c-a7a0-3f78e76c8f6b | cc-compute06-kna1            |     396721 |        0 |                7 |               NULL |
  | 2018-04-25 22:46:50 | 2019-05-24 07:28:59 |   8 | 64c2b0fb-4d7e-4d5f-92bc-69e00a3cb85e | cc-compute08-kna1            |     449994 |        0 |                8 |               NULL |
  | 2018-04-26 00:47:56 | 2019-06-11 20:43:47 |  11 | 61708a8f-77fd-47dc-9140-6ea613509506 | cc-compute14-kna1            |     474210 |        0 |               11 |               NULL |
  | 2018-04-26 00:48:01 | 2019-05-09 12:20:15 |  12 | 9e082274-568d-49a2-9801-05b2390f7dfa | cc-compute16-kna1            |     432294 |        0 |               12 |               NULL |
  | 2018-04-26 00:48:04 | 2019-06-11 20:11:28 |  14 | 396bb173-2e46-4d35-963e-9b49acf0add8 | cc-compute22-kna1            |     448545 |        0 |               14 |               NULL |
  | 2018-04-26 00:48:06 | 2019-05-21 13:07:23 |  15 | 80e5f3a7-e4a3-43d1-a7a8-4c118fba7792 | cc-compute12-kna1            |     450359 |        0 |               15 |               NULL |
  | 2018-04-26 00:48:20 | 2019-05-16 14:32:54 |  18 | b86db974-5787-4012-a7df-26aeb8e73574 | cc-compute20-kna1            |     425960 |        0 |               18 |               NULL |
  | 2018-04-26 00:48:20 | 2019-06-12 12:24:24 |  19 | dfb35aab-2af9-4d86-bccb-76959c7f68ed | cc-compute18-kna1            |     435686 |        0 |               19 |               NULL |
  | 2018-04-26 00:48:22 | 2019-05-07 10:55:46 |  20 | 4decfcd0-cca2-4ba5-9f83-a86b8f2a8e4d | cc-compute17-kna1            |     418818 |        0 |               20 |               NULL |
  | 2018-10-31 12:04:48 | 2019-04-24 08:32:36 |  28 | 266e5266-f811-4b24-949f-3ed9e841c479 | cc-compute10-kna1            |     166818 |     NULL |               28 |               NULL |
  | 2018-11-01 18:59:56 | 2019-06-14 06:29:52 |  34 | 5180de9c-c964-4661-bfbd-893cdfc19f32 | compute25.openstack.local    |     271667 |     NULL |               34 |               NULL |
  | 2018-11-01 18:59:56 | 2019-06-14 06:29:47 |  37 | 3a456de2-68ea-4472-95dd-2db1c7b29661 | compute24.openstack.local    |     283689 |     NULL |               37 |               NULL |
  | 2019-02-06 19:45:50 | 2019-06-14 06:29:39 |  43 | 0e5e6b94-2992-4075-a922-320bbe8b1bbb | compute26.openstack.local    |     165203 |     NULL |               43 |               NULL |
  | 2019-02-06 19:45:50 | 2019-06-14 06:27:26 |  46 | 008c7549-b638-4130-8e79-858556a787c2 | compute27.openstack.local    |     166810 |     NULL |               46 |               NULL |
  | 2019-02-10 17:45:03 | 2019-06-14 06:29:16 |  52 | 1fe21d2b-e6f1-4820-b341-a490cf9704d8 | compute29.openstack.local    |     161380 |     NULL |               52 |               NULL |
  | 2019-02-10 17:45:03 | 2019-06-14 06:29:08 |  55 | e636f01c-b5da-4886-8a60-1baa5371bcc5 | compute28.openstack.local    |     159388 |     NULL |               55 |               NULL |
  | 2019-04-30 09:53:45 | 2019-06-14 06:29:36 |  76 | 34381a1c-1b4e-4716-b7ba-ea72956b92f7 | compute19.openstack.local    |      56127 |     NULL |               76 |               NULL |
  | 2019-04-30 13:20:12 | 2019-06-14 06:29:37 |  79 | 946fa4f1-5f1d-47be-b65c-038a7e20c42b | compute06.openstack.local    |      56068 |     NULL |               79 |               NULL |
  | 2019-05-08 08:26:45 | 2019-06-14 06:30:01 |  84 | 30a5e17b-96d3-4806-849f-2d814085b130 | compute01.openstack.local    |      46162 |     NULL |               84 |               NULL |
  | 2019-05-08 08:27:01 | 2019-06-14 06:29:45 |  87 | 62f85460-4244-429e-9831-357032a8f5e7 | compute17.openstack.local    |      46258 |     NULL |               87 |               NULL |
  | 2019-05-13 11:37:50 | 2019-06-14 06:29:36 |  93 | 4e39206e-b00a-41d9-a2d1-a18085a576a7 | compute23.openstack.local    |      31555 |     NULL |               93 |               NULL |
  | 2019-05-13 11:37:51 | 2019-06-14 06:29:46 |  96 | 6db0004d-7bcb-4758-accd-52ef580d967b | compute16.openstack.local    |      40197 |     NULL |               96 |               NULL |
  | 2019-05-17 11:50:50 | 2019-06-14 06:29:38 | 102 | 18a0a9f5-c9e7-49a2-8e50-d221aec0a9f0 | compute20.openstack.local    |      31563 |     NULL |              102 |               NULL |
  | 2019-05-17 11:50:50 | 2019-06-14 06:29:16 | 105 | 97a16a89-055a-4533-86e5-1285ff1911ff | compute07.openstack.local    |      31495 |     NULL |              105 |               NULL |
  | 2019-05-29 11:20:15 | 2019-06-14 06:29:05 | 117 | e088c323-c8cb-4dc6-bb11-675a40cd1fcf | compute12.openstack.local    |      19449 |     NULL |              117 |               NULL |
  | 2019-05-29 11:20:16 | 2019-06-14 06:29:27 | 120 | 58f85279-1103-42b6-b01d-e1c8de83b8d2 | compute08.openstack.local    |      19407 |     NULL |              120 |               NULL |
  | 2019-05-29 11:20:32 | 2019-06-14 06:29:52 | 123 | 58ac9048-eca2-4f51-8d12-b6165f686cf7 | compute05.openstack.local    |      19392 |     NULL |              123 |               NULL |
  | 2019-06-11 09:15:59 | 2019-06-14 06:29:29 | 126 | 882f5ad3-f20f-489f-9a20-e2654fcfa925 | compute13.openstack.local    |       3873 |     NULL |              126 |               NULL |
  | 2019-06-11 09:16:23 | 2019-06-14 06:29:23 | 129 | 80e266f2-13f2-439c-b04e-736754fd27cd | compute03.openstack.local    |       3823 |     NULL |              129 |               NULL |
  | 2019-06-11 09:16:24 | 2019-06-14 06:29:25 | 132 | 09ef46fa-b9e7-429b-8d5b-f4f46ead3c85 | compute11.openstack.local    |       3844 |     NULL |              132 |               NULL |
  | 2019-06-12 12:31:49 | 2019-06-14 06:29:08 | 138 | ebc9a09f-08bb-4839-ab56-c4d06bcc6ed4 | vrtx01-lsd01.openstack.local |        362 |     NULL |              138 |               NULL |
  | 2019-06-12 12:32:32 | 2019-06-14 06:29:53 | 141 | d982e5bb-a7d9-40af-b667-43c2f8f2001c | vrtx01-lsd02.openstack.local |        355 |     NULL |              141 |               NULL |
  | 2019-06-13 19:42:01 | 2019-06-14 06:30:00 | 147 | ba89a743-b86f-4bb8-8cfa-3f08fc016c6a | compute15.openstack.local    |        612 |     NULL |              147 |               NULL |
  | 2019-06-13 19:42:24 | 2019-06-14 06:29:44 | 150 | 68f6b408-ab9f-4fe7-be9c-7e690086f631 | compute18.openstack.local    |        611 |     NULL |              150 |               NULL |
  | 2019-06-13 19:42:24 | 2019-06-14 06:29:21 | 153 | f981737a-d8f8-4b0e-8631-eedb95c85907 | compute22.openstack.local    |        592 |     NULL |              153 |               NULL |
  | 2019-06-13 19:42:25 | 2019-06-14 06:29:17 | 156 | d7f483ff-3b91-4d13-9900-0ec24c3a06a4 | compute09.openstack.local    |        604 |     NULL |              156 |               NULL |
  | 2019-06-13 19:42:26 | 2019-06-14 06:29:09 | 159 | bc05c643-a2db-442d-b721-39db8665f923 | compute14.openstack.local    |        598 |     NULL |              159 |               NULL |

  placement returns data on both UUIDs, for example compute18

  curl -g  -X GET http://*****:8780/resource_providers/dfb35aab-2af9-4d86-bccb-76959c7f68ed/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:******" | python -m json.tool
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
  100    98  100    98    0     0   2648      0 --:--:-- --:--:-- --:--:--  2648
      "resource_provider_generation": 435686,
      "usages": {
          "DISK_GB": 150,
          "MEMORY_MB": 9728,
          "VCPU": 7

  curl -g  -X GET http://*****:8780/resource_providers/68f6b408-ab9f-4fe7-be9c-7e690086f631/usages -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.2" -H "User-Agent: openstacksdk/0.31.0 keystoneauth1/3.14.0 python-requests/2.22.0 CPython/2.7.12" -H "X-Auth-Token:*****" | python -m json.tool
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                   Dload  Upload   Total   Spent    Left  Speed
  100    97  100    97    0     0    740      0 --:--:-- --:--:-- --:--:--   740
      "resource_provider_generation": 664,
      "usages": {
          "DISK_GB": 680,
          "MEMORY_MB": 59392,
          "VCPU": 32

  i am speculating heavily on the cause of the issue however other symptoms we have seen
  - live migration fails as no suitable host found (despite near empty nodes)
  - new VMs fail to spawn as no suitable host found (despite near empty nodes)

  these issues lead us to have to continually live migrate VMs to get
  some load balancing

  other potentially useful input (or separate bugs)

  nova-compute.log often has

  2019-02-07 13:37:59.362 2632 INFO nova.compute.resource_tracker [req-
  e0f53ec7-7668-4a64-8ba6-ead35f168e82 - - - - -] Instance
  4fba72d0-2e95-4b92-b0f6-a7853dc3e8bd has allocations against this
  compute host but is not found in the database.

  we find this in normal running, but also have found it in relation to
  live migrations which have failed and have not been rolled back (for
  example as a result of the port_binding error)

  it is also possible to get multiple entries in the services table, though I don't believe this is related, and will be reported in a separate bug

  MariaDB [nova]> select host, services.binary, version from services where host="cc-compute01-kna1"
      -> ;
  | host              | binary       | version |
  | cc-compute01-kna1 | nova-compute |      35 |
  | cc-compute01-kna1 | nova-compute |       0 |

To manage notifications about this bug go to:

Follow ups