← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1921150] [NEW] Repeated ERROR log: Unable to save resource provider ... because: re-parenting a provider is not currently allowed

 

Public bug reported:

Description
===========
If neutron is configured with QoS guaranteed minimum bandwidth and the deployment is upgraded from a Stein 14.0.4 or older, or Train 15.0.1 or older to any newer OpenStack versions the following stack trace appears repeatedly in the neutron-server log:

Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin Traceback (most recent call last):
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return f(self, *a, **k)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 232, in update_resource_provider
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return self._put(url, update_body).json()
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 188, in _put
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     endpoint_filter=self._ks_filter, **kwargs)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 1114, in put
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return self.request(url, 'PUT', **kwargs)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 943, in request
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     raise exceptions.from_response(resp, method, url)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin keystoneauth1.exceptions.http.BadRequest: Bad Request (HTTP 400) (Request-ID: req-31ef5696-dc60-4478-939b-a12d3d3bdf65)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin 
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin During handling of the above exception, another exception occurred:
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin 
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin Traceback (most recent call last):
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron/neutron/services/placement_report/plugin.py", line 163, in batch
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     deferred.execute()
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron/neutron/agent/common/placement_report.py", line 43, in execute
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return self.func(*self.args, **self.kwargs)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return f(self, *a, **k)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 254, in ensure_resource_provider
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     resource_provider=resource_provider)
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 62, in wrapper
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     msg=exc.response.text.replace('\n', ' '))
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin neutron_lib.exceptions.placement.PlacementClientError: Placement Client Error (4xx): {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n Unable to save resource provider af0bc0aa-525e-563f-bb4d-2f26f70371d6: Object action update failed because: re-parenting a provider is not currently allowed.  ", "request_id": "req-31ef5696-dc60-4478-939b-a12d3d3bdf65"}]}
Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin 
Mar 24 12:12:36 ubuntu neutron-server[4499]: WARNING neutron.services.placement_report.plugin [-] Synchronization of resources of agent type Open vSwitch agent at host ubuntu to placement failed.


Steps to reproduce
==================
1) Deploy neutron Stein 14.0.4 or older or Train 15.0.1 or older
2) Configure minimum guaranteed bandwidth according to [1] E.g define bandwidth inventory in the ovs or sriov agent config with the [ovs] or [sriov_nic]/resource_provider_bandwidths config option
3) Observe that the agent RP and the device RPs are created
  
    $ openstack --os-placement-api-version 1.14 resource provider tree list 

and that the parent of the device RP is the agent RP.

4) Upgrade to a newer OpenStack version
5) Observer that the above periodic error appears in the neutron-server log.

Expected result
===============

No error logs

Actual result
=============

Repeated error logs appear

Triage
======

The problem is caused by the bugfix [2] merged in Ussuri and backported
to stable/train and stable/stein.

Before patch[2] the RP tree in placement is created in the following
structure:

  computeRP
    \- agent_1_RP
    |   \- device_1_RP
    |   \- device_2_RP
    |
    \- agent_2_RP

So that the parent of the deviceRP is the agentRP that has the given
device configured.

However after patch [2] neutron would like to create a tree where the
parent of the deviceRP is the computeRP:

  computeRP
    \- agent_1_RP
    \- agent_2_RP
    \- device_1_RP
    \- device_2_RP

If the deviceRP already exists under the agentRP before the upgrade then
after the upgrade neturon tries to update the parent of the deviceRP to
point to the computeRP. However placement API does not allow such re-
parenting of the RP. Hence the periodic ERROR message in the neutron-
server's log.

If the a new device is added to the ovs and sriov agent config after the
upgrade then the neutron-server successfully creates the deviceRP under
the computeRP and therefore no repeated ERROR log appears.

Changing the structure of the RP tree was a mistake in [2]. The correct
and intended structure is where the deviceRP is under the agentRP.

Fortunately the direct effect of this mistake is limited to:
* repeated ERROR log visible in the neutron server
* neutron retries the placement sync at every agent hearthbeat causing unnecessary load on Placement.

The QoS guaranteed minimum bandwidth feature works properly with both
tree structure. Neutron can  create new device RPs or can update
resource inventory on exising device RPs. Nova and Placement can use
both type of tree to schedule VMs with ports having QoS policies and the
resource accounting will be correct in Placement.

Proposed solution
=================

The fix is twofold. First we need to restore the proper tree creation
logic in neturon. Then provide a way to fix the parent of the deviceRPs
created after [2] is applied. Thi

Restore the proper tree creation logic
--------------------------------------

It will be a simple fix that makes sure that neturon tries to create
deviceRPs under the agentRP. This will cause the the repeated ERROR log
will dissapeare in deployments that was upgraded from before [2].
However this will cause that the same ERROR log will appeare in
deployments that configure new devices after [2] was applied. The fix
will enhance the log message to explain the problem.

This fix will be backported to all the affected branches up until
stable/stein

Fix the wrongly parented RPs
----------------------------

To re-parent the wrongly parented RPs we need to change Placement to
allow re-parenting via the PUT /resource_providers/{provider_uuid} API.

Or alternatively we need to provide a script to the cloud admins that is
capable of fixing the Placement DB via SQL commands.


[1] https://docs.openstack.org/neutron/latest/admin/config-qos-min-bw.html
[2] https://review.opendev.org/q/I9b08a3a9c20b702b745b41d4885fb5120fd665ce

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: qos

** Tags added: qos

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1921150

Title:
  Repeated ERROR log: Unable to save resource provider ... because: re-
  parenting a provider is not currently allowed

Status in neutron:
  New

Bug description:
  Description
  ===========
  If neutron is configured with QoS guaranteed minimum bandwidth and the deployment is upgraded from a Stein 14.0.4 or older, or Train 15.0.1 or older to any newer OpenStack versions the following stack trace appears repeatedly in the neutron-server log:

  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin Traceback (most recent call last):
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return f(self, *a, **k)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 232, in update_resource_provider
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return self._put(url, update_body).json()
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 188, in _put
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     endpoint_filter=self._ks_filter, **kwargs)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 1114, in put
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return self.request(url, 'PUT', **kwargs)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/usr/local/lib/python3.6/dist-packages/keystoneauth1/session.py", line 943, in request
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     raise exceptions.from_response(resp, method, url)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin keystoneauth1.exceptions.http.BadRequest: Bad Request (HTTP 400) (Request-ID: req-31ef5696-dc60-4478-939b-a12d3d3bdf65)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin 
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin During handling of the above exception, another exception occurred:
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin 
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin Traceback (most recent call last):
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron/neutron/services/placement_report/plugin.py", line 163, in batch
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     deferred.execute()
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron/neutron/agent/common/placement_report.py", line 43, in execute
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return self.func(*self.args, **self.kwargs)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 53, in wrapper
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     return f(self, *a, **k)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 254, in ensure_resource_provider
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     resource_provider=resource_provider)
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin   File "/opt/stack/neutron-lib/neutron_lib/placement/client.py", line 62, in wrapper
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin     msg=exc.response.text.replace('\n', ' '))
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin neutron_lib.exceptions.placement.PlacementClientError: Placement Client Error (4xx): {"errors": [{"status": 400, "title": "Bad Request", "detail": "The server could not comply with the request since it is either malformed or otherwise incorrect.\n\n Unable to save resource provider af0bc0aa-525e-563f-bb4d-2f26f70371d6: Object action update failed because: re-parenting a provider is not currently allowed.  ", "request_id": "req-31ef5696-dc60-4478-939b-a12d3d3bdf65"}]}
  Mar 24 12:12:36 ubuntu neutron-server[4499]: ERROR neutron.services.placement_report.plugin 
  Mar 24 12:12:36 ubuntu neutron-server[4499]: WARNING neutron.services.placement_report.plugin [-] Synchronization of resources of agent type Open vSwitch agent at host ubuntu to placement failed.

  
  Steps to reproduce
  ==================
  1) Deploy neutron Stein 14.0.4 or older or Train 15.0.1 or older
  2) Configure minimum guaranteed bandwidth according to [1] E.g define bandwidth inventory in the ovs or sriov agent config with the [ovs] or [sriov_nic]/resource_provider_bandwidths config option
  3) Observe that the agent RP and the device RPs are created
    
      $ openstack --os-placement-api-version 1.14 resource provider tree list 

  and that the parent of the device RP is the agent RP.

  4) Upgrade to a newer OpenStack version
  5) Observer that the above periodic error appears in the neutron-server log.

  Expected result
  ===============

  No error logs

  Actual result
  =============

  Repeated error logs appear

  Triage
  ======

  The problem is caused by the bugfix [2] merged in Ussuri and
  backported to stable/train and stable/stein.

  Before patch[2] the RP tree in placement is created in the following
  structure:

    computeRP
      \- agent_1_RP
      |   \- device_1_RP
      |   \- device_2_RP
      |
      \- agent_2_RP

  So that the parent of the deviceRP is the agentRP that has the given
  device configured.

  However after patch [2] neutron would like to create a tree where the
  parent of the deviceRP is the computeRP:

    computeRP
      \- agent_1_RP
      \- agent_2_RP
      \- device_1_RP
      \- device_2_RP

  If the deviceRP already exists under the agentRP before the upgrade
  then after the upgrade neturon tries to update the parent of the
  deviceRP to point to the computeRP. However placement API does not
  allow such re-parenting of the RP. Hence the periodic ERROR message in
  the neutron-server's log.

  If the a new device is added to the ovs and sriov agent config after
  the upgrade then the neutron-server successfully creates the deviceRP
  under the computeRP and therefore no repeated ERROR log appears.

  Changing the structure of the RP tree was a mistake in [2]. The
  correct and intended structure is where the deviceRP is under the
  agentRP.

  Fortunately the direct effect of this mistake is limited to:
  * repeated ERROR log visible in the neutron server
  * neutron retries the placement sync at every agent hearthbeat causing unnecessary load on Placement.

  The QoS guaranteed minimum bandwidth feature works properly with both
  tree structure. Neutron can  create new device RPs or can update
  resource inventory on exising device RPs. Nova and Placement can use
  both type of tree to schedule VMs with ports having QoS policies and
  the resource accounting will be correct in Placement.

  Proposed solution
  =================

  The fix is twofold. First we need to restore the proper tree creation
  logic in neturon. Then provide a way to fix the parent of the
  deviceRPs created after [2] is applied. Thi

  Restore the proper tree creation logic
  --------------------------------------

  It will be a simple fix that makes sure that neturon tries to create
  deviceRPs under the agentRP. This will cause the the repeated ERROR
  log will dissapeare in deployments that was upgraded from before [2].
  However this will cause that the same ERROR log will appeare in
  deployments that configure new devices after [2] was applied. The fix
  will enhance the log message to explain the problem.

  This fix will be backported to all the affected branches up until
  stable/stein

  Fix the wrongly parented RPs
  ----------------------------

  To re-parent the wrongly parented RPs we need to change Placement to
  allow re-parenting via the PUT /resource_providers/{provider_uuid}
  API.

  Or alternatively we need to provide a script to the cloud admins that
  is capable of fixing the Placement DB via SQL commands.

  
  [1] https://docs.openstack.org/neutron/latest/admin/config-qos-min-bw.html
  [2] https://review.opendev.org/q/I9b08a3a9c20b702b745b41d4885fb5120fd665ce

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1921150/+subscriptions


Follow ups