yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81068
[Bug 1853840] Re: Neutron fails to create bandwidth providers if CONF.host is set
Reviewed: https://review.opendev.org/696600
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=258eebea71b1cac37badf429a90d5cf57e4c455c
Submitter: Zuul
Branch: master
commit 258eebea71b1cac37badf429a90d5cf57e4c455c
Author: Bence Romsics <bence.romsics@xxxxxxxxx>
Date: Wed Nov 27 17:59:15 2019 +0100
Locate RP-tree parent by hypervisor name
Previously we assumed that we can look up the resource provider (created
by nova) to be used as the parent of the agent and physical NIC resource
provider tree by the name set in the config option DEFAULT.host. This
assumption was wrong.
While nova-compute's DEFAULT.host and neutron-agent's DEFAULT.host
must match for port binding to work, the root resource provider created
by nova does not belong to the compute host (where nova-compute runs)
but it belongs to the compute nodes (i.e. hypervisors). Actually there
may be multiple compute nodes managed by a single nova-compute (think
of ironic). Plus the value of DEFAULT.host and the compute node's ID
may be different even when nova-compute manages a hypervisor on the
same host because of various deployment considerations. For example
when tripleo does not manage the undercloud (so a libvirt hypervisor
returns the plain hostname), but the same tripleo enforces it's host
naming conventions in nova's and neutron's DEFAULT.host settings.
This change enables neutron to use the hypervisor name to locate the
root of the resource provider tree.
We introduce a new configuration option for
(1) ovs-agent: resource_provider_hypervisors, for example:
[ovs]
bridge_mappings = physnet0:br-physnet0,...
resource_provider_bandwidths = br-physnet0:10000000:10000000,...
resource_provider_hypervisors = br-physnet0:hypervisor0,...
(2) sriov-agent: resource_provider_hypervisors, for example:
[sriov_nic]
bridge_mappings = physnet1:ens5,...
resource_provider_bandwidths = ens5:10000000:10000000,...
resource_provider_hypervisors = ens5:hypervisor1,...
For both agents 'resource_provider_hypervisors' values default to
socket.gethostname() for each key in resource_provider_bandwidths.
We try to not block later developments in which one neutron
agent may manage devices on multiple hosts. That's why we allow
the each physdev to be associated with a different hypervisor.
But here we do not try to solve the problem that the natural physdev
identifiers may not be unique accross multiple hosts. We leave solving
this problem to whoever wants to implement an agent handling devices of
multiple hosts.
(3) We extend report_state message's configurations field alike:
{
'bridge_mappings': {'physnet0': 'br-physnet0'},
'resource_provider_bandwidths': {
'br-physnet0': {'egress': 10000000, 'ingress': 10000000}},
'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'},
...
}
(4) In neutron-server we use
report_state.configurations.resource_provider_hypervisors.PHYSDEV
when selecting parent resource provider for agent and physdev
RP-tree. When not available in the message we fall back to using
report_state.host as before.
Since we only changed the free-format configurations field of the
report_state message rpc version is not bumped and we expect this
change to be backported to stein and train.
Change-Id: I9b08a3a9c20b702b745b41d4885fb5120fd665ce
Closes-Bug: #1853840
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1853840
Title:
Neutron fails to create bandwidth providers if CONF.host is set
Status in neutron:
Fix Released
Bug description:
If neutron is configured to support qos minimum bandwidth policy rules
and the [DEFAULT]/host config option is set for both nova-compute and
neutron (sriov / ovs) agents on a given compute host then neutron
fails to find the compute host root resource provider and therefore
fails to create the agent providers and the device providers.
Reproduction:
* deploy an all-in-one devstack with the minimum bandwidth configuration [1] and set [DEFAULT]/host for nova-compute and neutron agent to something else than the hostname of the compute host.
* start up the nova and neutron services
* check what resource providers are created during the startup
Expected:
stack@aio:~/devstack$ openstack resource provider list
+--------------------------------------+--------------------------------+------------+
| uuid | name | generation |
+--------------------------------------+--------------------------------+------------+
| 737d9a03-3f8d-4740-9b3b-933fac0dded9 | aio | 2 |
| 31b21568-8d05-5d9c-a045-6956ac62790a | aio:Open vSwitch agent | 0 |
| 1110cf59-cabf-526c-bacc-08baabbac692 | aio:Open vSwitch agent:br-test | 2 |
| 9734f92c-16da-585b-a19c-e3d4f30302fe | aio:NIC Switch agent | 0 |
+--------------------------------------+--------------------------------+------------+
Actual:
stack@aio:~/devstack$ openstack resource provider list
+--------------------------------------+--------------------------------+------------+
| uuid | name | generation |
+--------------------------------------+--------------------------------+------------+
| 737d9a03-3f8d-4740-9b3b-933fac0dded9 | aio | 2 |
+--------------------------------------+--------------------------------+------------+
There is the following log visible in the neturon-server:
Nov 22 11:39:34 aio neutron-server[14589]: DEBUG neutron.services.placement_report.plugin [None req-59a8b1b9-771b-4a38-9270-ea9fabccebb4 None None] placement: syncing state for agent type Open vSwitch agent on host not-the-compute-hostname {{(pid=14612) handle_placement_config /opt/stack/neutron/neutron/services/placement_report/plugin.py:197}}
Nov 22 11:39:34 aio neutron-server[14589]: WARNING neutron.services.placement_report.plugin [None req-59a8b1b9-771b-4a38-9270-ea9fabccebb4 None None] Synchronization of resources of agent type Open vSwitch agent at host not-the-compute-hostname to placement failed.: IndexError: list index out of range
Precieved severity:
* Medium, workaround exists: do not try to use qos configuration at the same time when the [DEFAULT]/host needs to be configured to other than the hostname of ht compute host.
Version: neutron from master 418be00155a9fa93c8f63bd1d847d2fb3410228b
ML post about the problem and discussion about possible solution:
http://lists.openstack.org/pipermail/openstack-
discuss/2019-November/011044.html
[1] https://docs.openstack.org/neutron/latest/admin/config-qos-min-
bw.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1853840/+subscriptions
References