← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1853840] Re: Neutron fails to create bandwidth providers if CONF.host is set

 

Reviewed:  https://review.opendev.org/696600
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=258eebea71b1cac37badf429a90d5cf57e4c455c
Submitter: Zuul
Branch:    master

commit 258eebea71b1cac37badf429a90d5cf57e4c455c
Author: Bence Romsics <bence.romsics@xxxxxxxxx>
Date:   Wed Nov 27 17:59:15 2019 +0100

    Locate RP-tree parent by hypervisor name
    
    Previously we assumed that we can look up the resource provider (created
    by nova) to be used as the parent of the agent and physical NIC resource
    provider tree by the name set in the config option DEFAULT.host. This
    assumption was wrong.
    
    While nova-compute's DEFAULT.host and neutron-agent's DEFAULT.host
    must match for port binding to work, the root resource provider created
    by nova does not belong to the compute host (where nova-compute runs)
    but it belongs to the compute nodes (i.e. hypervisors). Actually there
    may be multiple compute nodes managed by a single nova-compute (think
    of ironic). Plus the value of DEFAULT.host and the compute node's ID
    may be different even when nova-compute manages a hypervisor on the
    same host because of various deployment considerations. For example
    when tripleo does not manage the undercloud (so a libvirt hypervisor
    returns the plain hostname), but the same tripleo enforces it's host
    naming conventions in nova's and neutron's DEFAULT.host settings.
    
    This change enables neutron to use the hypervisor name to locate the
    root of the resource provider tree.
    
    We introduce a new configuration option for
    
    (1) ovs-agent: resource_provider_hypervisors, for example:
    
    [ovs]
    bridge_mappings = physnet0:br-physnet0,...
    resource_provider_bandwidths = br-physnet0:10000000:10000000,...
    resource_provider_hypervisors = br-physnet0:hypervisor0,...
    
    (2) sriov-agent: resource_provider_hypervisors, for example:
    
    [sriov_nic]
    bridge_mappings = physnet1:ens5,...
    resource_provider_bandwidths = ens5:10000000:10000000,...
    resource_provider_hypervisors = ens5:hypervisor1,...
    
    For both agents 'resource_provider_hypervisors' values default to
    socket.gethostname() for each key in resource_provider_bandwidths.
    
    We try to not block later developments in which one neutron
    agent may manage devices on multiple hosts. That's why we allow
    the each physdev to be associated with a different hypervisor.
    
    But here we do not try to solve the problem that the natural physdev
    identifiers may not be unique accross multiple hosts. We leave solving
    this problem to whoever wants to implement an agent handling devices of
    multiple hosts.
    
    (3) We extend report_state message's configurations field alike:
    
    {
    'bridge_mappings': {'physnet0': 'br-physnet0'},
    'resource_provider_bandwidths': {
        'br-physnet0': {'egress': 10000000, 'ingress': 10000000}},
    'resource_provider_hypervisors': {'br-physnet0': 'hypervisor0'},
    ...
    }
    
    (4) In neutron-server we use
    report_state.configurations.resource_provider_hypervisors.PHYSDEV
    when selecting parent resource provider for agent and physdev
    RP-tree. When not available in the message we fall back to using
    report_state.host as before.
    
    Since we only changed the free-format configurations field of the
    report_state message rpc version is not bumped and we expect this
    change to be backported to stein and train.
    
    Change-Id: I9b08a3a9c20b702b745b41d4885fb5120fd665ce
    Closes-Bug: #1853840


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1853840

Title:
  Neutron fails to create bandwidth providers if CONF.host is set

Status in neutron:
  Fix Released

Bug description:
  If neutron is configured to support qos minimum bandwidth policy rules
  and the [DEFAULT]/host config option is set for both nova-compute and
  neutron (sriov / ovs) agents on a given compute host then neutron
  fails to find the compute host root resource provider and therefore
  fails to create the agent providers and the device providers.

  Reproduction:
  * deploy an all-in-one devstack with the minimum bandwidth configuration [1] and set [DEFAULT]/host for nova-compute and neutron agent to something else than the hostname of the compute host.
  * start up the nova and neutron services
  * check what resource providers are created during the startup

  Expected:

  stack@aio:~/devstack$ openstack resource provider list
  +--------------------------------------+--------------------------------+------------+
  | uuid                                 | name                           | generation |
  +--------------------------------------+--------------------------------+------------+
  | 737d9a03-3f8d-4740-9b3b-933fac0dded9 | aio                            |          2 |
  | 31b21568-8d05-5d9c-a045-6956ac62790a | aio:Open vSwitch agent         |          0 |
  | 1110cf59-cabf-526c-bacc-08baabbac692 | aio:Open vSwitch agent:br-test |          2 |
  | 9734f92c-16da-585b-a19c-e3d4f30302fe | aio:NIC Switch agent           |          0 |
  +--------------------------------------+--------------------------------+------------+

  Actual:
  stack@aio:~/devstack$ openstack resource provider list
  +--------------------------------------+--------------------------------+------------+
  | uuid                                 | name                           | generation |
  +--------------------------------------+--------------------------------+------------+
  | 737d9a03-3f8d-4740-9b3b-933fac0dded9 | aio                            |          2 |
  +--------------------------------------+--------------------------------+------------+

  There is the following log visible in the neturon-server:

  Nov 22 11:39:34 aio neutron-server[14589]: DEBUG neutron.services.placement_report.plugin [None req-59a8b1b9-771b-4a38-9270-ea9fabccebb4 None None] placement: syncing state for agent type Open vSwitch agent on host not-the-compute-hostname {{(pid=14612) handle_placement_config /opt/stack/neutron/neutron/services/placement_report/plugin.py:197}}
  Nov 22 11:39:34 aio neutron-server[14589]: WARNING neutron.services.placement_report.plugin [None req-59a8b1b9-771b-4a38-9270-ea9fabccebb4 None None] Synchronization of resources of agent type Open vSwitch agent at host not-the-compute-hostname to placement failed.: IndexError: list index out of range

  Precieved severity:
  * Medium, workaround exists: do not try to use qos configuration at the same time when the [DEFAULT]/host needs to be configured to other than the hostname of ht compute host.

  Version: neutron from master 418be00155a9fa93c8f63bd1d847d2fb3410228b

  ML post about the problem and discussion about possible solution:
  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-November/011044.html

  [1] https://docs.openstack.org/neutron/latest/admin/config-qos-min-
  bw.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1853840/+subscriptions


References