← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1936839] [NEW] Ingress bw-limit with DPDK does not work

 

Public bug reported:

A colleague of mine working downstream found the following bug (his
report follows with minor redactions of company-internal details). I'm
going to push his proposed fix in a minute too.

In short, the inbound bandwidth limitation on vHost user ports doesn't
seem to work. The value set with OpenStack QoS commands on the port
isn't configured properly. The problem exists in OVS backend.

Creating a 1 Mbit/s limit rule:

openstack network qos rule create max_1_Mbps --type bandwidth-limit
--max-kbps 1000 --max-burst-kbits 1000 --ingress

After applying to the port, you can query it on the compute:

compute-0-5:/home/ceeinfra # ovs-vsctl list qos
_uuid               : c326ed8b-24ef-4f1f-a5b0-b20f3ca3297d
external_ids        : {id=vhu84edf6c2-f0}
other_config        : {cbs="125000.0", cir="125000.0"}
queues              : {}
type                : egress-policer

Note: the traffic is ingress from the VM point of view, and egress from
OVS.

The values are not integers, they have a .0 at the end. In
/var/log/openvswitch/ovs-vswitchd.log you can see that it is not
accepted:

2021-07-15T12:36:23.121Z|00208|netdev_dpdk|ERR|Could not create rte meter for egress policer
2021-07-15T12:36:23.121Z|00209|netdev_dpdk|ERR|Failed to set QoS type egress-policer on port vhu84edf6c2-f0: Invalid argument
2021-07-15T12:36:23.126Z|00210|netdev_dpdk|ERR|Could not create rte meter for egress policer
2021-07-15T12:36:23.126Z|00211|netdev_dpdk|ERR|Failed to set QoS type egress-policer on port vhu84edf6c2-f0: Invalid argument

If you create a traffic between two VMs, the downloading one having the
limitation applied on its port reports this:

root@bwtest2:~# nc 192.168.1.201 5555 | dd of=/dev/null status=progress
816316928 bytes (816 MB, 779 MiB) copied, 5 s, 163 MB/s^C
1863705+71 records in
1863738+0 records out
954233856 bytes (954 MB, 910 MiB) copied, 8.23046 s, 116 MB/s

The bandwidth is higher than the set 1 Mb/s.

It is possible to modify the OVS agent so it applies the bandwidth limit
correctly. You have to find out where the Python scripts of the
neutron_openvswitch_agent container are stored on the compute host. In
our environment the file to modify is:

/var/lib/docker/overlay2/68653008fca0a6434adb3985b021b2329680b71b49859c3a028f951deed59df3/merged/usr/lib/python3.6/site-
packages/neutron/agent/common/ovs_lib.py

In the _update_ingress_bw_limit_for_dpdk_port function, the original
code is:

        # cir and cbs should be set in bytes instead of bits
        qos_other_config = {
            'cir': str(max_bw_in_bits / 8),
            'cbs': str(max_burst_in_bits / 8)
        }

If you modify the code to this:

        # cir and cbs should be set in bytes instead of bits
        qos_other_config = {
            'cir': str(int(max_bw_in_bits / 8)),
            'cbs': str(int(max_burst_in_bits / 8))
        }

the values passed to OVS will be integers. You can see the difference
querying the new values after applying the limit on the ports again:

compute-0-5:/home/ceeinfra # ovs-vsctl list qos
_uuid               : b93b1165-e839-4378-a6b7-b75c13ad0d41
external_ids        : {id=vhu84edf6c2-f0}
other_config        : {cbs="125000", cir="125000"}
queues              : {}
type                : egress-policer

They don't have the .0 at the and anymore, and OVS doesn't complain in
the logs about invalid arguments. The bandwidth limitation between the
computes now works:

root@bwtest2:~# nc 192.168.1.201 5555 | dd of=/dev/null status=progress
4095488 bytes (4.1 MB, 3.9 MiB) copied, 33 s, 123 kB/s^C
7274+1382 records in
8051+0 records out
4122112 bytes (4.1 MB, 3.9 MiB) copied, 33.4033 s, 123 kB/s

125 kB/s translates to 1 Mb/s that we have applied with the rule, so it
works now.

My guess is that this problem comes from the different behaviour between
the division in Python2 and Python3:

user@debian:~$ python2
Python 2.7.16 (default, Oct 10 2019, 22:02:15)
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 4/3
1
>>> 
user@debian:~$ python3
Python 3.7.3 (default, Jan 22 2021, 20:04:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 4/3
1.3333333333333333
>>> 

Python3 doesn't round to integers, and OVS doesn't seem to accept
floating point numbers.

I have seen this in multiple versions.

** Affects: neutron
     Importance: Undecided
     Assignee: Bence Romsics (bence-romsics)
         Status: In Progress


** Tags: ovs qos

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1936839

Title:
  Ingress bw-limit with DPDK does not work

Status in neutron:
  In Progress

Bug description:
  A colleague of mine working downstream found the following bug (his
  report follows with minor redactions of company-internal details). I'm
  going to push his proposed fix in a minute too.

  In short, the inbound bandwidth limitation on vHost user ports doesn't
  seem to work. The value set with OpenStack QoS commands on the port
  isn't configured properly. The problem exists in OVS backend.

  Creating a 1 Mbit/s limit rule:

  openstack network qos rule create max_1_Mbps --type bandwidth-limit
  --max-kbps 1000 --max-burst-kbits 1000 --ingress

  After applying to the port, you can query it on the compute:

  compute-0-5:/home/ceeinfra # ovs-vsctl list qos
  _uuid               : c326ed8b-24ef-4f1f-a5b0-b20f3ca3297d
  external_ids        : {id=vhu84edf6c2-f0}
  other_config        : {cbs="125000.0", cir="125000.0"}
  queues              : {}
  type                : egress-policer

  Note: the traffic is ingress from the VM point of view, and egress
  from OVS.

  The values are not integers, they have a .0 at the end. In
  /var/log/openvswitch/ovs-vswitchd.log you can see that it is not
  accepted:

  2021-07-15T12:36:23.121Z|00208|netdev_dpdk|ERR|Could not create rte meter for egress policer
  2021-07-15T12:36:23.121Z|00209|netdev_dpdk|ERR|Failed to set QoS type egress-policer on port vhu84edf6c2-f0: Invalid argument
  2021-07-15T12:36:23.126Z|00210|netdev_dpdk|ERR|Could not create rte meter for egress policer
  2021-07-15T12:36:23.126Z|00211|netdev_dpdk|ERR|Failed to set QoS type egress-policer on port vhu84edf6c2-f0: Invalid argument

  If you create a traffic between two VMs, the downloading one having
  the limitation applied on its port reports this:

  root@bwtest2:~# nc 192.168.1.201 5555 | dd of=/dev/null status=progress
  816316928 bytes (816 MB, 779 MiB) copied, 5 s, 163 MB/s^C
  1863705+71 records in
  1863738+0 records out
  954233856 bytes (954 MB, 910 MiB) copied, 8.23046 s, 116 MB/s

  The bandwidth is higher than the set 1 Mb/s.

  It is possible to modify the OVS agent so it applies the bandwidth
  limit correctly. You have to find out where the Python scripts of the
  neutron_openvswitch_agent container are stored on the compute host. In
  our environment the file to modify is:

  /var/lib/docker/overlay2/68653008fca0a6434adb3985b021b2329680b71b49859c3a028f951deed59df3/merged/usr/lib/python3.6/site-
  packages/neutron/agent/common/ovs_lib.py

  In the _update_ingress_bw_limit_for_dpdk_port function, the original
  code is:

          # cir and cbs should be set in bytes instead of bits
          qos_other_config = {
              'cir': str(max_bw_in_bits / 8),
              'cbs': str(max_burst_in_bits / 8)
          }

  If you modify the code to this:

          # cir and cbs should be set in bytes instead of bits
          qos_other_config = {
              'cir': str(int(max_bw_in_bits / 8)),
              'cbs': str(int(max_burst_in_bits / 8))
          }

  the values passed to OVS will be integers. You can see the difference
  querying the new values after applying the limit on the ports again:

  compute-0-5:/home/ceeinfra # ovs-vsctl list qos
  _uuid               : b93b1165-e839-4378-a6b7-b75c13ad0d41
  external_ids        : {id=vhu84edf6c2-f0}
  other_config        : {cbs="125000", cir="125000"}
  queues              : {}
  type                : egress-policer

  They don't have the .0 at the and anymore, and OVS doesn't complain in
  the logs about invalid arguments. The bandwidth limitation between the
  computes now works:

  root@bwtest2:~# nc 192.168.1.201 5555 | dd of=/dev/null status=progress
  4095488 bytes (4.1 MB, 3.9 MiB) copied, 33 s, 123 kB/s^C
  7274+1382 records in
  8051+0 records out
  4122112 bytes (4.1 MB, 3.9 MiB) copied, 33.4033 s, 123 kB/s

  125 kB/s translates to 1 Mb/s that we have applied with the rule, so
  it works now.

  My guess is that this problem comes from the different behaviour
  between the division in Python2 and Python3:

  user@debian:~$ python2
  Python 2.7.16 (default, Oct 10 2019, 22:02:15)
  [GCC 8.3.0] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> 4/3
  1
  >>> 
  user@debian:~$ python3
  Python 3.7.3 (default, Jan 22 2021, 20:04:44)
  [GCC 8.3.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> 4/3
  1.3333333333333333
  >>> 

  Python3 doesn't round to integers, and OVS doesn't seem to accept
  floating point numbers.

  I have seen this in multiple versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1936839/+subscriptions



Follow ups