← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1653810] Re: [sriov] Modifying or removing pci_passthrough_whitelist may result in inconsistent VF availability

 

** Changed in: nova
       Status: In Progress => Triaged

** Changed in: nova
       Status: Triaged => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1653810

Title:
  [sriov] Modifying or removing pci_passthrough_whitelist may result in
  inconsistent VF availability

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  OpenStack Version: v14 (Newton)
  NIC: Mellanox ConnectX-3 Pro

  While testing an SR-IOV implementation, we found that
  pci_passthrough_whitelist in nova.conf is involved in the population
  of the pci_devices table in the Nova DB. Making changes to the
  device/interface in the whitelist or commenting out the line
  altogether, and restarting nova-compute, can result in the entries
  being marked as 'deleted' in the database. Reconfiguring the
  pci_passthrough_whitelist option with the same device/interface will
  result in new entries being created and marked as 'available'. This
  can cause PCI device claim issues if an existing instance is still
  running and using a VF and another instance is booted using a 'direct'
  port.

  In the following table, you can see the original implementation that
  includes an allocated VF. During testing, we commented out the
  pci_passthrough_whitelist line in nova.conf, and restarted nova-
  compute. The entries were marked as 'deleted', though the running
  instance was not deleted and continued to function.  The
  pci_passthrough_whitelist config was then returned and nova-compute
  restarted. New entries were created and marked as 'available':

  MariaDB [nova]> select * from pci_devices;
  +---------------------+---------------------+---------------------+---------+-----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-------------+------------+--------------------------------------+--------------------------------------+-----------+--------------+
  | created_at          | updated_at          | deleted_at          | deleted | id  | compute_node_id | address      | product_id | vendor_id | dev_type | dev_id           | label           | status      | extra_info | instance_uuid                        | request_id                           | numa_node | parent_addr  |
  +---------------------+---------------------+---------------------+---------+-----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-------------+------------+--------------------------------------+--------------------------------------+-----------+--------------+
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 |      72 |  72 |               6 | 0000:07:00.0 | 1007       | 15b3      | type-PF  | pci_0000_07_00_0 | label_15b3_1007 | unavailable | {}         | NULL                                 | NULL                                 |         0 | NULL         |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:43:23 |      75 |  75 |               6 | 0000:07:00.1 | 1004       | 15b3      | type-VF  | pci_0000_07_00_1 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 |      78 |  78 |               6 | 0000:07:00.2 | 1004       | 15b3      | type-VF  | pci_0000_07_00_2 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:44:25 |      81 |  81 |               6 | 0000:07:00.3 | 1004       | 15b3      | type-VF  | pci_0000_07_00_3 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 |      84 |  84 |               6 | 0000:07:00.4 | 1004       | 15b3      | type-VF  | pci_0000_07_00_4 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:43:23 |      87 |  87 |               6 | 0000:07:00.5 | 1004       | 15b3      | type-VF  | pci_0000_07_00_5 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:42:26 |      90 |  90 |               6 | 0000:07:00.6 | 1004       | 15b3      | type-VF  | pci_0000_07_00_6 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 20:40:34 | 2016-12-29 20:44:51 |      93 |  93 |               6 | 0000:07:00.7 | 1004       | 15b3      | type-VF  | pci_0000_07_00_7 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2016-12-29 15:23:36 | 2016-12-29 17:40:25 | 2016-12-29 20:42:26 |      96 |  96 |               6 | 0000:07:01.0 | 1004       | 15b3      | type-VF  | pci_0000_07_01_0 | label_15b3_1004 | allocated   | {}         | 178c733b-fb6a-4c97-b1e5-cdc14aae2e0d | b8d79a88-5918-4a38-b2fb-de97a263c70e |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 231 |               6 | 0000:07:00.0 | 1007       | 15b3      | type-PF  | pci_0000_07_00_0 | label_15b3_1007 | available   | {}         | NULL                                 | NULL                                 |         0 | NULL         |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 234 |               6 | 0000:07:00.1 | 1004       | 15b3      | type-VF  | pci_0000_07_00_1 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 237 |               6 | 0000:07:00.2 | 1004       | 15b3      | type-VF  | pci_0000_07_00_2 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 240 |               6 | 0000:07:00.3 | 1004       | 15b3      | type-VF  | pci_0000_07_00_3 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 243 |               6 | 0000:07:00.4 | 1004       | 15b3      | type-VF  | pci_0000_07_00_4 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 246 |               6 | 0000:07:00.5 | 1004       | 15b3      | type-VF  | pci_0000_07_00_5 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:37 | NULL                | NULL                |       0 | 249 |               6 | 0000:07:00.6 | 1004       | 15b3      | type-VF  | pci_0000_07_00_6 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  | 2017-01-03 22:23:38 | NULL                | NULL                |       0 | 252 |               6 | 0000:07:01.0 | 1004       | 15b3      | type-VF  | pci_0000_07_01_0 | label_15b3_1004 | available   | {}         | NULL                                 | NULL                                 |         0 | 0000:07:00.0 |
  +---------------------+---------------------+---------------------+---------+-----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-------------+------------+--------------------------------------+--------------------------------------+-----------+--------------+

  A new instance was then booted using a new 'direct' port. The instance
  was marked in an ERROR state with the following error:

  2017-01-03 16:10:10.513 12103 ERROR nova.compute.manager [instance:
  ad961a72-198f-4e3d-8ce0-c157668a44d6] libvirtError: Requested
  operation is not valid: PCI device 0000:07:01.0 is in use by driver
  QEMU, domain instance-0000007e

  Instance instance-0000007e corresponds to the instance UUID in the DB,
  178c733b-fb6a-4c97-b1e5-cdc14aae2e0d. The interface can be seen here:

  root@compute01:# ip link show ens1d1
  22: ens1d1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-vlan portid e41d2d03005b6213 state UP mode DEFAULT group default qlen 1000
      link/ether e4:1d:2d:5b:62:13 brd ff:ff:ff:ff:ff:ff
      vf 0 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 1 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 2 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 3 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 4 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 5 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 6 MAC 00:00:00:00:00:00, vlan 4095, spoof checking off, link-state auto
      vf 7 MAC fa:16:3e:27:bd:90, vlan 50, spoof checking on, link-state enable

  No attempt was made to provision a different VF, or to re-populate the
  entries in pci_devices based on the existing VF allocation on the
  host. I'm not sure what the expected action was meant to be in this
  circumstance, if any.

  A similar bug was reported at:
  https://bugs.launchpad.net/nova/+bug/1633120

  Please let me know if you need any additional info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1653810/+subscriptions


References