yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95614
[Bug 2104255] [NEW] nova-compute restart stripping VF capabilites on VF unbind
Public bug reported:
Description
===========
We started to experience instance build failures relating to port
binding failure on our 2024.1 system (with VF-LAG), relating to
```
Refusing to bind due to unsupported vnic_type: direct with no switchdev capability bind_port
```
and this information was missing from nova's pci_devices table:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:24:38 | 2025-03-24 10:05:24 | NULL | 0 | 19153 | 2782 | 0000:a1:01.5 | 101a | 15b3 | type-VF | pci_0000_a1_01_5 | label_15b3_101a | available | {"parent_ifname": "ens2f0_14", "capabilities": "{\"sriov\": {\"pf_mac_address\": \"3e:0b:a6:3d:08:51\", \"vf_num\": 11}, \"vpd\": {\"card_serial_number\": \"IL09FTMY74031167007R\"}}"}
```
Nova should be correctly assigning VF capabilities following this patch: https://review.opendev.org/c/openstack/nova/+/884439, and in our case the DB entry for a VF should like like:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Steps to Reproduce
==================
* create an instance with VF-LAG SRIOV 'direct' NIC
* restart nova-compute on that hypervisor
* delete instance
and then the VF in the pci_devices table is left with incomplete
capabilities
Expected Result
===============
The VF entry in pci_devices should contain the full set of capabilities
DB output
=========
This is the expected content of the DB before attach, during, and after
```
Before
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
allocated:
| 2024-08-08 13:25:19 | 2025-03-26 10:40:52 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 0af885b2-921a-4af8-9cec-f227c82e4b86 | 12f68837-b9f9-4993-a242-74c901483440 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:49:06 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```
but this is the content of the DB if nova-compute is restarted during the lifetime of the instance:
```
Before:
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Allocated
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Nova compute restarted
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:35:27 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
nova compute restarted again:
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```
Environment
===========
Openstack 2024.1
Kolla-Ansible
Rocky 9 + KVM
Neutron OVS with Mellanox VF-LAG on ConnectX-5
** Affects: nova
Importance: Undecided
Status: New
** Description changed:
Description
===========
We started to experience instance build failures relating to port
binding failure on our 2024.1 system (with VF-LAG), relating to
```
Refusing to bind due to unsupported vnic_type: direct with no switchdev capability bind_port
```
and this information was missing from nova's pci_devices table:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:24:38 | 2025-03-24 10:05:24 | NULL | 0 | 19153 | 2782 | 0000:a1:01.5 | 101a | 15b3 | type-VF | pci_0000_a1_01_5 | label_15b3_101a | available | {"parent_ifname": "ens2f0_14", "capabilities": "{\"sriov\": {\"pf_mac_address\": \"3e:0b:a6:3d:08:51\", \"vf_num\": 11}, \"vpd\": {\"card_serial_number\": \"IL09FTMY74031167007R\"}}"}
```
Nova should be correctly assigning VF capabilities following this patch: https://review.opendev.org/c/openstack/nova/+/884439, and in our case the DB entry for a VF should like like:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
-
Steps to Reproduce
==================
* create an instance with VF-LAG SRIOV 'direct' NIC
* restart nova-compute on that hypervisor
* delete instance
and then the VF in the pci_devices table is left with incomplete
capabilities
Expected Result
===============
The VF entry in pci_devices should contain the full set of capabilities
DB output
=========
This is the expected content of the DB before attach, during, and after
```
Before
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
allocated:
| 2024-08-08 13:25:19 | 2025-03-26 10:40:52 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 0af885b2-921a-4af8-9cec-f227c82e4b86 | 12f68837-b9f9-4993-a242-74c901483440 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:49:06 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```
but this is the content of the DB if nova-compute is restarted during the lifetime of the instance:
```
Before:
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Allocated
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Nova compute restarted
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:35:27 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
nova compute restarted again:
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```
+
+ Environment
+ ===========
+ Openstack 2024.1
+ Kolla-Ansible
+ Rocky 9 + KVM
+ Neutron OVS with Mellanox VF-LAG on ConnectX-5
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2104255
Title:
nova-compute restart stripping VF capabilites on VF unbind
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
We started to experience instance build failures relating to port
binding failure on our 2024.1 system (with VF-LAG), relating to
```
Refusing to bind due to unsupported vnic_type: direct with no switchdev capability bind_port
```
and this information was missing from nova's pci_devices table:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:24:38 | 2025-03-24 10:05:24 | NULL | 0 | 19153 | 2782 | 0000:a1:01.5 | 101a | 15b3 | type-VF | pci_0000_a1_01_5 | label_15b3_101a | available | {"parent_ifname": "ens2f0_14", "capabilities": "{\"sriov\": {\"pf_mac_address\": \"3e:0b:a6:3d:08:51\", \"vf_num\": 11}, \"vpd\": {\"card_serial_number\": \"IL09FTMY74031167007R\"}}"}
```
Nova should be correctly assigning VF capabilities following this patch: https://review.opendev.org/c/openstack/nova/+/884439, and in our case the DB entry for a VF should like like:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at | updated_at | deleted_at | deleted | id | compute_node_id | address | product_id | vendor_id | dev_type | dev_id | label | status | extra_info | instance_uuid | request_id | numa_node | parent_addr | uuid |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Steps to Reproduce
==================
* create an instance with VF-LAG SRIOV 'direct' NIC
* restart nova-compute on that hypervisor
* delete instance
and then the VF in the pci_devices table is left with incomplete
capabilities
Expected Result
===============
The VF entry in pci_devices should contain the full set of capabilities
DB output
=========
This is the expected content of the DB before attach, during, and after
```
Before
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
allocated:
| 2024-08-08 13:25:19 | 2025-03-26 10:40:52 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 0af885b2-921a-4af8-9cec-f227c82e4b86 | 12f68837-b9f9-4993-a242-74c901483440 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:49:06 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```
but this is the content of the DB if nova-compute is restarted during the lifetime of the instance:
```
Before:
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Allocated
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
Nova compute restarted
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:35:27 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
nova compute restarted again:
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL | 0 | 19768 | 2773 | 0000:a1:04.0 | 101a | 15b3 | type-VF | pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", "capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", \"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], \"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL | NULL | 6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```
Environment
===========
Openstack 2024.1
Kolla-Ansible
Rocky 9 + KVM
Neutron OVS with Mellanox VF-LAG on ConnectX-5
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2104255/+subscriptions