← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1680918] [NEW] Nova upgrade fails if PCI devices of type-PF or type-PCI are present in the database

 

Public bug reported:

Description
===========
If a Nova DB is upgraded (migrated) while containing PCI devices with device type 'type-PF' or 'type-PCI',
a validation error similar to this will be thrown:

"ValidationError: There are still 2 unmigrated records in the
pci_devices table. Migration cannot continue until all records have been
migrated."

The error is generated by the 330_enforce_mitaka_online_migrations.py
upgrade script.

The PCI device migration validation will fail if any PCI device entries
without a populated parent_addr are found.  However, the parent_addr
really only applies to PCI device entries of 'type-VF' (ie. SRIOV
virtual functions)

This is an example of what the pci_devices table looks like with SRIOV
enabled PCI devices if the appropriate entries are whitelisted in
nova.conf:

MariaDB [nova]> select * from pci_devices;
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
| created_at          | updated_at          | deleted_at | deleted | id | compute_node_id | address      | product_id | vendor_id | dev_type | dev_id           | label           | status    | extra_info | instance_uuid | request_id | numa_node | parent_addr  |
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  1 |               1 | 0000:05:10.1 | 10ed       | 8086      | type-VF  | pci_0000_05_10_1 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  2 |               1 | 0000:05:10.3 | 10ed       | 8086      | type-VF  | pci_0000_05_10_3 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  3 |               1 | 0000:05:10.5 | 10ed       | 8086      | type-VF  | pci_0000_05_10_5 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
| 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  4 |               1 | 0000:05:10.7 | 10ed       | 8086      | type-VF  | pci_0000_05_10_7 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
| 2017-04-06 21:53:13 | NULL                | NULL       |       0 |  5 |               1 | 0000:05:00.0 | 10fb       | 8086      | type-PF  | pci_0000_05_00_0 | label_8086_10fb | available | {}         | NULL          | NULL       |         0 | NULL         |
| 2017-04-06 21:53:13 | NULL                | NULL       |       0 |  6 |               1 | 0000:05:00.1 | 10fb       | 8086      | type-PF  | pci_0000_05_00_1 | label_8086_10fb | available | {}         | NULL          | NULL       |         0 | NULL         |
+---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
6 rows in set (0.00 sec)


I think the upgrade script should be checking the PciDevice dev_type field for 'type-VF' when validating the parent_addr.


Steps to reproduce
==================

1. Install a Mitaka control node and edit the nova.conf file to include
1 or more PCI devices in the pci_passthrough_whitelist.  ie:

pci_passthrough_whitelist = {"vendor_id": "8086", "product_id":"10fb"}

2. Install a second Newton or newer control node and edit the nova.conf
to point to the SQL database of the Mitaka node. ie:

[database]
connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova?charset=utf8
 
[api_database]
connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova_api?charset=utf8

3. From the new control node, issue the following command:

nova-manage db sync

Expected result
===============
Database migration/upgrade should succeed

Actual result
=============
A ValidationError, similar to:

"ValidationError: There are still <N> unmigrated records in the
pci_devices table. Migration cannot continue until all records have been
migrated."


Environment
===========
1. Exact version of OpenStack you are running. See the following
   Mitaka (old node) Newton, Ocata (new node)

2. Which hypervisor did you use?
   libvirt + kvm

2. Which storage type did you use?
   lvm

3. Which networking type did you use?
   Neutron, OVS

Logs & Configs
==============
See attached

** Affects: nova
     Importance: Undecided
     Assignee: Steven Webster (swebster-wr)
         Status: In Progress


** Tags: pci

** Attachment added: "Console output after nova-manage db sync"
   https://bugs.launchpad.net/bugs/1680918/+attachment/4857541/+files/nova_pci_upgrade_issue.txt

** Changed in: nova
     Assignee: (unassigned) => Steven Webster (swebster-wr)

** Changed in: nova
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680918

Title:
  Nova upgrade fails if PCI devices of type-PF or type-PCI are present
  in the database

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Description
  ===========
  If a Nova DB is upgraded (migrated) while containing PCI devices with device type 'type-PF' or 'type-PCI',
  a validation error similar to this will be thrown:

  "ValidationError: There are still 2 unmigrated records in the
  pci_devices table. Migration cannot continue until all records have
  been migrated."

  The error is generated by the 330_enforce_mitaka_online_migrations.py
  upgrade script.

  The PCI device migration validation will fail if any PCI device
  entries without a populated parent_addr are found.  However, the
  parent_addr really only applies to PCI device entries of 'type-VF'
  (ie. SRIOV virtual functions)

  This is an example of what the pci_devices table looks like with SRIOV
  enabled PCI devices if the appropriate entries are whitelisted in
  nova.conf:

  MariaDB [nova]> select * from pci_devices;
  +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
  | created_at          | updated_at          | deleted_at | deleted | id | compute_node_id | address      | product_id | vendor_id | dev_type | dev_id           | label           | status    | extra_info | instance_uuid | request_id | numa_node | parent_addr  |
  +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  1 |               1 | 0000:05:10.1 | 10ed       | 8086      | type-VF  | pci_0000_05_10_1 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  2 |               1 | 0000:05:10.3 | 10ed       | 8086      | type-VF  | pci_0000_05_10_3 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  3 |               1 | 0000:05:10.5 | 10ed       | 8086      | type-VF  | pci_0000_05_10_5 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  4 |               1 | 0000:05:10.7 | 10ed       | 8086      | type-VF  | pci_0000_05_10_7 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:53:13 | NULL                | NULL       |       0 |  5 |               1 | 0000:05:00.0 | 10fb       | 8086      | type-PF  | pci_0000_05_00_0 | label_8086_10fb | available | {}         | NULL          | NULL       |         0 | NULL         |
  | 2017-04-06 21:53:13 | NULL                | NULL       |       0 |  6 |               1 | 0000:05:00.1 | 10fb       | 8086      | type-PF  | pci_0000_05_00_1 | label_8086_10fb | available | {}         | NULL          | NULL       |         0 | NULL         |
  +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
  6 rows in set (0.00 sec)

  
  I think the upgrade script should be checking the PciDevice dev_type field for 'type-VF' when validating the parent_addr.

  
  Steps to reproduce
  ==================

  1. Install a Mitaka control node and edit the nova.conf file to
  include 1 or more PCI devices in the pci_passthrough_whitelist.  ie:

  pci_passthrough_whitelist = {"vendor_id": "8086", "product_id":"10fb"}

  2. Install a second Newton or newer control node and edit the
  nova.conf to point to the SQL database of the Mitaka node. ie:

  [database]
  connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova?charset=utf8
   
  [api_database]
  connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova_api?charset=utf8

  3. From the new control node, issue the following command:

  nova-manage db sync

  Expected result
  ===============
  Database migration/upgrade should succeed

  Actual result
  =============
  A ValidationError, similar to:

  "ValidationError: There are still <N> unmigrated records in the
  pci_devices table. Migration cannot continue until all records have
  been migrated."

  
  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
     Mitaka (old node) Newton, Ocata (new node)

  2. Which hypervisor did you use?
     libvirt + kvm

  2. Which storage type did you use?
     lvm

  3. Which networking type did you use?
     Neutron, OVS

  Logs & Configs
  ==============
  See attached

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1680918/+subscriptions


Follow ups