← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1680918] Re: Nova upgrade fails if PCI devices of type-PF or type-PCI are present in the database

 

Reviewed:  https://review.openstack.org/456397
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Submitter: Jenkins
Branch:    master

commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Author: Steven Webster <steven.webster@xxxxxxxxxxxxx>
Date:   Wed Apr 5 09:05:07 2017 -0400

    Fix mitaka online migration for PCI devices
    
    Currently, a validation error is thrown if we find any PCI device
    records which have not populated the parent_addr column on a nova
    upgrade. However, the only PCI records for which a parent_addr
    makes sense for are those with a device type of 'type-VF' (ie. an
    SRIOV virtual function).  PCI records with a device type of 'type-PF'
    or 'type-PCI' will not have a parent_addr.  If any of those records
    are present on upgrade, the validation will fail.
    
    This change checks that the device type of the PCI record is
    'type-VF' when making sure the parent_addr has been correctly
    populated
    
    Closes-Bug: #1680918
    Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680918

Title:
  Nova upgrade fails if PCI devices of type-PF or type-PCI are present
  in the database

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) newton series:
  New
Status in OpenStack Compute (nova) ocata series:
  New

Bug description:
  Description
  ===========
  If a Nova DB is upgraded (migrated) while containing PCI devices with device type 'type-PF' or 'type-PCI',
  a validation error similar to this will be thrown:

  "ValidationError: There are still 2 unmigrated records in the
  pci_devices table. Migration cannot continue until all records have
  been migrated."

  The error is generated by the 330_enforce_mitaka_online_migrations.py
  upgrade script.

  The PCI device migration validation will fail if any PCI device
  entries without a populated parent_addr are found.  However, the
  parent_addr really only applies to PCI device entries of 'type-VF'
  (ie. SRIOV virtual functions)

  This is an example of what the pci_devices table looks like with SRIOV
  enabled PCI devices if the appropriate entries are whitelisted in
  nova.conf:

  MariaDB [nova]> select * from pci_devices;
  +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
  | created_at          | updated_at          | deleted_at | deleted | id | compute_node_id | address      | product_id | vendor_id | dev_type | dev_id           | label           | status    | extra_info | instance_uuid | request_id | numa_node | parent_addr  |
  +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  1 |               1 | 0000:05:10.1 | 10ed       | 8086      | type-VF  | pci_0000_05_10_1 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  2 |               1 | 0000:05:10.3 | 10ed       | 8086      | type-VF  | pci_0000_05_10_3 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  3 |               1 | 0000:05:10.5 | 10ed       | 8086      | type-VF  | pci_0000_05_10_5 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL       |       0 |  4 |               1 | 0000:05:10.7 | 10ed       | 8086      | type-VF  | pci_0000_05_10_7 | label_8086_10ed | available | {}         | NULL          | NULL       |         0 | 0000:05:00.1 |
  | 2017-04-06 21:53:13 | NULL                | NULL       |       0 |  5 |               1 | 0000:05:00.0 | 10fb       | 8086      | type-PF  | pci_0000_05_00_0 | label_8086_10fb | available | {}         | NULL          | NULL       |         0 | NULL         |
  | 2017-04-06 21:53:13 | NULL                | NULL       |       0 |  6 |               1 | 0000:05:00.1 | 10fb       | 8086      | type-PF  | pci_0000_05_00_1 | label_8086_10fb | available | {}         | NULL          | NULL       |         0 | NULL         |
  +---------------------+---------------------+------------+---------+----+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+------------+---------------+------------+-----------+--------------+
  6 rows in set (0.00 sec)

  
  I think the upgrade script should be checking the PciDevice dev_type field for 'type-VF' when validating the parent_addr.

  
  Steps to reproduce
  ==================

  1. Install a Mitaka control node and edit the nova.conf file to
  include 1 or more PCI devices in the pci_passthrough_whitelist.  ie:

  pci_passthrough_whitelist = {"vendor_id": "8086", "product_id":"10fb"}

  2. Install a second Newton or newer control node and edit the
  nova.conf to point to the SQL database of the Mitaka node. ie:

  [database]
  connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova?charset=utf8
   
  [api_database]
  connection = mysql+pymysql://root:supersecret@<MITAKA CONTROLLER NODE>/nova_api?charset=utf8

  3. From the new control node, issue the following command:

  nova-manage db sync

  Expected result
  ===============
  Database migration/upgrade should succeed

  Actual result
  =============
  A ValidationError, similar to:

  "ValidationError: There are still <N> unmigrated records in the
  pci_devices table. Migration cannot continue until all records have
  been migrated."

  
  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
     Mitaka (old node) Newton, Ocata (new node)

  2. Which hypervisor did you use?
     libvirt + kvm

  2. Which storage type did you use?
     lvm

  3. Which networking type did you use?
     Neutron, OVS

  Logs & Configs
  ==============
  See attached

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1680918/+subscriptions


References