← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1821016] [NEW] Race on reboot, fault on spawning sriov instance after reboot

 

Public bug reported:

There appears to be some race between nova-compute and
neutron-sriov-agent when rebooting. When trying to bring up an
sriov-enabled instance this intermittently (but often) fails after a
fresh reboot. After restarting the neutron-sriov-agent and
nova-compute seems to fix this, i.e. can again spawn instances w/
sriov ports

* Pre-conditions:

- Sriov interfaces configured and functional, i.e. can spawn
  functional sriov enabled instances

* Step-by-step reproduction steps:

- Verify sriov enabled instance can be spawned as the admin user

- Reboot compute

- Attempt to spawn an sriov enabled instance A, wait for fault

- Restart: $ sudo service neutron-sriov-agent restart ; sleep 1 ; sudo
service nova-compute restart

- Attempt to spawn an sriov enabled instance B


* Expected output: 

2x ACTIVE instances A and B

* Actual output:

ERRORed instance A

nova-compute.log contains

2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [req-eddf8d43-ca07-491f-931c-96b20cce7ef7 a0f1548a1b3f45379155ca1fb21c1599 7881e5796b2e4f80a9e5a7e089029bc3 - - -] [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] Failed to allocate network(s)
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] Traceback (most recent call last):
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1939, in _build_and_run_instance
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]     block_device_info=block_device_info)
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2798, in spawn
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]     destroy_disks_on_failure=True)
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5321, in _create_domain_and_network
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]     raise exception.VirtualInterfaceCreateException()
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] VirtualInterfaceCreateException: Virtual Interface creation failed
2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] 

ACTIVE instance B

* Version:

Ocata running on Ubuntu xenial
neutron 10.0.7-0ubuntu1~cloud1
nova 15.1.5-0ubuntu1~cloud1

** Affects: neutron
     Importance: Undecided
         Status: New

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: canonical-bootstack

** Also affects: nova
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1821016

Title:
  Race on reboot, fault on spawning sriov instance after reboot

Status in neutron:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  There appears to be some race between nova-compute and
  neutron-sriov-agent when rebooting. When trying to bring up an
  sriov-enabled instance this intermittently (but often) fails after a
  fresh reboot. After restarting the neutron-sriov-agent and
  nova-compute seems to fix this, i.e. can again spawn instances w/
  sriov ports

  * Pre-conditions:

  - Sriov interfaces configured and functional, i.e. can spawn
    functional sriov enabled instances

  * Step-by-step reproduction steps:

  - Verify sriov enabled instance can be spawned as the admin user

  - Reboot compute

  - Attempt to spawn an sriov enabled instance A, wait for fault

  - Restart: $ sudo service neutron-sriov-agent restart ; sleep 1 ; sudo
  service nova-compute restart

  - Attempt to spawn an sriov enabled instance B

  
  * Expected output: 

  2x ACTIVE instances A and B

  * Actual output:

  ERRORed instance A

  nova-compute.log contains

  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [req-eddf8d43-ca07-491f-931c-96b20cce7ef7 a0f1548a1b3f45379155ca1fb21c1599 7881e5796b2e4f80a9e5a7e089029bc3 - - -] [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] Failed to allocate network(s)
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] Traceback (most recent call last):
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1939, in _build_and_run_instance
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]     block_device_info=block_device_info)
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2798, in spawn
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]     destroy_disks_on_failure=True)
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5321, in _create_domain_and_network
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086]     raise exception.VirtualInterfaceCreateException()
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] VirtualInterfaceCreateException: Virtual Interface creation failed
  2019-03-20 13:59:36.636 22577 ERROR nova.compute.manager [instance: 48f54a52-8cb7-4963-93fa-c412954a2086] 

  ACTIVE instance B

  * Version:

  Ocata running on Ubuntu xenial
  neutron 10.0.7-0ubuntu1~cloud1
  nova 15.1.5-0ubuntu1~cloud1

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1821016/+subscriptions


Follow ups