← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1551154] [NEW] Snapshot of VM with SRIOV port fails if host PF is down

 

Public bug reported:

Scenario:
----------------
1. Create a VM with SRIOV port --> the vNIC state should be UP
2. On the VM host, shutdown the PF (i.e. ifconfig ens1f1 down) --> The vNIC state should be DOWN
3. Create a snapshot of this suboptimal VM

Expected behavior:
--------------------------------
Snapshot creation process begins, after a while the snapshot succeeds, VM should be up without the vNIC (if the PF is still down)

Actual behavior:
---------------------------
Snapshot creation process begins, after a while the snapshot fails and is deleted. The VM is up without the vNIC (since the PF is still down)

Analysis:
---------------
>From nova-compute.log

2016-02-29 07:21:51.578 34455 ERROR nova.compute.manager [req-00a47655-f6c8-4e53-913f-bef0c61100d0 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Instance failed to spawn
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Traceback (most recent call last):
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2480, in _build_resource
s
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     yield resources
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2352, in _build_and_run_
instance
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     block_device_info=block_device_info)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2463, in spawn
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     block_device_info=block_device_info)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4588, in _create_domain_and_network
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     power_on=power_on)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4519, in _create_domain
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     LOG.error(err)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     six.reraise(self.type_, self.value, self.tb)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4509, in _create_domain
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     domain.createWithFlags(launch_flags)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     rv = execute(f, *args, **kwargs)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     six.reraise(c, e, tb)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     rv = meth(*args, **kwargs)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1059, in createWithFlags
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] libvirtError: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online.


After a short conversation in #openstack-nova the initial conclusion was that probably the snapshot process shouldn't fail.

Further analysis (trying to fix the issue locally):
------------------------------------------------------------------------
Looking at libvirt's code (virnetdev.c) it seems that there's no way to workaround this problem (attaching to an offline PF, though configuration of VLAN and MAC is possible):

static int
virNetDevReplaceVfConfig(const char *pflinkdev, int vf,
                         const virMacAddr *macaddress,
                         int vlanid,
                         const char *stateDir)
{
...
    /* Assure that PF is online prior to twiddling with the VF.  It
     * *should* be, but if the PF isn't online the changes made to the
     * VF via the PF won't take effect, yet there will be no error
     * reported. In the case that it isn't online, fail and report the
     * error, since setting an unconfigured interface online
     * automatically turns on IPv6 autoconfig, which may not be what
     * the admin expects, so we want them to explicitly enable the PF
     * in the host system network config.
     */
    if (virNetDevGetOnline(pflinkdev, &pfIsOnline) < 0)
       goto cleanup;
    if (!pfIsOnline) {
        virReportError(VIR_ERR_INTERNAL_ERROR,
                       _("Unable to configure VF %d of PF '%s' "
                         "because the PF is not online. Please "
                         "change host network config to put the "
                         "PF online."),
                       vf, pflinkdev);
        goto cleanup;
    }
...

(from
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virnetdev.c)

Introducing the following change to libvirt.py (the "Manually written part of python bindings for libvirt"):
--- /usr/lib64/python2.7/site-packages/libvirt.py	2016-02-29 09:20:34.734195406 +0000
+++ /usr/lib64/python2.7/site-packages/libvirt.py.patched	2016-02-29 09:20:26.068155372 +0000
@@ -527,7 +527,7 @@
         into S4 state (also known as hibernation) unless you also modify the
         persistent domain definition. """
         ret = libvirtmod.virDomainAttachDevice(self._o, xml)
-        if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
+        #if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
         return ret

     def attachDeviceFlags(self, xml, flags=0):
@@ -1056,7 +1056,7 @@
         If the VIR_DOMAIN_START_FORCE_BOOT flag is set, then any managed save
         file for this domain is discarded, and the domain boots from scratch. """
         ret = libvirtmod.virDomainCreateWithFlags(self._o, flags)
-        if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
+        #if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
         return ret


Solved the issue and the behavior returned to be as expected.

The fix should be in the snapshot failure code (and not in the
libvirt.py of course...)

** Affects: nova
     Importance: Undecided
     Assignee: David Edery (david-edery)
         Status: New

** Changed in: nova
     Assignee: (unassigned) => David Edery (david-edery)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1551154

Title:
  Snapshot of VM with SRIOV port fails if host PF is down

Status in OpenStack Compute (nova):
  New

Bug description:
  Scenario:
  ----------------
  1. Create a VM with SRIOV port --> the vNIC state should be UP
  2. On the VM host, shutdown the PF (i.e. ifconfig ens1f1 down) --> The vNIC state should be DOWN
  3. Create a snapshot of this suboptimal VM

  Expected behavior:
  --------------------------------
  Snapshot creation process begins, after a while the snapshot succeeds, VM should be up without the vNIC (if the PF is still down)

  Actual behavior:
  ---------------------------
  Snapshot creation process begins, after a while the snapshot fails and is deleted. The VM is up without the vNIC (since the PF is still down)

  Analysis:
  ---------------
  From nova-compute.log

  2016-02-29 07:21:51.578 34455 ERROR nova.compute.manager [req-00a47655-f6c8-4e53-913f-bef0c61100d0 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Instance failed to spawn
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Traceback (most recent call last):
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2480, in _build_resource
  s
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     yield resources
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2352, in _build_and_run_
  instance
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     block_device_info=block_device_info)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2463, in spawn
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     block_device_info=block_device_info)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4588, in _create_domain_and_network
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     power_on=power_on)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4519, in _create_domain
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     LOG.error(err)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     six.reraise(self.type_, self.value, self.tb)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4509, in _create_domain
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     domain.createWithFlags(launch_flags)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     rv = execute(f, *args, **kwargs)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     six.reraise(c, e, tb)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     rv = meth(*args, **kwargs)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1059, in createWithFlags
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] libvirtError: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online.

  
  After a short conversation in #openstack-nova the initial conclusion was that probably the snapshot process shouldn't fail.

  Further analysis (trying to fix the issue locally):
  ------------------------------------------------------------------------
  Looking at libvirt's code (virnetdev.c) it seems that there's no way to workaround this problem (attaching to an offline PF, though configuration of VLAN and MAC is possible):

  static int
  virNetDevReplaceVfConfig(const char *pflinkdev, int vf,
                           const virMacAddr *macaddress,
                           int vlanid,
                           const char *stateDir)
  {
  ...
      /* Assure that PF is online prior to twiddling with the VF.  It
       * *should* be, but if the PF isn't online the changes made to the
       * VF via the PF won't take effect, yet there will be no error
       * reported. In the case that it isn't online, fail and report the
       * error, since setting an unconfigured interface online
       * automatically turns on IPv6 autoconfig, which may not be what
       * the admin expects, so we want them to explicitly enable the PF
       * in the host system network config.
       */
      if (virNetDevGetOnline(pflinkdev, &pfIsOnline) < 0)
         goto cleanup;
      if (!pfIsOnline) {
          virReportError(VIR_ERR_INTERNAL_ERROR,
                         _("Unable to configure VF %d of PF '%s' "
                           "because the PF is not online. Please "
                           "change host network config to put the "
                           "PF online."),
                         vf, pflinkdev);
          goto cleanup;
      }
  ...

  (from
  http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virnetdev.c)

  Introducing the following change to libvirt.py (the "Manually written part of python bindings for libvirt"):
  --- /usr/lib64/python2.7/site-packages/libvirt.py	2016-02-29 09:20:34.734195406 +0000
  +++ /usr/lib64/python2.7/site-packages/libvirt.py.patched	2016-02-29 09:20:26.068155372 +0000
  @@ -527,7 +527,7 @@
           into S4 state (also known as hibernation) unless you also modify the
           persistent domain definition. """
           ret = libvirtmod.virDomainAttachDevice(self._o, xml)
  -        if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
  +        #if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
           return ret

       def attachDeviceFlags(self, xml, flags=0):
  @@ -1056,7 +1056,7 @@
           If the VIR_DOMAIN_START_FORCE_BOOT flag is set, then any managed save
           file for this domain is discarded, and the domain boots from scratch. """
           ret = libvirtmod.virDomainCreateWithFlags(self._o, flags)
  -        if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  +        #if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
           return ret

  
  Solved the issue and the behavior returned to be as expected.

  The fix should be in the snapshot failure code (and not in the
  libvirt.py of course...)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1551154/+subscriptions


Follow ups