← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1893263] Re: [SRU] Cannot create instance with multiqueue image and vif_type=tap (calico)

 

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Also affects: nova/train
   Importance: Undecided
       Status: New

** Also affects: nova/ussuri
   Importance: Undecided
       Status: New

** Changed in: nova/stein
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1893263

Title:
  [SRU] Cannot create instance with multiqueue image and vif_type=tap
  (calico)

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive stein series:
  Fix Committed
Status in Ubuntu Cloud Archive train series:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) stein series:
  Fix Released
Status in OpenStack Compute (nova) train series:
  New
Status in OpenStack Compute (nova) ussuri series:
  New
Status in nova package in Ubuntu:
  New
Status in nova source package in Focal:
  Fix Committed

Bug description:
  When using calico, the vif_type is tap, therefore when the instance is
  being created, the method plug_tap() is invoked, which creates the tap
  device prior to launching the instance.

  That tap device is currently always created without multiqueue as per
  [1]. When libvirt creates the instance, the XML definition
  "queues=<x>" clashes with the fact that the pre-existing tap interface
  doesn't have multiqueue enabled, and therefore errors out with the
  exception below. The code at [2] already handles multiqueue, but it is
  never invoked with multiqueue=True.

  Alternatively, as a current workaround, if the instance is shutdown
  through virsh, or rebooted through nova, it causes the tap device to
  be removed, to be created again by libvirt instead, allowing the tap
  device to be set up with multiqueue appropriately if its XML is
  manually edited. This begs the question as why the plug_tap() method
  needs to pre-create the interface at all, if when the VM rebooted
  libvirt does so regardless of plug_tap().

  Steps to reproduce:

  1) Ubuntu bionic + devstack master + follow instructions at [3]
  2) wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img
  3) openstack image create bionic-mq --file bionic-server-cloudimg-amd64.img --property hw_vif_multiqueue_enabled=True
  4) openstack image create bionic --file bionic-server-cloudimg-amd64.img
  5) ssh-keygen
  6) openstack keypair create key1 --public-key ~/.ssh/id_rsa.pub
  7) openstack flavor create --vcpu 2 --ram 1024 --disk 10 --public --id 10 test_flavor
  8) openstack server create --network calico --flavor test_flavor --image bionic --key-name key1 no-mq

  instance is created successfully

  9) ip a

  6: tapcc353751-13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
  fq_codel state UP group default qlen 1000

  10) sudo virsh edit 1

  add "<driver name='vhost' queues='2'/>" to the interface section

  11) openstack server reboot no-mq

  wait a few secs

  12) ip a

  7: tapcc353751-13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
  state UNKNOWN group default qlen 1000

  13) ssh to the instance and run "sudo ethtool -l <interface>"

  Combined:       2

  14) openstack server delete no-mq

  15) openstack server create --network calico --flavor test_flavor
  --image bionic-mq --key-name key1 mq

  instance fails to be created, log shows the below stack trace.

  [1] https://github.com/openstack/nova/blob/f521f4dbace0e35bedd089369da6f6969da5ca32/nova/virt/libvirt/vif.py#L701
  [2] https://github.com/openstack/nova/blob/f521f4dbace0e35bedd089369da6f6969da5ca32/nova/privsep/linux_net.py#L109
  [3] https://docs.projectcalico.org/getting-started/openstack/installation/devstack

  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [None req-71d40776-0fa7-466e-9060-11472b5bce42 admin admin] [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4] Instance failed to spawn: libvirt.libvirtError: Unable to create tap device tapb6021dd0-fd: Invalid argument
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4] Traceback (most recent call last):
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/compute/manager.py", line 2628, in _build_resources
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     yield resources
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/compute/manager.py", line 2401, in _build_and_run_instance
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     accel_info=accel_info)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3701, in spawn
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     cleanup_instance_disks=created_disks)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6700, in _create_guest_with_network
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     cleanup_instance_disks=cleanup_instance_disks)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     self.force_reraise()
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     six.reraise(self.type_, self.value, self.tb)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     raise value
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6669, in _create_guest_with_network
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     post_xml_callback=post_xml_callback)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6599, in _create_guest
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     guest.launch(pause=pause)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 160, in launch
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     self._encoded_xml, errors='ignore')
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     self.force_reraise()
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     six.reraise(self.type_, self.value, self.tb)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     raise value
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 155, in launch
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     return self._domain.createWithFlags(flags)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     rv = execute(f, *args, **kwargs)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     six.reraise(c, e, tb)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     raise value
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, in tworker
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     rv = meth(*args, **kwargs)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]   File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 1098, in createWithFlags
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]     if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4] libvirt.libvirtError: Unable to create tap device tapb6021dd0-fd: Invalid argument
  Aug 27 18:58:38 devstack nova-compute[7968]: ERROR nova.compute.manager [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4]
  Aug 27 18:58:38 devstack nova-compute[7968]: INFO nova.compute.manager [None req-71d40776-0fa7-466e-9060-11472b5bce42 admin admin] [instance: 69a0a527-9c33-432f-8889-c421ae8aebb4] Terminating instance

  
  =======================================================================

  [Impact]

  Users of calico plugin cannot use multiqueue in Nova. The VM fails to
  boot. The workaround is to edit the XML manually and reboot it through
  nova, so the tap interface is recreated by libvirt while the
  vif.plug() method is not re-run by Nova, allowing multiqueue to be set
  up properly by libvirt. This workaround does not scale well.

  [Test case]

  1. Setting up env
  1a. Deploy environment
  1b. Install calico plugin as per [0]
  1c. Setup SSH

  ssh-keygen

  1d. Create keypair for testing

  openstack keypair create key1 --public-key ~/.ssh/id_rsa.pub

  1e. Create test flavor

  openstack flavor create --vcpu 2 --ram 1024 --disk 10 --public --id 10
  test_flavor

  1f. Download an example image

  wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-
  cloudimg-amd64.img

  1g. Create image in glance with multiqueue metadata

  openstack image create bionic-mq --file bionic-server-cloudimg-
  amd64.img --property hw_vif_multiqueue_enabled=True

  1h. Create same image in glance without multiqueue metadata

  openstack image create bionic --file bionic-server-cloudimg-amd64.img

  1f. Create instance without multiqueue. Make sure instance creation
  and connectivity succeeds.

  openstack server create --network calico --flavor test_flavor --image
  bionic --key-name key1 no-mq

  2. Reproducing the bug

  2a. Create instance with multiqueue

  openstack server create --network calico --flavor test_flavor --image
  bionic-mq --key-name key1 mq

  Instance creation will fail

  2b. Check logs for error

  egrep "libvirt.libvirtError: Unable to create tap device .*: Invalid
  argument" /var/log/nova/nova-compute.log

  3. Cleanup

  3a. Delete instances "mq" and "no-mq"

  4. Install package that contains the fixed code

  5. Repeat step 2a. 2a should now succeed.

  [Regression Potential]

  New Code path is not triggered if image metadata is not used. For all
  other use cases, the previous behavior is maintained.

  
  [Other Info]

  None

  
  [0] https://docs.projectcalico.org/getting-started/openstack/installation/

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1893263/+subscriptions


References