← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1817548] [NEW] macvtap: possible race of interfaces scan/deletion at migration

 

Public bug reported:

A few days ago I stumbled upon a failed migration of VM with macvtap
interface. Nova "joyed" me with error "Unsupported VIF type
binding_failed". A short investigation lead me to the real problem:
macvtap agent crushed

2019-02-21 18:40:38.636 54364 ERROR neutron.agent.linux.utils [req-
362a8541-643e-4355-b3f7-a02fb9358b66 - - - - -] Exit code: 1; Stdin: ;
Stdout: ; Stderr: Device "macvtap1" does not exist.

2019-02-21 18:40:38.637 54364 ERROR oslo_service.service [req-362a8541-643e-4355-b3f7-a02fb9358b66 - - - - -] Error starting thread.: ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "macvtap1" does not exist.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service Traceback (most recent call last):
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 729, in run_service
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     service.start()
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     result = f(*args, **kwargs)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 111, in start
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     self.daemon_loop()
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     result = f(*args, **kwargs)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 447, in daemon_loop
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     device_info = self.scan_devices(previous=device_info, sync=sync)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     result = f(*args, **kwargs)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 377, in scan_devices
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     current_devices = self.mgr.get_all_devices()
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/macvtap/agent/macvtap_neutron_agent.py", line 138, in get_all_devices
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     mac = ip_lib.get_device_mac(device_name)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 950, in get_device_mac
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return IPDevice(device_name, namespace=namespace).link.address
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 541, in address
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self.attributes.get('link/ether')
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 565, in attributes
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self._parse_line(self._run(['o'], ('show', self.name)))
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 360, in _run
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self._parent._run(options, self.COMMAND, args)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 88, in _run
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self._execute(options, command, args)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 102, in _execute
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     log_fail_as_error=self.log_fail_as_error)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 151, in execute
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     raise ProcessExecutionError(msg, returncode=returncode)
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "macvtap1" does not exist.
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service
2019-02-21 18:40:38.637 54364 ERROR oslo_service.service
2019-02-21 18:40:38.640 54364 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Macvtap agent agent.

I have digg a bit and it looks like there's a race between interface
deletion and a periodic scan in daemon_loop. For now get_all_devices
(neutron/plugins/ml2/drivers/macvtap/agent/macvtap_neutron_agent.py
+138) does'n check interface existence before get_device_mac:

    def get_all_devices(self):
...............
        for device_name in all_device_names:
            if device_name.startswith(constants.MACVTAP_DEVICE_PREFIX):
                if ip_lib.device_exists(device_name):
                    mac = ip_lib.get_device_mac(device_name)
                    self.mac_device_name_mappings[mac] = device_name
                    devices.add(mac)

It's quite possible that device will be deleted while we're looping over (in my case with ~10% probability).
An obvious 'solution' is to check device on every loop but it doesn't really solve an issue though reduce race window so that I able to migrate VM for about 100 times before crush.
I'll attache this patch anyway.

** Affects: neutron
     Importance: Undecided
         Status: New

** Attachment added: "0004-MacvtapManager-get_all_devices-reducing-race-window.patch"
   https://bugs.launchpad.net/bugs/1817548/+attachment/5241546/+files/0004-MacvtapManager-get_all_devices-reducing-race-window.patch

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1817548

Title:
  macvtap: possible race of interfaces scan/deletion at migration

Status in neutron:
  New

Bug description:
  A few days ago I stumbled upon a failed migration of VM with macvtap
  interface. Nova "joyed" me with error "Unsupported VIF type
  binding_failed". A short investigation lead me to the real problem:
  macvtap agent crushed

  2019-02-21 18:40:38.636 54364 ERROR neutron.agent.linux.utils [req-
  362a8541-643e-4355-b3f7-a02fb9358b66 - - - - -] Exit code: 1; Stdin: ;
  Stdout: ; Stderr: Device "macvtap1" does not exist.

  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service [req-362a8541-643e-4355-b3f7-a02fb9358b66 - - - - -] Error starting thread.: ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "macvtap1" does not exist.
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service Traceback (most recent call last):
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/oslo_service/service.py", line 729, in run_service
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     service.start()
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     result = f(*args, **kwargs)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 111, in start
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     self.daemon_loop()
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     result = f(*args, **kwargs)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 447, in daemon_loop
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     device_info = self.scan_devices(previous=device_info, sync=sync)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/osprofiler/profiler.py", line 158, in wrapper
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     result = f(*args, **kwargs)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 377, in scan_devices
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     current_devices = self.mgr.get_all_devices()
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/macvtap/agent/macvtap_neutron_agent.py", line 138, in get_all_devices
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     mac = ip_lib.get_device_mac(device_name)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 950, in get_device_mac
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return IPDevice(device_name, namespace=namespace).link.address
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 541, in address
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self.attributes.get('link/ether')
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 565, in attributes
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self._parse_line(self._run(['o'], ('show', self.name)))
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 360, in _run
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self._parent._run(options, self.COMMAND, args)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 88, in _run
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     return self._execute(options, command, args)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 102, in _execute
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     log_fail_as_error=self.log_fail_as_error)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service   File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 151, in execute
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service     raise ProcessExecutionError(msg, returncode=returncode)
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service ProcessExecutionError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Device "macvtap1" does not exist.
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service
  2019-02-21 18:40:38.637 54364 ERROR oslo_service.service
  2019-02-21 18:40:38.640 54364 INFO neutron.plugins.ml2.drivers.agent._common_agent [-] Stopping Macvtap agent agent.

  I have digg a bit and it looks like there's a race between interface
  deletion and a periodic scan in daemon_loop. For now get_all_devices
  (neutron/plugins/ml2/drivers/macvtap/agent/macvtap_neutron_agent.py
  +138) does'n check interface existence before get_device_mac:

      def get_all_devices(self):
  ...............
          for device_name in all_device_names:
              if device_name.startswith(constants.MACVTAP_DEVICE_PREFIX):
                  if ip_lib.device_exists(device_name):
                      mac = ip_lib.get_device_mac(device_name)
                      self.mac_device_name_mappings[mac] = device_name
                      devices.add(mac)

  It's quite possible that device will be deleted while we're looping over (in my case with ~10% probability).
  An obvious 'solution' is to check device on every loop but it doesn't really solve an issue though reduce race window so that I able to migrate VM for about 100 times before crush.
  I'll attache this patch anyway.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1817548/+subscriptions


Follow ups