← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1617299] [NEW] Share based Nova Live Migration eratically fails

 

Public bug reported:

Hello,

in our productive Openstack environment we encountered in the last weeks that Openstack Nova VM Live migrations fails.
Currently this is only visible in our automated test environment. Every 15 minutes an automated test is started and it fails 3-4 times a day.

On the Nova instance path we have mounted a central NetApp NFS share to
support real Live migrations between different hypervisors.

When we analysed the issue we found the error message and trace:
BadRequest: <Compute-Node> is not on shared storage: Live migration can not be used without shared storage except a booted from volume VM which does not have a local disk. (HTTP 400) (Request-ID: req-8e709fd1-9d72-453b-b4b1-1f26112ea3d3)
 
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/rally/task/runner.py", line 66, in _run_scenario_once
    getattr(scenario_inst, method_name)(**scenario_kwargs)
  File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/servers.py", line 640, in boot_and_live_migrate_server
    block_migration, disk_over_commit)
  File "/usr/lib/python2.7/site-packages/rally/task/atomic.py", line 84, in func_atomic_actions
    f = func(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/utils.py", line 721, in _live_migrate
    disk_over_commit=disk_over_commit)
  File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 433, in live_migrate
    disk_over_commit)
  File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 370, in substitution
    return methods[-1].func(obj, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1524, in live_migrate
    'disk_over_commit': disk_over_commit})
  File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1691, in _action
    info=info, **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1702, in _action_return_resp_and_body
    return self.api.client.post(url, body=body)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 461, in post
    return self._cs_request(url, 'POST', **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 436, in _cs_request
    resp, body = self._time_request(url, method, **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 409, in _time_request
    resp, body = self.request(url, method, **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 403, in request
    raise exceptions.from_response(resp, body, url, method)
BadRequest: <Compute-Node> is not on shared storage: Live migration can not be used without shared storage except a booted from volume VM which does not have a local disk. (HTTP 400) (Request-ID: req-8e709fd1-9d72-453b-b4b1-1f26112ea3d3)
 
We examined the respective hypervisors for some problems with the NFS share/mount, but everything looks really good. Also the message log file shows no issues during the test timeframe.
 
The next step was to examine the Nova code to get a hint why Nova is bringing up such an error.
In the Nova code we found the test procedure how Nova checks if there is a shared filesystem between source and destination hypervisor.
 
In "nova/nova/virt/libvirt/driver.py"
 
In function „check_can_live_migrate_destination“ a temporary file is created on the destination hypervisor:
 
# Create file on storage, to be checked on source host
filename = self._create_shared_storage_test_file()
 
After that – in the same class -  in function „check_can_live_migrate_source“:
dest_check_data.is_shared_instance_path = (
    self._check_shared_storage_test_file(
        dest_check_data.filename))
 
will be checked if the temporary file exists. And this will sometimes fail and migration returns with this error message because the file on the source hypervisor is not yet available:
 
elif not (dest_check_data.is_shared_block_storage or
          dest_check_data.is_shared_instance_path or
          (booted_from_volume and not has_local_disk)):
    reason = _("Live migration can not be used "
               "without shared storage except "
               "a booted from volume VM which "
               "does not have a local disk.“)

** Affects: nova
     Importance: Undecided
     Assignee: Tom Patzig (tom-patzig)
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1617299

Title:
  Share based Nova Live Migration eratically fails

Status in OpenStack Compute (nova):
  New

Bug description:
  Hello,

  in our productive Openstack environment we encountered in the last weeks that Openstack Nova VM Live migrations fails.
  Currently this is only visible in our automated test environment. Every 15 minutes an automated test is started and it fails 3-4 times a day.

  On the Nova instance path we have mounted a central NetApp NFS share
  to support real Live migrations between different hypervisors.

  When we analysed the issue we found the error message and trace:
  BadRequest: <Compute-Node> is not on shared storage: Live migration can not be used without shared storage except a booted from volume VM which does not have a local disk. (HTTP 400) (Request-ID: req-8e709fd1-9d72-453b-b4b1-1f26112ea3d3)
   
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/rally/task/runner.py", line 66, in _run_scenario_once
      getattr(scenario_inst, method_name)(**scenario_kwargs)
    File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/servers.py", line 640, in boot_and_live_migrate_server
      block_migration, disk_over_commit)
    File "/usr/lib/python2.7/site-packages/rally/task/atomic.py", line 84, in func_atomic_actions
      f = func(self, *args, **kwargs)
    File "/usr/lib/python2.7/site-packages/rally/plugins/openstack/scenarios/nova/utils.py", line 721, in _live_migrate
      disk_over_commit=disk_over_commit)
    File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 433, in live_migrate
      disk_over_commit)
    File "/usr/lib/python2.7/site-packages/novaclient/api_versions.py", line 370, in substitution
      return methods[-1].func(obj, *args, **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1524, in live_migrate
      'disk_over_commit': disk_over_commit})
    File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1691, in _action
      info=info, **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1702, in _action_return_resp_and_body
      return self.api.client.post(url, body=body)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 461, in post
      return self._cs_request(url, 'POST', **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 436, in _cs_request
      resp, body = self._time_request(url, method, **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 409, in _time_request
      resp, body = self.request(url, method, **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 403, in request
      raise exceptions.from_response(resp, body, url, method)
  BadRequest: <Compute-Node> is not on shared storage: Live migration can not be used without shared storage except a booted from volume VM which does not have a local disk. (HTTP 400) (Request-ID: req-8e709fd1-9d72-453b-b4b1-1f26112ea3d3)
   
  We examined the respective hypervisors for some problems with the NFS share/mount, but everything looks really good. Also the message log file shows no issues during the test timeframe.
   
  The next step was to examine the Nova code to get a hint why Nova is bringing up such an error.
  In the Nova code we found the test procedure how Nova checks if there is a shared filesystem between source and destination hypervisor.
   
  In "nova/nova/virt/libvirt/driver.py"
   
  In function „check_can_live_migrate_destination“ a temporary file is created on the destination hypervisor:
   
  # Create file on storage, to be checked on source host
  filename = self._create_shared_storage_test_file()
   
  After that – in the same class -  in function „check_can_live_migrate_source“:
  dest_check_data.is_shared_instance_path = (
      self._check_shared_storage_test_file(
          dest_check_data.filename))
   
  will be checked if the temporary file exists. And this will sometimes fail and migration returns with this error message because the file on the source hypervisor is not yet available:
   
  elif not (dest_check_data.is_shared_block_storage or
            dest_check_data.is_shared_instance_path or
            (booted_from_volume and not has_local_disk)):
      reason = _("Live migration can not be used "
                 "without shared storage except "
                 "a booted from volume VM which "
                 "does not have a local disk.“)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1617299/+subscriptions


Follow ups