← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1678577] [NEW] nova live migration failed in some case

 

Public bug reported:

env: nova 15.0.2 + libvirt + kvm + centos

in some situation, nova request spec become

{"nova_object.version": "1.8", "nova_object.changes": ["instance_uuid",
"requested_destination", "retry", "num_instances", "pci_requests",
"limits", "availability_zone", "force_nodes", "image", "instance_group",
"force_hosts", "numa_topology", "flavor", "project_id",
"scheduler_hints", "ignore_hosts"], "nova_object.name": "RequestSpec",
"nova_object.data": {"requested_destination": null, "instance_uuid":
"ca01b22b-d2d4-4291-96bd-ff6111f1f88b", "retry": {"nova_object.version":
"1.1", "nova_object.changes": ["num_attempts", "hosts"],
"nova_object.name": "SchedulerRetries", "nova_object.data":
{"num_attempts": 1, "hosts": {"nova_object.version": "1.16",
"nova_object.changes": ["objects"], "nova_object.name":
"ComputeNodeList", "nova_object.data": {"objects":
[{"nova_object.version": "1.16", "nova_object.changes": ["host",
"hypervisor_hostname"], "nova_object.name": "ComputeNode",
"nova_object.data": {"host": "control01", "hypervisor_hostname":
"control01"}, "nova_object.namespace": "nova"}]},
"nova_object.namespace": "nova"}}, "nova_object.namespace": "nova"},
"num_instances": 1, "pci_requests": {"nova_object.version": "1.1",
"nova_object.name": "InstancePCIRequests", "nova_object.data":
{"instance_uuid": "ca01b22b-d2d4-4291-96bd-ff6111f1f88b", "requests":
[]}, "nova_object.namespace": "nova"}, "limits": {"nova_object.version":
"1.0", "nova_object.changes": ["memory_mb", "vcpu", "disk_gb",
"numa_topology"], "nova_object.name": "SchedulerLimits",
"nova_object.data": {"vcpu": null, "memory_mb": 245427, "disk_gb": 8371,
"numa_topology": null}, "nova_object.namespace": "nova"},
"availability_zone": null, "force_nodes": null, "image":
{"nova_object.version": "1.8", "nova_object.changes": ["min_disk",
"container_format", "min_ram", "disk_format", "properties"],
"nova_object.name": "ImageMeta", "nova_object.data": {"min_disk": 1,
"container_format": "bare", "min_ram": 0, "disk_format": "raw",
"properties": {"nova_object.version": "1.16", "nova_object.name":
"ImageMetaProps", "nova_object.data": {}, "nova_object.namespace":
"nova"}}, "nova_object.namespace": "nova"}, "instance_group": null,
"force_hosts": null, "numa_topology": null, "ignore_hosts": null,
"flavor": {"nova_object.version": "1.1", "nova_object.name": "Flavor",
"nova_object.data": {"disabled": false, "root_gb": 1, "name": "m1.tiny",
"flavorid": "a70249ef-5ea9-49cb-b35f-ab4732064981", "deleted": false,
"created_at": "2017-03-22T08:13:48Z", "ephemeral_gb": 0, "updated_at":
null, "memory_mb": 256, "vcpus": 1, "extra_specs": {}, "swap": 0,
"rxtx_factor": 1.0, "is_public": true, "deleted_at": null,
"vcpu_weight": 0, "id": 119}, "nova_object.namespace": "nova"},
"project_id": "f3c6d500b267432c858c588800b49653", "scheduler_hints":
{}}, "nova_object.namespace": "nova"}


check the retry part

retry": {"nova_object.version": "1.1", "nova_object.changes":
["num_attempts", "hosts"], "nova_object.name": "SchedulerRetries",
"nova_object.data": {"num_attempts": 1, "hosts": {"nova_object.version":
"1.16", "nova_object.changes": ["objects"], "nova_object.name":
"ComputeNodeList", "nova_object.data": {"objects":
[{"nova_object.version": "1.16", "nova_object.changes": ["host",
"hypervisor_hostname"], "nova_object.name": "ComputeNode",
"nova_object.data": {"host": "control01", "hypervisor_hostname":
"control01"}, "nova_object.namespace": "nova"}]}

it has control01 as host even it is in control02

when live migrate this vm from controll02 to control01, get error in
"migration-list", after check the nova-scheduler logs, got


2017-04-02 14:01:47.010 6 DEBUG nova.filters [req-191c8f6e-010b-42f6-acc6-c84c689f649c 2442cfcb9d5c4daf8d90af8bcfe30df7 8eb03bbcdfd84f68b88a7fbaa74e2327 - - -] Starting with 1 host(s) get_filtered_objects /var/lib/kolla/venv/lib/python2.7/site-packages/nova/filters.py:70
2017-04-02 14:01:47.010 6 INFO nova.scheduler.filters.retry_filter [req-191c8f6e-010b-42f6-acc6-c84c689f649c 2442cfcb9d5c4daf8d90af8bcfe30df7 8eb03bbcdfd84f68b88a7fbaa74e2327 - - -] Host [u'control01', u'control01'] fails.  Previously tried hosts: [[u'control01', u'control01']]


I think the root cause is the retry part, and still do not know how it happen.

** Affects: nova
     Importance: Undecided
         Status: New

** Attachment added: "nova-scheduler.log"
   https://bugs.launchpad.net/bugs/1678577/+attachment/4852537/+files/nova-scheduler.log

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1678577

Title:
  nova live migration failed in some case

Status in OpenStack Compute (nova):
  New

Bug description:
  env: nova 15.0.2 + libvirt + kvm + centos

  in some situation, nova request spec become

  {"nova_object.version": "1.8", "nova_object.changes":
  ["instance_uuid", "requested_destination", "retry", "num_instances",
  "pci_requests", "limits", "availability_zone", "force_nodes", "image",
  "instance_group", "force_hosts", "numa_topology", "flavor",
  "project_id", "scheduler_hints", "ignore_hosts"], "nova_object.name":
  "RequestSpec", "nova_object.data": {"requested_destination": null,
  "instance_uuid": "ca01b22b-d2d4-4291-96bd-ff6111f1f88b", "retry":
  {"nova_object.version": "1.1", "nova_object.changes": ["num_attempts",
  "hosts"], "nova_object.name": "SchedulerRetries", "nova_object.data":
  {"num_attempts": 1, "hosts": {"nova_object.version": "1.16",
  "nova_object.changes": ["objects"], "nova_object.name":
  "ComputeNodeList", "nova_object.data": {"objects":
  [{"nova_object.version": "1.16", "nova_object.changes": ["host",
  "hypervisor_hostname"], "nova_object.name": "ComputeNode",
  "nova_object.data": {"host": "control01", "hypervisor_hostname":
  "control01"}, "nova_object.namespace": "nova"}]},
  "nova_object.namespace": "nova"}}, "nova_object.namespace": "nova"},
  "num_instances": 1, "pci_requests": {"nova_object.version": "1.1",
  "nova_object.name": "InstancePCIRequests", "nova_object.data":
  {"instance_uuid": "ca01b22b-d2d4-4291-96bd-ff6111f1f88b", "requests":
  []}, "nova_object.namespace": "nova"}, "limits":
  {"nova_object.version": "1.0", "nova_object.changes": ["memory_mb",
  "vcpu", "disk_gb", "numa_topology"], "nova_object.name":
  "SchedulerLimits", "nova_object.data": {"vcpu": null, "memory_mb":
  245427, "disk_gb": 8371, "numa_topology": null},
  "nova_object.namespace": "nova"}, "availability_zone": null,
  "force_nodes": null, "image": {"nova_object.version": "1.8",
  "nova_object.changes": ["min_disk", "container_format", "min_ram",
  "disk_format", "properties"], "nova_object.name": "ImageMeta",
  "nova_object.data": {"min_disk": 1, "container_format": "bare",
  "min_ram": 0, "disk_format": "raw", "properties":
  {"nova_object.version": "1.16", "nova_object.name": "ImageMetaProps",
  "nova_object.data": {}, "nova_object.namespace": "nova"}},
  "nova_object.namespace": "nova"}, "instance_group": null,
  "force_hosts": null, "numa_topology": null, "ignore_hosts": null,
  "flavor": {"nova_object.version": "1.1", "nova_object.name": "Flavor",
  "nova_object.data": {"disabled": false, "root_gb": 1, "name":
  "m1.tiny", "flavorid": "a70249ef-5ea9-49cb-b35f-ab4732064981",
  "deleted": false, "created_at": "2017-03-22T08:13:48Z",
  "ephemeral_gb": 0, "updated_at": null, "memory_mb": 256, "vcpus": 1,
  "extra_specs": {}, "swap": 0, "rxtx_factor": 1.0, "is_public": true,
  "deleted_at": null, "vcpu_weight": 0, "id": 119},
  "nova_object.namespace": "nova"}, "project_id":
  "f3c6d500b267432c858c588800b49653", "scheduler_hints": {}},
  "nova_object.namespace": "nova"}

  
  check the retry part

  retry": {"nova_object.version": "1.1", "nova_object.changes":
  ["num_attempts", "hosts"], "nova_object.name": "SchedulerRetries",
  "nova_object.data": {"num_attempts": 1, "hosts":
  {"nova_object.version": "1.16", "nova_object.changes": ["objects"],
  "nova_object.name": "ComputeNodeList", "nova_object.data": {"objects":
  [{"nova_object.version": "1.16", "nova_object.changes": ["host",
  "hypervisor_hostname"], "nova_object.name": "ComputeNode",
  "nova_object.data": {"host": "control01", "hypervisor_hostname":
  "control01"}, "nova_object.namespace": "nova"}]}

  it has control01 as host even it is in control02

  when live migrate this vm from controll02 to control01, get error in
  "migration-list", after check the nova-scheduler logs, got

  
  2017-04-02 14:01:47.010 6 DEBUG nova.filters [req-191c8f6e-010b-42f6-acc6-c84c689f649c 2442cfcb9d5c4daf8d90af8bcfe30df7 8eb03bbcdfd84f68b88a7fbaa74e2327 - - -] Starting with 1 host(s) get_filtered_objects /var/lib/kolla/venv/lib/python2.7/site-packages/nova/filters.py:70
  2017-04-02 14:01:47.010 6 INFO nova.scheduler.filters.retry_filter [req-191c8f6e-010b-42f6-acc6-c84c689f649c 2442cfcb9d5c4daf8d90af8bcfe30df7 8eb03bbcdfd84f68b88a7fbaa74e2327 - - -] Host [u'control01', u'control01'] fails.  Previously tried hosts: [[u'control01', u'control01']]

  
  I think the root cause is the retry part, and still do not know how it happen.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1678577/+subscriptions


Follow ups