← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1753676] [NEW] Live migration not working as Expected when Restarting nova-compute service while migration

 

Public bug reported:

Description
===========

Environment: Ubuntu 16.04
Openstack Version: Pike

I am trying to migrate VM ( live migration ( block migration ) ) form
one compute node to another compute node...Everything looks good unless
I restart nova-compute service, live migration still running underneath
with help of libvirt, once the vm reaches destination, database is not
updated properly.


Steps to reproduce:
===================

nova.conf ( libvirt setting on both compute nodes )

[libvirt]
live_migration_bandwidth=1200
live_migration_downtime=100
live_migration_downtime_steps =3
live_migration_downtime_delay=10
live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
virt_type = kvm
inject_password = False
disk_cachemodes = network=writeback
live_migration_uri = "qemu+tcp://nova@%s/system"
live_migration_tunnelled = False
block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC


( default openstack live migration configuration ( pre-copy with no tunneling )
Source vm root disk ( boot from volume with one ephemernal disk (160GB) )


Trying to migrate vm from compute1 to compute2, below is my source vm.

| OS-EXT-SRV-ATTR:host                 | compute1                                                                |
| OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000153

1) nova live-migration --block-migrate <vm-id> compute2


[req-48a3df61-3974-46ac-8019-c4c4a0f8a8c8 4a8150eb246a4450829331e993f8c3fd f11a5d3631f14c4f879a2e7dddb96c06 - default default] pre_live_migration data is LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=6900736,disk_over_commit=<?>,filename='tmpW5ApOS',graphics_listen_addr_spice=x.x.x.x,graphics_listen_addr_vnc=127.0.0.1,image_type='default',instance_relative_path='504028fc-1381-42ca-ad7c-def7f749a722',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=True,migration=<?>,serial_listen_addr=None,serial_listen_ports=<?>,supported_perf_events=<?>,target_connect_addr=<?>) pre_live_migration /openstack/venvs/nova-16.0.6/lib/python2.7/site-packages/nova/compute/manager.py:5437


Migration started, able to see the data and memory transfer ( using iftop )

Data transfer between compute nodes using iftop 
                                                                                      <=                                                                                          4.94Gb  4.99Gb  5.01Gb

Restarted Nova-compute service on source compute node ( where the vm is
migrating)

Live migration still it is going, once migration completes, below is my
total data transfer ( using iftop )

TX:             cum:   17.3MB   peak:   2.50Mb                                                                                                                              rates:   11.1Kb  7.11Kb   463Kb
RX:                    97.7GB           4.97Gb                                                                                                                                       3.82Kb  1.93Kb  1.87Gb
TOTAL:                 97.7GB           4.97Gb

Once migration completes, from the destination compute node ( we can
able to see the virsh domain running)

root@compute2:~# virsh list --all
 Id    Name                           State
----------------------------------------------------
 3     instance-00000153              running

>From the nova-compute.log

Instance <id> has been moved to another host compute1(compute1). There
are allocations remaining against the source host that might need to be
removed: {u'resources': {u'VCPU': 8, u'MEMORY_MB': 23808, u'DISK_GB':
180}}. _remove_deleted_instances_allocations
/openstack/venvs/nova-16.0.6/lib/python2.7/site-
packages/nova/compute/resource_tracker.py:123

Nova compute still showing 0 vcpus ( but 8 core vm was there )

Total usable vcpus: 56, total allocated vcpus: 0
_report_final_resource_view /openstack/venvs/nova-16.0.6/lib/python2.7
/site-packages/nova/compute/resource_tracker.py:792

nova show <vm-id> ( still nova db shows src hostname, db is not updated
with new compute_node )

  OS-EXT-SRV-ATTR:host                 | compute1                                                                 |
| OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000153


Entire vm data is still present on both compute nodes.

After restarting nova-compute service on destination machine ( got below
warning from nova-compute )

2018-03-05 11:19:05.942 5791 WARNING nova.compute.manager [-] [instance:
18d63c06-b124-4ec4-9e36-afcadccaf23e] Instance is unexpectedly not
found. Ignore.: InstanceNotFound: Instance
18d63c06-b124-4ec4-9e36-afcadccaf23e could not be found.


Expected result
===============
DB should update accordingly or it should abort the migration

Actual result
=============

nova show <vm-id> ( still nova db shows src hostname, db is not updated
with new compute_node )

  OS-EXT-SRV-ATTR:host                 | compute1                                                                 |
| OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000153

Virsh list on the destination compute node shows below output:

root@compute2:~# virsh list --all
 Id    Name                           State
----------------------------------------------------
 3     instance-00000153              running


Entire vm data is still present on both compute nodes.

ls /var/lib/nova/instances/18d63c06-b124-4ec4-9e36-afcadccaf23e


After restarting nova-compute service on destination machine ( got below warning from nova-compute )

2018-03-05 11:19:05.942 5791 WARNING nova.compute.manager [-] [instance:
18d63c06-b124-4ec4-9e36-afcadccaf23e] Instance is unexpectedly not
found. Ignore.: InstanceNotFound: Instance
18d63c06-b124-4ec4-9e36-afcadccaf23e could not be found.

** Affects: nova
     Importance: Undecided
         Status: New

** Summary changed:

- Live migration not working as Expected when Restarting nova-compute service
+ Live migration not working as Expected when Restarting nova-compute service while migration

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1753676

Title:
  Live migration not working as Expected when Restarting nova-compute
  service while migration

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  Environment: Ubuntu 16.04
  Openstack Version: Pike

  I am trying to migrate VM ( live migration ( block migration ) ) form
  one compute node to another compute node...Everything looks good
  unless I restart nova-compute service, live migration still running
  underneath with help of libvirt, once the vm reaches destination,
  database is not updated properly.

  
  Steps to reproduce:
  ===================

  nova.conf ( libvirt setting on both compute nodes )

  [libvirt]
  live_migration_bandwidth=1200
  live_migration_downtime=100
  live_migration_downtime_steps =3
  live_migration_downtime_delay=10
  live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
  virt_type = kvm
  inject_password = False
  disk_cachemodes = network=writeback
  live_migration_uri = "qemu+tcp://nova@%s/system"
  live_migration_tunnelled = False
  block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC

  
  ( default openstack live migration configuration ( pre-copy with no tunneling )
  Source vm root disk ( boot from volume with one ephemernal disk (160GB) )

  
  Trying to migrate vm from compute1 to compute2, below is my source vm.

  | OS-EXT-SRV-ATTR:host                 | compute1                                                                |
  | OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
  | OS-EXT-SRV-ATTR:instance_name        | instance-00000153

  1) nova live-migration --block-migrate <vm-id> compute2

  
  [req-48a3df61-3974-46ac-8019-c4c4a0f8a8c8 4a8150eb246a4450829331e993f8c3fd f11a5d3631f14c4f879a2e7dddb96c06 - default default] pre_live_migration data is LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=6900736,disk_over_commit=<?>,filename='tmpW5ApOS',graphics_listen_addr_spice=x.x.x.x,graphics_listen_addr_vnc=127.0.0.1,image_type='default',instance_relative_path='504028fc-1381-42ca-ad7c-def7f749a722',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=True,migration=<?>,serial_listen_addr=None,serial_listen_ports=<?>,supported_perf_events=<?>,target_connect_addr=<?>) pre_live_migration /openstack/venvs/nova-16.0.6/lib/python2.7/site-packages/nova/compute/manager.py:5437

  
  Migration started, able to see the data and memory transfer ( using iftop )

  Data transfer between compute nodes using iftop 
                                                                                        <=                                                                                          4.94Gb  4.99Gb  5.01Gb

  Restarted Nova-compute service on source compute node ( where the vm
  is migrating)

  Live migration still it is going, once migration completes, below is
  my total data transfer ( using iftop )

  TX:             cum:   17.3MB   peak:   2.50Mb                                                                                                                              rates:   11.1Kb  7.11Kb   463Kb
  RX:                    97.7GB           4.97Gb                                                                                                                                       3.82Kb  1.93Kb  1.87Gb
  TOTAL:                 97.7GB           4.97Gb

  Once migration completes, from the destination compute node ( we can
  able to see the virsh domain running)

  root@compute2:~# virsh list --all
   Id    Name                           State
  ----------------------------------------------------
   3     instance-00000153              running

  From the nova-compute.log

  Instance <id> has been moved to another host compute1(compute1). There
  are allocations remaining against the source host that might need to
  be removed: {u'resources': {u'VCPU': 8, u'MEMORY_MB': 23808,
  u'DISK_GB': 180}}. _remove_deleted_instances_allocations
  /openstack/venvs/nova-16.0.6/lib/python2.7/site-
  packages/nova/compute/resource_tracker.py:123

  Nova compute still showing 0 vcpus ( but 8 core vm was there )

  Total usable vcpus: 56, total allocated vcpus: 0
  _report_final_resource_view /openstack/venvs/nova-16.0.6/lib/python2.7
  /site-packages/nova/compute/resource_tracker.py:792

  nova show <vm-id> ( still nova db shows src hostname, db is not
  updated with new compute_node )

    OS-EXT-SRV-ATTR:host                 | compute1                                                                 |
  | OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
  | OS-EXT-SRV-ATTR:instance_name        | instance-00000153

  
  Entire vm data is still present on both compute nodes.

  After restarting nova-compute service on destination machine ( got
  below warning from nova-compute )

  2018-03-05 11:19:05.942 5791 WARNING nova.compute.manager [-]
  [instance: 18d63c06-b124-4ec4-9e36-afcadccaf23e] Instance is
  unexpectedly not found. Ignore.: InstanceNotFound: Instance
  18d63c06-b124-4ec4-9e36-afcadccaf23e could not be found.

  
  Expected result
  ===============
  DB should update accordingly or it should abort the migration

  Actual result
  =============

  nova show <vm-id> ( still nova db shows src hostname, db is not
  updated with new compute_node )

    OS-EXT-SRV-ATTR:host                 | compute1                                                                 |
  | OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
  | OS-EXT-SRV-ATTR:instance_name        | instance-00000153

  Virsh list on the destination compute node shows below output:

  root@compute2:~# virsh list --all
   Id    Name                           State
  ----------------------------------------------------
   3     instance-00000153              running

  
  Entire vm data is still present on both compute nodes.

  ls /var/lib/nova/instances/18d63c06-b124-4ec4-9e36-afcadccaf23e

  
  After restarting nova-compute service on destination machine ( got below warning from nova-compute )

  2018-03-05 11:19:05.942 5791 WARNING nova.compute.manager [-]
  [instance: 18d63c06-b124-4ec4-9e36-afcadccaf23e] Instance is
  unexpectedly not found. Ignore.: InstanceNotFound: Instance
  18d63c06-b124-4ec4-9e36-afcadccaf23e could not be found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1753676/+subscriptions


Follow ups