← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1753676] Re: Live migration not working as Expected when Restarting nova-compute service while migration from source node

 

Reviewed:  https://review.opendev.org/678016
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ebcf6e4ce576285949c5a202f2d7d21dc03156ef
Submitter: Zuul
Branch:    master

commit ebcf6e4ce576285949c5a202f2d7d21dc03156ef
Author: Alexandre Arents <alexandre.arents@xxxxxxxxxxxx>
Date:   Tue Aug 20 13:37:33 2019 +0000

    Abort live-migration during instance_init
    
    When compute service restart during a live-migration,
    we lose live-migration monitoring thread. In that case
    it is better to early abort live-migration job before resetting
    state of instance, this will avoid API to accept further
    action while unmanaged migration process still run in background.
    It also avoid unexpected/dangerous behavior as describe in related bug.
    
    Change-Id: Idec2d31cbba497dc4b20912f3388ad2341951d23
    Closes-Bug: #1753676


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1753676

Title:
  Live migration not working as Expected when Restarting nova-compute
  service while migration from source node

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========

  Environment: Ubuntu 16.04
  Openstack Version: Pike

  I am trying to migrate VM ( live migration ( block migration ) ) form
  one compute node to another compute node...Everything looks good
  unless I restart nova-compute service, live migration still running
  underneath with help of libvirt, once the vm reaches destination,
  database is not updated properly.

  Steps to reproduce:
  ===================

  nova.conf ( libvirt setting on both compute nodes )

  [libvirt]
  live_migration_bandwidth=1200
  live_migration_downtime=100
  live_migration_downtime_steps =3
  live_migration_downtime_delay=10
  live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
  virt_type = kvm
  inject_password = False
  disk_cachemodes = network=writeback
  live_migration_uri = "qemu+tcp://nova@%s/system"
  live_migration_tunnelled = False
  block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC

  ( default openstack live migration configuration ( pre-copy with no tunneling )
  Source vm root disk ( boot from volume with one ephemernal disk (160GB) )

  Trying to migrate vm from compute1 to compute2, below is my source vm.

  | OS-EXT-SRV-ATTR:host                 | compute1                                                                |
  | OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
  | OS-EXT-SRV-ATTR:instance_name        | instance-00000153

  1) nova live-migration --block-migrate <vm-id> compute2

  [req-48a3df61-3974-46ac-8019-c4c4a0f8a8c8
  4a8150eb246a4450829331e993f8c3fd f11a5d3631f14c4f879a2e7dddb96c06 -
  default default] pre_live_migration data is
  LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=6900736,disk_over_commit=<?>,filename='tmpW5ApOS',graphics_listen_addr_spice=x.x.x.x,graphics_listen_addr_vnc=127.0.0.1,image_type='default',instance_relative_path='504028fc-1381
  -42ca-ad7c-
  def7f749a722',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=True,migration=<?>,serial_listen_addr=None,serial_listen_ports=<?>,supported_perf_events=<?>,target_connect_addr=<?>)
  pre_live_migration /openstack/venvs/nova-16.0.6/lib/python2.7/site-
  packages/nova/compute/manager.py:5437

  Migration started, able to see the data and memory transfer ( using
  iftop )

  Data transfer between compute nodes using iftop
                                                                                        <=                                                                                          4.94Gb  4.99Gb  5.01Gb

  Restarted Nova-compute service on source compute node ( where the vm
  is migrating)

  Live migration still it is going, once migration completes, below is
  my total data transfer ( using iftop )

  TX:             cum:   17.3MB   peak:   2.50Mb                                                                                                                              rates:   11.1Kb  7.11Kb   463Kb
  RX:                    97.7GB           4.97Gb                                                                                                                                       3.82Kb  1.93Kb  1.87Gb
  TOTAL:                 97.7GB           4.97Gb

  Once migration completes, from the destination compute node ( we can
  able to see the virsh domain running)

  root@compute2:~# virsh list --all
   Id    Name                           State
  ----------------------------------------------------
   3     instance-00000153              running

  From the nova-compute.log

  Instance <id> has been moved to another host compute1(compute1). There
  are allocations remaining against the source host that might need to
  be removed: {u'resources': {u'VCPU': 8, u'MEMORY_MB': 23808,
  u'DISK_GB': 180}}. _remove_deleted_instances_allocations
  /openstack/venvs/nova-16.0.6/lib/python2.7/site-
  packages/nova/compute/resource_tracker.py:123

  Nova compute still showing 0 vcpus ( but 8 core vm was there )

  Total usable vcpus: 56, total allocated vcpus: 0
  _report_final_resource_view /openstack/venvs/nova-16.0.6/lib/python2.7
  /site-packages/nova/compute/resource_tracker.py:792

  nova show <vm-id> ( still nova db shows src hostname, db is not
  updated with new compute_node )

    OS-EXT-SRV-ATTR:host                 | compute1                                                                 |
  | OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
  | OS-EXT-SRV-ATTR:instance_name        | instance-00000153

  Expected result
  ===============
  DB should update accordingly or it should abort the migration

  Actual result
  =============

  nova show <vm-id> ( still nova db shows src hostname, db is not
  updated with new compute_node )

    OS-EXT-SRV-ATTR:host                 | compute1                                                                 |
  | OS-EXT-SRV-ATTR:hostname             | testcase1-all-ephemernal-boot-from-vol                                           |
  | OS-EXT-SRV-ATTR:hypervisor_hostname  | compute1                                                |
  | OS-EXT-SRV-ATTR:instance_name        | instance-00000153

  Virsh list on the destination compute node shows below output:

  root@compute2:~# virsh list --all
   Id    Name                           State
  ----------------------------------------------------
   3     instance-00000153              running

  Entire vm data is still present on both compute nodes.

  ls /var/lib/nova/instances/18d63c06-b124-4ec4-9e36-afcadccaf23e

  After restarting nova-compute service on destination machine ( got
  below warning from nova-compute )

  2018-03-05 11:19:05.942 5791 WARNING nova.compute.manager [-]
  [instance: 18d63c06-b124-4ec4-9e36-afcadccaf23e] Instance is
  unexpectedly not found. Ignore.: InstanceNotFound: Instance
  18d63c06-b124-4ec4-9e36-afcadccaf23e could not be found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1753676/+subscriptions


References