yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80783
[Bug 1753676] Re: Live migration not working as Expected when Restarting nova-compute service while migration from source node
Reviewed: https://review.opendev.org/678016
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ebcf6e4ce576285949c5a202f2d7d21dc03156ef
Submitter: Zuul
Branch: master
commit ebcf6e4ce576285949c5a202f2d7d21dc03156ef
Author: Alexandre Arents <alexandre.arents@xxxxxxxxxxxx>
Date: Tue Aug 20 13:37:33 2019 +0000
Abort live-migration during instance_init
When compute service restart during a live-migration,
we lose live-migration monitoring thread. In that case
it is better to early abort live-migration job before resetting
state of instance, this will avoid API to accept further
action while unmanaged migration process still run in background.
It also avoid unexpected/dangerous behavior as describe in related bug.
Change-Id: Idec2d31cbba497dc4b20912f3388ad2341951d23
Closes-Bug: #1753676
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1753676
Title:
Live migration not working as Expected when Restarting nova-compute
service while migration from source node
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
Environment: Ubuntu 16.04
Openstack Version: Pike
I am trying to migrate VM ( live migration ( block migration ) ) form
one compute node to another compute node...Everything looks good
unless I restart nova-compute service, live migration still running
underneath with help of libvirt, once the vm reaches destination,
database is not updated properly.
Steps to reproduce:
===================
nova.conf ( libvirt setting on both compute nodes )
[libvirt]
live_migration_bandwidth=1200
live_migration_downtime=100
live_migration_downtime_steps =3
live_migration_downtime_delay=10
live_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE
virt_type = kvm
inject_password = False
disk_cachemodes = network=writeback
live_migration_uri = "qemu+tcp://nova@%s/system"
live_migration_tunnelled = False
block_migration_flag = VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC
( default openstack live migration configuration ( pre-copy with no tunneling )
Source vm root disk ( boot from volume with one ephemernal disk (160GB) )
Trying to migrate vm from compute1 to compute2, below is my source vm.
| OS-EXT-SRV-ATTR:host | compute1 |
| OS-EXT-SRV-ATTR:hostname | testcase1-all-ephemernal-boot-from-vol |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000153
1) nova live-migration --block-migrate <vm-id> compute2
[req-48a3df61-3974-46ac-8019-c4c4a0f8a8c8
4a8150eb246a4450829331e993f8c3fd f11a5d3631f14c4f879a2e7dddb96c06 -
default default] pre_live_migration data is
LibvirtLiveMigrateData(bdms=<?>,block_migration=True,disk_available_mb=6900736,disk_over_commit=<?>,filename='tmpW5ApOS',graphics_listen_addr_spice=x.x.x.x,graphics_listen_addr_vnc=127.0.0.1,image_type='default',instance_relative_path='504028fc-1381
-42ca-ad7c-
def7f749a722',is_shared_block_storage=False,is_shared_instance_path=False,is_volume_backed=True,migration=<?>,serial_listen_addr=None,serial_listen_ports=<?>,supported_perf_events=<?>,target_connect_addr=<?>)
pre_live_migration /openstack/venvs/nova-16.0.6/lib/python2.7/site-
packages/nova/compute/manager.py:5437
Migration started, able to see the data and memory transfer ( using
iftop )
Data transfer between compute nodes using iftop
<= 4.94Gb 4.99Gb 5.01Gb
Restarted Nova-compute service on source compute node ( where the vm
is migrating)
Live migration still it is going, once migration completes, below is
my total data transfer ( using iftop )
TX: cum: 17.3MB peak: 2.50Mb rates: 11.1Kb 7.11Kb 463Kb
RX: 97.7GB 4.97Gb 3.82Kb 1.93Kb 1.87Gb
TOTAL: 97.7GB 4.97Gb
Once migration completes, from the destination compute node ( we can
able to see the virsh domain running)
root@compute2:~# virsh list --all
Id Name State
----------------------------------------------------
3 instance-00000153 running
From the nova-compute.log
Instance <id> has been moved to another host compute1(compute1). There
are allocations remaining against the source host that might need to
be removed: {u'resources': {u'VCPU': 8, u'MEMORY_MB': 23808,
u'DISK_GB': 180}}. _remove_deleted_instances_allocations
/openstack/venvs/nova-16.0.6/lib/python2.7/site-
packages/nova/compute/resource_tracker.py:123
Nova compute still showing 0 vcpus ( but 8 core vm was there )
Total usable vcpus: 56, total allocated vcpus: 0
_report_final_resource_view /openstack/venvs/nova-16.0.6/lib/python2.7
/site-packages/nova/compute/resource_tracker.py:792
nova show <vm-id> ( still nova db shows src hostname, db is not
updated with new compute_node )
OS-EXT-SRV-ATTR:host | compute1 |
| OS-EXT-SRV-ATTR:hostname | testcase1-all-ephemernal-boot-from-vol |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000153
Expected result
===============
DB should update accordingly or it should abort the migration
Actual result
=============
nova show <vm-id> ( still nova db shows src hostname, db is not
updated with new compute_node )
OS-EXT-SRV-ATTR:host | compute1 |
| OS-EXT-SRV-ATTR:hostname | testcase1-all-ephemernal-boot-from-vol |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 |
| OS-EXT-SRV-ATTR:instance_name | instance-00000153
Virsh list on the destination compute node shows below output:
root@compute2:~# virsh list --all
Id Name State
----------------------------------------------------
3 instance-00000153 running
Entire vm data is still present on both compute nodes.
ls /var/lib/nova/instances/18d63c06-b124-4ec4-9e36-afcadccaf23e
After restarting nova-compute service on destination machine ( got
below warning from nova-compute )
2018-03-05 11:19:05.942 5791 WARNING nova.compute.manager [-]
[instance: 18d63c06-b124-4ec4-9e36-afcadccaf23e] Instance is
unexpectedly not found. Ignore.: InstanceNotFound: Instance
18d63c06-b124-4ec4-9e36-afcadccaf23e could not be found.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1753676/+subscriptions
References