← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1950894] [NEW] live_migration_permit_post_copy mode does not work

 

Public bug reported:

Description
===========
Some customers have noted that some VMs never complete a
live migration. The VM's memory copy keeps oscillating
around 1-10% but never completes. After changing 
live_migration_permit_post_copy = True, we expected this to
converge and migrate successfully as this feature describes it
should.

Workaround 1: It's possible to complete the process if you log into the source
host and run the QMP command[1]:

virsh qemu-monitor-command instance-00000026  '{"execute":"migrate-
start-postcopy"}'


Workaround 2: The migration finishes if you run 'nova live-migration-force-complete'


I believe this can also be a libvirt bug given that I don't see any "migrate-start-postcopy"
coming from nova/libvirt logs[4], but only after I manually triggered it via the execute
command above, at 2021-11-12 19:14:08.053+0000[4].


Steps to reproduce
==================

* Set up an OpenStack deployment with live_migration_permit_post_copy=False
* Create a large VM (8+ CPUs) and install stress-ng
* Run stress-ng:
  nohup stress-ng --vm 4 --vm-bytes 10% --vm-method write64 --vm-addr-method pwr2 -t 1h &
* Migrate the VM, and check for the source host logs messages like:
  'Migration running for \d+ secs, memory \d+% remaining'
  This should be oscillating like describing and migration not completing
* Complete or cancel the  above migration, set live_migration_permit_post_copy=True,
  restart nova services on the computes, and re-do the operation


Expected result
===============
Migration should complete 100% of times

Actual result
=============
The migration does not complete and VM's memory is never copied.

Environment
===========
1. Exact version of OpenStack you are running[8]

21.2.1-0ubuntu1


2. Which hypervisor did you use[8]?

qemu-kvm: 4.2-3ubuntu6.18
libvirt-daemon: 6.0.0-0ubuntu8.14


2. Which storage type did you use?

Shared Ceph


3. Which networking type did you use?

OpenvSwitch L3HA

Logs & Configs
==============


[1] QMP Commands: https://gist.github.com/sombrafam/5e8e991058001c2b3843c0d08b4cd7d1
[2] Migration (completed manually with workaround 1) logs: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
[3] nova-compute.log src: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
[4] libvirt.log src: https://gist.github.com/sombrafam/69f05404d7097265140e1578ea50c00c
[5] Migration list: https://gist.github.com/sombrafam/39b72e242e27b6a3123603db1faa7b19
[6] Nova.conf dst host: https://gist.github.com/sombrafam/ad43b268e7f4b69e7da513a0f7a0095f
[7] Nova.conf src host: https://gist.github.com/sombrafam/ab27b40e577fbe56d741f01e811f3a18
[8] Package versions: https://gist.github.com/sombrafam/0622792d82750b2141b45580b625b69f
[9] VM info: https://gist.github.com/sombrafam/57eaa4c4ba4b141dec9659ee01f25b6d

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1950894

Title:
  live_migration_permit_post_copy mode does not work

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  Some customers have noted that some VMs never complete a
  live migration. The VM's memory copy keeps oscillating
  around 1-10% but never completes. After changing 
  live_migration_permit_post_copy = True, we expected this to
  converge and migrate successfully as this feature describes it
  should.

  Workaround 1: It's possible to complete the process if you log into the source
  host and run the QMP command[1]:

  virsh qemu-monitor-command instance-00000026  '{"execute":"migrate-
  start-postcopy"}'

  
  Workaround 2: The migration finishes if you run 'nova live-migration-force-complete'

  
  I believe this can also be a libvirt bug given that I don't see any "migrate-start-postcopy"
  coming from nova/libvirt logs[4], but only after I manually triggered it via the execute
  command above, at 2021-11-12 19:14:08.053+0000[4].

  
  Steps to reproduce
  ==================

  * Set up an OpenStack deployment with live_migration_permit_post_copy=False
  * Create a large VM (8+ CPUs) and install stress-ng
  * Run stress-ng:
    nohup stress-ng --vm 4 --vm-bytes 10% --vm-method write64 --vm-addr-method pwr2 -t 1h &
  * Migrate the VM, and check for the source host logs messages like:
    'Migration running for \d+ secs, memory \d+% remaining'
    This should be oscillating like describing and migration not completing
  * Complete or cancel the  above migration, set live_migration_permit_post_copy=True,
    restart nova services on the computes, and re-do the operation

  
  Expected result
  ===============
  Migration should complete 100% of times

  Actual result
  =============
  The migration does not complete and VM's memory is never copied.

  Environment
  ===========
  1. Exact version of OpenStack you are running[8]

  21.2.1-0ubuntu1

  
  2. Which hypervisor did you use[8]?

  qemu-kvm: 4.2-3ubuntu6.18
  libvirt-daemon: 6.0.0-0ubuntu8.14


  2. Which storage type did you use?

  Shared Ceph

  
  3. Which networking type did you use?

  OpenvSwitch L3HA

  Logs & Configs
  ==============

  
  [1] QMP Commands: https://gist.github.com/sombrafam/5e8e991058001c2b3843c0d08b4cd7d1
  [2] Migration (completed manually with workaround 1) logs: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
  [3] nova-compute.log src: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
  [4] libvirt.log src: https://gist.github.com/sombrafam/69f05404d7097265140e1578ea50c00c
  [5] Migration list: https://gist.github.com/sombrafam/39b72e242e27b6a3123603db1faa7b19
  [6] Nova.conf dst host: https://gist.github.com/sombrafam/ad43b268e7f4b69e7da513a0f7a0095f
  [7] Nova.conf src host: https://gist.github.com/sombrafam/ab27b40e577fbe56d741f01e811f3a18
  [8] Package versions: https://gist.github.com/sombrafam/0622792d82750b2141b45580b625b69f
  [9] VM info: https://gist.github.com/sombrafam/57eaa4c4ba4b141dec9659ee01f25b6d

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1950894/+subscriptions



Follow ups