← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1950894] Re: live_migration_permit_post_copy mode does not work

 

** Project changed: nova => charm-nova-compute

** Summary changed:

- live_migration_permit_post_copy mode does not work
+ live-migration-permit-post-copy mode does not work

** Description changed:

  Description
  ===========
  Some customers have noted that some VMs never complete a
  live migration. The VM's memory copy keeps oscillating
- around 1-10% but never completes. After changing 
- live_migration_permit_post_copy = True, we expected this to
+ around 1-10% but never completes. After changing
+ live-migration-permit-post-copy = True, we expected this to
  converge and migrate successfully as this feature describes it
  should.
  
  Workaround 1: It's possible to complete the process if you log into the source
  host and run the QMP command[1]:
  
  virsh qemu-monitor-command instance-00000026  '{"execute":"migrate-
  start-postcopy"}'
  
- 
- Workaround 2: The migration finishes if you run 'nova live-migration-force-complete'
- 
+ Workaround 2: The migration finishes if you run 'nova live-migration-
+ force-complete'
  
  I believe this can also be a libvirt bug given that I don't see any "migrate-start-postcopy"
  coming from nova/libvirt logs[4], but only after I manually triggered it via the execute
  command above, at 2021-11-12 19:14:08.053+0000[4].
- 
  
  Steps to reproduce
  ==================
  
  * Set up an OpenStack deployment with live_migration_permit_post_copy=False
  * Create a large VM (8+ CPUs) and install stress-ng
  * Run stress-ng:
-   nohup stress-ng --vm 4 --vm-bytes 10% --vm-method write64 --vm-addr-method pwr2 -t 1h &
+   nohup stress-ng --vm 4 --vm-bytes 10% --vm-method write64 --vm-addr-method pwr2 -t 1h &
  * Migrate the VM, and check for the source host logs messages like:
-   'Migration running for \d+ secs, memory \d+% remaining'
-   This should be oscillating like describing and migration not completing
+   'Migration running for \d+ secs, memory \d+% remaining'
+   This should be oscillating like describing and migration not completing
  * Complete or cancel the  above migration, set live_migration_permit_post_copy=True,
-   restart nova services on the computes, and re-do the operation
- 
+   restart nova services on the computes, and re-do the operation
  
  Expected result
  ===============
  Migration should complete 100% of times
  
  Actual result
  =============
  The migration does not complete and VM's memory is never copied.
  
  Environment
  ===========
  1. Exact version of OpenStack you are running[8]
  
  21.2.1-0ubuntu1
  
- 
  2. Which hypervisor did you use[8]?
  
  qemu-kvm: 4.2-3ubuntu6.18
  libvirt-daemon: 6.0.0-0ubuntu8.14
  
- 
  2. Which storage type did you use?
  
  Shared Ceph
- 
  
  3. Which networking type did you use?
  
  OpenvSwitch L3HA
  
  Logs & Configs
  ==============
- 
  
  [1] QMP Commands: https://gist.github.com/sombrafam/5e8e991058001c2b3843c0d08b4cd7d1
  [2] Migration (completed manually with workaround 1) logs: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
  [3] nova-compute.log src: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
  [4] libvirt.log src: https://gist.github.com/sombrafam/69f05404d7097265140e1578ea50c00c
  [5] Migration list: https://gist.github.com/sombrafam/39b72e242e27b6a3123603db1faa7b19
  [6] Nova.conf dst host: https://gist.github.com/sombrafam/ad43b268e7f4b69e7da513a0f7a0095f
  [7] Nova.conf src host: https://gist.github.com/sombrafam/ab27b40e577fbe56d741f01e811f3a18
  [8] Package versions: https://gist.github.com/sombrafam/0622792d82750b2141b45580b625b69f
  [9] VM info: https://gist.github.com/sombrafam/57eaa4c4ba4b141dec9659ee01f25b6d

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1950894

Title:
  live-migration-permit-post-copy mode does not work

Status in OpenStack Nova Compute Charm:
  New

Bug description:
  Description
  ===========
  Some customers have noted that some VMs never complete a
  live migration. The VM's memory copy keeps oscillating
  around 1-10% but never completes. After changing
  live-migration-permit-post-copy = True, we expected this to
  converge and migrate successfully as this feature describes it
  should.

  Workaround 1: It's possible to complete the process if you log into the source
  host and run the QMP command[1]:

  virsh qemu-monitor-command instance-00000026  '{"execute":"migrate-
  start-postcopy"}'

  Workaround 2: The migration finishes if you run 'nova live-migration-
  force-complete'

  I believe this can also be a libvirt bug given that I don't see any "migrate-start-postcopy"
  coming from nova/libvirt logs[4], but only after I manually triggered it via the execute
  command above, at 2021-11-12 19:14:08.053+0000[4].

  Steps to reproduce
  ==================

  * Set up an OpenStack deployment with live_migration_permit_post_copy=False
  * Create a large VM (8+ CPUs) and install stress-ng
  * Run stress-ng:
    nohup stress-ng --vm 4 --vm-bytes 10% --vm-method write64 --vm-addr-method pwr2 -t 1h &
  * Migrate the VM, and check for the source host logs messages like:
    'Migration running for \d+ secs, memory \d+% remaining'
    This should be oscillating like describing and migration not completing
  * Complete or cancel the  above migration, set live_migration_permit_post_copy=True,
    restart nova services on the computes, and re-do the operation

  Expected result
  ===============
  Migration should complete 100% of times

  Actual result
  =============
  The migration does not complete and VM's memory is never copied.

  Environment
  ===========
  1. Exact version of OpenStack you are running[8]

  21.2.1-0ubuntu1

  2. Which hypervisor did you use[8]?

  qemu-kvm: 4.2-3ubuntu6.18
  libvirt-daemon: 6.0.0-0ubuntu8.14

  2. Which storage type did you use?

  Shared Ceph

  3. Which networking type did you use?

  OpenvSwitch L3HA

  Logs & Configs
  ==============

  [1] QMP Commands: https://gist.github.com/sombrafam/5e8e991058001c2b3843c0d08b4cd7d1
  [2] Migration (completed manually with workaround 1) logs: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
  [3] nova-compute.log src: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
  [4] libvirt.log src: https://gist.github.com/sombrafam/69f05404d7097265140e1578ea50c00c
  [5] Migration list: https://gist.github.com/sombrafam/39b72e242e27b6a3123603db1faa7b19
  [6] Nova.conf dst host: https://gist.github.com/sombrafam/ad43b268e7f4b69e7da513a0f7a0095f
  [7] Nova.conf src host: https://gist.github.com/sombrafam/ab27b40e577fbe56d741f01e811f3a18
  [8] Package versions: https://gist.github.com/sombrafam/0622792d82750b2141b45580b625b69f
  [9] VM info: https://gist.github.com/sombrafam/57eaa4c4ba4b141dec9659ee01f25b6d

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-nova-compute/+bug/1950894/+subscriptions



References