← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1955411] Re: Ping loss when live migration

 

Hi Yusuf,

Yes, this is expected. The exact quantity of ping loss will depend on
the network backend (OVS in your case), how busy/loaded the VM is, the
available network bandwidth for libvirt to copy the VM memory, as well
as whether autoconverge and/or post-copy is in use.

The following is an oversimplification, but it explains the general
idea.

When the VM is paused on the source host, libvirt needs to finish
copying the remaining yet-uncopied memory to the destination, and the
network backend needs to switch its flow rules from the source to the
destination. How fast these two things happens depends on the factors
listed in the first paragraph.

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1955411

Title:
  Ping loss when live migration

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Description
  ===========

  Hi, we are seeing 4 to 12 ping packet loss on our victoria cluster
  when live migrating instances.

  Is this behaviour normal? We had several tests with different flavors
  and different cpu/memory loads on instance but still loosing same
  number of pings. (Memory-CPU load does not affect)

  Steps to reproduce
  ==================

  Live migrate an instance from host A to host B.

  Ping loss from instance to outside:

  root@test-migration-small-03:/home/myuser# ping 8.8.8.8
  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
  64 bytes from 8.8.8.8: icmp_seq=1 ttl=112 time=28.8 ms
  64 bytes from 8.8.8.8: icmp_seq=2 ttl=112 time=28.2 ms
  64 bytes from 8.8.8.8: icmp_seq=3 ttl=112 time=28.4 ms
  64 bytes from 8.8.8.8: icmp_seq=4 ttl=112 time=28.2 ms
  64 bytes from 8.8.8.8: icmp_seq=5 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=6 ttl=112 time=28.4 ms
  64 bytes from 8.8.8.8: icmp_seq=7 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=8 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=9 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=10 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=11 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=12 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=13 ttl=112 time=28.4 ms
  64 bytes from 8.8.8.8: icmp_seq=14 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=15 ttl=112 time=28.4 ms
  64 bytes from 8.8.8.8: icmp_seq=16 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=17 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=18 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=19 ttl=112 time=28.2 ms
  64 bytes from 8.8.8.8: icmp_seq=20 ttl=112 time=28.5 ms
  64 bytes from 8.8.8.8: icmp_seq=21 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=22 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=23 ttl=112 time=28.2 ms
  64 bytes from 8.8.8.8: icmp_seq=24 ttl=112 time=28.2 ms
  64 bytes from 8.8.8.8: icmp_seq=25 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=26 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=27 ttl=112 time=28.7 ms
  64 bytes from 8.8.8.8: icmp_seq=33 ttl=112 time=31.5 ms
  64 bytes from 8.8.8.8: icmp_seq=34 ttl=112 time=28.7 ms
  64 bytes from 8.8.8.8: icmp_seq=35 ttl=112 time=28.7 ms
  64 bytes from 8.8.8.8: icmp_seq=36 ttl=112 time=28.5 ms
  64 bytes from 8.8.8.8: icmp_seq=37 ttl=112 time=28.5 ms
  64 bytes from 8.8.8.8: icmp_seq=38 ttl=112 time=28.5 ms
  64 bytes from 8.8.8.8: icmp_seq=39 ttl=112 time=28.4 ms
  64 bytes from 8.8.8.8: icmp_seq=40 ttl=112 time=28.3 ms
  64 bytes from 8.8.8.8: icmp_seq=41 ttl=112 time=28.4 ms
  ^C
  --- 8.8.8.8 ping statistics ---
  41 packets transmitted, 36 received, 12.1951% packet loss, time 40198ms
  rtt min/avg/max/mdev = 28.186/28.462/31.511/0.534 ms
  root@test-migration-small-03:/home/myuser#

  Ping loss from outside to instance:

  mypc:~ mypc$ ping 10.216.12.220
  PING 10.216.12.220 (10.216.12.220): 56 data bytes
  64 bytes from 10.216.12.220: icmp_seq=0 ttl=59 time=20.188 ms
  64 bytes from 10.216.12.220: icmp_seq=1 ttl=59 time=35.334 ms
  64 bytes from 10.216.12.220: icmp_seq=2 ttl=59 time=33.305 ms
  64 bytes from 10.216.12.220: icmp_seq=3 ttl=59 time=28.945 ms
  64 bytes from 10.216.12.220: icmp_seq=4 ttl=59 time=25.146 ms
  64 bytes from 10.216.12.220: icmp_seq=5 ttl=59 time=21.234 ms
  64 bytes from 10.216.12.220: icmp_seq=6 ttl=59 time=19.734 ms
  64 bytes from 10.216.12.220: icmp_seq=7 ttl=59 time=18.885 ms
  64 bytes from 10.216.12.220: icmp_seq=8 ttl=59 time=18.350 ms
  64 bytes from 10.216.12.220: icmp_seq=9 ttl=59 time=32.273 ms
  64 bytes from 10.216.12.220: icmp_seq=10 ttl=59 time=28.046 ms
  64 bytes from 10.216.12.220: icmp_seq=11 ttl=59 time=24.079 ms
  64 bytes from 10.216.12.220: icmp_seq=12 ttl=59 time=22.562 ms
  64 bytes from 10.216.12.220: icmp_seq=13 ttl=59 time=35.110 ms
  64 bytes from 10.216.12.220: icmp_seq=14 ttl=59 time=30.782 ms
  64 bytes from 10.216.12.220: icmp_seq=15 ttl=59 time=29.286 ms
  64 bytes from 10.216.12.220: icmp_seq=16 ttl=59 time=21.181 ms
  64 bytes from 10.216.12.220: icmp_seq=17 ttl=59 time=23.114 ms
  64 bytes from 10.216.12.220: icmp_seq=18 ttl=59 time=19.452 ms
  64 bytes from 10.216.12.220: icmp_seq=19 ttl=59 time=20.370 ms
  64 bytes from 10.216.12.220: icmp_seq=20 ttl=59 time=147.181 ms
  64 bytes from 10.216.12.220: icmp_seq=21 ttl=59 time=30.509 ms
  Request timeout for icmp_seq 22
  64 bytes from 10.216.12.220: icmp_seq=23 ttl=59 time=29.559 ms
  64 bytes from 10.216.12.220: icmp_seq=24 ttl=59 time=23.758 ms
  64 bytes from 10.216.12.220: icmp_seq=25 ttl=59 time=21.762 ms
  64 bytes from 10.216.12.220: icmp_seq=26 ttl=59 time=33.365 ms
  64 bytes from 10.216.12.220: icmp_seq=27 ttl=59 time=32.682 ms
  64 bytes from 10.216.12.220: icmp_seq=28 ttl=59 time=29.312 ms
  Request timeout for icmp_seq 29
  Request timeout for icmp_seq 30
  Request timeout for icmp_seq 31
  Request timeout for icmp_seq 32
  Request timeout for icmp_seq 33
  64 bytes from 10.216.12.220: icmp_seq=34 ttl=59 time=35.776 ms
  64 bytes from 10.216.12.220: icmp_seq=35 ttl=59 time=30.732 ms
  64 bytes from 10.216.12.220: icmp_seq=36 ttl=59 time=27.614 ms
  64 bytes from 10.216.12.220: icmp_seq=37 ttl=59 time=23.491 ms
  64 bytes from 10.216.12.220: icmp_seq=38 ttl=59 time=22.426 ms
  64 bytes from 10.216.12.220: icmp_seq=39 ttl=59 time=37.707 ms
  64 bytes from 10.216.12.220: icmp_seq=40 ttl=59 time=31.962 ms
  64 bytes from 10.216.12.220: icmp_seq=41 ttl=59 time=30.519 ms
  ^C
  --- 10.216.12.220 ping statistics ---
  42 packets transmitted, 36 packets received, 14.3% packet loss
  round-trip min/avg/max/stddev = 18.350/30.437/147.181/20.488 ms
  mypc:~ mypc$

  We had tested some config parameters like:
   live_migration_downtime = 500
   live_migration_downtime_steps = 30
   live_migration_downtime_delay = 50
   live_migration_wait_for_vif_plug = true
   vif_plugging_timeout = 10
   vif_plugging_is_fatal = true
   live_migration_permit_post_copy = true
   live_migration_permit_auto_converge = true
   live_migration_bandwidth = <various bandwiths>

     from
  https://docs.openstack.org/nova/victoria/configuration/config.html

  Expected result
  ===============
  No ping loss should be occur. (Or may be 1 or 2 pings?)

  Actual result
  =============
  Lost too many pings.

  Migrate API request logs attached.

  Environment
  ===========
   OpenStack Victoria Cluster installed via kolla-ansible to Ubuntu 20.04.2 LTS Hosts. (Kernel:5.4.0-90-generic)
   HyperVisor: Libvirt + KVM
   Storage: Ceph Cluster (version 15.2.10 octopus (stable))
   There exist 5 controller+network node.
   nova-compute --version : 22.2.3
   libvirtd --version : libvirtd (libvirt) 6.0.0
   Networking Type: Neutron with OpenVSwitch |"neutron-openvswitch-agent", "neutron-l3-agent" and "neutron-server" version is "17.2.2.dev46" | OpenvSwitch used in DVR mode with router HA configured. (l3_ha = true) | We are using a single centralized neutron router for connecting all tenant networks to provider network. | FireWall Driver: Native OpenVswitch Firewall Driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1955411/+subscriptions



References