yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88819
[Bug 1955411] Re: Ping loss when live migration
Hi Yusuf,
Yes, this is expected. The exact quantity of ping loss will depend on
the network backend (OVS in your case), how busy/loaded the VM is, the
available network bandwidth for libvirt to copy the VM memory, as well
as whether autoconverge and/or post-copy is in use.
The following is an oversimplification, but it explains the general
idea.
When the VM is paused on the source host, libvirt needs to finish
copying the remaining yet-uncopied memory to the destination, and the
network backend needs to switch its flow rules from the source to the
destination. How fast these two things happens depends on the factors
listed in the first paragraph.
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1955411
Title:
Ping loss when live migration
Status in OpenStack Compute (nova):
Invalid
Bug description:
Description
===========
Hi, we are seeing 4 to 12 ping packet loss on our victoria cluster
when live migrating instances.
Is this behaviour normal? We had several tests with different flavors
and different cpu/memory loads on instance but still loosing same
number of pings. (Memory-CPU load does not affect)
Steps to reproduce
==================
Live migrate an instance from host A to host B.
Ping loss from instance to outside:
root@test-migration-small-03:/home/myuser# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=112 time=28.8 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=112 time=28.2 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=112 time=28.4 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=112 time=28.2 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=112 time=28.4 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=8 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=9 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=10 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=11 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=12 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=13 ttl=112 time=28.4 ms
64 bytes from 8.8.8.8: icmp_seq=14 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=15 ttl=112 time=28.4 ms
64 bytes from 8.8.8.8: icmp_seq=16 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=17 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=18 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=19 ttl=112 time=28.2 ms
64 bytes from 8.8.8.8: icmp_seq=20 ttl=112 time=28.5 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=112 time=28.2 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=112 time=28.2 ms
64 bytes from 8.8.8.8: icmp_seq=25 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=26 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=27 ttl=112 time=28.7 ms
64 bytes from 8.8.8.8: icmp_seq=33 ttl=112 time=31.5 ms
64 bytes from 8.8.8.8: icmp_seq=34 ttl=112 time=28.7 ms
64 bytes from 8.8.8.8: icmp_seq=35 ttl=112 time=28.7 ms
64 bytes from 8.8.8.8: icmp_seq=36 ttl=112 time=28.5 ms
64 bytes from 8.8.8.8: icmp_seq=37 ttl=112 time=28.5 ms
64 bytes from 8.8.8.8: icmp_seq=38 ttl=112 time=28.5 ms
64 bytes from 8.8.8.8: icmp_seq=39 ttl=112 time=28.4 ms
64 bytes from 8.8.8.8: icmp_seq=40 ttl=112 time=28.3 ms
64 bytes from 8.8.8.8: icmp_seq=41 ttl=112 time=28.4 ms
^C
--- 8.8.8.8 ping statistics ---
41 packets transmitted, 36 received, 12.1951% packet loss, time 40198ms
rtt min/avg/max/mdev = 28.186/28.462/31.511/0.534 ms
root@test-migration-small-03:/home/myuser#
Ping loss from outside to instance:
mypc:~ mypc$ ping 10.216.12.220
PING 10.216.12.220 (10.216.12.220): 56 data bytes
64 bytes from 10.216.12.220: icmp_seq=0 ttl=59 time=20.188 ms
64 bytes from 10.216.12.220: icmp_seq=1 ttl=59 time=35.334 ms
64 bytes from 10.216.12.220: icmp_seq=2 ttl=59 time=33.305 ms
64 bytes from 10.216.12.220: icmp_seq=3 ttl=59 time=28.945 ms
64 bytes from 10.216.12.220: icmp_seq=4 ttl=59 time=25.146 ms
64 bytes from 10.216.12.220: icmp_seq=5 ttl=59 time=21.234 ms
64 bytes from 10.216.12.220: icmp_seq=6 ttl=59 time=19.734 ms
64 bytes from 10.216.12.220: icmp_seq=7 ttl=59 time=18.885 ms
64 bytes from 10.216.12.220: icmp_seq=8 ttl=59 time=18.350 ms
64 bytes from 10.216.12.220: icmp_seq=9 ttl=59 time=32.273 ms
64 bytes from 10.216.12.220: icmp_seq=10 ttl=59 time=28.046 ms
64 bytes from 10.216.12.220: icmp_seq=11 ttl=59 time=24.079 ms
64 bytes from 10.216.12.220: icmp_seq=12 ttl=59 time=22.562 ms
64 bytes from 10.216.12.220: icmp_seq=13 ttl=59 time=35.110 ms
64 bytes from 10.216.12.220: icmp_seq=14 ttl=59 time=30.782 ms
64 bytes from 10.216.12.220: icmp_seq=15 ttl=59 time=29.286 ms
64 bytes from 10.216.12.220: icmp_seq=16 ttl=59 time=21.181 ms
64 bytes from 10.216.12.220: icmp_seq=17 ttl=59 time=23.114 ms
64 bytes from 10.216.12.220: icmp_seq=18 ttl=59 time=19.452 ms
64 bytes from 10.216.12.220: icmp_seq=19 ttl=59 time=20.370 ms
64 bytes from 10.216.12.220: icmp_seq=20 ttl=59 time=147.181 ms
64 bytes from 10.216.12.220: icmp_seq=21 ttl=59 time=30.509 ms
Request timeout for icmp_seq 22
64 bytes from 10.216.12.220: icmp_seq=23 ttl=59 time=29.559 ms
64 bytes from 10.216.12.220: icmp_seq=24 ttl=59 time=23.758 ms
64 bytes from 10.216.12.220: icmp_seq=25 ttl=59 time=21.762 ms
64 bytes from 10.216.12.220: icmp_seq=26 ttl=59 time=33.365 ms
64 bytes from 10.216.12.220: icmp_seq=27 ttl=59 time=32.682 ms
64 bytes from 10.216.12.220: icmp_seq=28 ttl=59 time=29.312 ms
Request timeout for icmp_seq 29
Request timeout for icmp_seq 30
Request timeout for icmp_seq 31
Request timeout for icmp_seq 32
Request timeout for icmp_seq 33
64 bytes from 10.216.12.220: icmp_seq=34 ttl=59 time=35.776 ms
64 bytes from 10.216.12.220: icmp_seq=35 ttl=59 time=30.732 ms
64 bytes from 10.216.12.220: icmp_seq=36 ttl=59 time=27.614 ms
64 bytes from 10.216.12.220: icmp_seq=37 ttl=59 time=23.491 ms
64 bytes from 10.216.12.220: icmp_seq=38 ttl=59 time=22.426 ms
64 bytes from 10.216.12.220: icmp_seq=39 ttl=59 time=37.707 ms
64 bytes from 10.216.12.220: icmp_seq=40 ttl=59 time=31.962 ms
64 bytes from 10.216.12.220: icmp_seq=41 ttl=59 time=30.519 ms
^C
--- 10.216.12.220 ping statistics ---
42 packets transmitted, 36 packets received, 14.3% packet loss
round-trip min/avg/max/stddev = 18.350/30.437/147.181/20.488 ms
mypc:~ mypc$
We had tested some config parameters like:
live_migration_downtime = 500
live_migration_downtime_steps = 30
live_migration_downtime_delay = 50
live_migration_wait_for_vif_plug = true
vif_plugging_timeout = 10
vif_plugging_is_fatal = true
live_migration_permit_post_copy = true
live_migration_permit_auto_converge = true
live_migration_bandwidth = <various bandwiths>
from
https://docs.openstack.org/nova/victoria/configuration/config.html
Expected result
===============
No ping loss should be occur. (Or may be 1 or 2 pings?)
Actual result
=============
Lost too many pings.
Migrate API request logs attached.
Environment
===========
OpenStack Victoria Cluster installed via kolla-ansible to Ubuntu 20.04.2 LTS Hosts. (Kernel:5.4.0-90-generic)
HyperVisor: Libvirt + KVM
Storage: Ceph Cluster (version 15.2.10 octopus (stable))
There exist 5 controller+network node.
nova-compute --version : 22.2.3
libvirtd --version : libvirtd (libvirt) 6.0.0
Networking Type: Neutron with OpenVSwitch |"neutron-openvswitch-agent", "neutron-l3-agent" and "neutron-server" version is "17.2.2.dev46" | OpenvSwitch used in DVR mode with router HA configured. (l3_ha = true) | We are using a single centralized neutron router for connecting all tenant networks to provider network. | FireWall Driver: Native OpenVswitch Firewall Driver
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1955411/+subscriptions
References