yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75634
[Bug 1798690] Re: Live migrate of iscsi-backed VM loses internal network connectivity
i have not looked into this closely but my guess is this could be related to the arp suppression rules
used in the dvr case not being updated correctly that said it is just a guess
so there may be something more going on here.
eric can you try doing a hard reboot of the migrated instance and see if
that corrects the connectivity to the internal network ips.
it would also be helpful to know if you are using the iptabels firewall or openvswtich firewall
and if following the migration the port status and port admin status are active/up?
i may not have time to help futher but ill try and check in on this bug again in a few days.
from a nova perspective i dont think this is a nova but or os-vif for that matter but i have been investaging live migration related issues this cycle and this is yet another edgecase that apperars
to need fixing.
** Changed in: nova
Status: New => Opinion
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1798690
Title:
Live migrate of iscsi-backed VM loses internal network connectivity
Status in neutron:
New
Status in OpenStack Compute (nova):
Opinion
Bug description:
Description
===========
Note that this may be a Neutron issue, but since it is happening
during live migration, I wanted to point it out to the Nova group
first, and let them decide whether to include the Neutron group on
this ticket.
Also note that this may not be related to iSCSI at all - I just don't
have access to Ceph-backed VMs at the moment to test.
Live migration of a VM that uses an iSCSI-backed volume-based boot
disk (no other disks attached) will migrate correctly, including the
volume, and DVR router functionality with floating IPs, but internal
network connectivity won't work (pings between VMs on the same Neutron
network fail).
After live migrating the "bad" VM back to the original host, internal
networking works again!
NOTE - this seems to be only reproducible if you deploy the VMs, do
"not" ping between the VMs, migrate one of the VMs, and "then" ping
between the VMs. The ping fails in this case. In the case where
pings are performed "prior" to migration, the pings succeed!
So, it appears that something in Neutron isn't being migrated.
I had tested this configuration back in the Liberty days and ran into
the same issue, and thought it was possibly a bug that was fixed by
now, but it looks like the problem still exists.
Note that I'm still looking at logs to determine whether there is good
evidence for why/when this happens, but wanted to get a bug report
placed in case it was a known issue.
Steps to reproduce
==================
Deploy 2 VMs with an internal network, each with floating IPs, with
security groups that are not very restrictive (allow everything
including pings between VMs and the Internet).
In our case, the two VMs were deployed on separate physical hosts.
If VM #2 resides on physical host compute002 after deployment, live migrate this VM to physical host compute003 with:
openstack server migrate --live compute003 d3d45afb-e913-4cb7-89df-a1c1d51d6339
From VM #2, ping VM #1. There is no ping response.
If you perform all of the above, but ping between the VMs "prior" to
migration, pings work fine after migrations (hiding the issue).
Expected result
===============
Network should function correctly after a migration - pings should
work, for example, between VMs.
Actual result
=============
Testing with VM to VM pings: pings are lost and connectivity "never"
resumes. I deployed the 2 VMs, migrated one of them, and started a
ping from one VM to the other, waited 16+ minutes, and pings are still
failing.
Perform a live migrate of VM #2 back to the original host using:
openstack server migrate --live compute002 d3d45afb-e913-4cb7-89df-a1c1d51d6339
and pings start to work again.
Perform a live migrate of VM #2 to the same host as VM #1 and pings
between VMs "also" work!
Environment
===========
stable/rocky deployment with Kolla-Ansible 7.0.0.0rc3devXX (the latest
as of October 15th, 2018) and Kolla 7.0.0.0rc3devXX
CentOS 7.5 with latest updates as of October 15, 2018.
Kernel: Linux 4.18.14-1.el7.elrepo.x86_64
Hypervisor: KVM
Storage: Blockbridge (unsupported, but functions the same as other
iSCSI based backends)
Networking: DVR with OpenVSwitch
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1798690/+subscriptions
References