yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75315
[Bug 1798690] [NEW] Live migrate of iscsi-backed VM loses internal network connectivity
Public bug reported:
Description
===========
Note that this may be a Neutron issue, but since it is happening during
live migration, I wanted to point it out to the Nova group first, and
let them decide whether to include the Neutron group on this ticket.
Also note that this may not be related to iSCSI at all - I just don't
have access to Ceph-backed VMs at the moment to test.
Live migration of a VM that uses an iSCSI-backed volume-based boot disk
(no other disks attached) will migrate correctly, including the volume,
and DVR router functionality with floating IPs, but internal network
connectivity won't work (pings between VMs on the same Neutron network
fail).
After live migrating the "bad" VM back to the original host, internal
networking works again!
NOTE - this seems to be only reproducible if you deploy the VMs, do
"not" ping between the VMs, migrate one of the VMs, and "then" ping
between the VMs. The ping fails in this case. In the case where pings
are performed "prior" to migration, the pings succeed!
So, it appears that something in Neutron isn't being migrated.
I had tested this configuration back in the Liberty days and ran into
the same issue, and thought it was possibly a bug that was fixed by now,
but it looks like the problem still exists.
Note that I'm still looking at logs to determine whether there is good
evidence for why/when this happens, but wanted to get a bug report
placed in case it was a known issue.
Steps to reproduce
==================
Deploy 2 VMs with an internal network, each with floating IPs, with
security groups that are not very restrictive (allow everything
including pings between VMs and the Internet).
In our case, the two VMs were deployed on separate physical hosts.
If VM #2 resides on physical host compute002 after deployment, live migrate this VM to physical host compute003 with:
openstack server migrate --live compute003 d3d45afb-e913-4cb7-89df-a1c1d51d6339
>From VM #2, ping VM #1. There is no ping response.
If you perform all of the above, but ping between the VMs "prior" to
migration, pings work fine after migrations (hiding the issue).
Expected result
===============
Network should function correctly after a migration - pings should work,
for example, between VMs.
Actual result
=============
Testing with VM to VM pings: pings are lost and connectivity "never"
resumes. I deployed the 2 VMs, migrated one of them, and started a ping
from one VM to the other, waited 16+ minutes, and pings are still
failing.
Perform a live migrate of VM #2 back to the original host using:
openstack server migrate --live compute002 d3d45afb-e913-4cb7-89df-a1c1d51d6339
and pings start to work again.
Perform a live migrate of VM #2 to the same host as VM #1 and pings
between VMs "also" work!
Environment
===========
stable/rocky deployment with Kolla-Ansible 7.0.0.0rc3devXX (the latest
as of October 15th, 2018) and Kolla 7.0.0.0rc3devXX
CentOS 7.5 with latest updates as of October 15, 2018.
Kernel: Linux 4.18.14-1.el7.elrepo.x86_64
Hypervisor: KVM
Storage: Blockbridge (unsupported, but functions the same as other
iSCSI based backends)
Networking: DVR with OpenVSwitch
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1798690
Title:
Live migrate of iscsi-backed VM loses internal network connectivity
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Note that this may be a Neutron issue, but since it is happening
during live migration, I wanted to point it out to the Nova group
first, and let them decide whether to include the Neutron group on
this ticket.
Also note that this may not be related to iSCSI at all - I just don't
have access to Ceph-backed VMs at the moment to test.
Live migration of a VM that uses an iSCSI-backed volume-based boot
disk (no other disks attached) will migrate correctly, including the
volume, and DVR router functionality with floating IPs, but internal
network connectivity won't work (pings between VMs on the same Neutron
network fail).
After live migrating the "bad" VM back to the original host, internal
networking works again!
NOTE - this seems to be only reproducible if you deploy the VMs, do
"not" ping between the VMs, migrate one of the VMs, and "then" ping
between the VMs. The ping fails in this case. In the case where
pings are performed "prior" to migration, the pings succeed!
So, it appears that something in Neutron isn't being migrated.
I had tested this configuration back in the Liberty days and ran into
the same issue, and thought it was possibly a bug that was fixed by
now, but it looks like the problem still exists.
Note that I'm still looking at logs to determine whether there is good
evidence for why/when this happens, but wanted to get a bug report
placed in case it was a known issue.
Steps to reproduce
==================
Deploy 2 VMs with an internal network, each with floating IPs, with
security groups that are not very restrictive (allow everything
including pings between VMs and the Internet).
In our case, the two VMs were deployed on separate physical hosts.
If VM #2 resides on physical host compute002 after deployment, live migrate this VM to physical host compute003 with:
openstack server migrate --live compute003 d3d45afb-e913-4cb7-89df-a1c1d51d6339
From VM #2, ping VM #1. There is no ping response.
If you perform all of the above, but ping between the VMs "prior" to
migration, pings work fine after migrations (hiding the issue).
Expected result
===============
Network should function correctly after a migration - pings should
work, for example, between VMs.
Actual result
=============
Testing with VM to VM pings: pings are lost and connectivity "never"
resumes. I deployed the 2 VMs, migrated one of them, and started a
ping from one VM to the other, waited 16+ minutes, and pings are still
failing.
Perform a live migrate of VM #2 back to the original host using:
openstack server migrate --live compute002 d3d45afb-e913-4cb7-89df-a1c1d51d6339
and pings start to work again.
Perform a live migrate of VM #2 to the same host as VM #1 and pings
between VMs "also" work!
Environment
===========
stable/rocky deployment with Kolla-Ansible 7.0.0.0rc3devXX (the latest
as of October 15th, 2018) and Kolla 7.0.0.0rc3devXX
CentOS 7.5 with latest updates as of October 15, 2018.
Kernel: Linux 4.18.14-1.el7.elrepo.x86_64
Hypervisor: KVM
Storage: Blockbridge (unsupported, but functions the same as other
iSCSI based backends)
Networking: DVR with OpenVSwitch
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1798690/+subscriptions
Follow ups