← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1498196] [NEW] Live migration's assigned ports conflicts

 

Public bug reported:

It looks like during live migration some generated port to use for live
migration didn't checked for being already used and/or didn't have a re-
get new port if the old one got occupied by someone else.

Here is an example of this behavior in nova-compute log files from the
source compute node:

2015-09-20T06:25:21.701157+00:00 info:  2015-09-20 06:25:21.700 17037 INFO nova.virt.libvirt.driver [-] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Instance spawned successfully.
2015-09-20T06:25:21.828941+00:00 info:  2015-09-20 06:25:21.828 17037 INFO nova.compute.manager [req-8fdf447a-48c4-4b41-8276-9459ae9e5a65 - - - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] VM 
Resumed (Lifecycle Event)
2015-09-20T06:25:37.349069+00:00 err:  2015-09-20 06:25:37.348 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Live Migration failure: internal error: early end of file from monitor: possible problem:
2015-09-20T06:25:37.116947Z qemu-system-x86_64: -incoming tcp:[::]:49152: Failed to bind socket: Address already in use
2015-09-20T06:25:37.354837+00:00 info:  2015-09-20 06:25:37.354 17037 INFO nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration running for 0 secs, memory 0% remaining; (bytes processed=0, remaining=0, total=0)
2015-09-20T06:25:37.856147+00:00 err:  2015-09-20 06:25:37.855 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration operation has aborted

Some env description:

root@node-169:~# nova-compute --version
2015.1.1

root@node-169:~# dpkg -l |grep 'nova-compute '|awk '{print $3}'
1:2015.1.1-1~u14.04+mos19662

Steps to reproduce:

Actually this happens during rally testing of pretty big env (~200 nodes) one per 200 iterations so chances for getting that on scale are pretty big. So it should be easily reproduced under following circumastances:
1. Very high rate of migrations.
2. A lot of running VMs/other services with large amount of used TCP ports.

Both of these statements will lead to the higher chances of getting
collision for qemu migration port allocation procedure.

** Affects: mos
     Importance: Undecided
         Status: New

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: scale

** Project changed: nova => mos

** Also affects: nova
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1498196

Title:
  Live migration's assigned ports conflicts

Status in Mirantis OpenStack:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  It looks like during live migration some generated port to use for
  live migration didn't checked for being already used and/or didn't
  have a re-get new port if the old one got occupied by someone else.

  Here is an example of this behavior in nova-compute log files from the
  source compute node:

  2015-09-20T06:25:21.701157+00:00 info:  2015-09-20 06:25:21.700 17037 INFO nova.virt.libvirt.driver [-] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Instance spawned successfully.
  2015-09-20T06:25:21.828941+00:00 info:  2015-09-20 06:25:21.828 17037 INFO nova.compute.manager [req-8fdf447a-48c4-4b41-8276-9459ae9e5a65 - - - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] VM 
  Resumed (Lifecycle Event)
  2015-09-20T06:25:37.349069+00:00 err:  2015-09-20 06:25:37.348 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Live Migration failure: internal error: early end of file from monitor: possible problem:
  2015-09-20T06:25:37.116947Z qemu-system-x86_64: -incoming tcp:[::]:49152: Failed to bind socket: Address already in use
  2015-09-20T06:25:37.354837+00:00 info:  2015-09-20 06:25:37.354 17037 INFO nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration running for 0 secs, memory 0% remaining; (bytes processed=0, remaining=0, total=0)
  2015-09-20T06:25:37.856147+00:00 err:  2015-09-20 06:25:37.855 17037 ERROR nova.virt.libvirt.driver [req-8150b87f-f87b-4bec-8bab-561dd37605d5 820904596e1d422e9460f472b7b9672f 04ce0fe8f21a4a6b8535c5cefd9f8594 - - -] [instance: 60493be7-4b00-4f4f-a785-9d4aa6e74f58] Migration operation has aborted

  Some env description:

  root@node-169:~# nova-compute --version
  2015.1.1

  root@node-169:~# dpkg -l |grep 'nova-compute '|awk '{print $3}'
  1:2015.1.1-1~u14.04+mos19662

  Steps to reproduce:

  Actually this happens during rally testing of pretty big env (~200 nodes) one per 200 iterations so chances for getting that on scale are pretty big. So it should be easily reproduced under following circumastances:
  1. Very high rate of migrations.
  2. A lot of running VMs/other services with large amount of used TCP ports.

  Both of these statements will lead to the higher chances of getting
  collision for qemu migration port allocation procedure.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mos/+bug/1498196/+subscriptions


Follow ups