← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1849802] Re: Live-migration fails without a logged root cause

 

I have used my lab to understand original report and what is actually
going on. Shortly speaking, Nova is unable to live migrate an instance
because Neutron fails to create port binding on destination host (it
already exists) when Nova Conductor tries to create port binding on
destination host.

There could be multiple causes of such behavior and this bug is not
about solving a root cause. Instead, the problem is that it is hard to
isolate the issue using Neutron Server logs.

When debug is enabled Neutron generates logs [1] for such kinds of
requests. It looks like for some reason the following exception raised
by Neutron is not logged properly on Neutron side: we can see it in Nova
logs, but not in Neutron Server logs.
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L2463-L2466

As a result, it looks like this problem should be solved by improving
logging for Neutron extensions in general or this particular function.

With that being said, I am changing affected product to Neutron and
unassigning this bug.

[1]
  2021-10-18 14:15:58.081 18 DEBUG neutron.api.v2.base [req-c237e714-0085-47c6-abab-e7e15cec7ea1 d79d0393fbb04564bc8fd3c62d290087 b8715d57125f4787a6701319d38f61e3 - default default] Request body: {'binding': {'host': 'compute-0.redhat.local', 'vnic_type': 'normal', 'profile': {}}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:719
  2021-10-18 14:15:58.081 18 DEBUG neutron.api.v2.base [req-c237e714-0085-47c6-abab-e7e15cec7ea1 d79d0393fbb04564bc8fd3c62d290087 b8715d57125f4787a6701319d38f61e3 - default default] Unknown quota resources ['binding']. _create /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:490
  2021-10-18 14:15:58.135 18 INFO neutron.api.v2.resource [req-c237e714-0085-47c6-abab-e7e15cec7ea1 d79d0393fbb04564bc8fd3c62d290087 b8715d57125f4787a6701319d38f61e3 - default default] create failed (client error): There was a conflict when trying to complete your request.
  2021-10-18 14:15:58.137 18 INFO neutron.wsgi [req-c237e714-0085-47c6-abab-e7e15cec7ea1 d79d0393fbb04564bc8fd3c62d290087 b8715d57125f4787a6701319d38f61e3 - default default] 172.17.1.17 "POST /v2.0/ports/7542a977-0586-423a-ae35-86e3ff791060/bindings HTTP/1.1" status: 409  len: 364 time: 0.3800023
  2021-10-18 14:15:58.316 21 DEBUG neutron_lib.callbacks.manager [req-334ebddd-4e81-4c12-829c-64f3b0a278ff - - - - -] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8793714910767', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-16758855'] for agent, after_update _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193

** Changed in: nova
     Assignee: Alexey Stupnikov (astupnikov) => (unassigned)

** Project changed: nova => neutron

** Changed in: neutron
       Status: Triaged => New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1849802

Title:
  Live-migration fails without a logged root cause

Status in neutron:
  New

Bug description:
  Operating system distribution and version: Ubuntu 18.04

  Neutron package version: 2:14.0.2-0ubuntu1~cloud0
  Nova package version: 2:19.0.1-0ubuntu2.1~cloud0

  Cloud was deploying (using Juju charms) as rocky, then upgraded to
  stein.

  There are a number of instances that I need to migrate from one
  compute node to another, but:

  $ openstack server migrate --block-migration 8703d9db-81b0-4e86-a2ef-c4ba5250556c --live shinx --disk-overcommit
  Migration pre-check error: Binding failed for port 5a3c5d23-8727-47d2-af72-a53b495358d2, please check neutron logs for more information. (HTTP 400) (Request-ID: req-7c41ae70-6f5b-48a8-9d09-add2bbbe2b7e)
  $ 

  However, even with debug logging enabled, all that shows up in the
  neutron-api logs is:

  2019-10-25 09:34:12.147 1569534 INFO neutron.wsgi [req-
  ac358ed5-cfec-4618-b765-f2defd5a3aac 92e98c5c687a46d29ec28aca3025f3da
  7555fff7e7eb4a0eb28149905b266a2b - 207337407e3647798c0f68a0a28a0f8b
  207337407e3647798c0f68a0a28a0f8b] 10.x.y.z,127.0.0.1 "POST
  /v2.0/ports/5a3c5d23-8727-47d2-af72-a53b495358d2/bindings HTTP/1.1"
  status: 409  len: 371 time: 0.1632745

  Which suggests that for some reason, the API call to retrieve port
  bindings is failing, but there's no further information in the logs
  for me to debug exactly why.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1849802/+subscriptions



References