← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1892870] Re: concurrent interface detach/attach may accidently unbind port

 

Reviewed:  https://review.opendev.org/747957
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=39831c55999abe97e3bd26a1c0db2e4ceb467041
Submitter: Zuul
Branch:    master

commit 39831c55999abe97e3bd26a1c0db2e4ceb467041
Author: Alexandre Arents <alexandre.arents@xxxxxxxxxxxx>
Date:   Mon Aug 24 14:05:27 2020 +0000

    Add a lock to prevent race during detach/attach of interface
    
    Only allow one detach/attach at a time with the same pattern instance-port_id
    in order to avoid race condition when multiple detach/attach are run
    concurrently.
    
    When multiple detach run concurrently on a specific instance-port_id,
    manager consider many of them as valid because info_cache still contains
    the port and info_cache is refreshed only once the first request complete.
    
    So during this gap of time, while the first request accomplishes the task,
    all subsequent requests are destined to fail and log a warning [1] in
    different location of code, depending on the outcome of the first request.
    The Issue is that all those caught requests finally run a
    deallocate_port_for_instance which will unbind the port.
    
    This may cause a race condition, because a successful attach can pass between
    those unbind, and be silently unbound, resulting in an infrastructure/DB
    inconsistency.
    
    [1] 'Detaching interface %(mac)s failed because the device is no longer found
         on the guest.'
    
    Closes-Bug: #1892870
    Change-Id: Iea5969d0bd16dc9a6f1ba950224b0115e466ce66


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1892870

Title:
  concurrent  interface detach/attach may accidently unbind port

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  
  Description
  ===========

  When we run multiple concurrent detach interface on the same instance uuid/port-id,
  a lot of those requests are accepted (HTTP 202) an processed because info_cache is updated
  only when first request finish, so all request after the first one will fail BUT will send
  an unbind that may unbind a port that have been just legitly attached to another instance.

  Ex of use case: a kube operator that move all interface from one
  instance to another(rollout).

  Steps to reproduce
  ==================

  On devstack master:

  INST=$(openstack server create  --flavor m1.small --image cirros-0.4.0-x86_64-disk   --nic net-id=private inst1 | awk '/ id/{print $4}')
  HOST_IP=$(hostname -I | awk '{print $1}')
  PORT=$(openstack port create --network private myport | awk '/ id/{print $4}')
  openstack server add port $INST $PORT
  TOKEN=$(openstack token issue|awk '/ id/{print $4}')

  # create detach script with this content:
  echo 'curl -g -i -X DELETE http://${HOST_IP}/compute/v2.1/servers/${INST}/os-interface/${PORT} -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: ${TOKEN}" -H "X-OpenStack-Nova-API-Version: 2.1"' > detach

  # open in another terminal neutron logs filtering on PUT:
  sudo journalctl -u devstack@q-svc.service  -f | grep -E PUT

  
  # Now send plenty of detach:
  for i in $(seq 1 20) ; do . detach ; sleep 0.2 ; done
  HTTP/1.1 202 Accepted
  ...
  HTTP/1.1 202 Accepted
  ...

  
  In neutron logs you will see plenty of unbinding request:
   sudo journalctl -u devstack@q-svc.service  -f | grep -E PUT
  Aug 25 12:15:01 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-9b454f98-a12c-49fb-8c39-e70094912029 req-914eaa3d-c139-416d-85b2-6d20e3e3c496 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1098 time: 1.1390624

                    ^^^ After the first PUT above, port is free to be
  attach.. if the app attach here the port risk to be accidently
  unbinded by other unbind PUT below:

  Aug 25 12:15:01 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-a0f2c9ab-419c-46e4-9488-684c7e173241 req-b4d96937-27ee-4c3f-b114-f02e86979605 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1098 time: 1.2870913
  Aug 25 12:15:02 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-91a3b2e6-bf4f-47b5-b05b-707ac48cb1e4 req-2a4c305e-ece6-4c4a-a7af-4c3371e12ea7 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1098 time: 1.7216117
  Aug 25 12:15:02 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-e298313f-6b30-4815-ae52-4716e0df3e0f req-aed7f7f4-9dca-41a7-84b5-1e59ef6c50e9 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1098 time: 1.4968698
  Aug 25 12:15:03 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-a7bc3121-13b4-484c-9f2f-c9b3fd4b225b req-424edb50-c998-4fa3-b26f-8d0d01ea474e service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 2.5676210
  Aug 25 12:15:04 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-62ed888a-6f35-425e-a872-e4a5758b8a92 req-7ce9ce3e-503d-4845-8663-c1b6d02b89ca service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 3.3949170
  Aug 25 12:15:05 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-4a10d91b-7431-4255-b27d-a72343d2482e req-5614949b-b1e7-4862-ad95-f06c4ba7936e service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 2.9760244
  Aug 25 12:15:05 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-682d7521-af6b-4c9a-b90e-9a8ea6ed45bc req-76e2d9ec-f8a8-4882-b4d6-8ea3f1c7f818 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 4.5097113
  Aug 25 12:15:06 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-ca379b27-7754-4125-8318-4e2ff7e97227 req-0e9ba555-6572-4388-a081-3b69cc92c2ec service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 5.1839015
  Aug 25 12:15:06 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-f6881052-baf4-4fb7-9767-739563331058 req-7a3194ff-0fad-4917-8d34-851a850cbae2 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 4.4715307
  Aug 25 12:15:07 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-1e7106c5-6dfa-4d36-a889-35198479add8 req-6c513399-c0f5-4d0b-9315-e2e30f1e6370 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 5.4051881
  Aug 25 12:15:10 alex-devstack neutron-server[1051]: INFO neutron.wsgi [req-909107a5-f03a-4ac4-8312-dcef83210908 req-38a6ef2b-b9f1-44cf-a823-ca67159115a4 service neutron] 54.39.183.136 "PUT /v2.0/ports/0466658a-02ef-46da-a4ac-9a335c6d3352 HTTP/1.1" status: 200  len: 1099 time: 8.5914369

  if an attach pass, and is accidently unbind as previously explain,
  apps can detect port is free and try to reattach an interface that is already
  attached in libvirt and you GOT HTTP 500 with those error message on compute node:
  oslo_db.exception.DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'fa:16:3e:d5:26:e1/0466658a-02ef-46da-a4ac-9a335c6d3352-0' for key 'uniq_virtual_interfaces0address
  0deleted'").

  
  Expected result
  ===============
  Only one unbind should be sent.

  
  Actual result
  =============
  Multiple unbind are sent and one of them can break a port just attached

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1892870/+subscriptions


References