← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1719011] Re: Neutron port can get stuck in BUILD state

 

Reviewed:  https://review.openstack.org/506770
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a789d23b0242ae3b4c13856e96a72a3acc5a3697
Submitter: Zuul
Branch:    master

commit a789d23b0242ae3b4c13856e96a72a3acc5a3697
Author: Brian Haley <bhaley@xxxxxxxxxx>
Date:   Fri Sep 22 16:23:11 2017 -0400

    Change OVS agent to update skipped port status to DOWN
    
    When the OVS agent skips processing a port because it was
    not found on the integration bridge, it doesn't send back
    any status to the server to notify it.  This can cause the
    port to get stuck in the BUILD state indefinitely, since
    that is the default state it gets before the server tells
    the agent to update it.
    
    The OVS agent will now notify the server that any skipped
    device should be considered DOWN if it did not exist.
    
    Change-Id: I15dc55951cdb75c6d87d7c645f8e2cbf82b2f3e4
    Closes-bug: #1719011


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1719011

Title:
  Neutron port can get stuck in BUILD state

Status in neutron:
  Fix Released

Bug description:
  Sometimes a neutron port can get stuck in BUILD state, most likely due
  to a timing issue.

  I have a test environment where I can run something like the following
  in a loop:

  while true; 
  do
  	for i in {1..6}
  	do
      		openstack server start test${i}
  	done

          sleep 10

  	for i in {1..6}
  	do
  	    	openstack server stop test${i}
  	done

  	for i in {1..6}
  	do
      		nova interface-list test${i} |grep BUILD && exit
  	done
  done

  This is assuming I have already created instances with names
  test1,test2 ... test6.

  `nova interface-list <instance name>` sometimes returns BUILD state
  even though the instance is SHUTOFF.

  I've double-checked the neutron port and it does show the port in
  BUILD state, so it's not the nova nw_cache causing this.

  
  Debugging further I saw this:

  Looking at instance 'test1', which is ID
  182954ba-d343-4ed5-94d6-fa4852e52af0, port 4e8b8aff-974f-44be-bfdb-
  191e2644e84c, I can see a notification from the neutron-server to nova
  when the vif is unplugged:

  2017-09-22 04:16:50.104 31706 INFO neutron.notifiers.nova [-] Nova
  event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
  44be-bfdb-191e2644e84c', u'name': u'network-vif-unplugged',
  u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}

  And in nova-api.log:

  2017-09-22 04:16:50.090 26493 INFO
  nova.api.openstack.compute.server_external_events [req-c9ddf3a0-6773
  -4d1e-8d80-d00470158c6d 4666a714be2140fb9327da6e03feeb6b
  1870339bb62d479fb53324280f46e54f - default default] Creating event
  network-vif-unplugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for
  instance 182954ba-d343-4ed5-94d6-fa4852e52af0

  But then a few seconds later I see another notification for the vif
  being plugged:

  2017-09-22 04:16:56.624 31710 INFO neutron.notifiers.nova [-] Nova
  event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
  44be-bfdb-191e2644e84c', u'name': u'network-vif-plugged',
  u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}

  And in nova-api.log:

  2017-09-22 04:16:56.610 26494 INFO
  nova.api.openstack.compute.server_external_events [req-cff9e976-522c-
  444e-a286-0023857f59a4 4666a714be2140fb9327da6e03feeb6b
  1870339bb62d479fb53324280f46e54f - default default] Creating event
  network-vif-plugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
  182954ba-d343-4ed5-94d6-fa4852e52af0

  
  Digging further into the openvswitch-agent.log I see it was sent a message to bring the port up:

  2017-09-22 04:16:54.158 71348 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
  5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
  44be-bfdb-191e2644e84c updated. Details: {u'profile': {},
  u'network_qos_policy_id': None, u'qos_policy_id': None,
  u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id':
  u'836d30a1-9dbf-48f2-9d2c-86bfb68bf73d', u'segmentation_id': 100,
  u'device_owner': u'compute:None', u'physical_network': None,
  u'mac_address': u'fa:16:3e:cd:9c:e1', u'device': u'4e8b8aff-974f-44be-
  bfdb-191e2644e84c', u'port_security_enabled': True, u'port_id': u
  '4e8b8aff-974f-44be-bfdb-191e2644e84c', u'fixed_ips': [{u'subnet_id':
  u'35c9353e-ffb6-4aa4-94ee-4c22d142d7f5', u'ip_address':
  u'172.24.4.229'}], u'network_type': u'vxlan'}

  Since the port isn't there (libvirt has removed it), it just skips
  processing it:

  2017-09-22 04:16:58.323 71348 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
  5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
  44be-bfdb-191e2644e84c was not found on the integration bridge and
  will therefore not be processed

  The problem is that seems to cause the port to change to BUILD status,
  which is the default state on the server side before sending the
  message.  Since the agent didn't respond it stays in that state
  indefinitely.

  I think when the OVS agent skips processing a port, it should notify
  the server that it should be updated to the DOWN state, since it is
  not currently available.  I have a patch I've been testing that I'll
  send out for comments, as perhaps there is a better option.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1719011/+subscriptions


References