← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1719011] [NEW] Neutron port can get stuck in BUILD state

 

Public bug reported:

Sometimes a neutron port can get stuck in BUILD state, most likely due
to a timing issue.

I have a test environment where I can run something like the following
in a loop:

while true; 
do
	for i in {1..6}
	do
    		openstack server start test${i}
	done

        sleep 10

	for i in {1..6}
	do
	    	openstack server stop test${i}
	done

	for i in {1..6}
	do
    		nova interface-list test${i} |grep BUILD && exit
	done
done

This is assuming I have already created instances with names test1,test2
... test6.

`nova interface-list <instance name>` sometimes returns BUILD state even
though the instance is SHUTOFF.

I've double-checked the neutron port and it does show the port in BUILD
state, so it's not the nova nw_cache causing this.


Debugging further I saw this:

Looking at instance 'test1', which is ID
182954ba-d343-4ed5-94d6-fa4852e52af0, port 4e8b8aff-974f-44be-bfdb-
191e2644e84c, I can see a notification from the neutron-server to nova
when the vif is unplugged:

2017-09-22 04:16:50.104 31706 INFO neutron.notifiers.nova [-] Nova event
response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-44be-bfdb-
191e2644e84c', u'name': u'network-vif-unplugged', u'server_uuid':
u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}

And in nova-api.log:

2017-09-22 04:16:50.090 26493 INFO
nova.api.openstack.compute.server_external_events [req-c9ddf3a0-6773
-4d1e-8d80-d00470158c6d 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-unplugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
182954ba-d343-4ed5-94d6-fa4852e52af0

But then a few seconds later I see another notification for the vif
being plugged:

2017-09-22 04:16:56.624 31710 INFO neutron.notifiers.nova [-] Nova event
response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-44be-bfdb-
191e2644e84c', u'name': u'network-vif-plugged', u'server_uuid':
u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}

And in nova-api.log:

2017-09-22 04:16:56.610 26494 INFO
nova.api.openstack.compute.server_external_events [req-cff9e976-522c-
444e-a286-0023857f59a4 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-plugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
182954ba-d343-4ed5-94d6-fa4852e52af0


Digging further into the openvswitch-agent.log I see it was sent a message to bring the port up:

2017-09-22 04:16:54.158 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-44be-
bfdb-191e2644e84c updated. Details: {u'profile': {},
u'network_qos_policy_id': None, u'qos_policy_id': None,
u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id':
u'836d30a1-9dbf-48f2-9d2c-86bfb68bf73d', u'segmentation_id': 100,
u'device_owner': u'compute:None', u'physical_network': None,
u'mac_address': u'fa:16:3e:cd:9c:e1', u'device': u'4e8b8aff-974f-44be-
bfdb-191e2644e84c', u'port_security_enabled': True, u'port_id': u
'4e8b8aff-974f-44be-bfdb-191e2644e84c', u'fixed_ips': [{u'subnet_id': u
'35c9353e-ffb6-4aa4-94ee-4c22d142d7f5', u'ip_address':
u'172.24.4.229'}], u'network_type': u'vxlan'}

Since the port isn't there (libvirt has removed it), it just skips
processing it:

2017-09-22 04:16:58.323 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-44be-
bfdb-191e2644e84c was not found on the integration bridge and will
therefore not be processed

The problem is that seems to cause the port to change to BUILD status,
which is the default state on the server side before sending the
message.  Since the agent didn't respond it stays in that state
indefinitely.

I think when the OVS agent skips processing a port, it should notify the
server that it should be updated to the DOWN state, since it is not
currently available.  I have a patch I've been testing that I'll send
out for comments, as perhaps there is a better option.

** Affects: neutron
     Importance: Medium
     Assignee: Brian Haley (brian-haley)
         Status: In Progress


** Tags: ovs

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1719011

Title:
  Neutron port can get stuck in BUILD state

Status in neutron:
  In Progress

Bug description:
  Sometimes a neutron port can get stuck in BUILD state, most likely due
  to a timing issue.

  I have a test environment where I can run something like the following
  in a loop:

  while true; 
  do
  	for i in {1..6}
  	do
      		openstack server start test${i}
  	done

          sleep 10

  	for i in {1..6}
  	do
  	    	openstack server stop test${i}
  	done

  	for i in {1..6}
  	do
      		nova interface-list test${i} |grep BUILD && exit
  	done
  done

  This is assuming I have already created instances with names
  test1,test2 ... test6.

  `nova interface-list <instance name>` sometimes returns BUILD state
  even though the instance is SHUTOFF.

  I've double-checked the neutron port and it does show the port in
  BUILD state, so it's not the nova nw_cache causing this.

  
  Debugging further I saw this:

  Looking at instance 'test1', which is ID
  182954ba-d343-4ed5-94d6-fa4852e52af0, port 4e8b8aff-974f-44be-bfdb-
  191e2644e84c, I can see a notification from the neutron-server to nova
  when the vif is unplugged:

  2017-09-22 04:16:50.104 31706 INFO neutron.notifiers.nova [-] Nova
  event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
  44be-bfdb-191e2644e84c', u'name': u'network-vif-unplugged',
  u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}

  And in nova-api.log:

  2017-09-22 04:16:50.090 26493 INFO
  nova.api.openstack.compute.server_external_events [req-c9ddf3a0-6773
  -4d1e-8d80-d00470158c6d 4666a714be2140fb9327da6e03feeb6b
  1870339bb62d479fb53324280f46e54f - default default] Creating event
  network-vif-unplugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for
  instance 182954ba-d343-4ed5-94d6-fa4852e52af0

  But then a few seconds later I see another notification for the vif
  being plugged:

  2017-09-22 04:16:56.624 31710 INFO neutron.notifiers.nova [-] Nova
  event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
  44be-bfdb-191e2644e84c', u'name': u'network-vif-plugged',
  u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}

  And in nova-api.log:

  2017-09-22 04:16:56.610 26494 INFO
  nova.api.openstack.compute.server_external_events [req-cff9e976-522c-
  444e-a286-0023857f59a4 4666a714be2140fb9327da6e03feeb6b
  1870339bb62d479fb53324280f46e54f - default default] Creating event
  network-vif-plugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
  182954ba-d343-4ed5-94d6-fa4852e52af0

  
  Digging further into the openvswitch-agent.log I see it was sent a message to bring the port up:

  2017-09-22 04:16:54.158 71348 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
  5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
  44be-bfdb-191e2644e84c updated. Details: {u'profile': {},
  u'network_qos_policy_id': None, u'qos_policy_id': None,
  u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id':
  u'836d30a1-9dbf-48f2-9d2c-86bfb68bf73d', u'segmentation_id': 100,
  u'device_owner': u'compute:None', u'physical_network': None,
  u'mac_address': u'fa:16:3e:cd:9c:e1', u'device': u'4e8b8aff-974f-44be-
  bfdb-191e2644e84c', u'port_security_enabled': True, u'port_id': u
  '4e8b8aff-974f-44be-bfdb-191e2644e84c', u'fixed_ips': [{u'subnet_id':
  u'35c9353e-ffb6-4aa4-94ee-4c22d142d7f5', u'ip_address':
  u'172.24.4.229'}], u'network_type': u'vxlan'}

  Since the port isn't there (libvirt has removed it), it just skips
  processing it:

  2017-09-22 04:16:58.323 71348 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
  5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
  44be-bfdb-191e2644e84c was not found on the integration bridge and
  will therefore not be processed

  The problem is that seems to cause the port to change to BUILD status,
  which is the default state on the server side before sending the
  message.  Since the agent didn't respond it stays in that state
  indefinitely.

  I think when the OVS agent skips processing a port, it should notify
  the server that it should be updated to the DOWN state, since it is
  not currently available.  I have a patch I've been testing that I'll
  send out for comments, as perhaps there is a better option.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1719011/+subscriptions


Follow ups