yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #68869
[Bug 1719011] Re: Neutron port can get stuck in BUILD state
Reviewed: https://review.openstack.org/506770
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a789d23b0242ae3b4c13856e96a72a3acc5a3697
Submitter: Zuul
Branch: master
commit a789d23b0242ae3b4c13856e96a72a3acc5a3697
Author: Brian Haley <bhaley@xxxxxxxxxx>
Date: Fri Sep 22 16:23:11 2017 -0400
Change OVS agent to update skipped port status to DOWN
When the OVS agent skips processing a port because it was
not found on the integration bridge, it doesn't send back
any status to the server to notify it. This can cause the
port to get stuck in the BUILD state indefinitely, since
that is the default state it gets before the server tells
the agent to update it.
The OVS agent will now notify the server that any skipped
device should be considered DOWN if it did not exist.
Change-Id: I15dc55951cdb75c6d87d7c645f8e2cbf82b2f3e4
Closes-bug: #1719011
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1719011
Title:
Neutron port can get stuck in BUILD state
Status in neutron:
Fix Released
Bug description:
Sometimes a neutron port can get stuck in BUILD state, most likely due
to a timing issue.
I have a test environment where I can run something like the following
in a loop:
while true;
do
for i in {1..6}
do
openstack server start test${i}
done
sleep 10
for i in {1..6}
do
openstack server stop test${i}
done
for i in {1..6}
do
nova interface-list test${i} |grep BUILD && exit
done
done
This is assuming I have already created instances with names
test1,test2 ... test6.
`nova interface-list <instance name>` sometimes returns BUILD state
even though the instance is SHUTOFF.
I've double-checked the neutron port and it does show the port in
BUILD state, so it's not the nova nw_cache causing this.
Debugging further I saw this:
Looking at instance 'test1', which is ID
182954ba-d343-4ed5-94d6-fa4852e52af0, port 4e8b8aff-974f-44be-bfdb-
191e2644e84c, I can see a notification from the neutron-server to nova
when the vif is unplugged:
2017-09-22 04:16:50.104 31706 INFO neutron.notifiers.nova [-] Nova
event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
44be-bfdb-191e2644e84c', u'name': u'network-vif-unplugged',
u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}
And in nova-api.log:
2017-09-22 04:16:50.090 26493 INFO
nova.api.openstack.compute.server_external_events [req-c9ddf3a0-6773
-4d1e-8d80-d00470158c6d 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-unplugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for
instance 182954ba-d343-4ed5-94d6-fa4852e52af0
But then a few seconds later I see another notification for the vif
being plugged:
2017-09-22 04:16:56.624 31710 INFO neutron.notifiers.nova [-] Nova
event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
44be-bfdb-191e2644e84c', u'name': u'network-vif-plugged',
u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}
And in nova-api.log:
2017-09-22 04:16:56.610 26494 INFO
nova.api.openstack.compute.server_external_events [req-cff9e976-522c-
444e-a286-0023857f59a4 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-plugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
182954ba-d343-4ed5-94d6-fa4852e52af0
Digging further into the openvswitch-agent.log I see it was sent a message to bring the port up:
2017-09-22 04:16:54.158 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
44be-bfdb-191e2644e84c updated. Details: {u'profile': {},
u'network_qos_policy_id': None, u'qos_policy_id': None,
u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id':
u'836d30a1-9dbf-48f2-9d2c-86bfb68bf73d', u'segmentation_id': 100,
u'device_owner': u'compute:None', u'physical_network': None,
u'mac_address': u'fa:16:3e:cd:9c:e1', u'device': u'4e8b8aff-974f-44be-
bfdb-191e2644e84c', u'port_security_enabled': True, u'port_id': u
'4e8b8aff-974f-44be-bfdb-191e2644e84c', u'fixed_ips': [{u'subnet_id':
u'35c9353e-ffb6-4aa4-94ee-4c22d142d7f5', u'ip_address':
u'172.24.4.229'}], u'network_type': u'vxlan'}
Since the port isn't there (libvirt has removed it), it just skips
processing it:
2017-09-22 04:16:58.323 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
44be-bfdb-191e2644e84c was not found on the integration bridge and
will therefore not be processed
The problem is that seems to cause the port to change to BUILD status,
which is the default state on the server side before sending the
message. Since the agent didn't respond it stays in that state
indefinitely.
I think when the OVS agent skips processing a port, it should notify
the server that it should be updated to the DOWN state, since it is
not currently available. I have a patch I've been testing that I'll
send out for comments, as perhaps there is a better option.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1719011/+subscriptions
References