yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67883
[Bug 1719011] [NEW] Neutron port can get stuck in BUILD state
Public bug reported:
Sometimes a neutron port can get stuck in BUILD state, most likely due
to a timing issue.
I have a test environment where I can run something like the following
in a loop:
while true;
do
for i in {1..6}
do
openstack server start test${i}
done
sleep 10
for i in {1..6}
do
openstack server stop test${i}
done
for i in {1..6}
do
nova interface-list test${i} |grep BUILD && exit
done
done
This is assuming I have already created instances with names test1,test2
... test6.
`nova interface-list <instance name>` sometimes returns BUILD state even
though the instance is SHUTOFF.
I've double-checked the neutron port and it does show the port in BUILD
state, so it's not the nova nw_cache causing this.
Debugging further I saw this:
Looking at instance 'test1', which is ID
182954ba-d343-4ed5-94d6-fa4852e52af0, port 4e8b8aff-974f-44be-bfdb-
191e2644e84c, I can see a notification from the neutron-server to nova
when the vif is unplugged:
2017-09-22 04:16:50.104 31706 INFO neutron.notifiers.nova [-] Nova event
response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-44be-bfdb-
191e2644e84c', u'name': u'network-vif-unplugged', u'server_uuid':
u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}
And in nova-api.log:
2017-09-22 04:16:50.090 26493 INFO
nova.api.openstack.compute.server_external_events [req-c9ddf3a0-6773
-4d1e-8d80-d00470158c6d 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-unplugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
182954ba-d343-4ed5-94d6-fa4852e52af0
But then a few seconds later I see another notification for the vif
being plugged:
2017-09-22 04:16:56.624 31710 INFO neutron.notifiers.nova [-] Nova event
response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-44be-bfdb-
191e2644e84c', u'name': u'network-vif-plugged', u'server_uuid':
u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}
And in nova-api.log:
2017-09-22 04:16:56.610 26494 INFO
nova.api.openstack.compute.server_external_events [req-cff9e976-522c-
444e-a286-0023857f59a4 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-plugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
182954ba-d343-4ed5-94d6-fa4852e52af0
Digging further into the openvswitch-agent.log I see it was sent a message to bring the port up:
2017-09-22 04:16:54.158 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-44be-
bfdb-191e2644e84c updated. Details: {u'profile': {},
u'network_qos_policy_id': None, u'qos_policy_id': None,
u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id':
u'836d30a1-9dbf-48f2-9d2c-86bfb68bf73d', u'segmentation_id': 100,
u'device_owner': u'compute:None', u'physical_network': None,
u'mac_address': u'fa:16:3e:cd:9c:e1', u'device': u'4e8b8aff-974f-44be-
bfdb-191e2644e84c', u'port_security_enabled': True, u'port_id': u
'4e8b8aff-974f-44be-bfdb-191e2644e84c', u'fixed_ips': [{u'subnet_id': u
'35c9353e-ffb6-4aa4-94ee-4c22d142d7f5', u'ip_address':
u'172.24.4.229'}], u'network_type': u'vxlan'}
Since the port isn't there (libvirt has removed it), it just skips
processing it:
2017-09-22 04:16:58.323 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-44be-
bfdb-191e2644e84c was not found on the integration bridge and will
therefore not be processed
The problem is that seems to cause the port to change to BUILD status,
which is the default state on the server side before sending the
message. Since the agent didn't respond it stays in that state
indefinitely.
I think when the OVS agent skips processing a port, it should notify the
server that it should be updated to the DOWN state, since it is not
currently available. I have a patch I've been testing that I'll send
out for comments, as perhaps there is a better option.
** Affects: neutron
Importance: Medium
Assignee: Brian Haley (brian-haley)
Status: In Progress
** Tags: ovs
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1719011
Title:
Neutron port can get stuck in BUILD state
Status in neutron:
In Progress
Bug description:
Sometimes a neutron port can get stuck in BUILD state, most likely due
to a timing issue.
I have a test environment where I can run something like the following
in a loop:
while true;
do
for i in {1..6}
do
openstack server start test${i}
done
sleep 10
for i in {1..6}
do
openstack server stop test${i}
done
for i in {1..6}
do
nova interface-list test${i} |grep BUILD && exit
done
done
This is assuming I have already created instances with names
test1,test2 ... test6.
`nova interface-list <instance name>` sometimes returns BUILD state
even though the instance is SHUTOFF.
I've double-checked the neutron port and it does show the port in
BUILD state, so it's not the nova nw_cache causing this.
Debugging further I saw this:
Looking at instance 'test1', which is ID
182954ba-d343-4ed5-94d6-fa4852e52af0, port 4e8b8aff-974f-44be-bfdb-
191e2644e84c, I can see a notification from the neutron-server to nova
when the vif is unplugged:
2017-09-22 04:16:50.104 31706 INFO neutron.notifiers.nova [-] Nova
event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
44be-bfdb-191e2644e84c', u'name': u'network-vif-unplugged',
u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}
And in nova-api.log:
2017-09-22 04:16:50.090 26493 INFO
nova.api.openstack.compute.server_external_events [req-c9ddf3a0-6773
-4d1e-8d80-d00470158c6d 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-unplugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for
instance 182954ba-d343-4ed5-94d6-fa4852e52af0
But then a few seconds later I see another notification for the vif
being plugged:
2017-09-22 04:16:56.624 31710 INFO neutron.notifiers.nova [-] Nova
event response: {u'status': u'completed', u'tag': u'4e8b8aff-974f-
44be-bfdb-191e2644e84c', u'name': u'network-vif-plugged',
u'server_uuid': u'182954ba-d343-4ed5-94d6-fa4852e52af0', u'code': 200}
And in nova-api.log:
2017-09-22 04:16:56.610 26494 INFO
nova.api.openstack.compute.server_external_events [req-cff9e976-522c-
444e-a286-0023857f59a4 4666a714be2140fb9327da6e03feeb6b
1870339bb62d479fb53324280f46e54f - default default] Creating event
network-vif-plugged:4e8b8aff-974f-44be-bfdb-191e2644e84c for instance
182954ba-d343-4ed5-94d6-fa4852e52af0
Digging further into the openvswitch-agent.log I see it was sent a message to bring the port up:
2017-09-22 04:16:54.158 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
44be-bfdb-191e2644e84c updated. Details: {u'profile': {},
u'network_qos_policy_id': None, u'qos_policy_id': None,
u'allowed_address_pairs': [], u'admin_state_up': True, u'network_id':
u'836d30a1-9dbf-48f2-9d2c-86bfb68bf73d', u'segmentation_id': 100,
u'device_owner': u'compute:None', u'physical_network': None,
u'mac_address': u'fa:16:3e:cd:9c:e1', u'device': u'4e8b8aff-974f-44be-
bfdb-191e2644e84c', u'port_security_enabled': True, u'port_id': u
'4e8b8aff-974f-44be-bfdb-191e2644e84c', u'fixed_ips': [{u'subnet_id':
u'35c9353e-ffb6-4aa4-94ee-4c22d142d7f5', u'ip_address':
u'172.24.4.229'}], u'network_type': u'vxlan'}
Since the port isn't there (libvirt has removed it), it just skips
processing it:
2017-09-22 04:16:58.323 71348 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
5b698e60-25a9-4c4a-ad93-9398402b1731 - - - - -] Port 4e8b8aff-974f-
44be-bfdb-191e2644e84c was not found on the integration bridge and
will therefore not be processed
The problem is that seems to cause the port to change to BUILD status,
which is the default state on the server side before sending the
message. Since the agent didn't respond it stays in that state
indefinitely.
I think when the OVS agent skips processing a port, it should notify
the server that it should be updated to the DOWN state, since it is
not currently available. I have a patch I've been testing that I'll
send out for comments, as perhaps there is a better option.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1719011/+subscriptions
Follow ups