yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #25021
[Bug 1398446] [NEW] Novacompute failed to detel VM port with DVR
Public bug reported:
This defect is hard to reproduce, only happens when I have more than 3
compute node with DVR enabled.
With the following script, run several times, I can see one VM in ERROR
state.
neutron net-create demo-net
netdemoid=$(neutron net-list | awk '{if($4=="'demo-net'"){print $2;}}')
neutron subnet-create demo-net 10.100.100.0/24 --name demo-subnet
subnetdemoid=$(neutron subnet-list | awk '{if($4=="'demo-subnet'"){print $2;}}')
neutron router-create demo-router
routerdemoid=$(neutron router-list | awk '{if($4=="'demo-router'"){print $2;}}')
exnetid=$(neutron net-list | awk '{if($4=="'ext-net'"){print $2;}}')
for i in `seq 1 10`; do
#boot vm, and create floating ip
nova boot --image cirros --flavor m1.tiny --nic net-id=$netdemoid cirrosdemo${i}
cirrosdemoid[i]=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $2;}}')
output=$(neutron floatingip-create $exnetid)
echo $output
floatipid[i]=$(echo "$output" | awk '{if($2=="id"){print $4;}}')
floatip[i]=$(echo "$output" | awk '{if($2=="floating_ip_address"){print $4;}}')a
done
# Setup router
neutron router-gateway-set $routerdemoid $exnetid
neutron router-interface-add demo-router $subnetdemoid
#wait for VM to be running
sleep 30
for i in `seq 1 10`; do
cirrosfix=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $12;}}')
cirrosfixip=${cirrosfix#*=}
output=$(neutron port-list | grep ${cirrosfixip})
echo $output
portid=$(echo "$output" | awk '{print $2;}')
neutron floatingip-associate --fixed-ip-address $cirrosfixip ${floatipid[i]} $portid
neutron floatingip-delete ${floatipid[i]}
nova delete ${cirrosdemoid[i]}
done
neutron router-interface-delete demo-router $subnetdemoid
neutron router-gateway-clear demo-router $netdemoid
neutron router-delete demo-router
neutron subnet-delete $subnetdemoid
neutron net-delete $netdemoid
Looking at log file:
2014-11-20 17:25:56.258 31042 DEBUG neutron.openstack.common.lockutils [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] Got semaphore "db-access" lock /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/openstack/common/lockutils.py:168
2014-11-20 17:25:56.424 31042 ERROR neutron.api.v2.resource [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] delete failed
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Traceback (most recent call last):
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource result = method(request=request, **args)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/base.py", line 476, in delete
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1036, in delete_port
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource router_info = l3plugin.dvr_deletens_if_no_vm(context, id)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/l3_dvrscheduler_db.py", line 195, in dvr_deletens_if_no_vm
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource port_host)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Agent.host == host).one()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2369, in one
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource ret = list(self)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2411, in _iter_
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.session._autoflush()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1198, in _autoflush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.flush()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self._flush(objects)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource transaction.rollback(_capture_exception=True)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in _exit_
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource compat.reraise(exc_type, exc_value, exc_tb)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource flush_context.execute()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource rec.execute(self)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource uow
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 60, in save_obj
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource mapper, table, update)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 536, in _emit_update_statements
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource (table.description, len(update), rows))
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource StaleDataError: UPDATE statement on table 'ml2_dvr_port_bindings' expected to update 1 row(s); 0 were matched.
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource
2014-11-20 17:25:56.427 31042 INFO neutron.wsgi [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] 10.23.32.135 - - [20/Nov/2014 17:25:56] "DELETE /v2.0/ports/83f2d987-0ab7-4479-8c97-7ab7b758a2dc.json HTTP/1.1" 500 296 0.235463
The problem could be some race condition. The code l3_devscheduler_db.py
>From line 158, got_dvr_routers_by_portid to line 193 get_agent_by_type_and_host, there are several calls.
If the port was there when code on 158. Another thread comes in the middle, delete the port. Line 193 will fail because the entry was not there anymore.
Both "neutron floatingip-delete" and "nova delete" can delete the port.
I think if we move l3_devscheduler_db.py, line 193:
agent = self._get_agent_by_type_and_host(context,
q_const.AGENT_TYPE_L3,
port_host)
to line 164
It should resolve the problem.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1398446
Title:
Novacompute failed to detel VM port with DVR
Status in OpenStack Neutron (virtual network service):
New
Bug description:
This defect is hard to reproduce, only happens when I have more than 3
compute node with DVR enabled.
With the following script, run several times, I can see one VM in
ERROR state.
neutron net-create demo-net
netdemoid=$(neutron net-list | awk '{if($4=="'demo-net'"){print $2;}}')
neutron subnet-create demo-net 10.100.100.0/24 --name demo-subnet
subnetdemoid=$(neutron subnet-list | awk '{if($4=="'demo-subnet'"){print $2;}}')
neutron router-create demo-router
routerdemoid=$(neutron router-list | awk '{if($4=="'demo-router'"){print $2;}}')
exnetid=$(neutron net-list | awk '{if($4=="'ext-net'"){print $2;}}')
for i in `seq 1 10`; do
#boot vm, and create floating ip
nova boot --image cirros --flavor m1.tiny --nic net-id=$netdemoid cirrosdemo${i}
cirrosdemoid[i]=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $2;}}')
output=$(neutron floatingip-create $exnetid)
echo $output
floatipid[i]=$(echo "$output" | awk '{if($2=="id"){print $4;}}')
floatip[i]=$(echo "$output" | awk '{if($2=="floating_ip_address"){print $4;}}')a
done
# Setup router
neutron router-gateway-set $routerdemoid $exnetid
neutron router-interface-add demo-router $subnetdemoid
#wait for VM to be running
sleep 30
for i in `seq 1 10`; do
cirrosfix=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $12;}}')
cirrosfixip=${cirrosfix#*=}
output=$(neutron port-list | grep ${cirrosfixip})
echo $output
portid=$(echo "$output" | awk '{print $2;}')
neutron floatingip-associate --fixed-ip-address $cirrosfixip ${floatipid[i]} $portid
neutron floatingip-delete ${floatipid[i]}
nova delete ${cirrosdemoid[i]}
done
neutron router-interface-delete demo-router $subnetdemoid
neutron router-gateway-clear demo-router $netdemoid
neutron router-delete demo-router
neutron subnet-delete $subnetdemoid
neutron net-delete $netdemoid
Looking at log file:
2014-11-20 17:25:56.258 31042 DEBUG neutron.openstack.common.lockutils [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] Got semaphore "db-access" lock /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/openstack/common/lockutils.py:168
2014-11-20 17:25:56.424 31042 ERROR neutron.api.v2.resource [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] delete failed
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Traceback (most recent call last):
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource result = method(request=request, **args)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/base.py", line 476, in delete
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1036, in delete_port
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource router_info = l3plugin.dvr_deletens_if_no_vm(context, id)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/l3_dvrscheduler_db.py", line 195, in dvr_deletens_if_no_vm
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource port_host)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Agent.host == host).one()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2369, in one
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource ret = list(self)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2411, in _iter_
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.session._autoflush()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1198, in _autoflush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.flush()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self._flush(objects)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource transaction.rollback(_capture_exception=True)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in _exit_
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource compat.reraise(exc_type, exc_value, exc_tb)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource flush_context.execute()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource rec.execute(self)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource uow
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 60, in save_obj
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource mapper, table, update)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 536, in _emit_update_statements
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource (table.description, len(update), rows))
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource StaleDataError: UPDATE statement on table 'ml2_dvr_port_bindings' expected to update 1 row(s); 0 were matched.
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource
2014-11-20 17:25:56.427 31042 INFO neutron.wsgi [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] 10.23.32.135 - - [20/Nov/2014 17:25:56] "DELETE /v2.0/ports/83f2d987-0ab7-4479-8c97-7ab7b758a2dc.json HTTP/1.1" 500 296 0.235463
The problem could be some race condition. The code l3_devscheduler_db.py
From line 158, got_dvr_routers_by_portid to line 193 get_agent_by_type_and_host, there are several calls.
If the port was there when code on 158. Another thread comes in the middle, delete the port. Line 193 will fail because the entry was not there anymore.
Both "neutron floatingip-delete" and "nova delete" can delete the port.
I think if we move l3_devscheduler_db.py, line 193:
agent = self._get_agent_by_type_and_host(context,
q_const.AGENT_TYPE_L3,
port_host)
to line 164
It should resolve the problem.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1398446/+subscriptions
Follow ups
References