← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1398446] [NEW] Novacompute failed to detel VM port with DVR

 

Public bug reported:

This defect is hard to reproduce, only happens when I have more than 3
compute node with DVR enabled.

With the following script, run several times, I can see one VM in ERROR
state.


neutron net-create demo-net
netdemoid=$(neutron net-list | awk '{if($4=="'demo-net'"){print $2;}}')
neutron subnet-create demo-net 10.100.100.0/24 --name demo-subnet
subnetdemoid=$(neutron subnet-list | awk '{if($4=="'demo-subnet'"){print $2;}}')
neutron router-create demo-router
routerdemoid=$(neutron router-list | awk '{if($4=="'demo-router'"){print $2;}}')

exnetid=$(neutron net-list | awk '{if($4=="'ext-net'"){print $2;}}')
for i in `seq 1 10`; do
    #boot vm, and create floating ip
    nova boot --image cirros --flavor m1.tiny --nic net-id=$netdemoid cirrosdemo${i}
    cirrosdemoid[i]=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $2;}}')
    output=$(neutron floatingip-create $exnetid)
    echo $output
    floatipid[i]=$(echo "$output" | awk '{if($2=="id"){print $4;}}')
    floatip[i]=$(echo "$output" | awk '{if($2=="floating_ip_address"){print $4;}}')a
done

# Setup router
neutron router-gateway-set $routerdemoid $exnetid
neutron router-interface-add demo-router $subnetdemoid
#wait for VM to be running
sleep 30

for i in `seq 1 10`; do
    cirrosfix=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $12;}}')
    cirrosfixip=${cirrosfix#*=}
    output=$(neutron port-list | grep ${cirrosfixip})
    echo $output
    portid=$(echo "$output" | awk '{print $2;}')
    neutron floatingip-associate --fixed-ip-address $cirrosfixip ${floatipid[i]} $portid
    neutron floatingip-delete ${floatipid[i]}
    nova delete ${cirrosdemoid[i]}
done


neutron router-interface-delete demo-router $subnetdemoid
neutron router-gateway-clear demo-router $netdemoid
neutron router-delete demo-router
neutron subnet-delete $subnetdemoid
neutron net-delete $netdemoid

Looking at log file:
2014-11-20 17:25:56.258 31042 DEBUG neutron.openstack.common.lockutils [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] Got semaphore "db-access" lock /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/openstack/common/lockutils.py:168
2014-11-20 17:25:56.424 31042 ERROR neutron.api.v2.resource [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] delete failed
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Traceback (most recent call last):
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource result = method(request=request, **args)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/base.py", line 476, in delete
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1036, in delete_port
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource router_info = l3plugin.dvr_deletens_if_no_vm(context, id)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/l3_dvrscheduler_db.py", line 195, in dvr_deletens_if_no_vm
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource port_host)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Agent.host == host).one()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2369, in one
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource ret = list(self)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2411, in _iter_
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.session._autoflush()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1198, in _autoflush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.flush()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self._flush(objects)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource transaction.rollback(_capture_exception=True)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in _exit_
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource compat.reraise(exc_type, exc_value, exc_tb)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource flush_context.execute()
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource rec.execute(self)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource uow
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 60, in save_obj
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource mapper, table, update)
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 536, in _emit_update_statements
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource (table.description, len(update), rows))
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource StaleDataError: UPDATE statement on table 'ml2_dvr_port_bindings' expected to update 1 row(s); 0 were matched.
2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource
2014-11-20 17:25:56.427 31042 INFO neutron.wsgi [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] 10.23.32.135 - - [20/Nov/2014 17:25:56] "DELETE /v2.0/ports/83f2d987-0ab7-4479-8c97-7ab7b758a2dc.json HTTP/1.1" 500 296 0.235463


The problem could be some race condition. The code l3_devscheduler_db.py
>From line 158, got_dvr_routers_by_portid to line 193 get_agent_by_type_and_host, there are several calls.
If the port was there when code on 158. Another thread comes in the middle, delete the port. Line 193 will fail because the entry was not there anymore.
Both "neutron floatingip-delete" and "nova delete" can delete the port.

I think if we move l3_devscheduler_db.py, line 193:
            agent = self._get_agent_by_type_and_host(context,
                                                     q_const.AGENT_TYPE_L3,
                                                     port_host)
to line 164

It should resolve the problem.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1398446

Title:
  Novacompute failed to detel VM port with DVR

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  This defect is hard to reproduce, only happens when I have more than 3
  compute node with DVR enabled.

  With the following script, run several times, I can see one VM in
  ERROR state.

  
  neutron net-create demo-net
  netdemoid=$(neutron net-list | awk '{if($4=="'demo-net'"){print $2;}}')
  neutron subnet-create demo-net 10.100.100.0/24 --name demo-subnet
  subnetdemoid=$(neutron subnet-list | awk '{if($4=="'demo-subnet'"){print $2;}}')
  neutron router-create demo-router
  routerdemoid=$(neutron router-list | awk '{if($4=="'demo-router'"){print $2;}}')

  exnetid=$(neutron net-list | awk '{if($4=="'ext-net'"){print $2;}}')
  for i in `seq 1 10`; do
      #boot vm, and create floating ip
      nova boot --image cirros --flavor m1.tiny --nic net-id=$netdemoid cirrosdemo${i}
      cirrosdemoid[i]=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $2;}}')
      output=$(neutron floatingip-create $exnetid)
      echo $output
      floatipid[i]=$(echo "$output" | awk '{if($2=="id"){print $4;}}')
      floatip[i]=$(echo "$output" | awk '{if($2=="floating_ip_address"){print $4;}}')a
  done

  # Setup router
  neutron router-gateway-set $routerdemoid $exnetid
  neutron router-interface-add demo-router $subnetdemoid
  #wait for VM to be running
  sleep 30

  for i in `seq 1 10`; do
      cirrosfix=$(nova list | awk '{if($4=="'cirrosdemo${i}'"){print $12;}}')
      cirrosfixip=${cirrosfix#*=}
      output=$(neutron port-list | grep ${cirrosfixip})
      echo $output
      portid=$(echo "$output" | awk '{print $2;}')
      neutron floatingip-associate --fixed-ip-address $cirrosfixip ${floatipid[i]} $portid
      neutron floatingip-delete ${floatipid[i]}
      nova delete ${cirrosdemoid[i]}
  done

  
  neutron router-interface-delete demo-router $subnetdemoid
  neutron router-gateway-clear demo-router $netdemoid
  neutron router-delete demo-router
  neutron subnet-delete $subnetdemoid
  neutron net-delete $netdemoid

  Looking at log file:
  2014-11-20 17:25:56.258 31042 DEBUG neutron.openstack.common.lockutils [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] Got semaphore "db-access" lock /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/openstack/common/lockutils.py:168
  2014-11-20 17:25:56.424 31042 ERROR neutron.api.v2.resource [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] delete failed
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Traceback (most recent call last):
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource result = method(request=request, **args)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/api/v2/base.py", line 476, in delete
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource obj_deleter(request.context, id, **kwargs)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1036, in delete_port
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource router_info = l3plugin.dvr_deletens_if_no_vm(context, id)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/l3_dvrscheduler_db.py", line 195, in dvr_deletens_if_no_vm
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource port_host)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource Agent.host == host).one()
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2369, in one
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource ret = list(self)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2411, in _iter_
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.session._autoflush()
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1198, in _autoflush
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self.flush()
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1919, in flush
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource self._flush(objects)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2037, in _flush
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource transaction.rollback(_capture_exception=True)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in _exit_
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource compat.reraise(exc_type, exc_value, exc_tb)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2001, in _flush
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource flush_context.execute()
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 372, in execute
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource rec.execute(self)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 526, in execute
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource uow
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 60, in save_obj
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource mapper, table, update)
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource File "/opt/stack/venvs/openstack/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 536, in _emit_update_statements
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource (table.description, len(update), rows))
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource StaleDataError: UPDATE statement on table 'ml2_dvr_port_bindings' expected to update 1 row(s); 0 were matched.
  2014-11-20 17:25:56.424 31042 TRACE neutron.api.v2.resource
  2014-11-20 17:25:56.427 31042 INFO neutron.wsgi [req-6eabf07e-2fe4-4960-89ca-f0ac3f04f7a5 None] 10.23.32.135 - - [20/Nov/2014 17:25:56] "DELETE /v2.0/ports/83f2d987-0ab7-4479-8c97-7ab7b758a2dc.json HTTP/1.1" 500 296 0.235463

  
  The problem could be some race condition. The code l3_devscheduler_db.py
  From line 158, got_dvr_routers_by_portid to line 193 get_agent_by_type_and_host, there are several calls.
  If the port was there when code on 158. Another thread comes in the middle, delete the port. Line 193 will fail because the entry was not there anymore.
  Both "neutron floatingip-delete" and "nova delete" can delete the port.

  I think if we move l3_devscheduler_db.py, line 193:
              agent = self._get_agent_by_type_and_host(context,
                                                       q_const.AGENT_TYPE_L3,
                                                       port_host)
  to line 164

  It should resolve the problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1398446/+subscriptions


Follow ups

References