← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1687913] [NEW] db retry not triggered when fail happened in after_create notify

 

Public bug reported:

Note: 
- The specific use case can no longer happen on master (due to a couple of commits). So the below is for a < ocata context.
- Bug seen on Newton setup

During high concurrency testing (with router:external networks) the following deadlock may occur
http://paste.openstack.org/show/608690/

Deadlocks are normally 'okay', because the db retry mechanism will retry
the request. But in this specific case it did not.

The issue happens here:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L769

- It's inside of a transaction
- the external_net_db code does a notify with AFTER_CREATE.
- in the AFTER_CREATE even processing, the deadlock happens 

The problem is that an AFTER_CREATE event will not raise exceptions. It just logs. 
But it IS inside of a transaction, and it did make the session invalid.

So the code continues, it tries to commit the invalid session. And the
resulting exception of this is a

sqlalchemy.exc.InvalidRequestError  - This Session's transaction has
been rolled back due to a previous exception during flush. To begin a
new transaction with this Session, first issue Session.rollback().
Original exception was: ...

Since this exception type is not part of the db_retry exceptions, no
retry happens and the request fails.


While this use case is a very specific one. Maybe some action is needed to avoid something like this happening in other places. Because any database error which occurs inside of an event notify which is not BEFORE_x or PRECOMMIT will have this behaviour: corrupt the session object, nothing raises, and the following error is not retriable.


(to easily reproduce on a test setup: add

    if event == events.AFTER_CREATE:
        try:
            context.session.add(models_v2.Network(name=256*'g'))
            context.session.flush() # this makes the session invalid
        except:
            raise db_exc.DBDeadlock()


to _ensure_external_network_default_value_callback in neutron.services.auto_allocate.db.py
and create a router:external network.

This should trigger the retry mechanism at first sight, but it won't.)

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1687913

Title:
  db retry not triggered when fail happened in after_create notify

Status in neutron:
  New

Bug description:
  Note: 
  - The specific use case can no longer happen on master (due to a couple of commits). So the below is for a < ocata context.
  - Bug seen on Newton setup

  During high concurrency testing (with router:external networks) the following deadlock may occur
  http://paste.openstack.org/show/608690/

  Deadlocks are normally 'okay', because the db retry mechanism will
  retry the request. But in this specific case it did not.

  The issue happens here:
  https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L769

  - It's inside of a transaction
  - the external_net_db code does a notify with AFTER_CREATE.
  - in the AFTER_CREATE even processing, the deadlock happens 

  The problem is that an AFTER_CREATE event will not raise exceptions. It just logs. 
  But it IS inside of a transaction, and it did make the session invalid.

  So the code continues, it tries to commit the invalid session. And the
  resulting exception of this is a

  sqlalchemy.exc.InvalidRequestError  - This Session's transaction has
  been rolled back due to a previous exception during flush. To begin a
  new transaction with this Session, first issue Session.rollback().
  Original exception was: ...

  Since this exception type is not part of the db_retry exceptions, no
  retry happens and the request fails.

  
  While this use case is a very specific one. Maybe some action is needed to avoid something like this happening in other places. Because any database error which occurs inside of an event notify which is not BEFORE_x or PRECOMMIT will have this behaviour: corrupt the session object, nothing raises, and the following error is not retriable.


  (to easily reproduce on a test setup: add

      if event == events.AFTER_CREATE:
          try:
              context.session.add(models_v2.Network(name=256*'g'))
              context.session.flush() # this makes the session invalid
          except:
              raise db_exc.DBDeadlock()

  
  to _ensure_external_network_default_value_callback in neutron.services.auto_allocate.db.py
  and create a router:external network.

  This should trigger the retry mechanism at first sight, but it won't.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1687913/+subscriptions