yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1687913] [NEW] db retry not triggered when fail happened in after_create notify

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Wim De Clercq <wim.de_clercq@xxxxxxxxxxxxxxxxx>
Date: Wed, 03 May 2017 11:43:52 -0000
Reply-to: Bug 1687913 <1687913@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

Note:
- The specific use case can no longer happen on master (due to a couple of commits). So the below is for a < ocata context.
- Bug seen on Newton setup

During high concurrency testing (with router:external networks) the following deadlock may occur
http://paste.openstack.org/show/608690/

Deadlocks are normally 'okay', because the db retry mechanism will retry
the request. But in this specific case it did not.

The issue happens here:
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/plugin.py#L769

- It's inside of a transaction
- the external_net_db code does a notify with AFTER_CREATE.
- in the AFTER_CREATE even processing, the deadlock happens

The problem is that an AFTER_CREATE event will not raise exceptions. It just logs.
But it IS inside of a transaction, and it did make the session invalid.

So the code continues, it tries to commit the invalid session. And the
resulting exception of this is a

sqlalchemy.exc.InvalidRequestError - This Session's transaction has
been rolled back due to a previous exception during flush. To begin a
new transaction with this Session, first issue Session.rollback().
Original exception was: ...

Since this exception type is not part of the db_retry exceptions, no
retry happens and the request fails.

While this use case is a very specific one. Maybe some action is needed to avoid something like this happening in other places. Because any database error which occurs inside of an event notify which is not BEFORE_x or PRECOMMIT will have this behaviour: corrupt the session object, nothing raises, and the following error is not retriable.

(to easily reproduce on a test setup: add

if event == events.AFTER_CREATE:
try:
context.session.add(models_v2.Network(name=256*'g'))
context.session.flush() # this makes the session invalid
except:
raise db_exc.DBDeadlock()

to _ensure_external_network_default_value_callback in neutron.services.auto_allocate.db.py
and create a router:external network.

This should trigger the retry mechanism at first sight, but it won't.)

** Affects: neutron
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1687913

Title:
db retry not triggered when fail happened in after_create notify

Status in neutron:
New

Bug description:
Note:
- The specific use case can no longer happen on master (due to a couple of commits). So the below is for a < ocata context.
- Bug seen on Newton setup