← Back to team overview

fuel-dev team mailing list archive

[Nailgun] Deadlocks in tests

 

Hello colleagues,

There are some open bugs dealing with timeouts and deadlocks in Nailgun
unit tests, such as https://bugs.launchpad.net/fuel/+bug/1314523 and
https://bugs.launchpad.net/fuel/+bug/1319067

I did some research on this issue, and found that it is mostly caused by
two things:

1) we are trying to work with objects in transactions in different order,
which leads to deadlocks
2) SQLAlchemy doesn't do SELECT FOR UPDATE by default (implicitly)

So, we discussed possible solutions with Alexander Kislitsky, such as:

1) strict locking order for all objects (almost all our HTTP and RPC
handlers can behave as almost "atomic")
2) proper usage of .with_lock('update') in some places, which will
completely remove all tracebacks about unavailable objects and things like
that from unit test logs (resolving conflicts with fake task threads)

This issue can't be fixed easily, but it can be done step by step. As a
first approach, we need to rewrite RPC consumer to using transactions and
do it's handlers "kinda atomic".

-- 
Best regards,
Nick Markov