← Back to team overview

c2c-oerpscenario team mailing list archive

Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences

 

On 06/01/2011 07:53 PM, Raphaël Valyi - http://www.akretion.com wrote:
> I just cannot understand why there is the "NOWAIT" instruction. On Twitter
> Tryton lead told me that NOWAIT doesn't make any difference when using the
> "serialized isolation level", the one we used. Is that true? Why? 

If you read the description of Postgres's serializable transaction
level[1], the section about application-level consistency[2], and the
description of snapshot isolation[3], you will see that it is critical
to take the /write skew/ issue into consideration.
We can however distinguish 2 cases:
1. Transactions that read and update the same row (e.g ir.sequence
generation) -> Postgres can detect this in serializable isolation
2. Transactions that would read the same rows but update different rows
(e.g. product/stock reservation) -> Postgres cannot detect this.

For case 1, different strategies have different properties, as
illustrated in the comparison I am attaching to this bug. Using SELECT
FOR UPDATE with the NOWAIT flag is basically a pessimistic locking
strategy with immediate conflict detection. This looks to me like the
best strategy when you know all transactions are going to update the row
in the same way, which is the case for ir.sequence. With this strategy
you prevent transactions to even read the about-to-change value (making
sure it cannot be used erroneously to produce a report, an email, etc.
before detecting the conflict), and you also detect the conflict
immediately, instead of waiting a bit to fail later anyway.

For case 2, there is no way to avoid explicit locking, because Postgres
cannot detect the concurrent update. This is subject to the typical
/write skew/ anomaly described in [3], has already been discussed with
the community last year and in bug 507389. In this case again, using a
correct SELECT FOR UPDATE NOWAIT query provides the appropriate safety
without forcing a full table lock. For example, our current
implementation allows concurrent transactions to reserve stock moves at
the same time if the moves connect different locations, or if the
products are different.
The reason for using NOWAIT is a little bit different in this case,
because there is a chance that transactions could behave correctly
without it. The problem is that the infinite wait for the lock is a
possible source of deadlocks. There are some deadlocks that Postgres
cannot detect (e.g when they are indirectly caused by Python level
locks), and due to the extensibility of the framework and the automatic
implicit row-level locks Postgres does, we cannot guarantee the same
locking order everywhere. There is also currently no way to specify a
timeout for the lock acquisition in Postgres: the last patch attempt[4]
was rejected, so timeouts can only be set at statement level[5], which
is not appropriate at all.

Wrapping up: SELECT FOR UPDATE NOWAIT provides us with an appropriate
locking granularity for preventing /write skew/ anomalies when needed,
with no risk of deadlock. It also allows us to immediately detect future
conflicts in case of same-row concurrent updates.


> In the strange case where that would be true, can we no fake a few
> retry at least in the sequence.py code to avoid systematically
> falling into that "cannot lock ir_sequence table" bug ?

Sure, we could loop a few times and retry, but that doesn't remove the
need for proper error handling in case all retries have failed. And for
same-row locking (ir.sequence), I guess we all agree that it is best to
have a lightweight gap-tolerant sequence without the locking hassle. At
least for all the cases where speed matters and the absence of gaps is
not a requirement (that is, for most sequences!)


I hope this clarifies the current design decisions and can serve as a
basis for discussing the future implementation.


[1]
http://www.postgresql.org/docs/8.2/static/transaction-iso.html#XACT-SERIALIZABLE
[2] http://www.postgresql.org/docs/8.2/static/applevel-consistency.html
[3] http://en.wikipedia.org/wiki/Snapshot_isolation
[4] http://archives.postgresql.org/pgsql-hackers/2010-08/msg00048.php
[5]
http://www.postgresql.org/docs/8.3/static/runtime-config-client.html#GUC-STATEMENT-TIMEOUT


** Attachment added: "concurrent-updates-guarding-strategies.txt"
   https://bugs.launchpad.net/bugs/746620/+attachment/2152168/+files/concurrent-updates-guarding-strategies.txt

-- 
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/746620

Title:
  Implementation of faster gap-tolerant sequences

Status in OpenERP Server:
  Confirmed

Bug description:
  OpenERP is classified as an Enterprise class software package. Meaning
  that more than one person, at least 2 should be able to use the system
  simultaneously. Like packers shipping products, taking orders,
  reserving products. I find that in fact, it is not possible for
  creation of stock moves simultaneously.

   Say I am importing orders from a shop. It is creating stock moves  because the order is paid. 
  At the same time I am shipping products which is normal for an ERP system.
  I might also import orders from a separate shop say a POS system.
  It is not possible!
  [2011-03-31 13:10:47,657][midwestsupplies] WARNING:stock.location:Failed attempt to reserve 1.0 x product 1669, li
  kely due to another transaction already in progress. Next attempt is likely to work. Detailed error available at D
  EBUG level.  
  OperationalError: could not obtain lock on row in relation "ir_sequence"
  Two different errors.
  I  can only perform one action at a time!
  What happens is that any time the program calls for a stock_move it will lock the stock_move table so no other process can access it which means that you cant do hardly anything unless its done individually. Now say with a MS system, or any Enterprise system would be able to handle many simultaneous actions but this program has a serious architecture flaw to not be able to support this.


Follow ups

References