← Back to team overview

c2c-oerpscenario team mailing list archive

Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences

 

On 06/02/2011 07:00 PM, Raphaël Valyi - http://www.akretion.com wrote:
> Olivier, sorry, I still don't get it:
> if the sequence is locked by some batch (like the MRP), what is the benefit
> of "not waiting even a second before failing".
> Can you cite a use case where there is a benefit in not waiting even for a
> second?

Have a look at the example transactions in the text file I attached,
this shows why waiting is useless. It's simple: concurrent updates on
the same row(s) by interleaved transactions are simply not allowed by
snapshot isolation, regardless of the amount of waiting you're willing
to do.

If two transactions start in parallel to update the same row(s), only
one will be able to do it, and the second one will fail 100% of the
time. This is because even after waiting for the first transaction to
finish, the second one is still only allowed to see the "snapshot" of
the database as it was when the transaction started. And because that
snapshot is now stale, updating rows in it is not allowed and will fail
with "ERROR:  could not serialize access due to concurrent update".
To convince yourself, start 2 psql clients in parallel, begin a
serializable transaction in each, and try to update the same row.
The second one will fail when the first commits, e.g:

test=# BEGIN ISOLATION LEVEL SERIALIZABLE;
BEGIN
test=# -- dummy SELECT to really start the transaction:
test=# SELECT count(id) FROM res_company;
(...)
test=# UPDATE res_users SET write_date = now() WHERE id = 1;
ERROR:  could not serialize access due to concurrent update
test_install=#

Please, don't reply again to this bug until you've actually played with
that and tried the different variations ;-) (and remember that a
transaction only starts when you access the database at least once after
the BEGIN)


> You say: " instead of waiting a bit to fail later anyway."

Yes, exactly. Without SFU NOWAIT, you will still fail 100% of the time,
it will just take longer to notice it. No amount of waiting *during the
transaction* will help. This is a consequence of snapshot isolation, our
closest match for serializable isolation.
You need to start a new transaction and try again to have a chance of
overcoming that, but doing so automatically in the code of stock or
sequence means breaking the atomicity of the enclosing transaction, so
that's not possible.


> If take the MRP batch case or other batches like we have with 
> base_sale_multichannels, there is often a commit after a record is
> processed, and I believe this is a good practice as it would prevent
> a full roll back just because of some specific record issue.

Sure, during batch processing, it makes sense to consider each item on
its own and process it in its own transaction. But the core code of
stock or sequence is much lower-level, so it cannot do rogue commits nor
escape its enclosing transaction.


> Since there are commit and since the batch spend time doing other things*
> before locking the ir_sequence table, unless I miss something it means that
> there are small time periods while the ir_sequence is not locked. Am I
> wrong?
> Then, unless I miss something, if we wait say up to 1 sec when requesting a
> sequence number, wouldn't have get a chance to be able to get our sequence
> number during those small periods it's not locked?

That won't work, for the same reason: the transactions would be
interleaved, so any transaction that starts while another one increments
the sequence is going to fail no matter what. Looping a few times would
not work either, once the transaction has begun, it's too late. You
would need to retry the whole transaction completely (batch processing
can/should do that).


> And overall, what is the benefit of waiting 0 millisecond before blowing
> into a locked sequence table exception? Any use case where this really bring
> value?

Benefits: cut the useless wait (you're gonna fail *anyway*, just later),
and remove a possible source of deadlocks, because any transaction that
is waiting is such a source. Good enough?


> * I even consider very seriously to introduce some sleep(a few millisecs) at
> least in base_sale_multi_channels import batches if that would help making
> other process more fluid (and so may be we could optionally do it in the MRP
> batch too).

If you put a sleep() right after the cr.commit() you're indeed
decreasing the chance of conflicts with manual transactions. Conflicts
can still occur, just less often.


Wrapping up again, our suggested solution to improve the situation is to
have 2 different behaviors: general sequences would be implemented with
PostgreSQL sequences, with no locking but no guarantee about gaps, and
good performances. Only some accounting sequences would be implemented
with the gapless, locking sequences, because that is still the only
reliable way we can guarantee the absence of gaps.

This means user actions and batch processes should still be prepared to
face a temporary error in *rare* cases. We could have a temporary
exception in the exception hierarchy for that, easy to distinguish from
permanent exceptions, and to handle appropriately in batch processing.

-- 
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/746620

Title:
  Implementation of faster gap-tolerant sequences

Status in OpenERP Server:
  Confirmed

Bug description:
  OpenERP is classified as an Enterprise class software package. Meaning
  that more than one person, at least 2 should be able to use the system
  simultaneously. Like packers shipping products, taking orders,
  reserving products. I find that in fact, it is not possible for
  creation of stock moves simultaneously.

   Say I am importing orders from a shop. It is creating stock moves  because the order is paid. 
  At the same time I am shipping products which is normal for an ERP system.
  I might also import orders from a separate shop say a POS system.
  It is not possible!
  [2011-03-31 13:10:47,657][midwestsupplies] WARNING:stock.location:Failed attempt to reserve 1.0 x product 1669, li
  kely due to another transaction already in progress. Next attempt is likely to work. Detailed error available at D
  EBUG level.  
  OperationalError: could not obtain lock on row in relation "ir_sequence"
  Two different errors.
  I  can only perform one action at a time!
  What happens is that any time the program calls for a stock_move it will lock the stock_move table so no other process can access it which means that you cant do hardly anything unless its done individually. Now say with a MS system, or any Enterprise system would be able to handle many simultaneous actions but this program has a serious architecture flaw to not be able to support this.


Follow ups

References