c2c-oerpscenario team mailing list archive

Thread
Date

Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences

To: c2c-oerpscenario@xxxxxxxxxxxxxxxxxxx
From: Raphaël Valyi - http://www.akretion.com <746620@xxxxxxxxxxxxxxxxxx>
Date: Mon, 06 Jun 2011 22:11:32 -0000
Reply-to: Bug 746620 <746620@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Ok got it with the snapshot thing, I was unaware of that and was exactly the
information I was asking for.

Some additional remarks: for me the gap enabled solution might have some
benefits in some situation but is still not the silver bullet. Take Kyle's
case: we are importing orders both from Magento and OpenbravoPOS by batch
and there are several hundreds of such orders a day: we certainly don't want
holes in the invoice and accounting sequence (POS or Credit cards orders are
automatically reconciled by base_sale_multichannels) and the probability of
concurrent access is still huge.

In this situation I would instead go for: at each order iteration, being
able to strictly catch this error (I believe that wouldn't be too clean with
today's code) and if we got the order process attempt failed because of
that, then we wait some amount of time and retry and this over and over.
Indeed we already do a cr.commit at the end of those iterations to avoid
loosing the whole batch because of some peculiar error. I feel there is then
no trouble in doing some extra commits each time we fail because of that
before retrying.
The only thing that I don't like with that is that some resource is wasted:
when we start an iteration over: we are going to do quite a lot of things
(import order data via WS or SQL, map fields...) before trying to get some
ir_sequence and this will all be wasted when we try the whole iteration
cycle over.

Then I kind of think base_sale_multichannels and possibly other batches like
MRP could possibly have some kind of mutex (could be some request on
ir_sequence itself) so we would wait until we can acquire the mutex, then
fire a commit and take an iteration cycle, then release then commit and then
release the mutex.
What do you think about this approach?

On Mon, Jun 6, 2011 at 9:41 AM, Olivier Dony (OpenERP) <
746620@xxxxxxxxxxxxxxxxxx> wrote:

> On 06/02/2011 07:00 PM, Raphaël Valyi - http://www.akretion.com wrote:
> > Olivier, sorry, I still don't get it:
> > if the sequence is locked by some batch (like the MRP), what is the
> benefit
> > of "not waiting even a second before failing".
> > Can you cite a use case where there is a benefit in not waiting even for
> a
> > second?
>
> Have a look at the example transactions in the text file I attached,
> this shows why waiting is useless. It's simple: concurrent updates on
> the same row(s) by interleaved transactions are simply not allowed by
> snapshot isolation, regardless of the amount of waiting you're willing
> to do.
>
> If two transactions start in parallel to update the same row(s), only
> one will be able to do it, and the second one will fail 100% of the
> time. This is because even after waiting for the first transaction to
> finish, the second one is still only allowed to see the "snapshot" of
> the database as it was when the transaction started. And because that
> snapshot is now stale, updating rows in it is not allowed and will fail
> with "ERROR:  could not serialize access due to concurrent update".
> To convince yourself, start 2 psql clients in parallel, begin a
> serializable transaction in each, and try to update the same row.
> The second one will fail when the first commits, e.g:
>
> test=# BEGIN ISOLATION LEVEL SERIALIZABLE;
> BEGIN
> test=# -- dummy SELECT to really start the transaction:
> test=# SELECT count(id) FROM res_company;
> (...)
> test=# UPDATE res_users SET write_date = now() WHERE id = 1;
> ERROR:  could not serialize access due to concurrent update
> test_install=#
>
> Please, don't reply again to this bug until you've actually played with
> that and tried the different variations ;-) (and remember that a
> transaction only starts when you access the database at least once after
> the BEGIN)
>
>
> > You say: " instead of waiting a bit to fail later anyway."
>
> Yes, exactly. Without SFU NOWAIT, you will still fail 100% of the time,
> it will just take longer to notice it. No amount of waiting *during the
> transaction* will help. This is a consequence of snapshot isolation, our
> closest match for serializable isolation.
> You need to start a new transaction and try again to have a chance of
> overcoming that, but doing so automatically in the code of stock or
> sequence means breaking the atomicity of the enclosing transaction, so
> that's not possible.
>
>
> > If take the MRP batch case or other batches like we have with
> > base_sale_multichannels, there is often a commit after a record is
> > processed, and I believe this is a good practice as it would prevent
> > a full roll back just because of some specific record issue.
>
> Sure, during batch processing, it makes sense to consider each item on
> its own and process it in its own transaction. But the core code of
> stock or sequence is much lower-level, so it cannot do rogue commits nor
> escape its enclosing transaction.
>
>
> > Since there are commit and since the batch spend time doing other things*
> > before locking the ir_sequence table, unless I miss something it means
> that
> > there are small time periods while the ir_sequence is not locked. Am I
> > wrong?
> > Then, unless I miss something, if we wait say up to 1 sec when requesting
> a
> > sequence number, wouldn't have get a chance to be able to get our
> sequence
> > number during those small periods it's not locked?
>
> That won't work, for the same reason: the transactions would be
> interleaved, so any transaction that starts while another one increments
> the sequence is going to fail no matter what. Looping a few times would
> not work either, once the transaction has begun, it's too late. You
> would need to retry the whole transaction completely (batch processing
> can/should do that).
>
>
> > And overall, what is the benefit of waiting 0 millisecond before blowing
> > into a locked sequence table exception? Any use case where this really
> bring
> > value?
>
> Benefits: cut the useless wait (you're gonna fail *anyway*, just later),
> and remove a possible source of deadlocks, because any transaction that
> is waiting is such a source. Good enough?
>
>
> > * I even consider very seriously to introduce some sleep(a few millisecs)
> at
> > least in base_sale_multi_channels import batches if that would help
> making
> > other process more fluid (and so may be we could optionally do it in the
> MRP
> > batch too).
>
> If you put a sleep() right after the cr.commit() you're indeed
> decreasing the chance of conflicts with manual transactions. Conflicts
> can still occur, just less often.
>
>
> Wrapping up again, our suggested solution to improve the situation is to
> have 2 different behaviors: general sequences would be implemented with
> PostgreSQL sequences, with no locking but no guarantee about gaps, and
> good performances. Only some accounting sequences would be implemented
> with the gapless, locking sequences, because that is still the only
> reliable way we can guarantee the absence of gaps.
>
> This means user actions and batch processes should still be prepared to
> face a temporary error in *rare* cases. We could have a temporary
> exception in the exception hierarchy for that, easy to distinguish from
> permanent exceptions, and to handle appropriately in batch processing.
>
> --
> You received this bug notification because you are a member of OpenERP
> Drivers, which is subscribed to OpenERP Server.
> https://bugs.launchpad.net/bugs/746620
>
> Title:
>  Implementation of faster gap-tolerant sequences
>
> Status in OpenERP Server:
>  Confirmed
>
> Bug description:
>  OpenERP is classified as an Enterprise class software package. Meaning
>  that more than one person, at least 2 should be able to use the system
>  simultaneously. Like packers shipping products, taking orders,
>  reserving products. I find that in fact, it is not possible for
>  creation of stock moves simultaneously.
>
>   Say I am importing orders from a shop. It is creating stock moves
>  because the order is paid.
>  At the same time I am shipping products which is normal for an ERP system.
>  I might also import orders from a separate shop say a POS system.
>  It is not possible!
>  [2011-03-31 13:10:47,657][midwestsupplies] WARNING:stock.location:Failed
> attempt to reserve 1.0 x product 1669, li
>  kely due to another transaction already in progress. Next attempt is
> likely to work. Detailed error available at D
>  EBUG level.
>  OperationalError: could not obtain lock on row in relation "ir_sequence"
>  Two different errors.
>  I  can only perform one action at a time!
>  What happens is that any time the program calls for a stock_move it will
> lock the stock_move table so no other process can access it which means that
> you cant do hardly anything unless its done individually. Now say with a MS
> system, or any Enterprise system would be able to handle many simultaneous
> actions but this program has a serious architecture flaw to not be able to
> support this.
>

-- 
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/746620

Title:
  Implementation of faster gap-tolerant sequences

Status in OpenERP Server:
  Confirmed

Bug description:
  OpenERP is classified as an Enterprise class software package. Meaning
  that more than one person, at least 2 should be able to use the system
  simultaneously. Like packers shipping products, taking orders,
  reserving products. I find that in fact, it is not possible for
  creation of stock moves simultaneously.

   Say I am importing orders from a shop. It is creating stock moves  because the order is paid. 
  At the same time I am shipping products which is normal for an ERP system.
  I might also import orders from a separate shop say a POS system.
  It is not possible!
  [2011-03-31 13:10:47,657][midwestsupplies] WARNING:stock.location:Failed attempt to reserve 1.0 x product 1669, li
  kely due to another transaction already in progress. Next attempt is likely to work. Detailed error available at D
  EBUG level.  
  OperationalError: could not obtain lock on row in relation "ir_sequence"
  Two different errors.
  I  can only perform one action at a time!
  What happens is that any time the program calls for a stock_move it will lock the stock_move table so no other process can access it which means that you cant do hardly anything unless its done individually. Now say with a MS system, or any Enterprise system would be able to handle many simultaneous actions but this program has a serious architecture flaw to not be able to support this.

Follow ups

Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences
From: Olivier Dony (OpenERP), 2011-06-07

References

[Bug 746620] [NEW] Enterpise system NOT Enterprise capable, Only one concurrent transaction possible!
From: Kyle Waid - http://www.gcotech.com, 2011-03-31
[Bug 746620] Re: serialized transaction architecture flaw PLEASE COLLABORATE
From: Olivier Dony (OpenERP), 2011-06-01
Re: [Bug 746620] Re: serialized transaction architecture flaw PLEASE COLLABORATE
From: Raphaël Valyi - http : //www . akretion . com, 2011-06-01
Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences
From: Olivier Dony (OpenERP), 2011-06-02
Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences
From: Raphaël Valyi - http : //www . akretion . com, 2011-06-02
Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences
From: Olivier Dony (OpenERP), 2011-06-06