← Back to team overview

c2c-oerpscenario team mailing list archive

Re: [Bug 746620] Re: Implementation of faster gap-tolerant sequences

 

On 06/07/2011 12:11 AM, Raphaël Valyi - http://www.akretion.com wrote:
> Some additional remarks: for me the gap enabled solution might have some
> benefits in some situation but is still not the silver bullet. Take Kyle's
> case: we are importing orders both from Magento and OpenbravoPOS by batch
> and there are several hundreds of such orders a day: we certainly don't want
> holes in the invoice and accounting sequence (POS or Credit cards orders are
> automatically reconciled by base_sale_multichannels) and the probability of
> concurrent access is still huge.

Yes, you'd apparently be in a situation where you need to use gapless
sequences in a high contention workload.

True serializability would be required to address this with no risk of
locking issue, but PostgreSQL does not support it, for performance
reasons. Even the new Serializable Snapshot Isolation mode (SSI)
introduced by pg 9.1 won't help here, as it will only detect more
serializable anomalies, and trigger a similar error.
Apparently, performance in a truly serializable isolation would be so
terrible [1] that SSI will perform better in all cases even with some
transactions having to be replayed.

[1]
http://wiki.postgresql.org/wiki/Serializable#Serializable_Isolation_Implementation_Strategies

> In this situation I would instead go for: at each order iteration, being
> able to strictly catch this error (I believe that wouldn't be too clean with
> today's code) and if we got the order process attempt failed because of
> that, then we wait some amount of time and retry and this over and over.

That's the approach we are suggesting, and by having a dedicated kind of
exception to detect, it should be very easy.

In fact, we could implement some sort of auto-retry at the RPC level,
when such a temporary lock exception is raised. This may not work for
batch processing when not done via RPC, but batch jobs could implement
the same logic. And this would at least help regular users, without too
much risk (with a limit on the number of retries and a semi-random delay
between them to avoid obvious issues)


> The only thing that I don't like with that is that some resource is wasted:
> when we start an iteration over: we are going to do quite a lot of things
> (import order data via WS or SQL, map fields...) before trying to get some
> ir_sequence and this will all be wasted when we try the whole iteration
> cycle over.

Yes, this should be minimized as much as possible, but cannot be 100%
avoided except with full serializability, which has a big perf impact.
One obvious thing to do is to schedule the batch imports at different
times of the day (e.g. OBPOS vs Magento), and if possible outside of the
peak activity hours from regular users.


> Then I kind of think base_sale_multichannels and possibly other batches like
> MRP could possibly have some kind of mutex (could be some request on
> ir_sequence itself) so we would wait until we can acquire the mutex, then
> fire a commit and take an iteration cycle, then release then commit and then
> release the mutex.
> What do you think about this approach?

This would be an ad-hoc implementation of serializability, and I think
it has some of the same pitfalls. To work easily in a multi-instance
system, the mutex would need to be done at database level, e.g. with a
LOCK TABLE statement. Such an exclusive lock waits until all readers and
writers are done, and blocks new ones. As a consequence, it has a very
high chance of causing deadlocks, because it must wait for every single
reader to be gone. Two transactions doing SELECT then LOCK TABLE on the
same table would immediately deadlock. To avoid this issue, a dedicated
mutex table could be used, with the consequence that non-batch
transactions wouldn't use the mutex (the LOCK TABLE needs to be taken
immediately at transaction start, otherwise it doesn't work, so this is
difficult outside of controlled jobs)
That last option might be sufficient indeed to drastically reduce the
chance of locking failures caused by 2 batch imports being executed in
parallel, because they would be serialized, with only conflicts with
manual transactions remaining.
But due to what was discussed above, this could well be outperformed by
the simple retrying of failed transactions in regular snapshot
isolation.. though this probably depends a lot on the usage patterns and
how the jobs are scheduled.

My gut feeling is: KISS as much as possible, so perhaps this is sufficient:
- most sequences would be lightweight, gap-tolerant, no locking
- accounting sequences and stock reservation would still do what is
right wrt locking, with a possible temporary exception raised
- the server would do some level of auto-retry at RPC level for
temporary exceptions: transparent to both users and business code
- batch jobs that use accounting sequences or stock reservations must
take care of retries, and should be scheduled in their own time-frame as
much as possible, so that lock failures are very rare, to minimize waste

-- 
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/746620

Title:
  Implementation of faster gap-tolerant sequences

Status in OpenERP Server:
  Confirmed

Bug description:
  OpenERP is classified as an Enterprise class software package. Meaning
  that more than one person, at least 2 should be able to use the system
  simultaneously. Like packers shipping products, taking orders,
  reserving products. I find that in fact, it is not possible for
  creation of stock moves simultaneously.

   Say I am importing orders from a shop. It is creating stock moves  because the order is paid. 
  At the same time I am shipping products which is normal for an ERP system.
  I might also import orders from a separate shop say a POS system.
  It is not possible!
  [2011-03-31 13:10:47,657][midwestsupplies] WARNING:stock.location:Failed attempt to reserve 1.0 x product 1669, li
  kely due to another transaction already in progress. Next attempt is likely to work. Detailed error available at D
  EBUG level.  
  OperationalError: could not obtain lock on row in relation "ir_sequence"
  Two different errors.
  I  can only perform one action at a time!
  What happens is that any time the program calls for a stock_move it will lock the stock_move table so no other process can access it which means that you cant do hardly anything unless its done individually. Now say with a MS system, or any Enterprise system would be able to handle many simultaneous actions but this program has a serious architecture flaw to not be able to support this.


References