← Back to team overview

openerp-expert-framework team mailing list archive

Re: Exception TransactionRollbackError not correctly handled ?

 

Re-execution of RPC calls seems to be the most sensible solution.
RPC execution is just a on line call, that in case of this type of
exception, could be called a couple of times before reporting the exception
to caller.
There is no side effect to be expected, as long as all involved operations
where covered by the transaction rollback.

The cases where you could expect difficult situations (for example stock
reservation) are those where the limitations of the framework were
avoided by direct SQL calls, outside the scope of the transaction. Those
cases should be prepared to handle the cases where due to an exception on
the rest of the transaction, it could be retried (like the current
situation where the retry operation will be generated by the user). If not,
is a problem in a few hot spots, very localized, easy to locate (At a first
look, I could only find the example on stock reservation).

On any other transaction where all operations are performed within the
provided database cursor, I see no reason to expect any side effect. Again,
only operations outside the transaction could be a problem. Only case I can
foresee some consequence is the interaction with objects outside the
OpenERP server, for example fiscal printers, where the output operation
could be duplicated. But as it is already described in the previous
paragraph, this situation should be foresee in the module, because there
are other reasons to abort the transaction before it is completed and thus
most probably it will be repeated (like for example an unplanned server
shutdown or a power outage). In any case, I do not see any example of this
in the standard modules, and for private modules it is a situation already
to be handled by the current modules

On the other hand, the parallelization of web actions brings a meaningful
improvement to the operational speed, so it seems reasonable to invest some
effort trying to find a solution not giving up the optimization in place

Retrying a couple of times, optionally with a randomized limited wait in
between, may solve most of the situations without asking for user retry (or
at least, stark limiting the frequency of such situations).

Best regards


2012/5/3 Olivier Dony <odo@xxxxxxxxxxx>

> On 05/01/2012 01:09 PM, Valentin LAB wrote:
> > While using openerp, Psycopg2 raises TransactionRollbackError quite
> > often even on small database. This does not seem to be easily
> > reproduceable as it seems to be a conflict between two thread accessing
> > the same table.
>
> Yes, it is a normal and safe behavior of Postgres when running serializable
> transactions, whenever it detects that concurrent threads have performed
> operations that may not give the same result if they had been executed
> sequentially. For example 2 concurrent updates on the same row after
> reading
> its value will likely suffer from a "nonrepeatable read" issue [1], but
> there
> are much many more indirect cases.
>
>
> > This occurs mostly at module installation. And can completely mess up
> > the module installation by giving empty wizard windows of instance.
> >
> > So it's not just annoying, this can break badly a database. I guess it
> > could also occurs in other situations (in multi-user context), where the
> > bug would be quite difficult to reproduce and with unforeseeable
> > consequences ;)
>
> It can indeed happen in other circumstances, and the normal consequence is
> to
> rollback completely one of the operations. For GUI-based operation it is
> quite
> safe and the user can simply retry the operation again (provided they
> understand the error - the error reporting in that case needs improvement).
>
> More annoying cases would be the single-user issues during installation
> where
> the concurrent requests performed by the 6.1 web client may inadvertently
> cause
> this error: this is what you are seeing. Unattended requests driven by
> automated RPC calls may also get this, and without proper error management
> on
> the caller side this could be disastrous. This has been discussed in the
> past,
> e.g. on bug 746620 [2].
>
>
> > In my case (single user), it seem to hit more often on fast computers.
> > To make a probable better guess, it seems to hurt more often whenever
> > using a local connection between the browser and the server. It could be
> > about the web module trying to update the res_users session info and may
> > collide with normal operation.
>
> Yup, this is most likely caused by the new web client executing most RPC
> calls
> in parallel (quite necessary to speed up page rendering).
>
>
> > To give more info on non-base_crypt errors, this is a SQL query that is
> > often triggering the TransactionRollbackError::
> >
> >    UPDATE res_users
> >      SET
> >          date=now() AT TIME ZONE 'UTC'
> >      WHERE
> >          id=1
> >        AND password=E'$1$ctHfTPNo$0BbeJr1qjUDCINSQh8zci.'
> >        AND active
> >      RETURNING id
>
> This is the small transaction that updates the last login date whenever a
> user
> logs into the system. It is likely that the web client is doing one extra
> login() call at the same time as getting the result of the module install,
> causing Postgres to detect a possible serialization problem.
> Most transactions inserting or updating a row will indeed lock the
> res_user row
> of the creator of this row (due to the create_uid foreign key), and thus
> any
> transaction that updates a user row is much more likely to cause a
> serialization problem.
> Fortunately there are very few transactions that actually modify the user
> record: login() is one of them and should only be performed when a user
> connects to the system.
>
>
>
> > And here is a bug report which I found while doing mine:
> > https://bugs.launchpad.net/openobject-server/+bug/956715
> >
> >
> > After looking quickly on the net, I found that it's perfectly "normal"
> > to receive this exception from postgres, and that the openerp engine
> > should simply retry a little bit later.
> >
> > Is this info correct ?
>
> Yup, it is a normal error, and the transaction should simply be retried
> later,
> either manually (as it is now) or automatically.
>
>
>
> > If yes, I provided a quick patch that works well
> > in my case. But it should be reviewed carefully as I'm not particularly
> > aware of database world.
> >
> > merge proposal:
> >
> https://code.launchpad.net/~vaab/openobject-server/fix-transactionrollbackerror-by-retrying/+merge/104233
> >
> > bug report:
> > https://bugs.launchpad.net/openobject-server/+bug/992525
>
> The idea having OpenERP retry transactions that failed because of a
> serialization problem has been discussed in the past, and it sounds fine
> to me.
> However your implementation is too low-level: you are replaying only the
> last
> cr.execute() call, while it is the whole transaction that was rolled back.
> It
> will work for a few trivial cases (like the one-query transaction that
> updates
> the last login date), but a real transaction could consist in hundreds of
> queries, and discarding the previous queries will cause unpredictable
> madness.
>
> If you could move the retry logic higher in the stack and make it retry the
> whole RPC call, it should become a workable patch, but probably less
> simple.
>
> Other remarks on the patch:
> - Different reasons may cause TransactionRollbackErrors, and I'm not 100%
> sure
> we want to retry all of them. The implementation discussion[3] indicates
> that
> it corresponds to Class 40 Postgres errors[4] which are apparently
> intended to
> be retried, so perhaps we don't need to be more restrictive.
> Any opinions there? How is error 40002 different from the normal integrity
> constraint violations of class 23, for example?
> - The serialization problem will often come from a single competing
> transaction, so retrying 1-2 times should usually be sufficient, shouldn't
> it?
>
> Thanks for working on this, these issues certainly deserve some love!
>
>
> [1] http://www.postgresql.org/docs/9.0/static/transaction-iso.html
> [2] See the _long_ discussion on
> https://bugs.launchpad.net/openobject-server/+bug/746620
> [3] http://thread.gmane.org/gmane.comp.python.db.psycopg.devel/4384
> [4] http://www.postgresql.org/docs/current/static/errcodes-appendix.html
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openerp-expert-framework
> Post to     : openerp-expert-framework@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openerp-expert-framework
> More help   : https://help.launchpad.net/ListHelp
>



-- 

Gustavo Adrian Marino

Mobile: +54 911 5498 2515

Email: gamarino@xxxxxxxxx

Skype: gustavo.adrian.marino

References