c2c-oerpscenario team mailing list archive
-
c2c-oerpscenario team
-
Mailing list archive
-
Message #17603
[Bug 724961] Re: [6.0.1] possible WS inconsistency under high load.
I took a quick look at server/bin/osv.py:184 execute(self, db, uid,...) where it all happens.
This function is executed in a separate thread per http /connection/[2] that serves XML-RPC in fact.
Early in this fn() there is a new cursor. This cursor provides SQL isolation to any other RPC requests that may be running in parallel. In fact, the results of SQL calls will be as recent as the first row acces to that table/row.
But, at the end of that function, the "return res" is clearly done after
the cursor is committed and closed. Which means that in a single HTTP
connection there can NOT[1] be calls that haven't written to the db.
Same applies for RPC calls that are done in separate
connections/threads, but are really serialized on the client side. (eg.
we first receive the 200 OK from the server and then send the next
request).
Are you sure OOOR didn't try to send (or start sending) those XML-RPC
requests in any kind of parallel threads?
[1] unless we have really messed with Murphy's law, here, too.
[2] I explicitly wrote that code to behave like that. Every connection is its own thread, but requests within a single persistent HTTP connection are served synchronously (limited by the nature of http protocol, in fact)
--
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/724961
Title:
[6.0.1] possible WS inconsistency under high load.
Status in OpenERP Server:
Incomplete
Bug description:
Hello,
I'll mark that bug as incomplete as for now, I have only suspicion but
no solid proof. But as early sharing is better than nothing, here is
my test scenario that produced scary results:
on a v6.0.1 production server, I had to drop 250 orders from a Magento import of year 2010 (cause not paid actually, just for later business intelligence analysis).
As it's impossible to drop confirmed pickings, with a 5 lines OOOR script (I trashed away), I was doing using the XML/RPC API:
for each order name to drop:
search the order by the name. -> OK
find out their picking (by origin name) -> OK
write state = 'draft' in those pickings -> KO sometimes
unlink the picking -> OK if previous OK
write state = 'draft' in those orders -> KO sometimes
unlink the order -> OK if previous OK
The real strange thing is I was doing something very simple, equivalent to:
self.pool.get('stock.picking').write(cr, uid, [picking_id], {'state': 'draft'})
self.pool.get('sale.order').write(cr, uid, [order_id], {'state': 'draft'})
but using OOOR, so XML/RPC calls under the hood.
I checked, the XML/RPC call was really OK
What happened is that sometimes, the write {'state': 'draft'} wasn't actually performed in the database!!
I could then check reading the record using OOOR or directly in the ERP via GTK (+refresh of course), pickings or orders where still confirmed.
This was totally inconsistent. Running the exact same code again,
would go a little further and process some more orders.
Also, something that also proved my OOOR code was right is that by
just introducing a sleep(0.5 secs) before the write call, suddenly I
had no more error at all!
Also, by running the server in log-level=debug_rpc, I could check that the server really received the 'write' call properly.
It was even answering True to the call, but not writing it in the database as expected.
This might never have been in production before because GTK or
webclients manual manipulations will wait a bit before sending more
write calls.
Also, OOOR trunk RSpec test suite fails in some ways that I found
inconsistent with v6 (redoing the same operation that failed worked
later). On the contrary OOOR 1.4.2 had always no failure with OpenERP
V5. Unfortunately I had no time to investigate it further.
Yesterday I tried to made a loop that call write on the name partner field using XML/PRC from Python on a localhost and write a different value each time. Then I was doing a read to check the value was the one I wrote. It was working always both from Python or OOOR. So so far I wasn't able to reproduce the problem simply.
May be it's linked to the object/field I was writing, may be it's linked to the fact the host is remote. Or may be it's even an OOOR trunk bug though I stringly doubt about that as log-level=debug_rpc proved me the proper calls where received by the server.
This is definitely worth a double check.
Can anyone confirm this issue?
Follow ups
References