← Back to team overview

c2c-oerpscenario team mailing list archive

Re: [Bug 724961] Re: [6.0.1] possible WS inconsistency under high load.

 

Hello xrg,

Sorry, to forget to mention that: no I was 100% that my code was single
threaded and executed in sequence (this was form MRI 1.9.2, not JRuby BTW).
I'll try to reproduce it and give details, but sorry tons of very urgent
things to do right now...
But I didn't dream, we spent a good hour hacking with that, even asked a
Renato to take a look and both of us where perplex with what we observed.


On Fri, Feb 25, 2011 at 11:18 AM, xrg <xrg@xxxxxxxxx> wrote:

> I took a quick look at server/bin/osv.py:184 execute(self, db, uid,...)
> where it all happens.
> This function is executed in a separate thread per http /connection/[2]
> that serves XML-RPC in fact.
> Early in this fn() there is a new cursor. This cursor provides SQL
> isolation to any other RPC requests that may be running in parallel. In
>  fact, the results of SQL calls will be as recent as the first row acces to
> that table/row.
>
> But, at the end of that function, the "return res" is clearly done after
> the cursor is committed and closed. Which means that in a single HTTP
> connection there can NOT[1] be calls that haven't written to the db.
> Same applies for RPC calls that are done in separate
> connections/threads, but are really serialized on the client side. (eg.
> we first receive the 200 OK from the server and then send the next
> request).
>
> Are you sure OOOR didn't try to send (or start sending) those XML-RPC
> requests in any kind of parallel threads?
>
>
> [1] unless we have really messed with Murphy's law, here, too.
> [2] I explicitly wrote that code to behave like that. Every connection is
> its own thread, but requests within a single persistent HTTP connection are
> served synchronously (limited by the nature of http protocol, in fact)
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/724961
>
> Title:
>  [6.0.1] possible WS inconsistency under high load.
>
> Status in OpenERP Server:
>  Incomplete
>
> Bug description:
>  Hello,
>
>  I'll mark that bug as incomplete as for now, I have only suspicion but
>  no solid proof. But as early sharing is better than nothing, here is
>  my test scenario that produced scary results:
>
>  on a v6.0.1 production server, I had to drop 250 orders from a Magento
> import of year 2010 (cause not paid actually, just for later business
> intelligence analysis).
>  As it's impossible to drop confirmed pickings, with a 5 lines OOOR script
> (I trashed away), I was doing using the XML/RPC API:
>
>  for each order name to drop:
>  search the order by the name. -> OK
>  find out their picking (by origin name) -> OK
>  write state = 'draft' in those pickings -> KO sometimes
>  unlink the picking -> OK if previous OK
>  write state = 'draft' in those orders -> KO sometimes
>  unlink the order -> OK if previous OK
>
>  The real strange thing is I was doing something very simple, equivalent
> to:
>  self.pool.get('stock.picking').write(cr, uid, [picking_id], {'state':
> 'draft'})
>  self.pool.get('sale.order').write(cr, uid, [order_id], {'state': 'draft'})
>  but using OOOR, so XML/RPC calls under the hood.
>  I checked, the XML/RPC call was really OK
>
>  What happened is that sometimes, the write {'state': 'draft'} wasn't
> actually performed in the database!!
>  I could then check reading the record using OOOR or directly in the ERP
> via GTK (+refresh of course), pickings or orders where still confirmed.
>
>  This was totally inconsistent. Running the exact same code again,
>  would go a little further and process some more orders.
>
>  Also, something that also proved my OOOR code was right is that by
>  just introducing a sleep(0.5 secs) before the write call, suddenly I
>  had no more error at all!
>
>  Also, by running the server in log-level=debug_rpc, I could check that the
> server really received the 'write' call properly.
>  It was even answering True to the call, but not writing it in the database
> as expected.
>
>  This might never have been in production before because GTK or
>  webclients manual manipulations will wait a bit before sending more
>  write calls.
>
>  Also, OOOR trunk RSpec test suite fails in some ways that I found
>  inconsistent with v6 (redoing the same operation that failed worked
>  later). On the contrary OOOR 1.4.2 had always no failure with OpenERP
>  V5. Unfortunately I had no time to investigate it further.
>
>
>  Yesterday I tried to made a loop that call write on the name partner field
> using XML/PRC from Python on a localhost and write a different value each
> time. Then I was doing a read to check the value was the one I wrote. It was
> working always both from Python or OOOR. So so far I wasn't able to
> reproduce the problem simply.
>  May be it's linked to the object/field I was writing, may be it's linked
> to the fact the host is remote. Or may be it's even an OOOR trunk bug though
> I stringly doubt about that as log-level=debug_rpc proved me the proper
> calls where received by the server.
>
>
>  This is definitely worth a double check.
>  Can anyone confirm this issue?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/openobject-server/+bug/724961/+subscribe
>

-- 
You received this bug notification because you are a member of C2C
OERPScenario, which is subscribed to the OpenERP Project Group.
https://bugs.launchpad.net/bugs/724961

Title:
  [6.0.1] possible WS inconsistency under high load.

Status in OpenERP Server:
  Incomplete

Bug description:
  Hello,

  I'll mark that bug as incomplete as for now, I have only suspicion but
  no solid proof. But as early sharing is better than nothing, here is
  my test scenario that produced scary results:

  on a v6.0.1 production server, I had to drop 250 orders from a Magento import of year 2010 (cause not paid actually, just for later business intelligence analysis).
  As it's impossible to drop confirmed pickings, with a 5 lines OOOR script (I trashed away), I was doing using the XML/RPC API:

  for each order name to drop:
  search the order by the name. -> OK
  find out their picking (by origin name) -> OK
  write state = 'draft' in those pickings -> KO sometimes
  unlink the picking -> OK if previous OK
  write state = 'draft' in those orders -> KO sometimes
  unlink the order -> OK if previous OK

  The real strange thing is I was doing something very simple, equivalent to:
  self.pool.get('stock.picking').write(cr, uid, [picking_id], {'state': 'draft'})
  self.pool.get('sale.order').write(cr, uid, [order_id], {'state': 'draft'})
  but using OOOR, so XML/RPC calls under the hood.
  I checked, the XML/RPC call was really OK

  What happened is that sometimes, the write {'state': 'draft'} wasn't actually performed in the database!!
  I could then check reading the record using OOOR or directly in the ERP via GTK (+refresh of course), pickings or orders where still confirmed.

  This was totally inconsistent. Running the exact same code again,
  would go a little further and process some more orders.

  Also, something that also proved my OOOR code was right is that by
  just introducing a sleep(0.5 secs) before the write call, suddenly I
  had no more error at all!

  Also, by running the server in log-level=debug_rpc, I could check that the server really received the 'write' call properly.
  It was even answering True to the call, but not writing it in the database as expected.

  This might never have been in production before because GTK or
  webclients manual manipulations will wait a bit before sending more
  write calls.

  Also, OOOR trunk RSpec test suite fails in some ways that I found
  inconsistent with v6 (redoing the same operation that failed worked
  later). On the contrary OOOR 1.4.2 had always no failure with OpenERP
  V5. Unfortunately I had no time to investigate it further.

  
  Yesterday I tried to made a loop that call write on the name partner field using XML/PRC from Python on a localhost and write a different value each time. Then I was doing a read to check the value was the one I wrote. It was working always both from Python or OOOR. So so far I wasn't able to reproduce the problem simply.
  May be it's linked to the object/field I was writing, may be it's linked to the fact the host is remote. Or may be it's even an OOOR trunk bug though I stringly doubt about that as log-level=debug_rpc proved me the proper calls where received by the server.

  
  This is definitely worth a double check.
  Can anyone confirm this issue?



References