openerp-expert-accounting team mailing list archive

Thread
Date
Re: Terminatooor

To: Christophe Terrier <ct@xxxxxxxx>
From: Raphael Valyi <rvalyi@xxxxxxxxx>
Date: Wed, 27 Apr 2011 10:57:02 -0300
Cc: openerp-expert-accounting@xxxxxxxxxxxxxxxxxxx, all@xxxxxxxxxxxx
In-reply-to: <6B4AC9FB4B38B24E97AF462BC09E24A82BDA94@svr-exchange.agrinova.local>
Hello Cristophe, a few remarks:


   - 15 minutes for 500 lines doesn't shock me given we do all through WS,
   it's not too far from what we get.
   - TerminatOOOR is extremely sensible to the network latency. Was that all
   local? we use the kettle_connector module to run it locally on remote
   servers
   - There are several small optimizations you can do to speed it up: load
   things by batch and not one by one. Use OpenERP batch read as much as you
   can, avoid reloading objects after update/save if you don't need too (look a
   the OOOR API, there are flags for that)
   - C2C recently proposed a "find_by_batch" method (check OOOR github) that
    automatically uses OpenERP batches but split it automatically in small bits
   (like 50 by 50 records).
   - Last version of TerminatOOOR using Redbridge:
   https://github.com/type-exit/Ruby-Scripting-for-Kettle and JRuby 1.6 or
   later is some 20% faster than previous version when the speed is not not
   bound to the network latency.
   - what does a "top" gives you on the server where both OpenERP and
   TerminatOOOR run (if everything is not local there is a lot to be win). Once
   things are optimized we typically see something like Java 50% and Python +
   Postgres 50% that means that there is still some performance we can win.
   - A way we think we can win Java performance is to let it run for long
   period so the JVM "warms up" and does all its dynamic bytecode
   optimizations. Currently kettle_connector restart the server always killing
   all dynopts. A better will be to fire the transformation on Kettle while
   using it through the "Carte" server. I'm working on that as a background
   task. When everything is local, we can win some some 30% perf by letting the
   dynopts take place this way. We also typically run some 30 seconds per run,
   the time to launch the whole Kettle stack.
   - There are things we want OpenERP SA to do/accept to speed it like:
      - search_read (search and read with a single server round trip and
      HTTP layer traversal)
      - ideally it would be great if OpenERP take a look at great API's such
      as Rails and add things like "include" keyword to perform joins
via WS in a
      single request, this would also speed up OpenERP in general.
   - In general, TerminatOOOR is more when ETL flexibility is required
   rather than pure speed. We are of course limited by the WS part. And I'm not
   always sure that OpenERP SA takes WS latency very seriously (2 days ago, one
   of the top OpenERP SA manager told me they don't care about search_read;
   Sighhh!). Not sure they even figure out the "include" things. I was thinking
   to make a 'to_ar' for to_active record method on OOOR objects that would
   turn them real Active Record objects with a real postgres connection and all
   AR goodness if the Postgres credentials where given to OOOR. Of course you
   would then bypass all the OpenERP logic and hit the DB directly. A lot more
   risky, but also a lot faster when speed is what you need (currently anyway
   you have no choice but hit the DB directly if TerminatOOOR is not fast
   enough for you).
   - As you can see, TerminatOOOR 1 was based on JSR 223 and hence first
   versions where Jython compatibles (I shown this to Pentaho but looks like
   they didn't care, that's why I specialized it twoard JRuby and
   type-exit.org went even further porting it to Redbridge). Using Jython
   and 6.1 OpenERP clean up, it would in theory be possible to import OpenERP
   classes right into Kettle ETL, meaning speed as if you where coding into
   OpenERP, no WS latency/overhead. Still, you would be back on ugly OpenERP
   API and loose OOOR API goodness. Utimately, you would have speed as if you
   were coding and OpenERP module but it would just be as ugly and as hard to
   master (you know the write (6,0) mysterious crap). The only thing you would
   win would be the Kettle connectivity. I'm no sure we at Akretion will insist
   in that Jython direction, as we think that one of the benefit of
   TerminatOOOR is also mostly the easy for an end-user to load and extract
   data to/form OpenERP.


Hope this helps.


-- 
Raphaël Valyi
Founder and consultant
+55 21 3010 9965
www.akretion.com


On Wed, Apr 27, 2011 at 5:17 AM, Christophe Terrier <ct@xxxxxxxx> wrote:

> Hello,
>
>
>
> I send you this mail to expose you a problem about the TerminatOOOR plugin
> for Pentaho Data Integration (Kettle).
>
>
>
> Indeed, I found a slowness problem when retrieving data from OpenERP via
> TerminatOOOR.
>
> For example, a retrieving data script (with few joins) with approximately
> 500 rows
>
> takes a quarter of an hour to run.
>
>
>
> Is there a way to improve the processing times of TerminatOOOR ?
>
> Because in the present state, this is not applicable to a consistent
> database.
>
>
>
>
> Best regards,
>
> *C h r i s t o p h e   T E R R I E R    *|     E N O V A
>
> Tél : 03 28 55 12 80
> Mob : 06 09 61 33 99
> E-mail : c.terrier@xxxxxxxx
> ________________________________________
> Expert en tranquillité informatique : www.enova.fr
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openerp-expert-accounting
> Post to     : openerp-expert-accounting@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openerp-expert-accounting
> More help   : https://help.launchpad.net/ListHelp
>
>
References

Terminatooor
From: Christophe Terrier, 2011-04-27