openerp-expert-accounting team mailing list archive
-
openerp-expert-accounting team
-
Mailing list archive
-
Message #01119
Re: Terminatooor
Hello Cristophe, a few remarks:
- 15 minutes for 500 lines doesn't shock me given we do all through WS,
it's not too far from what we get.
- TerminatOOOR is extremely sensible to the network latency. Was that all
local? we use the kettle_connector module to run it locally on remote
servers
- There are several small optimizations you can do to speed it up: load
things by batch and not one by one. Use OpenERP batch read as much as you
can, avoid reloading objects after update/save if you don't need too (look a
the OOOR API, there are flags for that)
- C2C recently proposed a "find_by_batch" method (check OOOR github) that
automatically uses OpenERP batches but split it automatically in small bits
(like 50 by 50 records).
- Last version of TerminatOOOR using Redbridge:
https://github.com/type-exit/Ruby-Scripting-for-Kettle and JRuby 1.6 or
later is some 20% faster than previous version when the speed is not not
bound to the network latency.
- what does a "top" gives you on the server where both OpenERP and
TerminatOOOR run (if everything is not local there is a lot to be win). Once
things are optimized we typically see something like Java 50% and Python +
Postgres 50% that means that there is still some performance we can win.
- A way we think we can win Java performance is to let it run for long
period so the JVM "warms up" and does all its dynamic bytecode
optimizations. Currently kettle_connector restart the server always killing
all dynopts. A better will be to fire the transformation on Kettle while
using it through the "Carte" server. I'm working on that as a background
task. When everything is local, we can win some some 30% perf by letting the
dynopts take place this way. We also typically run some 30 seconds per run,
the time to launch the whole Kettle stack.
- There are things we want OpenERP SA to do/accept to speed it like:
- search_read (search and read with a single server round trip and
HTTP layer traversal)
- ideally it would be great if OpenERP take a look at great API's such
as Rails and add things like "include" keyword to perform joins
via WS in a
single request, this would also speed up OpenERP in general.
- In general, TerminatOOOR is more when ETL flexibility is required
rather than pure speed. We are of course limited by the WS part. And I'm not
always sure that OpenERP SA takes WS latency very seriously (2 days ago, one
of the top OpenERP SA manager told me they don't care about search_read;
Sighhh!). Not sure they even figure out the "include" things. I was thinking
to make a 'to_ar' for to_active record method on OOOR objects that would
turn them real Active Record objects with a real postgres connection and all
AR goodness if the Postgres credentials where given to OOOR. Of course you
would then bypass all the OpenERP logic and hit the DB directly. A lot more
risky, but also a lot faster when speed is what you need (currently anyway
you have no choice but hit the DB directly if TerminatOOOR is not fast
enough for you).
- As you can see, TerminatOOOR 1 was based on JSR 223 and hence first
versions where Jython compatibles (I shown this to Pentaho but looks like
they didn't care, that's why I specialized it twoard JRuby and
type-exit.org went even further porting it to Redbridge). Using Jython
and 6.1 OpenERP clean up, it would in theory be possible to import OpenERP
classes right into Kettle ETL, meaning speed as if you where coding into
OpenERP, no WS latency/overhead. Still, you would be back on ugly OpenERP
API and loose OOOR API goodness. Utimately, you would have speed as if you
were coding and OpenERP module but it would just be as ugly and as hard to
master (you know the write (6,0) mysterious crap). The only thing you would
win would be the Kettle connectivity. I'm no sure we at Akretion will insist
in that Jython direction, as we think that one of the benefit of
TerminatOOOR is also mostly the easy for an end-user to load and extract
data to/form OpenERP.
Hope this helps.
--
Raphaël Valyi
Founder and consultant
+55 21 3010 9965
www.akretion.com
On Wed, Apr 27, 2011 at 5:17 AM, Christophe Terrier <ct@xxxxxxxx> wrote:
> Hello,
>
>
>
> I send you this mail to expose you a problem about the TerminatOOOR plugin
> for Pentaho Data Integration (Kettle).
>
>
>
> Indeed, I found a slowness problem when retrieving data from OpenERP via
> TerminatOOOR.
>
> For example, a retrieving data script (with few joins) with approximately
> 500 rows
>
> takes a quarter of an hour to run.
>
>
>
> Is there a way to improve the processing times of TerminatOOOR ?
>
> Because in the present state, this is not applicable to a consistent
> database.
>
>
>
>
> Best regards,
>
> *C h r i s t o p h e T E R R I E R *| E N O V A
>
> Tél : 03 28 55 12 80
> Mob : 06 09 61 33 99
> E-mail : c.terrier@xxxxxxxx
> ________________________________________
> Expert en tranquillité informatique : www.enova.fr
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openerp-expert-accounting
> Post to : openerp-expert-accounting@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openerp-expert-accounting
> More help : https://help.launchpad.net/ListHelp
>
>
References