← Back to team overview

openerp-india team mailing list archive

[Bug 1029326] Re: report.account.general.ledger takes hours to print

 

Hello Fabrice,

Here are some details on the OpenERP report generation process, to help
you understand how it works and take appropriate measures for your
customers. Sorry for the lengthy answer, but this topic is complicated
and I think it is worth a proper explanation.

Report Generation Process
=========================
The time taken to produce a PDF report almost exclusively depends on the actual size of the PDF to produce, so the rule of thumb is: a report with only a few pages will be very quick to generate even with a very large database, while a large report will take a lot of time, even on a small database. For example the Partner Balance report will be produced in a few seconds even with 10k partners and 100k journal items because it will only be 10 pages long for example.

Diving further into the process, the time to produce most OpenERP reports is roughly divided up into 2 parts:
1. Rendering of the RML output based on the report RML template. This step is managed at 50% by OpenERP code, and at 50% by the reportlab.platypus library. The OpenERP part includes all the queries to the database for retrieving the actual data. For large documents the database queries represent a negligible fraction of this step (e.g. less than 1 minute for a 300 pages report).
2. Rendering the RML from step 1 into an actual PDF file. This final step is 100% managed by the reportlab.platypus library, and is fully CPU-bound - it does not use the database at all.

The actual time taken to perform these 2 steps is unfortunately not directly proportional to the size of the report. Step 1 will usually be directly proportional to the number of full pages of output, but Step 2 is quadratic, and will become much slower for bigger reports.
Here is a spreadsheet estimating the time it takes to print a General Ledger (landscape) report on an Intel Xeon 2.8 GHz server with 16Gb RAM  on a recent OpenERP 6.0 (server r.3632 - addons  r.5302) with reportlab 2.5 (older versions were slower):
https://docs.google.com/spreadsheet/ccc?key=0AuuV1YRJafDgdEpmdjVYa3JtR1pLMFhKSnBrZ1J3c0E

As wee see the process indeed becomes much slower when the number of
output pages increases to a few hundreds, and an obvious workaround for
very large reports is to split them into smaller parts, e.g. by printing
only one period at a time to stay below a few hundred pages of output.
The time it takes to print 12 per-period ledgers will be a *lot less*
than printing the whole fiscal year at once, due to the quadratic time
of the printing process. Also watch out for the RAM usage of very large
reports - again this is avoided if you print smaller parts one by one.

Of course we would like to speed up the production of large reports, but
we have very limited options in practice. The second step and a part of
the first step are completely managed by the reportlab.platypus library,
on which we have very little control. We have already performed a lot of
benchmarking and we optimized the RML output produced by OpenERP to
minimize the computations required during Step2 (e.g. by forcing the
height of all table rows in our reports we make reportlab skip a
prohibitive height computation operation during the PDF layout). There
does not seem to be more headway to make in this area, considering that
optimizing further the SQL queries and OpenERP part will have negligible
impact on the complete report generation time.

One course of action for speeding up large reports would be to switch to
a different technology than RML, such as the webkit reporting engine
contributed by Camptocamp and built-in as of OpenERP 6.1, which might
not have the quadratic complexity issue for large outputs. A change of
reporting engine is therefore considered for future versions, but it is
out of question for past stable releases, and of course requires
rewriting all reports. Camptocamp is working on rewriting the main
financial reports using the webkit engine, so there is a possible
solution there.

We understand that having very long generation times for big reports can be frustrating, but this annoyance is greatly mitigated by the fact that reports with thousands of pages do not need to be printed very often - usually only for legal purposes, on a quarterly/yearly basis, and can usually be split into smaller parts, like one period at a time. If the periods are then closed, you may even not need to print them more than once per year.
Every day use and normal business operations are always best conducted on the live database using the dynamic reporting and analysis views of OpenERP - using a thousands-page report would be quite impractical anyway.

We are also investigating the performance problem upstream in the
reportlab project, but there has not been any positive outcome so far.


Impact on other usage
=====================
Your bug report also indicates that you are concerned about the impact on the overall system performance during the report generation. Based on the description above you can see that there is very little database load to produce the report, most of the work is CPU-bound. This means a few things:
- a faster CPU will definitely print the report faster, but the generation itself is not multi-threaded, so multiple cores will not help to speed up the report generation
- for OpenERP servers that are not fully multi-process, there will be a big impact on overall responsiveness during the generation operation (due to CPU load). As of OpenERP 6.1 you can easily switch to a multi-process deployment mode[1] that will greatly reduce the impact on other users during report generation operations (1 core will be used by the report with the other cores fully available for serving other requests)
- in single-process deployments, setting up a dedicated OpenERP server on a different machine for printing large reports will be a very effective technique to reduce the overall impact (there is little database load caused by the report)
- printing those large reports outside business hours is a common solution for many OpenERP users,  e.g. you start the operation when you leave in the evening, and get the result in the morning

Timeouts
========
One other thing to watch for is the timeout setting of your OpenERP client. For the GTK client and the pre-6.1 web clients there is a timeout setting in the client configuration file that needs to be increased in order to successfully retrieve reports that take a few hours to generate.
For the GTK client you can change the timeout in the [client] section of the config file, located in $HOME/.openerprc. For the 6.0 web client, there is an openerp.server.timeout parameter in the configuration file (doc/openerp-web.cfg)

Conclusions
===========
Based on all of the above, we'll have to close this bug report as Won't Fix, because:
- it is not really blocking for normal OpenERP usage
- it only impacts a few reports that do not have to be printed very often
- the total printing time can be dramatically reduced by printing one period at a time
- even if printing the full report, the impact on other users can be greatly reduced using recommended deployment techniques
- there is not much more we can do to improve the situation without switching to a different reporting engine - something that cannot be done in stable version (and might be worth a separate bug report)

Rest assured that we have done everything possible to find a definitive solution to this speed issue, and will continue to pursue this goal in future OpenERP versions.
I hope you now understand better the situation and our position.

[1] http://www.slideshare.net/openobject/openerp-61-framework-changes

** Changed in: openobject-addons
       Status: New => Won't Fix

-- 
You received this bug notification because you are a member of OpenERP
Indian Team, which is subscribed to OpenERP Addons.
https://bugs.launchpad.net/bugs/1029326

Title:
  report.account.general.ledger takes hours to print

Status in OpenERP Addons (modules):
  Won't Fix

Bug description:
  Hi,

  We encounter a problem when trying to print out the
  report.account.general.ledger (or
  report.account.general.ledger_lanscape) report.

  Here are the server details:

  Processor : Intel(R) Xeon(R) CPU E31270 @ 3.40GHz
  Memory : 16Gb
  Swap : 16Gb

  OpenERP instance :

  account.move : over 13.000 and daily increase
  account.move.line : overs 68.000 and daily increase

  After more than one hour, the report.account.general.ledger (or report.account.general.ledger_landscape) does not print out. The OpenERP process takes 115% of CPU.
  And more important : the overall performances of OpenERP are impaired : the other users are freezed in their work!

To manage notifications about this bug go to:
https://bugs.launchpad.net/openobject-addons/+bug/1029326/+subscriptions


References