← Back to team overview

dhis2-devs team mailing list archive

Creation of CategoryOptionCombinations

 

Dear devs,

I am experiencing problems when handling category combinations. Our protoype
with 5 dimensions went through the process of generating
categoryOptionCombinations (~20.000 records) quite well. 7 dimensions (~400.000)
worked as well, although it took a very long time.

Now we defined the next datamodel with 10 dimensions (expecting ~5Mio
categoryOptionCombinations) and the process dies without further notice. Last
words in catalina.out:
* INFO  2016-06-07 13:29:33,783 Building object-bridge maps (preheatCache: true,
3 classes). (DefaultObjectBridge.java [http-bio-8180-exec-15])
* INFO  2016-06-07 13:29:36,779 Building object-bridge maps took 2.99 seconds.
(DefaultObjectBridge.java [http-bio-8180-exec-15])
* INFO  2016-06-07 13:29:36,896 'admin' update
org.hisp.dhis.dataelement.DataElementCategoryCombo, name: Membership, uid:
SCgLXYHqVzz (AuditLogUtil.java [http-bio-8180-exec-15])

Ten dimensions with not extraordinarily big option sets is actually not unusual
and rather slim for multi-dimensional data-models in data warehouses, so I'd
expect DHIS2 to be able to handle this easily. 

Could of course be a memory problem (tried up to 14g for tomcat on a 4-core
Ubuntu 14.04 server, DHIS 2.23) Before I'll start experimenting with other
parameters, I am hoping to get some hints on known limitations or workarounds
from you (not allowed: reducing the number of options or categories, sql-hacks
:-) ). Is there any info on whether optimizations on this process are being
planned in the kernel?

Some observations on the process:

* during generation (either when saving the categoryCombination or in the data
maintenance menu):
- long names - cOCs are generated with generated names that are getting
extremely long as they are mere concats of the involved categoryOptions. Could
there be an option to just use the codes as basis or to leave away the names
completely? Could be one reason for a memory problem and performance issues.
- long log entries - every single entry is logged in catalina.out with several
lines of text, causing catalina to become extremely big.
- during execution lots of Java-memory are being used and no DB-memory, which
looks to me as if all the logic is happening in the java machine. It might be
more usefull to transfer more logic into SQLs to the DB (e.g. use DB cross-joins
for combining options) as the DB will be more efficient.
- because of the log entries I assume that every single combination is being
persisted into the DB with a single SQL statement, causing millions of single
SQL requests. Prefer batch SQL instead of single record processing.

* during import/export of categoryOptionCombinations: 
- prefer batch SQL instead of single record processing
- huge log entries in catalina.out due to several lines of text per combination

I'd be very happy about comments.

Thanks in advance,

Uwe


Follow ups