← Back to team overview

dhis2-devs team mailing list archive

Re: Reporting based on tracker data

 

Hi Tran,

Is there anything you could do on this?

Briefly looking at DefaultCaseAggregationConditionService, for example you
have nested loops and inside these loops you are calling periodService to
fetch period from db. I don't see the call to periodService depending on
data generated in the loops - is it not possible to prefetch the periods
and use them inside the loops?

---
Thank you,
Abyot.

On Thu, Jan 29, 2015 at 2:58 PM, Olav Poppe <olav.poppe@xxxxxx> wrote:

> Hi,
> I am really hoping to avoid custom scripts for this. What we’re trying to
> do is to set up a reusable meta data «packages» (input and output) for
> recording this data, and having to make «hacks» outside of DHIS (that then
> needs to be maintained) to make it work does not look great.. Especially
> because all the functionality is *almost* there: 1) the aggregation
> queries works, but isn’t optimized, whilst 2) ER and EV still miss a few
> things for aggregated analysis (which is available in PT and DV).
>
> I’ve been experimenting a bit more with the aggregation queries, including
> deleting 90% of the cases (but keeping all the ≈ 5000 queries). That didn’t
> really speed up things, but looking at the CPU use, the java process was
> now using 100+% CPU whilst postgresql stayed around 40-50% - so most of the
> time seems to be spent preparing the queries rather than executing them.
> Also did a few stack traces (jstack), and about half of the time, it was
> doing something in HibernatePeriodStore.java:
> at org.hibernate.internal.CriteriaImpl.uniqueResult(CriteriaImpl.java:396)
> at
> org.hisp.dhis.period.hibernate.HibernatePeriodStore.getPeriod(HibernatePeriodStore.java:73)
> at
> org.hisp.dhis.period.hibernate.HibernatePeriodStore.reloadPeriod(HibernatePeriodStore.java:163)
> at
> org.hisp.dhis.period.hibernate.HibernatePeriodStore.reloadForceAddPeriod(HibernatePeriodStore.java:171)
> at
> org.hisp.dhis.period.DefaultPeriodService.reloadPeriod(DefaultPeriodService.java:278)
>
> So perhaps (I’m only guessing here of course) some small optimizations is
> all it takes to make the aggregation feasible.
>
> Olav
>
>
>
> 29. jan. 2015 kl. 13.16 skrev Jason Pickering <jason.p.pickering@xxxxxxxxx
> >:
>
> Hi Olav,
> I came across a similar case,but even worse with many more disaggregates
> and and data elements . Short story is that's it was simply not feasible
> with the aggregation query builder,but rather trivial with a script to pull
> required data out of the database,aggregate it and then reimport it through
> the API. Not surprisingly, I chose to use R,but am sure it could be done
> with other languages. A map between the event data elements and aggregate
> data elements was used to transform and aggregate from one domain to the
> other. Robin brought up a similar request a while ago about this ,so looks
> like something we should really have in the core as opposed to relying on
> scripts.
>
> Regards,
> Jason
> On Jan 29, 2015 1:36 PM, "Olav Poppe" <olav.poppe@xxxxxx> wrote:
>
>> Hi devs,
>> I’m having some problems with making reports/dashboards/output based on
>> tracker data.
>>
>> We have made a «Cause of death» program, quite similar to the inpatient
>> program on the demo instance and what is currently being used several
>> places. Each case will have as a minimum age, sex and a cause of death
>> (based on an ICD-10 based short list of about 120 diagnosis).
>>
>> Based on this data, we want to make reports that include as a minimum:
>> - number of cases by gender => can be done in ER/EV
>> - number of cases by diagnosis => can be done in ER/EV
>> - number of cases by age *group*  => can NOT be done in ER/EV (unless
>> I’ve missed something)
>> - number of cases by diagnosis *group*  => can NOT be done in ER/EV
>> (unless I’ve missed something)
>> - combinations of the above, e.g. case by diagnosis and age group/sex =>
>> can NOT be done in ER/EV (unless I’ve missed something)
>>
>> Since this output can quite easily be created based on aggregate data (it
>> is similar to the OPD datasets), I decided to set up aggregation queries to
>> make the reports we need. I therefore made a categorycombo with gender and
>> age group, and one data element for each diagnosis. I then grouped the
>> diagnosis into data element groups to be able make reports based on disease
>> group. I tested this with a couple of queries, and it worked fine.
>>
>> The problem with this approach is that 118 diagnosis * 2 sexes * 22 age
>> group = 4 956 aggregation queries, and this causes the aggregation to take
>> a very long time. When I try to run manual aggregation for just one week
>> with 4-500 cases, it takes about 10 minutes (4/8 core CPU, SSD, 3/6 GB
>> memory for tomcat/postgres). Running the aggregation for the last 6 months
>> would take a several hours. This *might* work for a cause of death
>> program where the number of cases is relatively low, but to do similar
>> analysis on an admission program (where I know some countries have 19000+
>> cases some weeks) is just not possible.
>>
>> So I guess my question is whether there are any ideas on how this can be
>> made to work. One obvious thing I see is that the aggregation only uses one
>> CPU (as opposed to the analytics), could that be made more efficient? Or
>> what would be even better was if ER/EV could support this type of analysis
>> - the general requirement would be some sort of grouping functionality?
>> That is something I think would be useful in many/most programs (e.g.
>> grouping mothers by age group in ANC program)
>>
>> Olav
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References