dhis2-devs team mailing list archive

Thread
Date

Re: Reporting based on tracker data

To: Olav Poppe <olav.poppe@xxxxxx>
From: Jason Pickering <jason.p.pickering@xxxxxxxxx>
Date: Thu, 29 Jan 2015 13:16:47 +0100
Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <BD6AE2AB-5ED5-4990-AD6C-DD3BC605BC47@me.com>

Hi Olav,
I came across a similar case,but even worse with many more disaggregates
and and data elements . Short story is that's it was simply not feasible
with the aggregation query builder,but rather trivial with a script to pull
required data out of the database,aggregate it and then reimport it through
the API. Not surprisingly, I chose to use R,but am sure it could be done
with other languages. A map between the event data elements and aggregate
data elements was used to transform and aggregate from one domain to the
other. Robin brought up a similar request a while ago about this ,so looks
like something we should really have in the core as opposed to relying on
scripts.

Regards,
Jason
On Jan 29, 2015 1:36 PM, "Olav Poppe" <olav.poppe@xxxxxx> wrote:

> Hi devs,
> I’m having some problems with making reports/dashboards/output based on
> tracker data.
>
> We have made a «Cause of death» program, quite similar to the inpatient
> program on the demo instance and what is currently being used several
> places. Each case will have as a minimum age, sex and a cause of death
> (based on an ICD-10 based short list of about 120 diagnosis).
>
> Based on this data, we want to make reports that include as a minimum:
> - number of cases by gender => can be done in ER/EV
> - number of cases by diagnosis => can be done in ER/EV
> - number of cases by age *group*  => can NOT be done in ER/EV (unless
> I’ve missed something)
> - number of cases by diagnosis *group*  => can NOT be done in ER/EV
> (unless I’ve missed something)
> - combinations of the above, e.g. case by diagnosis and age group/sex =>
> can NOT be done in ER/EV (unless I’ve missed something)
>
> Since this output can quite easily be created based on aggregate data (it
> is similar to the OPD datasets), I decided to set up aggregation queries to
> make the reports we need. I therefore made a categorycombo with gender and
> age group, and one data element for each diagnosis. I then grouped the
> diagnosis into data element groups to be able make reports based on disease
> group. I tested this with a couple of queries, and it worked fine.
>
> The problem with this approach is that 118 diagnosis * 2 sexes * 22 age
> group = 4 956 aggregation queries, and this causes the aggregation to take
> a very long time. When I try to run manual aggregation for just one week
> with 4-500 cases, it takes about 10 minutes (4/8 core CPU, SSD, 3/6 GB
> memory for tomcat/postgres). Running the aggregation for the last 6 months
> would take a several hours. This *might* work for a cause of death
> program where the number of cases is relatively low, but to do similar
> analysis on an admission program (where I know some countries have 19000+
> cases some weeks) is just not possible.
>
> So I guess my question is whether there are any ideas on how this can be
> made to work. One obvious thing I see is that the aggregation only uses one
> CPU (as opposed to the analytics), could that be made more efficient? Or
> what would be even better was if ER/EV could support this type of analysis
> - the general requirement would be some sort of grouping functionality?
> That is something I think would be useful in many/most programs (e.g.
> grouping mothers by age group in ANC program)
>
> Olav
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

Re: Reporting based on tracker data
From: Olav Poppe, 2015-01-29

References

Reporting based on tracker data
From: Olav Poppe, 2015-01-29