dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #35416
Re: Reporting based on tracker data
Hi Olav,
I came across a similar case,but even worse with many more disaggregates
and and data elements . Short story is that's it was simply not feasible
with the aggregation query builder,but rather trivial with a script to pull
required data out of the database,aggregate it and then reimport it through
the API. Not surprisingly, I chose to use R,but am sure it could be done
with other languages. A map between the event data elements and aggregate
data elements was used to transform and aggregate from one domain to the
other. Robin brought up a similar request a while ago about this ,so looks
like something we should really have in the core as opposed to relying on
scripts.
Regards,
Jason
On Jan 29, 2015 1:36 PM, "Olav Poppe" <olav.poppe@xxxxxx> wrote:
> Hi devs,
> I’m having some problems with making reports/dashboards/output based on
> tracker data.
>
> We have made a «Cause of death» program, quite similar to the inpatient
> program on the demo instance and what is currently being used several
> places. Each case will have as a minimum age, sex and a cause of death
> (based on an ICD-10 based short list of about 120 diagnosis).
>
> Based on this data, we want to make reports that include as a minimum:
> - number of cases by gender => can be done in ER/EV
> - number of cases by diagnosis => can be done in ER/EV
> - number of cases by age *group* => can NOT be done in ER/EV (unless
> I’ve missed something)
> - number of cases by diagnosis *group* => can NOT be done in ER/EV
> (unless I’ve missed something)
> - combinations of the above, e.g. case by diagnosis and age group/sex =>
> can NOT be done in ER/EV (unless I’ve missed something)
>
> Since this output can quite easily be created based on aggregate data (it
> is similar to the OPD datasets), I decided to set up aggregation queries to
> make the reports we need. I therefore made a categorycombo with gender and
> age group, and one data element for each diagnosis. I then grouped the
> diagnosis into data element groups to be able make reports based on disease
> group. I tested this with a couple of queries, and it worked fine.
>
> The problem with this approach is that 118 diagnosis * 2 sexes * 22 age
> group = 4 956 aggregation queries, and this causes the aggregation to take
> a very long time. When I try to run manual aggregation for just one week
> with 4-500 cases, it takes about 10 minutes (4/8 core CPU, SSD, 3/6 GB
> memory for tomcat/postgres). Running the aggregation for the last 6 months
> would take a several hours. This *might* work for a cause of death
> program where the number of cases is relatively low, but to do similar
> analysis on an admission program (where I know some countries have 19000+
> cases some weeks) is just not possible.
>
> So I guess my question is whether there are any ideas on how this can be
> made to work. One obvious thing I see is that the aggregation only uses one
> CPU (as opposed to the analytics), could that be made more efficient? Or
> what would be even better was if ER/EV could support this type of analysis
> - the general requirement would be some sort of grouping functionality?
> That is something I think would be useful in many/most programs (e.g.
> grouping mothers by age group in ANC program)
>
> Olav
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>
>
Follow ups
References