← Back to team overview

dhis2-devs team mailing list archive

Re: DHIS2 - Indicator calculation over dimensions

 

Hi Jason,

Thanks for taking the time to read through my email.

I'll have a look at the different possibilities you proposed, and we'll be looking forward to any future upgrade of the calculation method (for now or later). I guess it's just that some sectors need more complex indicators than others (our project is in forest management).

Have a nice day,

Robin

From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
Sent: 11 September 2014 19:00
To: Robin Martens
Cc: Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
Subject: Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions

Hi Robin,

Your mail is dense and will need some digestion. :)

 You give a very good level of detail however of you problem in this mail and will be very useful as this type of functionality is attempted to be implemented.

To respond immediately to how you might be able to solve the issue, you should possibly consider using the WebAPI to extract your data, process it as you need, and then inject it back into DHIS2. The WebAPI is described in detail here<https://www.dhis2.org/doc/snapshot/en/user/html/ch32.html>. I have also written a chapter on the use of the R programming language with DHIS2, which is particularly well suited to do the type of custom calculations you are describing here. It is available here<https://www.dhis2.org/doc/snapshot/en/user/html/apc.html>. Of course, other language/methods may also be more suited to your situation, such as Python. Lastly, you can have a look at the DHIS2 Ad-hoc tool<http://bazaar.launchpad.net/~dhis2-devs-core/dhis2/trunk/files/head:/tools/dhis-adhoc/> which would allow interaction with the service layer of DHIS2. Another approach could be SQL which interacts directly with the database. I am sure there are many other means as well. So short answer is, right now there is no in-built way to achieve what you need I think, and it will take some coding on your side.

We have run into similar issues in the water and sanitation sector, where we need to work with the "latest reported data", which DHIS2 does not handle really. We pull out the data via the WebAPI, do the aggregation externally, and then inject everything back into the system to get the figures we need. It would be nice if the system did it automatically, but given the nature of the project, there are many feature requests and limited resources. Contributions of course are welcome.

The current aggregation engine handles the "easy" cases of sums and averages pretty well, but for more complex stuff, external routes may be the only solution for now.

We should certainly try and distill some of your ideas into a concrete blueprint.

Best regards,
Jason


On Thu, Sep 11, 2014 at 6:15 PM, Robin Martens <martens@xxxxxxx<mailto:martens@xxxxxxx>> wrote:
Hi Jason,

I appreciate your help as this is very important for our project, thanks.

Some of our indicators are indeed quite complex and might need some custom coding if not too complicated. However, can you give some basic steps on how to achieve this (and on how hard this is in terms of programming as we're not experts here)?

---

The rest of this mail is about the specific issue I'm having here, it's basically related to three things:


1.       The absence of "cross-product" calculations in DHIS2 (I think it's what you call compulsory pairs of data).

2.       The fact that when no data exists on a disaggregated level, the value is taken to be zero instead of the aggregated (for custom dimensions only I think).

3.       The average function only exists over the time dimension (as discussed by Lars previously this week).

A simple example:



Population

Conso pp

Total

District 1

10

2

20

District 2

5

3

15

Total

15

5

35


When calculating the total national consumption, DHIS2 will do: aggregated population (=15) times aggregated consumption per person (=5) makes 75, which is wrong. In reality, the two mistakes are:


1.       The calculation should happen on district level before aggregating to the national value (20 for district1 plus 15 for district2 makes 35, which is the correct answer). -> Cross product

2.       DHIS2 always sums over orgunits (to be corrected soon according to Lars so I won't go further in detail here)

The cross-product issue can actually be "solved" by a workaround: obliging the user to explicitly show the disaggregation level (i.e. the level at which the cross product happens) in the report tables. Interestingly enough, when calculating the total in a report without showing districts, DHIS2 will return 75, while when showing the districts 35.

Imagine now that the consumption has three products (a custom category), ABC. The table would look like this:



Population

Conso pp A

Conso pp B

Conso pp C

Total A

Total B

Total C

Total

District 1

10

2

1

1

20

10

10

40

District 2

5

3

1

0

15

5

0

20

Total

15

5

2

1

35

15

10

60


The same principle, but aggregated over the Product category and orgunit dimension gives the correct result of 60. This is how DHIS2 would calculate:


1.       When not showing the Product category in the table: total population (15) x total aggregated consumption (=5+2+1=8) is 120.

2.       When showing the Product category in the table: total population (0, it will not find a value and return zero) x consumption is 0 !!!

Indeed, the workaround does work for orgunits but not for custom dimensions when not all data (in this case the population) has the same custom dimensions.

I guess these are things that won't be solved quickly so I might need to do some coding myself. As a conclusion, to increase calculation power in DHIS2 I'd say:


1.       Use aggregated value when no disaggregated value exists (such as for population in the previous example).

2.       Aggregation operators (sum, average,...) should be defined per custom category and per data element. In other words, when creating a data element and adding categories, you have to add the operator for each category.

3.       Indicators should be available for re-use in other indicators. It enables you building complex indicators piece by piece and gives more flexibility on intermediate calculation (on disaggregated level).

I hope this is somewhat more clear.

Kind regards,

Robin

From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx<mailto:jason.p.pickering@xxxxxxxxx>]
Sent: 11 September 2014 16:30

To: Robin Martens
Cc: Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx<mailto:dhis2-users@xxxxxxxxxxxxxxxxxxx>; dhis2-devs
Subject: Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions

Hi Robin,
You lost me. Could you maybe give a somewhat simpler example by what you mean by an "intermediary calculation"?

I am not sure exactly what you are trying to acheive, but what I can say is that in certain cases, I have had to write my own calculation methods for certain indicators which are basically impossible to calculate with the current implementation in DHIS2. It works fine for simple sums, averages, and other types of statistical things (standard deviation, etc), but for instance, if you want to calculate other statistical properties (skewness, kurtosis) of a given set of values, there is not a way to do it directly with DHIS2. Also, certain indicators depend on component parts, and cannot be calculated the way DHIS2 does it, by first summing up the numerator and denominator and then dividing it, as opposed to calculating a non-weighted average of compulsory pairs of data. What I am getting at, is that you may have to write your own calculation methods, depending on how complex they are.

Regards,
Jason


On Thu, Sep 11, 2014 at 4:20 PM, Robin Martens <martens@xxxxxxx<mailto:martens@xxxxxxx>> wrote:
Hi Jason,

To pick up the point again, there's an additional question I've been looking at.

Even if disaggregated indicator reporting is burdensome (as you explain below), it is sometimes necessary for correct aggregated indicator calculations (the most obvious one the use of weighted averages) to have "intermediary calculations" according to dimensions in the indicator calculation, which can then be aggregated over the whole table to obtain the total aggregated indicator value. Even in these intermediary calculations, however, the data is not available for calculation, returning zero as a result.

The conclusion is that the current way of indicator calculation not only complicates (if not makes impossible in many cases) calculation of indicators per custom dimension, but also making impossible the correct calculation of indicators over period and orgunit dimension when any intermediary calculation over custom dimensions is necessary.

Can you confirm this?

If true, is it hard to modify the calculation method to simply pick the one-level-higher value of a data element whenever no disaggregated value exists? With existing I don't mean NULL or zero, but rather not defined (the dimension does not exist).

Robin

From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
Sent: 10 September 2014 17:55
To: Robin Martens
Cc: Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx<mailto:dhis2-users@xxxxxxxxxxxxxxxxxxx>; dhis2-devs
Subject: Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions

Hi Robin,
It has been a discussed, and certainly not a bug. See a related thread here (https://lists.launchpad.net/dhis2-devs/msg27571.html) for a similar discussion on validation rules. It is essentially the same as indicators. What you will have to do is to create seperate indicator for each and every combination which you need. It can be painful, but the only way really which I know at the moment.

Feel free to file a blueprint here. https://blueprints.launchpad.net/dhis2

Regards,
Jason


On Wed, Sep 10, 2014 at 5:37 PM, Robin Martens <martens@xxxxxxx<mailto:martens@xxxxxxx>> wrote:
Dear all,

I've been testing the indicator calculation algorithm and noticed something particular of which I'm not sure if it's a bug or a deliberate development choice.

Indicators are not explicitly defined per category such as data elements but the reporting tools allow a disaggregated indicator calculation, which is definitely very useful. In a specific example, I want to know how many people were vaccinated this year and I have 3 kinds of vaccinations: A, B, and C. I have two data elements: the total population and the national vaccination levels (in %), with a custom category "vaccination type" which can be A, B, or C.

My indicator would be "total population" x "national vaccination level (total)". That works fine when put in a pivot table.

However, when trying to disaggregate the indicator calculation by adding my custom category to the pivot table, I don't have any values anymore. It seems the reason is that the "total population" data element does not have the "vaccination type" category (which seems logical) and therefore isn't found by the calculation algorithm. As a result, my table is empty. It seems useful that the algorithm would take the aggregated value (for population) available in such cases.

Another example is over the period dimension: my population is a yearly value, so when calculating an indicator on a monthly basis, instead of taking the available yearly value, he takes zero.

So my question: is this a deliberate choice in the development, a bug, or an idea for a future system improvement?

Kind regards,

Robin



_______________________________________________
Mailing list: https://launchpad.net/~dhis2-devs
Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx<mailto:dhis2-devs@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~dhis2-devs
More help   : https://help.launchpad.net/ListHelp



--
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx<mailto:jason.p.pickering@xxxxxxxxx>
tel:+46764147049



--
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx<mailto:jason.p.pickering@xxxxxxxxx>
tel:+46764147049



--
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx<mailto:jason.p.pickering@xxxxxxxxx>
tel:+46764147049

Follow ups

References