← Back to team overview

dhis2-devs team mailing list archive

Re: DHIS2 - Indicator calculation over dimensions

 

Hi Robin,

Your mail is dense and will need some digestion. :)

 You give a very good level of detail however of you problem in this mail
and will be very useful as this type of functionality is attempted to be
implemented.

To respond immediately to how you might be able to solve the issue, you
should possibly consider using the WebAPI to extract your data, process it
as you need, and then inject it back into DHIS2. The WebAPI is described in
detail here <https://www.dhis2.org/doc/snapshot/en/user/html/ch32.html>. I
have also written a chapter on the use of the R programming language with
DHIS2, which is particularly well suited to do the type of custom
calculations you are describing here. It is available here
<https://www.dhis2.org/doc/snapshot/en/user/html/apc.html>. Of course,
other language/methods may also be more suited to your situation, such as
Python. Lastly, you can have a look at the DHIS2 Ad-hoc tool
<http://bazaar.launchpad.net/~dhis2-devs-core/dhis2/trunk/files/head:/tools/dhis-adhoc/>
which
would allow interaction with the service layer of DHIS2. Another approach
could be SQL which interacts directly with the database. I am sure there
are many other means as well. So short answer is, right now there is no
in-built way to achieve what you need I think, and it will take some coding
on your side.

We have run into similar issues in the water and sanitation sector, where
we need to work with the "latest reported data", which DHIS2 does not
handle really. We pull out the data via the WebAPI, do the aggregation
externally, and then inject everything back into the system to get the
figures we need. It would be nice if the system did it automatically, but
given the nature of the project, there are many feature requests and
limited resources. Contributions of course are welcome.

The current aggregation engine handles the "easy" cases of sums and
averages pretty well, but for more complex stuff, external routes may be
the only solution for now.

We should certainly try and distill some of your ideas into a concrete
blueprint.

Best regards,
Jason


On Thu, Sep 11, 2014 at 6:15 PM, Robin Martens <martens@xxxxxxx> wrote:

>  Hi Jason,
>
>
>
> I appreciate your help as this is very important for our project, thanks.
>
>
>
> Some of our indicators are indeed quite complex and might need some custom
> coding if not too complicated. However, can you give some basic steps on
> how to achieve this (and on how hard this is in terms of programming as
> we're not experts here)?
>
>
>
> ---
>
>
>
> The rest of this mail is about the specific issue I'm having here, it's
> basically related to three things:
>
>
>
> 1.       The absence of "cross-product" calculations in DHIS2 (I think
> it's what you call compulsory pairs of data).
>
> 2.       The fact that when no data exists on a disaggregated level, the
> value is taken to be zero instead of the aggregated (for custom dimensions
> only I think).
>
> 3.       The average function only exists over the time dimension (as
> discussed by Lars previously this week).
>
>
>
> A simple example:
>
>
>
>
>
> Population
>
> Conso pp
>
> Total
>
> District 1
>
> 10
>
> 2
>
> 20
>
> District 2
>
> 5
>
> 3
>
> 15
>
> Total
>
> 15
>
> 5
>
> 35
>
>
>
> When calculating the total national consumption, DHIS2 will do: aggregated
> population (=15) times aggregated consumption per person (=5) makes 75,
> which is wrong. In reality, the two mistakes are:
>
>
>
> 1.       The calculation should happen on district level before
> aggregating to the national value (20 for district1 plus 15 for district2
> makes 35, which is the correct answer). -> Cross product
>
> 2.       DHIS2 always sums over orgunits (to be corrected soon according
> to Lars so I won't go further in detail here)
>
>
>
> The cross-product issue can actually be "solved" by a workaround: obliging
> the user to explicitly show the disaggregation level (i.e. the level at
> which the cross product happens) in the report tables. Interestingly
> enough, when calculating the total in a report without showing districts,
> DHIS2 will return 75, while when showing the districts 35.
>
>
>
> Imagine now that the consumption has three products (a custom category),
> ABC. The table would look like this:
>
>
>
>
>
> Population
>
> Conso pp A
>
> Conso pp B
>
> Conso pp C
>
> Total A
>
> Total B
>
> Total C
>
> *Total*
>
> District 1
>
> 10
>
> 2
>
> 1
>
> 1
>
> 20
>
> 10
>
> 10
>
> *40*
>
> District 2
>
> 5
>
> 3
>
> 1
>
> 0
>
> 15
>
> 5
>
> 0
>
> *20*
>
> Total
>
> 15
>
> 5
>
> 2
>
> 1
>
> 35
>
> 15
>
> 10
>
> 60
>
>
>
> The same principle, but aggregated over the Product category and orgunit
> dimension gives the correct result of 60. This is how DHIS2 would calculate:
>
>
>
> 1.       When not showing the Product category in the table: total
> population (15) x total aggregated consumption (=5+2+1=8) is 120.
>
> 2.       When showing the Product category in the table: total population
> (0, it will not find a value and return zero) x consumption is 0 !!!
>
>
>
> Indeed, the workaround does work for orgunits but not for custom
> dimensions when not all data (in this case the population) has the same
> custom dimensions.
>
>
>
> I guess these are things that won't be solved quickly so I might need to
> do some coding myself. As a conclusion, to increase calculation power in
> DHIS2 I'd say:
>
>
>
> 1.       Use aggregated value when no disaggregated value exists (such as
> for population in the previous example).
>
> 2.       Aggregation operators (sum, average,...) should be defined per
> custom category and per data element. In other words, when creating a data
> element and adding categories, you have to add the operator for each
> category.
>
> 3.       Indicators should be available for re-use in other indicators.
> It enables you building complex indicators piece by piece and gives more
> flexibility on intermediate calculation (on disaggregated level).
>
>
>
> I hope this is somewhat more clear.
>
>
>
> Kind regards,
>
>
>
> Robin
>
>
>
> *From:* Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
> *Sent:* 11 September 2014 16:30
>
> *To:* Robin Martens
> *Cc:* Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
> *Subject:* Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
>
>
>
> Hi Robin,
>
> You lost me. Could you maybe give a somewhat simpler example by what you
> mean by an "intermediary calculation"?
>
>
>
> I am not sure exactly what you are trying to acheive, but what I can say
> is that in certain cases, I have had to write my own calculation methods
> for certain indicators which are basically impossible to calculate with the
> current implementation in DHIS2. It works fine for simple sums, averages,
> and other types of statistical things (standard deviation, etc), but for
> instance, if you want to calculate other statistical properties (skewness,
> kurtosis) of a given set of values, there is not a way to do it directly
> with DHIS2. Also, certain indicators depend on component parts, and cannot
> be calculated the way DHIS2 does it, by first summing up the numerator and
> denominator and then dividing it, as opposed to calculating a non-weighted
> average of compulsory pairs of data. What I am getting at, is that you may
> have to write your own calculation methods, depending on how complex they
> are.
>
>
>
> Regards,
>
> Jason
>
>
>
>
>
> On Thu, Sep 11, 2014 at 4:20 PM, Robin Martens <martens@xxxxxxx> wrote:
>
> Hi Jason,
>
>
>
> To pick up the point again, there's an additional question I've been
> looking at.
>
>
>
> Even if disaggregated indicator reporting is burdensome (as you explain
> below), it is sometimes necessary for correct aggregated indicator
> calculations (the most obvious one the use of weighted averages) to have
> "intermediary calculations" according to dimensions in the indicator
> calculation, which can then be aggregated over the whole table to obtain
> the total aggregated indicator value. Even in these intermediary
> calculations, however, the data is not available for calculation, returning
> zero as a result.
>
>
>
> The conclusion is that the current way of indicator calculation not only
> complicates (if not makes impossible in many cases) calculation of
> indicators per custom dimension, but also making impossible the correct
> calculation of indicators over period and orgunit dimension when any
> intermediary calculation over custom dimensions is necessary.
>
>
>
> Can you confirm this?
>
>
>
> If true, is it hard to modify the calculation method to simply pick the
> one-level-higher value of a data element whenever no disaggregated value
> exists? With existing I don't mean NULL or zero, but rather not defined
> (the dimension does not exist).
>
>
>
> Robin
>
>
>
> *From:* Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx
> <jason.p.pickering@xxxxxxxxx>]
> *Sent:* 10 September 2014 17:55
> *To:* Robin Martens
> *Cc:* Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
> *Subject:* Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
>
>
>
> Hi Robin,
>
> It has been a discussed, and certainly not a bug. See a related thread
> here (https://lists.launchpad.net/dhis2-devs/msg27571.html) for a similar
> discussion on validation rules. It is essentially the same as indicators.
> What you will have to do is to create seperate indicator for each and every
> combination which you need. It can be painful, but the only way really
> which I know at the moment.
>
>
>
> Feel free to file a blueprint here. https://blueprints.launchpad.net/dhis2
>
>
>
> Regards,
>
> Jason
>
>
>
>
>
> On Wed, Sep 10, 2014 at 5:37 PM, Robin Martens <martens@xxxxxxx> wrote:
>
> Dear all,
>
>
>
> I've been testing the indicator calculation algorithm and noticed
> something particular of which I'm not sure if it's a bug or a deliberate
> development choice.
>
>
>
> Indicators are not explicitly defined per category such as data elements
> but the reporting tools allow a disaggregated indicator calculation, which
> is definitely very useful. In a specific example, I want to know how many
> people were vaccinated this year and I have 3 kinds of vaccinations: A, B,
> and C. I have two data elements: the total population and the national
> vaccination levels (in %), with a custom category "vaccination type" which
> can be A, B, or C.
>
>
>
> My indicator would be "total population" x "national vaccination level
> (total)". That works fine when put in a pivot table.
>
>
>
> However, when trying to disaggregate the indicator calculation by adding
> my custom category to the pivot table, I don't have any values anymore. It
> seems the reason is that the "total population" data element does not have
> the "vaccination type" category (which seems logical) and therefore isn't
> found by the calculation algorithm. As a result, my table is empty. It
> seems useful that the algorithm would take the aggregated value (for
> population) available in such cases.
>
>
>
> Another example is over the period dimension: my population is a yearly
> value, so when calculating an indicator on a monthly basis, instead of
> taking the available yearly value, he takes zero.
>
>
>
> So my question: is this a deliberate choice in the development, a bug, or
> an idea for a future system improvement?
>
>
>
> Kind regards,
>
>
>
> Robin
>
>
>
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>
>
>
>
> --
>
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+46764147049 <+46764147049>
>
>
>
>
>
> --
>
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+46764147049 <+46764147049>
>



-- 
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+46764147049

Follow ups

References