dhis2-users team mailing list archive
-
dhis2-users team
-
Mailing list archive
-
Message #05341
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
Hi Robin,
I think that is the real issue, namely that you are applying DHIS2 in a
domain which is slightly different than it's typical domain, namely health.
I have been involved in some other projects with DHIS2 on the fringes of
what it can do out of the box in food security, water and sanitation, even
using it for recording golf handicap scores. What I have seen in each of
these domains is that there are some challenges with the way that the data
is aggregated. Lots of things work out of the box, like data collection,
user management and security, etc. But sometimes, the analysis needs to be
done externally through other means. Of course, it would be great if DHIS2
could do all of this for all domains, but since its primary focus is on
collection and management of health data, that is where things work most
often (although there are some challenges there as well, particular on data
which needs to be averaged or handled different in time or across orgunits,
such as ART current count). Contributions from the community are of course
welcome! :)
Regards,
Jason
On Fri, Sep 12, 2014 at 8:55 AM, Robin Martens <martens@xxxxxxx> wrote:
> Hi Jason,
>
>
>
> Thanks for taking the time to read through my email.
>
>
>
> I'll have a look at the different possibilities you proposed, and we'll be
> looking forward to any future upgrade of the calculation method (for now or
> later). I guess it's just that some sectors need more complex indicators
> than others (our project is in forest management).
>
>
>
> Have a nice day,
>
>
>
> Robin
>
>
>
> *From:* Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
> *Sent:* 11 September 2014 19:00
>
> *To:* Robin Martens
> *Cc:* Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
> *Subject:* Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
>
>
>
> Hi Robin,
>
>
>
> Your mail is dense and will need some digestion. :)
>
>
>
> You give a very good level of detail however of you problem in this mail
> and will be very useful as this type of functionality is attempted to be
> implemented.
>
>
>
> To respond immediately to how you might be able to solve the issue, you
> should possibly consider using the WebAPI to extract your data, process it
> as you need, and then inject it back into DHIS2. The WebAPI is described in
> detail here <https://www.dhis2.org/doc/snapshot/en/user/html/ch32.html>.
> I have also written a chapter on the use of the R programming language with
> DHIS2, which is particularly well suited to do the type of custom
> calculations you are describing here. It is available here
> <https://www.dhis2.org/doc/snapshot/en/user/html/apc.html>. Of course,
> other language/methods may also be more suited to your situation, such as
> Python. Lastly, you can have a look at the DHIS2 Ad-hoc tool
> <http://bazaar.launchpad.net/~dhis2-devs-core/dhis2/trunk/files/head:/tools/dhis-adhoc/> which
> would allow interaction with the service layer of DHIS2. Another approach
> could be SQL which interacts directly with the database. I am sure there
> are many other means as well. So short answer is, right now there is no
> in-built way to achieve what you need I think, and it will take some coding
> on your side.
>
>
>
> We have run into similar issues in the water and sanitation sector, where
> we need to work with the "latest reported data", which DHIS2 does not
> handle really. We pull out the data via the WebAPI, do the aggregation
> externally, and then inject everything back into the system to get the
> figures we need. It would be nice if the system did it automatically, but
> given the nature of the project, there are many feature requests and
> limited resources. Contributions of course are welcome.
>
>
>
> The current aggregation engine handles the "easy" cases of sums and
> averages pretty well, but for more complex stuff, external routes may be
> the only solution for now.
>
>
>
> We should certainly try and distill some of your ideas into a concrete
> blueprint.
>
>
>
> Best regards,
>
> Jason
>
>
>
>
>
> On Thu, Sep 11, 2014 at 6:15 PM, Robin Martens <martens@xxxxxxx> wrote:
>
> Hi Jason,
>
>
>
> I appreciate your help as this is very important for our project, thanks.
>
>
>
> Some of our indicators are indeed quite complex and might need some custom
> coding if not too complicated. However, can you give some basic steps on
> how to achieve this (and on how hard this is in terms of programming as
> we're not experts here)?
>
>
>
> ---
>
>
>
> The rest of this mail is about the specific issue I'm having here, it's
> basically related to three things:
>
>
>
> 1. The absence of "cross-product" calculations in DHIS2 (I think
> it's what you call compulsory pairs of data).
>
> 2. The fact that when no data exists on a disaggregated level, the
> value is taken to be zero instead of the aggregated (for custom dimensions
> only I think).
>
> 3. The average function only exists over the time dimension (as
> discussed by Lars previously this week).
>
>
>
> A simple example:
>
>
>
>
>
> Population
>
> Conso pp
>
> Total
>
> District 1
>
> 10
>
> 2
>
> 20
>
> District 2
>
> 5
>
> 3
>
> 15
>
> Total
>
> 15
>
> 5
>
> 35
>
>
>
> When calculating the total national consumption, DHIS2 will do: aggregated
> population (=15) times aggregated consumption per person (=5) makes 75,
> which is wrong. In reality, the two mistakes are:
>
>
>
> 1. The calculation should happen on district level before
> aggregating to the national value (20 for district1 plus 15 for district2
> makes 35, which is the correct answer). -> Cross product
>
> 2. DHIS2 always sums over orgunits (to be corrected soon according
> to Lars so I won't go further in detail here)
>
>
>
> The cross-product issue can actually be "solved" by a workaround: obliging
> the user to explicitly show the disaggregation level (i.e. the level at
> which the cross product happens) in the report tables. Interestingly
> enough, when calculating the total in a report without showing districts,
> DHIS2 will return 75, while when showing the districts 35.
>
>
>
> Imagine now that the consumption has three products (a custom category),
> ABC. The table would look like this:
>
>
>
>
>
> Population
>
> Conso pp A
>
> Conso pp B
>
> Conso pp C
>
> Total A
>
> Total B
>
> Total C
>
> *Total*
>
> District 1
>
> 10
>
> 2
>
> 1
>
> 1
>
> 20
>
> 10
>
> 10
>
> *40*
>
> District 2
>
> 5
>
> 3
>
> 1
>
> 0
>
> 15
>
> 5
>
> 0
>
> *20*
>
> Total
>
> 15
>
> 5
>
> 2
>
> 1
>
> 35
>
> 15
>
> 10
>
> 60
>
>
>
> The same principle, but aggregated over the Product category and orgunit
> dimension gives the correct result of 60. This is how DHIS2 would calculate:
>
>
>
> 1. When not showing the Product category in the table: total
> population (15) x total aggregated consumption (=5+2+1=8) is 120.
>
> 2. When showing the Product category in the table: total population
> (0, it will not find a value and return zero) x consumption is 0 !!!
>
>
>
> Indeed, the workaround does work for orgunits but not for custom
> dimensions when not all data (in this case the population) has the same
> custom dimensions.
>
>
>
> I guess these are things that won't be solved quickly so I might need to
> do some coding myself. As a conclusion, to increase calculation power in
> DHIS2 I'd say:
>
>
>
> 1. Use aggregated value when no disaggregated value exists (such as
> for population in the previous example).
>
> 2. Aggregation operators (sum, average,...) should be defined per
> custom category and per data element. In other words, when creating a data
> element and adding categories, you have to add the operator for each
> category.
>
> 3. Indicators should be available for re-use in other indicators.
> It enables you building complex indicators piece by piece and gives more
> flexibility on intermediate calculation (on disaggregated level).
>
>
>
> I hope this is somewhat more clear.
>
>
>
> Kind regards,
>
>
>
> Robin
>
>
>
> *From:* Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
> *Sent:* 11 September 2014 16:30
>
>
> *To:* Robin Martens
> *Cc:* Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
> *Subject:* Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
>
>
>
> Hi Robin,
>
> You lost me. Could you maybe give a somewhat simpler example by what you
> mean by an "intermediary calculation"?
>
>
>
> I am not sure exactly what you are trying to acheive, but what I can say
> is that in certain cases, I have had to write my own calculation methods
> for certain indicators which are basically impossible to calculate with the
> current implementation in DHIS2. It works fine for simple sums, averages,
> and other types of statistical things (standard deviation, etc), but for
> instance, if you want to calculate other statistical properties (skewness,
> kurtosis) of a given set of values, there is not a way to do it directly
> with DHIS2. Also, certain indicators depend on component parts, and cannot
> be calculated the way DHIS2 does it, by first summing up the numerator and
> denominator and then dividing it, as opposed to calculating a non-weighted
> average of compulsory pairs of data. What I am getting at, is that you may
> have to write your own calculation methods, depending on how complex they
> are.
>
>
>
> Regards,
>
> Jason
>
>
>
>
>
> On Thu, Sep 11, 2014 at 4:20 PM, Robin Martens <martens@xxxxxxx> wrote:
>
> Hi Jason,
>
>
>
> To pick up the point again, there's an additional question I've been
> looking at.
>
>
>
> Even if disaggregated indicator reporting is burdensome (as you explain
> below), it is sometimes necessary for correct aggregated indicator
> calculations (the most obvious one the use of weighted averages) to have
> "intermediary calculations" according to dimensions in the indicator
> calculation, which can then be aggregated over the whole table to obtain
> the total aggregated indicator value. Even in these intermediary
> calculations, however, the data is not available for calculation, returning
> zero as a result.
>
>
>
> The conclusion is that the current way of indicator calculation not only
> complicates (if not makes impossible in many cases) calculation of
> indicators per custom dimension, but also making impossible the correct
> calculation of indicators over period and orgunit dimension when any
> intermediary calculation over custom dimensions is necessary.
>
>
>
> Can you confirm this?
>
>
>
> If true, is it hard to modify the calculation method to simply pick the
> one-level-higher value of a data element whenever no disaggregated value
> exists? With existing I don't mean NULL or zero, but rather not defined
> (the dimension does not exist).
>
>
>
> Robin
>
>
>
> *From:* Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx
> <jason.p.pickering@xxxxxxxxx>]
> *Sent:* 10 September 2014 17:55
> *To:* Robin Martens
> *Cc:* Lars Helge Øverland; dhis2-users@xxxxxxxxxxxxxxxxxxx; dhis2-devs
> *Subject:* Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
>
>
>
> Hi Robin,
>
> It has been a discussed, and certainly not a bug. See a related thread
> here (https://lists.launchpad.net/dhis2-devs/msg27571.html) for a similar
> discussion on validation rules. It is essentially the same as indicators.
> What you will have to do is to create seperate indicator for each and every
> combination which you need. It can be painful, but the only way really
> which I know at the moment.
>
>
>
> Feel free to file a blueprint here. https://blueprints.launchpad.net/dhis2
>
>
>
> Regards,
>
> Jason
>
>
>
>
>
> On Wed, Sep 10, 2014 at 5:37 PM, Robin Martens <martens@xxxxxxx> wrote:
>
> Dear all,
>
>
>
> I've been testing the indicator calculation algorithm and noticed
> something particular of which I'm not sure if it's a bug or a deliberate
> development choice.
>
>
>
> Indicators are not explicitly defined per category such as data elements
> but the reporting tools allow a disaggregated indicator calculation, which
> is definitely very useful. In a specific example, I want to know how many
> people were vaccinated this year and I have 3 kinds of vaccinations: A, B,
> and C. I have two data elements: the total population and the national
> vaccination levels (in %), with a custom category "vaccination type" which
> can be A, B, or C.
>
>
>
> My indicator would be "total population" x "national vaccination level
> (total)". That works fine when put in a pivot table.
>
>
>
> However, when trying to disaggregate the indicator calculation by adding
> my custom category to the pivot table, I don't have any values anymore. It
> seems the reason is that the "total population" data element does not have
> the "vaccination type" category (which seems logical) and therefore isn't
> found by the calculation algorithm. As a result, my table is empty. It
> seems useful that the algorithm would take the aggregated value (for
> population) available in such cases.
>
>
>
> Another example is over the period dimension: my population is a yearly
> value, so when calculating an indicator on a monthly basis, instead of
> taking the available yearly value, he takes zero.
>
>
>
> So my question: is this a deliberate choice in the development, a bug, or
> an idea for a future system improvement?
>
>
>
> Kind regards,
>
>
>
> Robin
>
>
>
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>
>
>
>
>
> --
>
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+46764147049 <+46764147049>
>
>
>
>
>
> --
>
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+46764147049 <+46764147049>
>
>
>
>
>
> --
>
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+46764147049 <+46764147049>
>
--
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+46764147049
Follow ups
References
-
Data Entry Screen Fails to Load Data Set
From: Morina Matasi, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Morina Matasi, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Jason Pickering, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Morten Olav Hansen, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Morina Matasi, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Morten Olav Hansen, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Morina Matasi, 2014-09-10
-
Re: Data Entry Screen Fails to Load Data Set
From: Lars Helge Øverland, 2014-09-10
-
DHIS2 - Indicator calculation over dimensions
From: Robin Martens, 2014-09-10
-
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
From: Jason Pickering, 2014-09-10
-
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
From: Robin Martens, 2014-09-11
-
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
From: Jason Pickering, 2014-09-11
-
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
From: Robin Martens, 2014-09-11
-
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
From: Jason Pickering, 2014-09-11
-
Re: [Dhis2-devs] DHIS2 - Indicator calculation over dimensions
From: Robin Martens, 2014-09-12