dhis2-users team mailing list archive
-
dhis2-users team
-
Mailing list archive
-
Message #01154
Re: Missing values in calculating indicators
Hi Jason and Lars,
Thank you for the thoughtful responses. I agree with you this is similar to the NULL issue, but I think it's helpful to separate them into sub-scenarios as you've done.
Does this summarize?
NULL issues
1. NAs when summing variables (indicators or variable dimensions to make the variable(total))
* Treat NAs as 0s: 1+2+3+NA=6
* Treat NAs as NA: 1+2+3+NA=NA
* Treat NAs as 0s unless all are NA: 0+NA+NA+NA=0 and NA+NA+NA+NA=NA
* Treat NAs as 0s sometimes and NAs sometimes: IFNULL(var1,0)+IFNULL(var2,NA)
2. Treatment of NAs in evaluating rules
* NAs are considered 0s in rules
* NAs always make rules fail
* NAs are ignored in rules
3. Treatment of NAs when calculating numerator/denominator indicators (also depends on settings in 1.)
* Drop numerator and denominator for an orgunit-timeperiod when calculating if either are NA
* Drop numerator and denominator for an orgunit-timeperiod when calculating only if denominator is NA
* Drop numerator and denominator for an orgunit-timeperiod when calculating only if both are NA
I don't pretend to understand all the implications of this on the backend, but it certainly makes sense to me as Jason said that wrong indicators are as useful as no indicators (and even detrimental if misleading).
I think the current DHIS behavior (1a, 2b, 3c) is justifiable (except for 3c which seems the wrong option of the three). I suspect that fairly granular control would be required not only to accommodate different users of DHIS, but also to accommodate different needs within one DHIS instance, but perhaps default behaviour is as far as you want to take it in the system.
I was speaking with Knut the other day about the possibilities of incorporating R into DHIS. I don't know if this would this be a way to take this heavy lifting off the database, but it would be great (for me at least!) to have that kind of control that R provides. Or maybe the best scenario is simply a way for R to put output back into DHIS and then any tailored aggregating can be done outside of the system.
For me, at least for the time being, I will just make sure to not calculate indicators unless rules are in place to clean the data first before calculating.
Thanks again,
Tom
________________________________
From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
Sent: 01 June 2012 14:31
To: Hiatt, Mr Tom (WPRO)
Cc: dhis2-users@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Dhis2-users] Missing values in calculating indicators
Hi Lars and Tom,
As Lars mentions, this is a very common problem with aggregation in DHIS2. In the case you highlight here, you want to exlcude the value which has NA as a denominator. We started having big problems with this with PMTCT indicators, when we saw huge values. This was because of systematic under-reporting of the denominator value, leading to many numerators with no corresponding denominator. (This is further complicated by the fact that the validation rules do not respond as you might expect and makes this problem hard to identify.
Lets say we were going to have a validation rule like
Numerator <= Denominator
This is fairly typical for many indicators in DHIS2
Now, in the case of the validations rules
50 <= NA evaluates to FALSE/NULL
In this case, the NA is not treated as a zero, although in the case of aggregation which you highlight, the NA is treated effectively as a zero.
The case which you and Lars both describe is quite simple. It becomes complicated with composite denominators, which may consist of multiple operands.
So, if you have an indicator like
DE1 + DE2 / DE3 + DE4
Now, lets say the values are as follows
10+20 / 30 + NA
What does that NA really mean? Is it something which has not been reported? Is it a zero which was entered as a zero but never recorded in the database? There are not really any clear cut answers, and you are going to have to try and solve this problem through a number of approaches, such as use of compulsory data elements, storing zeros, validation rules, and likely, external aggregation in tools such as R which will allow you a lot more control over how the aggregation takes place. This is the only solution I have come up with, but it is certainly far from ideal.
I think this comes down, once again, to the problem with handling NULLs. In certain cases we need to be able to exlcude NULLS entirely from the calculation. But with indicators it is tricky, as we take the sum of the numerators and then the sum of the denominators and divide them. With what you are proposing Tom, we would need to exclude certain denominator values (or a set of operands) entirely based on whether or not a numerator (or member of set of operands which compose a numerator value) is NULL. Definitely doable, and probably a good idea, but as Lars points out, it is going to most certainly affect performance. However, there is little point in calculating values quickly, if they are not correct. I certainly have come up against these same problems before Tom, and I do think we need to do some careful thinking around this issue.
Best regards,
Jason
On Thu, May 31, 2012 at 8:56 AM, <hiattt@xxxxxxxxxxxx<mailto:hiattt@xxxxxxxxxxxx>> wrote:
Hi Jason,
Sorry I'm in the dark on some of these debates. Please let me know if I'm getting annoying on any of my questions!
My problem is slightly different. It's more like this example for a specific period:
Org unit Numerator Denominator
O1 50 100
O2 40 50
O3 80 NA
For indicator X = (numerator / denominator) * 100
What I would like to see is (50 + 40) / (100 + 50) = 60%
What I'm seeing is (50 + 40 + 80) / (100 + 50) = 113%
Does that make sense? Again I could see a debate on whether to include rows with missing numerator data, but I can't think of a scenario where you would want rows with missing denominator data.
Thanks again, and please do let me know if I'm abusing the mailing list. You all have been so responsive and I appreciate it, but I don't want to wear out my welcome!
Tom
________________________________
From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx<mailto:jason.p.pickering@xxxxxxxxx>]
Sent: 31 May 2012 14:39
To: Hiatt, Mr Tom (WPRO)
Cc: dhis2-users@xxxxxxxxxxxxxxxxxxx<mailto:dhis2-users@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [Dhis2-users] Missing values in calculating indicators
Hi Tom,
You can consult the DHIS2 mailing list archives for long and torturous email threads on this issue and it has come up many times.
What would the desired behavior be in your case? Do you mean, that something like
NA + 1 + 2 +3 = NA
This is not the default behavior at all in DHIS2, and most of the time NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is
NA + 1 + 2+ 3 = 6
In cases when the NAs are important, we have always had to resort to SQL/R/Stata to perform the aggregation. But maybe there is a better solution from someone else?
Best regards,
Jason
On Thu, May 31, 2012 at 8:13 AM, <hiattt@xxxxxxxxxxxx<mailto:hiattt@xxxxxxxxxxxx>> wrote:
I see that when indicators are calculated, records where either the numerator or denominator is missing are not excluded. For me this would be ideal default behavior so that the indicator isn't thrown wildly off by assuming no reporting is the same as reporting 0. I could imagine some times when one would want an unreported numerator to be treated as 0 (if it's likely that it isn't reported, because it doesn't exist), so maybe an option to override default in the indicator definition screen would be good. I can't think of a scenario where one would want missing denominators to be treated as 0.
I don't know if others are running into this, or if it's only me. If there is a way to set this already let me know.
Thanks,
Tom
________________________________
This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.
Disclaimer:
This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.
If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.
_______________________________________________
Mailing list: https://launchpad.net/~dhis2-users
Post to : dhis2-users@xxxxxxxxxxxxxxxxxxx<mailto:dhis2-users@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~dhis2-users
More help : https://help.launchpad.net/ListHelp
________________________________
This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.
Disclaimer:
This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.
If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.
====================================================
This message was scanned for viruses with Trend Micro ScanMail, GFI MailSecurity and GFI MailEssentials by the World Health Organization Regional Office for the Western Pacific. However, the recipient is advised to scan this e-mail and any attached files for viruses.
Disclaimer:
This e-mail, together with any attachments, is intended for the named recipients only and is confidential. It may also be privileged or otherwise protected by law.
If you have received it in error, please notify the sender immediately by reply e-mail and delete it and any attachments from your system. You may not copy or disclose its contents to anyone.
Follow ups
References