← Back to team overview

dhis2-users team mailing list archive

Re: Missing values in calculating indicators

 

Hi Lars and Tom,

 As Lars mentions, this is a very common problem with aggregation in DHIS2.
In the case you highlight here, you want to exlcude the value which has NA
as a denominator. We started having big problems with this with PMTCT
indicators, when we saw huge values. This was because of systematic
under-reporting of the denominator value, leading to many numerators with
no corresponding denominator.  (This is further complicated by the fact
that the validation rules do not respond as you might expect and makes this
problem hard to identify.

Lets say we were going to have a validation rule like

Numerator <= Denominator

This is fairly typical for many indicators in DHIS2

 Now, in the case of the validations rules

50 <= NA evaluates to FALSE/NULL

In this case, the NA is not treated as a zero, although in the case of
aggregation which you highlight, the NA is treated effectively as a zero.

The case which you and Lars both describe is quite simple.  It becomes
complicated with composite denominators, which may consist of multiple
operands.

So, if you have an indicator like

DE1 + DE2 / DE3 + DE4

Now, lets say the values are as follows

10+20 / 30 + NA

What does that NA really mean? Is it something which has not been reported?
Is it a zero which was entered as a zero but never recorded in the
database? There are not really any clear cut answers, and you are going to
have to try and solve this problem through a number of approaches, such as
use of compulsory data elements, storing zeros, validation rules, and
likely, external aggregation in tools such as R which will allow you a lot
more control over how the aggregation takes place. This is the only
solution I have come up with, but it is certainly far from ideal.

I think this comes down, once again, to the problem with handling NULLs. In
certain cases we need to be able to exlcude NULLS entirely from the
calculation.  But with indicators it is tricky, as we take the sum of the
numerators and then the sum of the denominators and divide them. With what
you are proposing Tom, we would need to exclude certain denominator values
(or a set of operands) entirely based on whether or not a numerator (or
member of set of operands which compose a numerator value) is
NULL. Definitely doable, and probably a good idea, but as Lars points out,
it is going to most certainly affect performance. However, there is little
point in calculating values quickly, if they are not correct. I certainly
have come up against these same problems before Tom, and I do think we need
to do some careful thinking around this issue.

Best regards,
Jason





On Thu, May 31, 2012 at 8:56 AM, <hiattt@xxxxxxxxxxxx> wrote:

> **
>
> Hi Jason,****
>
> ** **
>
> Sorry I’m in the dark on some of these debates. Please let me know if I’m
> getting annoying on any of my questions!****
>
> ** **
>
> My problem is slightly different. It’s more like this example for a
> specific period:****
>
> ** **
>
> Org unit             Numerator         Denominator****
>
> O1                    50                     100****
>
> O2                    40                     50****
>
> O3                    80                     NA****
>
> ** **
>
> For indicator X = (numerator / denominator) * 100****
>
> ** **
>
> What I would like to see is (50 + 40) / (100 + 50) = 60%****
>
> ** **
>
> What I’m seeing is (50 + 40 + 80) / (100 + 50) = 113%****
>
> ** **
>
> Does that make sense? Again I could see a debate on whether to include
> rows with missing numerator data, but I can’t think of a scenario where you
> would want rows with missing denominator data.****
>
> ** **
>
> Thanks again, and please do let me know if I’m abusing the mailing list.
> You all have been so responsive and I appreciate it, but I don’t want to
> wear out my welcome!****
>
> ** **
>
> Tom****
>
> ** **
>
> ** **
>  ------------------------------
>
> *From:* Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
> *Sent:* 31 May 2012 14:39
> *To:* **Hiatt, Mr Tom** (WPRO)
> *Cc:* **dhis2-users@xxxxxxxxxxxxxxxxxxx**
> *Subject:* Re: [Dhis2-users] Missing values in calculating indicators****
>
> ** **
>
> Hi Tom,****
>
> ** **
>
> You can consult the DHIS2 mailing list archives for long and torturous
> email threads on this issue and it has come up many times. ****
>
> ** **
>
> What would the desired behavior be in your case? Do you mean, that
> something like ****
>
> ** **
>
> NA + 1 + 2 +3 = NA****
>
> ** **
>
> This is not the default behavior at all in DHIS2, and most of the time
> NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is***
> *
>
> ** **
>
> NA + 1 + 2+ 3  = 6****
>
> ** **
>
> In cases when the NAs are important, we have always had to resort to
> SQL/R/Stata to perform the aggregation. But maybe there is a better
> solution from someone else?****
>
> ** **
>
> Best regards,****
>
> Jason****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, May 31, 2012 at 8:13 AM, <hiattt@xxxxxxxxxxxx> wrote:****
>
> I see that when indicators are calculated, records where either the
> numerator or denominator is missing are not excluded. For me this would be
> ideal default behavior so that the indicator isn’t thrown wildly off by
> assuming no reporting is the same as reporting 0. I could imagine some
> times when one would want an unreported numerator to be treated as 0 (if
> it’s likely that it isn’t reported, because it doesn’t exist), so maybe an
> option to override default in the indicator definition screen would be
> good. I can’t think of a scenario where one would want missing denominators
> to be treated as 0.****
>
>  ****
>
> I don’t know if others are running into this, or if it’s only me. If there
> is a way to set this already let me know.****
>
>  ****
>
> Thanks,****
>
> Tom****
>
>  ****
>
>  ****
>
>  ****
>  ------------------------------
>
> This message was scanned for viruses with Trend Micro ScanMail, GFI
> MailSecurity and GFI MailEssentials by the World Health Organization
> Regional Office for the Western Pacific. However, the recipient is advised
> to scan this e-mail and any attached files for viruses. ****
>
> Disclaimer:****
>
> This e-mail, together with any attachments, is intended for the named
> recipients only and is confidential. It may also be privileged or otherwise
> protected by law.****
>
> If you have received it in error, please notify the sender immediately by
> reply e-mail and delete it and any attachments from your system. You may
> not copy or disclose its contents to anyone.****
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp****
>
> ** **
>
>
>
>
>
>  ------------------------------
>
> This message was scanned for viruses with Trend Micro ScanMail, GFI
> MailSecurity and GFI MailEssentials by the World Health Organization
> Regional Office for the Western Pacific. However, the recipient is advised
> to scan this e-mail and any attached files for viruses.
>
> Disclaimer:
>
> This e-mail, together with any attachments, is intended for the named
> recipients only and is confidential. It may also be privileged or otherwise
> protected by law.
>
> If you have received it in error, please notify the sender immediately by
> reply e-mail and delete it and any attachments from your system. You may
> not copy or disclose its contents to anyone.
>

Follow ups

References