← Back to team overview

dhis2-users team mailing list archive

Re: Missing values in calculating indicators

 

Hi Tom,
I think you have identified many of the scenarios, and it is good to see
them mapped out here. Would be to capture this as a blueprint on launchpad.

Just another comment about R. In fact, we often have to pull out certain
indicators, and aggregate them using a combination of SQL and R. What we
have not tried to do yet (although we will shortly), is put them back in
the database, However, the last two sections of the appendix on R
integration with DHIS sets the stage for this. Using PL/R, more complex
aggregation operators could be programmed in R, and applied to result sets
using custom aggregation operators. This might be useful for building
custom SQL queries for report tables. The last section on the use of the
WebAPI with R, shows how to retrieve data  and metadata into R. This
section<http://dhis2.org/doc/snapshot/en/user/html/ch22s04.html>
outlines
how to use the WebAPI to inject data back into DHIS2 through the WebAPI.
So, the basic building blocks are there. We just need to do a lot more work
to create usable methods in R or DHIS2 to make this more easily doable. I
think custom analyses should not be too difficult, but the offloading the
entire datamart process to R would obviously incur a big overhead and
require a lot of work.

Best regards,
Jason



On Fri, Jun 1, 2012 at 10:36 AM,  <hiattt@xxxxxxxxxxxx> wrote:
> Hi Jason and Lars,
>
>
>
> Thank you for the thoughtful responses. I agree with you this is similar
to
> the NULL issue, but I think it’s helpful to separate them into
sub-scenarios
> as you’ve done.
>
>
>
> Does this summarize?
>
>
>
> NULL issues
>
>
>
> NAs when summing variables (indicators or variable dimensions to make the
> variable(total))
>
> Treat NAs as 0s: 1+2+3+NA=6
> Treat NAs as NA: 1+2+3+NA=NA
> Treat NAs as 0s unless all are NA: 0+NA+NA+NA=0 and NA+NA+NA+NA=NA
> Treat NAs as 0s sometimes and NAs sometimes:
IFNULL(var1,0)+IFNULL(var2,NA)
>
> Treatment of NAs in evaluating rules
>
> NAs are considered 0s in rules
> NAs always make rules fail
> NAs are ignored in rules
>
> Treatment of NAs when calculating numerator/denominator indicators (also
> depends on settings in 1.)
>
> Drop numerator and denominator for an orgunit-timeperiod when calculating
if
> either are NA
> Drop numerator and denominator for an orgunit-timeperiod when calculating
> only if denominator is NA
> Drop numerator and denominator for an orgunit-timeperiod when calculating
> only if both are NA
>
>
>
> I don’t pretend to understand all the implications of this on the backend,
> but it certainly makes sense to me as Jason said that wrong indicators are
> as useful as no indicators (and even detrimental if misleading).
>
>
>
> I think the current DHIS behavior (1a, 2b, 3c) is justifiable (except for
3c
> which seems the wrong option of the three). I suspect that fairly granular
> control would be required not only to accommodate different users of DHIS,
> but also to accommodate different needs within one DHIS instance, but
> perhaps default behaviour is as far as you want to take it in the system.
>
>
>
> I was speaking with Knut the other day about the possibilities of
> incorporating R into DHIS. I don’t know if this would this be a way to
take
> this heavy lifting off the database, but it would be great (for me at
> least!) to have that kind of control that R provides. Or maybe the best
> scenario is simply a way for R to put output back into DHIS and then any
> tailored aggregating can be done outside of the system.
>
>
>
> For me, at least for the time being, I will just make sure to not
calculate
> indicators unless rules are in place to clean the data first before
> calculating.
>
>
>
> Thanks again,
>
> Tom
>
>
>
>
>
> ________________________________
>
> From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
> Sent: 01 June 2012 14:31
>
>
> To: Hiatt, Mr Tom (WPRO)
> Cc: dhis2-users@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Dhis2-users] Missing values in calculating indicators
>
>
>
> Hi Lars and Tom,
>
>
>
>  As Lars mentions, this is a very common problem with aggregation in
DHIS2.
> In the case you highlight here, you want to exlcude the value which has NA
> as a denominator. We started having big problems with this with PMTCT
> indicators, when we saw huge values. This was because of systematic
> under-reporting of the denominator value, leading to many numerators with
no
> corresponding denominator.  (This is further complicated by the fact that
> the validation rules do not respond as you might expect and makes this
> problem hard to identify.
>
>
>
> Lets say we were going to have a validation rule like
>
>
>
> Numerator <= Denominator
>
>
>
> This is fairly typical for many indicators in DHIS2
>
>
>
> Now, in the case of the validations rules
>
>
>
> 50 <= NA evaluates to FALSE/NULL
>
>
>
> In this case, the NA is not treated as a zero, although in the case of
> aggregation which you highlight, the NA is treated effectively as a zero.
>
>
>
> The case which you and Lars both describe is quite simple.  It becomes
> complicated with composite denominators, which may consist of multiple
> operands.
>
>
>
> So, if you have an indicator like
>
>
>
> DE1 + DE2 / DE3 + DE4
>
>
>
> Now, lets say the values are as follows
>
>
>
> 10+20 / 30 + NA
>
>
>
> What does that NA really mean? Is it something which has not been
reported?
> Is it a zero which was entered as a zero but never recorded in the
database?
> There are not really any clear cut answers, and you are going to have to
try
> and solve this problem through a number of approaches, such as use of
> compulsory data elements, storing zeros, validation rules, and likely,
> external aggregation in tools such as R which will allow you a lot more
> control over how the aggregation takes place. This is the only solution I
> have come up with, but it is certainly far from ideal.
>
>
>
> I think this comes down, once again, to the problem with handling NULLs.
In
> certain cases we need to be able to exlcude NULLS entirely from the
> calculation.  But with indicators it is tricky, as we take the sum of the
> numerators and then the sum of the denominators and divide them. With what
> you are proposing Tom, we would need to exclude certain denominator values
> (or a set of operands) entirely based on whether or not a numerator (or
> member of set of operands which compose a numerator value) is
> NULL. Definitely doable, and probably a good idea, but as Lars points out,
> it is going to most certainly affect performance. However, there is little
> point in calculating values quickly, if they are not correct. I certainly
> have come up against these same problems before Tom, and I do think we
need
> to do some careful thinking around this issue.
>
>
>
> Best regards,
>
> Jason
>
>
>
>
>
>
>
>
>
>
>
> On Thu, May 31, 2012 at 8:56 AM, <hiattt@xxxxxxxxxxxx> wrote:
>
> Hi Jason,
>
>
>
> Sorry I’m in the dark on some of these debates. Please let me know if I’m
> getting annoying on any of my questions!
>
>
>
> My problem is slightly different. It’s more like this example for a
specific
> period:
>
>
>
> Org unit             Numerator         Denominator
>
> O1                    50                     100
>
> O2                    40                     50
>
> O3                    80                     NA
>
>
>
> For indicator X = (numerator / denominator) * 100
>
>
>
> What I would like to see is (50 + 40) / (100 + 50) = 60%
>
>
>
> What I’m seeing is (50 + 40 + 80) / (100 + 50) = 113%
>
>
>
> Does that make sense? Again I could see a debate on whether to include
rows
> with missing numerator data, but I can’t think of a scenario where you
would
> want rows with missing denominator data.
>
>
>
> Thanks again, and please do let me know if I’m abusing the mailing list.
You
> all have been so responsive and I appreciate it, but I don’t want to wear
> out my welcome!
>
>
>
> Tom
>
>
>
>
>
> ________________________________
>
> From: Jason Pickering [mailto:jason.p.pickering@xxxxxxxxx]
> Sent: 31 May 2012 14:39
> To: Hiatt, Mr Tom (WPRO)
> Cc: dhis2-users@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Dhis2-users] Missing values in calculating indicators
>
>
>
> Hi Tom,
>
>
>
> You can consult the DHIS2 mailing list archives for long and torturous
email
> threads on this issue and it has come up many times.
>
>
>
> What would the desired behavior be in your case? Do you mean, that
something
> like
>
>
>
> NA + 1 + 2 +3 = NA
>
>
>
> This is not the default behavior at all in DHIS2, and most of the time
> NAs/NULLS will be coerced to zero. So, the default behavior in DHIS2 is
>
>
>
> NA + 1 + 2+ 3  = 6
>
>
>
> In cases when the NAs are important, we have always had to resort to
> SQL/R/Stata to perform the aggregation. But maybe there is a better
solution
> from someone else?
>
>
>
> Best regards,
>
> Jason
>
>
>
>
>
>
>
>
>
> On Thu, May 31, 2012 at 8:13 AM, <hiattt@xxxxxxxxxxxx> wrote:
>
> I see that when indicators are calculated, records where either the
> numerator or denominator is missing are not excluded. For me this would be
> ideal default behavior so that the indicator isn’t thrown wildly off by
> assuming no reporting is the same as reporting 0. I could imagine some
times
> when one would want an unreported numerator to be treated as 0 (if it’s
> likely that it isn’t reported, because it doesn’t exist), so maybe an
option
> to override default in the indicator definition screen would be good. I
> can’t think of a scenario where one would want missing denominators to be
> treated as 0.
>
>
>
> I don’t know if others are running into this, or if it’s only me. If there
> is a way to set this already let me know.
>
>
>
> Thanks,
>
> Tom
>
>
>
>
>
>
>
> ________________________________
>
> This message was scanned for viruses with Trend Micro ScanMail, GFI
> MailSecurity and GFI MailEssentials by the World Health Organization
> Regional Office for the Western Pacific. However, the recipient is advised
> to scan this e-mail and any attached files for viruses.
>
> Disclaimer:
>
> This e-mail, together with any attachments, is intended for the named
> recipients only and is confidential. It may also be privileged or
otherwise
> protected by law.
>
> If you have received it in error, please notify the sender immediately by
> reply e-mail and delete it and any attachments from your system. You may
not
> copy or disclose its contents to anyone.
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-users
> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-users
> More help   : https://help.launchpad.net/ListHelp
>
>
>
>
>
>
>
> ________________________________
>
> This message was scanned for viruses with Trend Micro ScanMail, GFI
> MailSecurity and GFI MailEssentials by the World Health Organization
> Regional Office for the Western Pacific. However, the recipient is advised
> to scan this e-mail and any attached files for viruses.
>
> Disclaimer:
>
> This e-mail, together with any attachments, is intended for the named
> recipients only and is confidential. It may also be privileged or
otherwise
> protected by law.
>
> If you have received it in error, please notify the sender immediately by
> reply e-mail and delete it and any attachments from your system. You may
not
> copy or disclose its contents to anyone.
>
>
>
>
>
>
>
> ________________________________
>
> This message was scanned for viruses with Trend Micro ScanMail, GFI
> MailSecurity and GFI MailEssentials by the World Health Organization
> Regional Office for the Western Pacific. However, the recipient is advised
> to scan this e-mail and any attached files for viruses.
>
> Disclaimer:
>
> This e-mail, together with any attachments, is intended for the named
> recipients only and is confidential. It may also be privileged or
otherwise
> protected by law.
>
> If you have received it in error, please notify the sender immediately by
> reply e-mail and delete it and any attachments from your system. You may
not
> copy or disclose its contents to anyone.

References