← Back to team overview

dhis2-users team mailing list archive

Re: R and the web API

 

Hi Eric,

Indicators in DHIS2 are constructed by metadata, so there is no standard
way. If you are going to aggregate these yourself, then yes, you would need
to pull out all of the component data elements, reconstruct the indicator
in R, and then perform the aggregation. You can see an example of an
indicator here

https://play.dhis2.org/demo/api/indicators/ReUHfIn0pTQ

The numerators and denominators are described by the following snippet of
metadata...

<denominator>#{fbfJHSPpUQD.pq2XI5kz2BY}+#{fbfJHSPpUQD.PT59n8BQbqM}</denominator>
<numerator>#{fbfJHSPpUQD.pq2XI5kz2BY}+#{fbfJHSPpUQD.PT59n8BQbqM}-#{Jtf34kNZhzP.pq2XI5kz2BY}-#{Jtf34kNZhzP.PT59n8BQbqM}</numerator>

The first UID corresponds to the data element, and the second the UID of
the particular disaggregation (category option combination in DHIS2-ese).

There are other metadata components which are used to calculate the
indicator, such as the annualization factor, etc. Reconstructing the
aggregation engine of DHIS2 would probably not be totally trivial, but I
describe some approaches here <https://rpubs.com/jason_p_pickering/124722>
which could probably be also applied to indicators. In the case shown
there, I show how you can take the metadata of DHIS2, and then using the
metadata of the system, perform validation rule evaluation outside of the
system in R. Since the syntax of the indicators and the syntax of the
validation rules are the same, it would seem feasible (if non-trivial) to
do this as well with indicators.

In terms of weighting, the important thing to keep in mind with DHIS2 with
indicators, is that the numerators and denominators are aggregated
themselves, and then divided. Thus, you end up effectively with a weighted
average. The other approach would be to calculate each
numerator/denominator pair separately, and then calculate the mean
(unweighted).

In terms of comment of line 60, there is no guarantee that "indID <-
indicators$id[indicators$name==ind[i]]" will return anything, the way you
have the code at the moment. An NA could result, if there is no match
there. But yes, depending on your API call, NA/NULLs are possible, but the
analytics resources should not return any NULLs/NAs, but could return blank
values. Best to check, just to be sure.

Best regards,
Jason




On Tue, Nov 24, 2015 at 3:44 AM, Eric Green <epgreen@xxxxxxxxx> wrote:

> Hi Alex and Jason,
>
> Thanks for sharing these ideas. I was able to get the reference table I
> wanted. Much appreciated.
>
> Jason, your points about server stress are good. In my use case queries
> will be small in scope and infrequent, but it’s a good point to remember.
>
> I was not aware of the weighting issue (new to dhis2 and APIs!), but it
> makes sense. I would need to switch to data elements, right? Could anyone
> point me to good resources for finding out how specific indicators are
> constructed (and weighted)? Is there a standard reference?
>
> Jason, in your revised code (thanks!), could you clarify what you mean by
> "#Needs to be checked against NAs and duplicates” in line 60? This step is
> just creating the segment of the url that specifies the indicator,
> e.g., "dimension=dx:ReUHfIn0pTQ”. Are you saying more generally that
> resulting datasets for indicators need to be checked for NAs and
> duplicates? I think I’m missing something here.
>
> Thanks again.
>
> Eric
>
> On November 23, 2015 at 10:36:26 AM, Jason Pickering (
> jason.p.pickering@xxxxxxxxx) wrote:
>
> Hi Eric,
>
> Nice to see someone else looking to use R and DHIS2. :)
>
> Another way of getting the orgunit Hierarchy is with something like this.
>
>
> https://play.dhis2.org/demo/api/organisationUnits?fields=uid,parent[id],name,level,path
>
> Once you have the parent ID you can then generate the entire tree
> structure . The "path" also provides the full hierarchy of the position of
> a given orgunit within the hierarchy. Once you have either of these, it
> would be possible to generate the hierarchical structure pretty easily in R
> I think (although I have not written the code to do it!).
>
> I think your approach will work, but in general, the API can aggregate the
> data for you (depending on how you would like to aggregate it). Otherwise,
> if you make a lot of loops on the server, it could be a lot of data, and
> could potentially put the server under stress (depending on the level of
> usage of course). In general, I think it would make sense to try and only
> ask for what you need, if that is possible, and supported by the API. This
> will run a lot quicker (on the server and in R). This of course, all
> depends on the scale of what you are asking for and if you need to perform
> some type of filtering (such as outliers, bad data, etc) prior to
> aggregation, which the server may not perform.
>
>
> Also, be aware, that when getting indicators from DHIS2, you do not get
> the data values which compose the indicators. Thus, any aggregation which
> you would perform would likely be significantly different than DHIS2,
> because when DHIS2 aggregates the data, it does so with a weighted average,
> as opposed to an un-weighted average (which would be the only possibility
> since you are getting the  percentages here rather than both the numerator
> and denominator).
>
> I hacked your example a bit to make it a bit quicker. You can test the
> output on RFiddle here <http://www.r-fiddle.org/#/fiddle?id=QLAOB1hp>.
>
> Hope this helps to get you started.
>
> Regards,
> Jason
>
>
>
>
>
> On Mon, Nov 23, 2015 at 3:46 PM, Alex Tumwesigye <atumwesigye@xxxxxxxxx>
> wrote:
>
>> Dear Eric,
>>
>> Something like this  should assist to generate the metadata
>>
>> http://YOUR_URLl/api/organisationUnits.json?paging=false&fields=id,name,parent[id,name,parent[id,name,parent[id,name]]]&filter=level:EQ:5
>>
>>
>> The above will generate the orgunit hierachy at level 5 (lowest level) up
>> to level 2. Note how I use the parent[id,name]
>>
>> Alex
>>
>> On Mon, Nov 23, 2015 at 5:35 PM, Eric Green <epgreen@xxxxxxxxx> wrote:
>>
>>> I had a side conversation with Jason Pickering about using R to access
>>> the web API, and I’m moving the conversation to the mailing list to
>>> document it for others.
>>>
>>> I asked Jason for guidance on modifying the API url to import data into
>>> R. Prior to contacting Jason, I reviewed this documentation
>>> <https://www.dhis2.org/doc/snapshot/en/developer/html/apas07.html> and his
>>> presentation <https://github.com/jason-p-pickering/dhis2RIntegration> on
>>> R/DHIS2 integration (great stuff!). Jason was nice enough to create this
>>> example <http://www.r-fiddle.org/#/fiddle?id=wHglXleC&version=1> that
>>> showed me how to use the pivot table app, copy the API url using
>>> Firefox/Chrome developer tools, and use the pre-filled URL in R as a
>>> template.
>>>
>>> I wanted to do more with organization units, so I modified Jason’s
>>> example here: https://gist.github.com/ericpgreen/bb7fcb55efd8c93d3451.
>>>
>>> I might not be approaching the problem the right way, but my general
>>> approach is to define a set of periods (monthly) and organizational units
>>> and then loop over a set of indicators to create a data frame for each
>>> indicator that has values by unit (row) and period (column). Then in R (not
>>> shown), I will transform each data frame from wide to long and then combine
>>> the data frames for each indicator into a larger data frame for analysis.
>>>
>>> I would like to have the data at the lowest level possible so I can
>>> later aggregate at higher organization unit levels (e.g., counties) and
>>> periods (e.g., years) as needed. I know I could just request these
>>> aggregations via the API, but I am accustomed to working with datasets at
>>> the lowest level and doing manipulations in my code so I can follow the
>>> process more closely (I’m new to APIs).
>>>
>>> *My current question is how to obtain the metadata that indicates the
>>> organizational hierarchy of units.* When I define urlD in my code, I’d
>>> like to automatically grab all facility OU’s where county==2, for instance.
>>> I know I could do this if I had something like the following table. Right
>>> now I specify each OU manually. Having this table would allow me to build
>>> the API url programmatically.
>>>
>>> Also, in the data frame that is created, I only know that an observation
>>> is linked to facility 5, for instance, but I don’t have the metadata to
>>> show that facility 5 is in sub county 3 which is in county 2 of country 1.
>>> So having this table would let me aggregate on my end later.
>>>
>>> Of course suggestions on improving my general approach are also welcome!!
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>> --
>> Alex Tumwesigye
>>
>> Technical Advisor - DHIS2 (Consultant),
>> Ministry of Health/AFENET
>> Kampala
>> Uganda
>>
>> IT Consultant - BarefootPower Uganda Ltd, SmartSolar, Kenya
>>
>> IT Specialist (Servers, Networks and Security, Health Information Systems
>> - DHIS2 ) & Solar Consultant
>>
>> +256 774149 775, + 256 759 800161
>>
>> "I don't want to be anything other than what I have been - one tree hill "
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-users
>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-users
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+46764147049
>
>


-- 
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+46764147049

Attachment: 60A270FF-ACD1-4011-A0FC-71AA674E1993
Description: Binary data


Follow ups

References