← Back to team overview

dhis2-users team mailing list archive

Re: Cumulative numbers in DHIS2

 

Hi Jim

I will outline my approach here.

Create some dummy data elements which will be used for the cumulative data.
Data will be drawn from the "datavalue" table.  There are two ways which we
have used to aggregate the data 1) In Postgresql using a window-aggregate
function 2) in R with the "cumsum" function. I am sure you could use
other languages as well for this (Java, Python, etc) There are advantages
and disadvantages to both approaches, but generally I prefer to work with R
as opposed to SQL as it provides a bit more flexibility, but you will maybe
pay a performance penalty, so it really depends on how much data you are
dealing with and what language you are most comfortable using. Either way,
you must order the data correctly for each
dataelement/categorycomboid/sourceid combination, so that the running sum
is generated correctly (i.e. ascending date order)

Once you get the data, you can aggregate it to generate the running sum for
each sourceid/dataelementid/categorycomboid combination, with the periods
ordered correctly. Data in the "aggregateddatavaluetable" should be
truncated for the data elements you are working with, and replaced with the
data which you generate through your external aggregation function with
appropriate INSERT statements. The other option is to insert the data back
into the "datavalue" table, and allow the datamart to aggregate upwards
through the organisation unit hierarchy, which will work for the same
frequency as your data is collected. But the datamart will by default
aggregate the monthly cumulative values to quarterly cumulative values, but
of course, this will not work, so you must be very careful analyzing the
data should you choose to go this route. The easy way to deal with this is
to only aggregate the data at the same frequency and orgunit level as it is
collected, and do any subsequent analysis with Pivots or ad-hoc SQL
queries.

I am not sure however how long this approach is going to work, because I
know that a lot of stuff which depends on the datamart is being moved over
to the "analytics". I suppose the approach would be the same, but I have
not looked at it yet.

Hope this helps to guide your approach.

Best regards,
Jason




On Thu, May 16, 2013 at 3:39 PM, Jim Grace <jimgrace@xxxxxxxxx> wrote:

> Hi Jason,
>
> Thanks, that sounds like the best option for us. I had almost gotten there
> myself. I've just been describing to our M&E folks the various suggestions
> I've received (thanks by the way Knut for the HTML report idea!) They are
> having a hard time understanding why something so easy in Excel isn't
> supported more simply through DHIS. I'm speculating that it may be because
> HIV/AIDS treatment and care is a relatively new service, so people may be
> more interested in cumulative numbers in this area; nobody is asking how
> many deliveries have been done in a hospital since it was founded 100 years
> ago!
>
> I was thinking as a last resort we could define an extra data element but
> don't put it in one of the entry datasets. Then through the Web API I could
> periodically compute it for each site/month and upload it. But your
> approach sounds more efficient. Short of plunging in and reading the source
> code (which maybe I should do anyway), how do I understand what to do? Do I
> create a dummy indicator? How is the aggregateddatavalue table used? (Is it
> still used now that we have dynamic aggregation?) If you give me some
> initial help I may be able to find my way from there.
>
> Cheers,
> Jim
>
> On May 16, 2013, at 4:07 PM, Jason Pickering <jason.p.pickering@xxxxxxxxx>
> wrote:
>
> Hi Jim,
>
> Personally, I think the easiest way would be an external script which
> would process the data and then inject it back into the aggregateddatavalue
> table. Some might call this a  "hack", but we have used this approach for
> calculation of these types of cumulative numbers and it seems to work fine.
> In our case, we use a R script to make the calculation, and then inject the
> data back into the aggregateddatavalue table. The script is executed on a
> nightly basis.
>
>  The advantage with this approach is that the cumulative numbers are then
> available through the normal tools to the user, such as the charts, and of
> course, could be pulled out quite easily with the WebAPI into this Access
> tool.
>
> I am sure there are other approaches, but this is one of them. :)
>
> Regards,
> Jason
>
>
>
>
>
> On Thu, May 16, 2013 at 2:47 PM, Jim Grace <jimgrace@xxxxxxxxx> wrote:
>
>> Hi Jason,
>>
>> Thanks. I agree it's best not to collect a cumulative amount as a data
>> element when it could be derived. You've given me some ideas how to pull it
>> out -- for instance I see I can write a Jasper report based on a query.
>> Although otherwise I was thinking we could pull all the data we need
>> through Report Tables -- which would be easier for FACES to maintain after
>> I leave. As a possible future DHIS enhancement, I think some way of getting
>> this functionality in a Report Table would be preferable, whether as a
>> report table feature, or as a way of defining a cumulative indicator (even
>> better in my opinion, because of the many ways that indicators can be
>> used.) I realize it would be performance expensive to compute such an
>> indicator. But if it's what you need, you have to pay this price one way or
>> another.
>>
>> At the moment I'm writing some code to use the Web API to pull out the
>> FACES data and put it into the MS Access reporting tool. Kenya is set to
>> convert from this tool to DHIS, but meanwhile we have to report through it.
>> So it sounds like for each site/month I will have to pull this data element
>> for all previous months and sum it in my tool. Or could I somehow get this
>> through a SQL view? I don't see any way of using parameters in a SQL view.
>> (Am I missing something?) So would I have to write a SQL view to generate
>> the cumulatives for all possible reporting months for all FACES sites? That
>> sounds awkward. Maybe pulling all prior months through the API is the
>> lesser evil, even though the Internet connection is somewhat slow here in
>> Kenya.
>>
>> Cheers,
>> Jim
>>
>> On May 16, 2013, at 2:26 PM, Jason Pickering <jason.p.pickering@xxxxxxxxx>
>> wrote:
>>
>> Hi Jim,
>> I have seen this exact same data element being collected USAID/PEPFAR
>> supported organisations in Nigeria, Different organisations there, follow
>> different approaches, either of collecting it in the way which you mention,
>> or recording the cumulative figure each month. In the context of DHIS2, the
>> recording of cumulative totals is not really a great idea, because there is
>> not a "LATEST" aggregation operator, whereby the system simply would take
>> the latest available cumulative figure as the current one.
>>
>> With that in mind, it is number better to simply record the number of new
>> entrants each month. Once you have this, you can easily create a custom
>> report to accumulate the data from inception, use an SQL query to aggregate
>> it directly, or pull it out into other analytical tools such as R/Stata. I
>> think if you need something for end-users, you would need to develop a
>> custom report to achieve this, whereby data would be aggregated from
>> inception of reporting, up until the "End date" chosen by the user.
>>
>> Best regards,
>> Jason
>>
>>
>>
>>
>> On Thu, May 16, 2013 at 10:58 AM, Jim Grace <jimgrace@xxxxxxxxx> wrote:
>>
>>> Hi All,
>>>
>>> My PEPFAR partner organization FACES in Kenya is configuring our own
>>> DHIS2 instance to collect data among the clinics we support. One of the
>>> standard government variables we collect and report each month is "ever
>>> enrolled in [HIV/AIDS] care". This means the cumulative number enrolled
>>> since the start of the service at each facility, which is often several
>>> years ago. It equals the sum of all the "enrolled this month" numbers going
>>> back through time. For example if "ever enrolled" is 4000 in January 2013,
>>> and "enrolled this month" is 100 in February 2013, then "ever enrolled"
>>> must be 4100 in February.
>>>
>>> I am not (yet) seeing a good way to compute this in DHIS, so I'm asking
>>> advice. When we first started collecting this data from clinics in our
>>> current system of spreadsheets, we asked each clinic to compute the
>>> cumulative total and enter it for each month. We had a lot of errors this
>>> way, and we realized that it would be better calculating this in our tool
>>> instead of asking the data clerks to do this. So in our spreadsheets we
>>> just compute it by taking the previous month's "ever enrolled" and adding
>>> this month's "enrolled this month". Our records don't always go back to the
>>> start of care at each clinic, so we usually have an initial, hand-entered
>>> "ever enrolled" to get things started on the month before we enter real
>>> data for that clinic.
>>>
>>> Now we are trying to convert our system of spreadsheets to DHIS, and I
>>> don't see a good way to do this. I've tried creating a Report Table with
>>> "Include cumulative" checked, but it only gives me cumulative numbers
>>> within the range of report months. What we really need is to get the
>>> numbers for one or more sites, for one or more months, and to have "ever
>>> enrolled" as one of the numbers that is reported per site, per month. The
>>> best options I can think of are:
>>>
>>> 1. Ask each site to compute this number each month and enter it as a
>>> data element.
>>>
>>> 2. Export the data from DHIS into our own software and have our software
>>> compute this number.
>>>
>>> Are there any better options?
>>>
>>> Cheers,
>>> Jim
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-users
>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>>
>
>

Follow ups

References