← Back to team overview

dhis2-users team mailing list archive

Re: Cumulative numbers in DHIS2

 

Hi Jim,
Yes, this sounds like a better approach in fact. It should be totally
possible to extract the data through the API and inject it back this way.

I have outlined an approach
here<http://dhis2.net/dhis2RIntegration/index.html#21>,
using again R, to get data through the WebAPI, and then a few slides later,
manipulate it, and import it back into the application.

Yes, I fully agree that at some point we should likely support these
cumulative types of data elements, but for  types of statistical
manipulations, there may be better solutions, and not sure that DHIS2 will
always be able to cater to all of them. But agree for PEPFAR related
programmes, this functionality is important.

Best regards,
Jason



On Sat, May 18, 2013 at 6:49 AM, Jim Grace <jimgrace@xxxxxxxxx> wrote:

> Hi Jason,
>
> Thank you, this is immensely helpful. I'm new to Postgresql and wasn't
> aware of the window-aggregate functions. They look very slick. And thanks
> for all the advice of how your aggregations interact with DHIS.
>
> FACES has their DHIS implementation hosted for them, and while I have
> access to the back-end system, it would be best if this mechanism could be
> maintained in the future by those who do not. So my current thinking is
> I'll write a DHIS SQL View that pulls the data to be summed and also pulls
> the dummy data element values that should contain the cumulative amounts.
> This view is fetched through the Web API by a remote Java program and the
> values compared. (The SQL view can order the values by date for the
> convenience of the Java program to keep running totals.) The Java program
> can use the API to write back any cumulative values that need updating
> where the source values may have been recently entered or changed. Then
> we'll just rely on DHIS to process these data elements and we won't have to
> worry about how the mechanism might change in the future. We will only need
> the monthly values for reporting; we won't look at aggregations of the
> cumulative numbers through time since these will be incorrect.
>
> FACES is very happy that they'll be able to pull these values through
> report tables and pivot tables. Thanks everyone for the advice. But
> eventually I hope DHIS will support cumulative indicators. The performance
> might be a concern, but as I said before if you need this data you'll have
> to pay the performance one way or another. DHIS could also report these
> indicators correctly over longer time periods. :)
>
> Cheers,
> Jim
>
> On May 17, 2013, at 8:55 AM, Jason Pickering <jason.p.pickering@xxxxxxxxx>
> wrote:
>
> Hi Jim
>
> I will outline my approach here.
>
> Create some dummy data elements which will be used for the cumulative
> data. Data will be drawn from the "datavalue" table.  There are two ways
> which we have used to aggregate the data 1) In Postgresql using a
> window-aggregate function 2) in R with the "cumsum" function. I am sure you
> could use other languages as well for this (Java, Python, etc) There are
> advantages and disadvantages to both approaches, but generally I prefer to
> work with R as opposed to SQL as it provides a bit more flexibility, but
> you will maybe pay a performance penalty, so it really depends on how much
> data you are dealing with and what language you are most comfortable using.
> Either way, you must order the data correctly for each
> dataelement/categorycomboid/sourceid combination, so that the running sum
> is generated correctly (i.e. ascending date order)
>
> Once you get the data, you can aggregate it to generate the running sum
> for each sourceid/dataelementid/categorycomboid combination, with the
> periods ordered correctly. Data in the "aggregateddatavaluetable" should be
> truncated for the data elements you are working with, and replaced with the
> data which you generate through your external aggregation function with
> appropriate INSERT statements. The other option is to insert the data back
> into the "datavalue" table, and allow the datamart to aggregate upwards
> through the organisation unit hierarchy, which will work for the same
> frequency as your data is collected. But the datamart will by default
> aggregate the monthly cumulative values to quarterly cumulative values, but
> of course, this will not work, so you must be very careful analyzing the
> data should you choose to go this route. The easy way to deal with this is
> to only aggregate the data at the same frequency and orgunit level as it is
> collected, and do any subsequent analysis with Pivots or ad-hoc SQL
> queries.
>
> I am not sure however how long this approach is going to work, because I
> know that a lot of stuff which depends on the datamart is being moved over
> to the "analytics". I suppose the approach would be the same, but I have
> not looked at it yet.
>
> Hope this helps to guide your approach.
>
> Best regards,
> Jason
>
>
>
>
> On Thu, May 16, 2013 at 3:39 PM, Jim Grace <jimgrace@xxxxxxxxx> wrote:
>
>> Hi Jason,
>>
>> Thanks, that sounds like the best option for us. I had almost gotten
>> there myself. I've just been describing to our M&E folks the various
>> suggestions I've received (thanks by the way Knut for the HTML report
>> idea!) They are having a hard time understanding why something so easy in
>> Excel isn't supported more simply through DHIS. I'm speculating that it may
>> be because HIV/AIDS treatment and care is a relatively new service, so
>> people may be more interested in cumulative numbers in this area; nobody is
>> asking how many deliveries have been done in a hospital since it was
>> founded 100 years ago!
>>
>> I was thinking as a last resort we could define an extra data element but
>> don't put it in one of the entry datasets. Then through the Web API I could
>> periodically compute it for each site/month and upload it. But your
>> approach sounds more efficient. Short of plunging in and reading the source
>> code (which maybe I should do anyway), how do I understand what to do? Do I
>> create a dummy indicator? How is the aggregateddatavalue table used? (Is it
>> still used now that we have dynamic aggregation?) If you give me some
>> initial help I may be able to find my way from there.
>>
>> Cheers,
>> Jim
>>
>> On May 16, 2013, at 4:07 PM, Jason Pickering <jason.p.pickering@xxxxxxxxx>
>> wrote:
>>
>> Hi Jim,
>>
>> Personally, I think the easiest way would be an external script which
>> would process the data and then inject it back into the aggregateddatavalue
>> table. Some might call this a  "hack", but we have used this approach for
>> calculation of these types of cumulative numbers and it seems to work fine.
>> In our case, we use a R script to make the calculation, and then inject the
>> data back into the aggregateddatavalue table. The script is executed on a
>> nightly basis.
>>
>>  The advantage with this approach is that the cumulative numbers are then
>> available through the normal tools to the user, such as the charts, and of
>> course, could be pulled out quite easily with the WebAPI into this Access
>> tool.
>>
>> I am sure there are other approaches, but this is one of them. :)
>>
>> Regards,
>> Jason
>>
>>
>>
>>
>>
>> On Thu, May 16, 2013 at 2:47 PM, Jim Grace <jimgrace@xxxxxxxxx> wrote:
>>
>>> Hi Jason,
>>>
>>> Thanks. I agree it's best not to collect a cumulative amount as a data
>>> element when it could be derived. You've given me some ideas how to pull it
>>> out -- for instance I see I can write a Jasper report based on a query.
>>> Although otherwise I was thinking we could pull all the data we need
>>> through Report Tables -- which would be easier for FACES to maintain after
>>> I leave. As a possible future DHIS enhancement, I think some way of getting
>>> this functionality in a Report Table would be preferable, whether as a
>>> report table feature, or as a way of defining a cumulative indicator (even
>>> better in my opinion, because of the many ways that indicators can be
>>> used.) I realize it would be performance expensive to compute such an
>>> indicator. But if it's what you need, you have to pay this price one way or
>>> another.
>>>
>>> At the moment I'm writing some code to use the Web API to pull out the
>>> FACES data and put it into the MS Access reporting tool. Kenya is set to
>>> convert from this tool to DHIS, but meanwhile we have to report through it.
>>> So it sounds like for each site/month I will have to pull this data element
>>> for all previous months and sum it in my tool. Or could I somehow get this
>>> through a SQL view? I don't see any way of using parameters in a SQL view.
>>> (Am I missing something?) So would I have to write a SQL view to generate
>>> the cumulatives for all possible reporting months for all FACES sites? That
>>> sounds awkward. Maybe pulling all prior months through the API is the
>>> lesser evil, even though the Internet connection is somewhat slow here in
>>> Kenya.
>>>
>>> Cheers,
>>> Jim
>>>
>>> On May 16, 2013, at 2:26 PM, Jason Pickering <
>>> jason.p.pickering@xxxxxxxxx> wrote:
>>>
>>> Hi Jim,
>>> I have seen this exact same data element being collected USAID/PEPFAR
>>> supported organisations in Nigeria, Different organisations there, follow
>>> different approaches, either of collecting it in the way which you mention,
>>> or recording the cumulative figure each month. In the context of DHIS2, the
>>> recording of cumulative totals is not really a great idea, because there is
>>> not a "LATEST" aggregation operator, whereby the system simply would take
>>> the latest available cumulative figure as the current one.
>>>
>>> With that in mind, it is number better to simply record the number of
>>> new entrants each month. Once you have this, you can easily create a custom
>>> report to accumulate the data from inception, use an SQL query to aggregate
>>> it directly, or pull it out into other analytical tools such as R/Stata. I
>>> think if you need something for end-users, you would need to develop a
>>> custom report to achieve this, whereby data would be aggregated from
>>> inception of reporting, up until the "End date" chosen by the user.
>>>
>>> Best regards,
>>> Jason
>>>
>>>
>>>
>>>
>>> On Thu, May 16, 2013 at 10:58 AM, Jim Grace <jimgrace@xxxxxxxxx> wrote:
>>>
>>>> Hi All,
>>>>
>>>> My PEPFAR partner organization FACES in Kenya is configuring our own
>>>> DHIS2 instance to collect data among the clinics we support. One of the
>>>> standard government variables we collect and report each month is "ever
>>>> enrolled in [HIV/AIDS] care". This means the cumulative number enrolled
>>>> since the start of the service at each facility, which is often several
>>>> years ago. It equals the sum of all the "enrolled this month" numbers going
>>>> back through time. For example if "ever enrolled" is 4000 in January 2013,
>>>> and "enrolled this month" is 100 in February 2013, then "ever enrolled"
>>>> must be 4100 in February.
>>>>
>>>> I am not (yet) seeing a good way to compute this in DHIS, so I'm asking
>>>> advice. When we first started collecting this data from clinics in our
>>>> current system of spreadsheets, we asked each clinic to compute the
>>>> cumulative total and enter it for each month. We had a lot of errors this
>>>> way, and we realized that it would be better calculating this in our tool
>>>> instead of asking the data clerks to do this. So in our spreadsheets we
>>>> just compute it by taking the previous month's "ever enrolled" and adding
>>>> this month's "enrolled this month". Our records don't always go back to the
>>>> start of care at each clinic, so we usually have an initial, hand-entered
>>>> "ever enrolled" to get things started on the month before we enter real
>>>> data for that clinic.
>>>>
>>>> Now we are trying to convert our system of spreadsheets to DHIS, and I
>>>> don't see a good way to do this. I've tried creating a Report Table with
>>>> "Include cumulative" checked, but it only gives me cumulative numbers
>>>> within the range of report months. What we really need is to get the
>>>> numbers for one or more sites, for one or more months, and to have "ever
>>>> enrolled" as one of the numbers that is reported per site, per month. The
>>>> best options I can think of are:
>>>>
>>>> 1. Ask each site to compute this number each month and enter it as a
>>>> data element.
>>>>
>>>> 2. Export the data from DHIS into our own software and have our
>>>> software compute this number.
>>>>
>>>> Are there any better options?
>>>>
>>>> Cheers,
>>>> Jim
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-users
>>>> Post to     : dhis2-users@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-users
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>>
>>
>>
>
>

References