← Back to team overview

dhis2-devs team mailing list archive

Re: Managing metadata in a distributed setting

 

Thanks Greg for the heads up from SA.  Elmarie also just let me know of
your plans.  Which, as you know, I have been dimly aware of for some time
:-)  I was kind of hoping you guys would respond ...

>From yours (and Gartners) classification the datawarehouse would be a
consolidator of datavalues.

>From what you describe, your DD collects metadata from edge systems and
populates nation DW with what it finds.  So getting to specifics, what do
you do about categoryoptions (I think DEs and orgUnits are both
conceptually simpler cases)?  Taking my example as a starting point.  Two
edge systems have both defined "Male" with particular different UIDs.  When
they report datavalues they will make use of categoryoptioncombos which
envelop these different encodings for "Male".  What do you put in your DD
and what do you tell the DW about "Male"?

Do you create aliases, which would be one approach?

Or do you try and push standardized terminology downwards towards the edge
system?  Your case might be politically more complex bvecause of the need
to respect autonomy in the provinces.  The Rwanda case is slightly
different and potentially simpler in that all these systems are central,
national systems.

Bob

PS.  my process of constructingfiltered metadata is a very simple xslt
pipeline.  Starting with all the structural metadata related to
dataelements (like you would get from GET
to /api/metaData.xml?assumeTrue=false&dataElements=true&dataElementGroups=true&categories=true&categoryCombos=true&categoryOptions=true&categoryOptionCombos=true"
) I pass through a very simple 3 stage pipeline using xslt.  First isolate
the dataelements belonging to the group.  Second filter out only the
categorycombos and catregoryOptionCombos referred to by those
dataelements.  Finally filter the categotryoptions and categories
referenced in those categorycombos.  Just a few lines of code in all.
Maybe this would work for you too.

On 10 December 2014 at 21:50, Greg Rowles <greg.rowles@xxxxxxxxx> wrote:

> Hi Bob
>
> This is an interesting challenge. I can think of different methods to
> address the current alignment problem but you obviously have a long term
> (master/meta) data management strategy to resolve. Is your current data
> warehouse/dashboard instance purely a consolidator of edge-system data
> (i.e. it is an accumulator of datavalues and meta data)?
>
> It sounds as if the warehouse exists as a consolidator but how often is
> meta/master data cleaned/reviewed for integrity (is that taking place on a
> regular basis)? If so - what actions are usually taken?
>
> The intention of our national data dictionary (DD) in SA is to ensure
> integrity of master/meta data across all DHIS2 systems within our
> province-wide architecture (both for organisation units and for dx
> resources). Our DD does not exist to ensure integrity of our national data
> warehouse (NHIRD). For the most part we try to identify new master/meta
> data within edge-databases and load these recprds into the warehouse before
> importing data. That way we are prepared before the problem occurs (at
> least that is our best intention).
>
> We are in the process of developing a synchronization tool for this data
> dictionary with major coding support coming from the HISP India developers.
> Maybe they have scripts that can assist?
>
> Best,
> Greg
>
> p.s. Gartner research group sometimes make sense (see attachment)
>
>
> On Wed, Dec 10, 2014 at 7:41 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
>
>> Hi
>>
>> I have been helping out in Rwanda with their dhis2 setup(s) and I want to
>> share a couple of issues to see whether some of these have been experienced
>> by others and whether we can put our collective heads together on the
>> trickier ones.
>>
>> The setting is relatively complex in that they have a number of dhis2
>> instances.  The busiest one is the HMIS which is essentially the national
>> routine system which is collecting data from each facility (2466
>> dataelements in total).  There are a few other systems which collect
>> special data - such as the PBF system - also using dhis2.
>>
>> Then they have a central datawarehouse/dashboard DHIS2 instance which
>> accumulates some data from each of these others.  In general it is a
>> restricted subset.  So for example there is a dataelement group in HMIS
>> which defines 268 dataelements to be exported to datawarehouse each month.
>> In addition it collects data from some external (non-dhis) systems.
>>
>> So there are a number of challenges, some which are solved and some which
>> remain unsolved.  The first is to synchronize orgunits between them all
>> which we have had largely working for some time (if a bit clumsily).
>>
>> The bigger one is keeping the dataelement+categoryXXX structures
>> compatible between them all.  I have written some simple scripts to
>> extract/filter all the relevant structural metadata for a dataelement group
>> (its dataelements and related categoryXXXX's).  So this set can be exported
>> from one system (eg HMIS) and imported into another (eg Datawarehouse).
>> This solves the problem of routinely exporting datavalues from a
>> dataelementgroup in HMIS and reliably pushing to the datawarehouse without
>> weird conflicts.  But ....
>>
>> When you have another system (eg PBF) doing the same thing, it gets
>> hairier.  One problem we see is that the categoryoption lists on different
>> systems can have common elements (eg "Male") but they are independently
>> defined on both HMIS and PBF.  So if I export PBF metadata to datawarehouse
>> I will end up with two "Male" categoryOptions with different uids.  It
>> seems clear that within this small universe of dhis2 systems they will all
>> have to harmonize these things.  So it seems there are two options ..
>> (i) to abandon the project of distributed systems and to bring everything
>> together on one uber-instance with associated usergroups and access
>> controls, or
>> (ii) to setup a metadata repository instance from which all systems
>> derive their metadata and disable editing on the client systems.
>>
>> (i) is tempting but I think it escapes the problem rather than solves
>> it.  There are valid reasons to maintain these systems separately and there
>> will always be similar cases.  So we will probably do (ii).  If dhis2
>> cannot interoperate with itself it augurs badly for its interoperability in
>> general.
>>
>> Has anybody done something similar?  Large scale rationalization of
>> metadata will involve also the massaging of the datavalue tables to keep
>> everything true as the catoptcombos get reorganized.  Given that some of
>> the systems have been in full operation for over 3 years now there are the
>> usual messes which have accumulated over time amongst the categoryXXX
>> structures which will need to be addressed anyway to arrive at a pristine
>> state.  Maybe there are some nice scripts for this?
>>
>> Things like sharing settings on categoryoptions provide a bit of a
>> puzzle, but also possibly something useful.  Presumably we can set them to
>> be completely readonly on the metadata client systems.
>>
>> Bob
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
>
> *Health Information Systems Program*
> *- - - - - - - **- - - - - - - **- - - - - - - **- - - - - - - **- - - -
> - *
> Mobile  :    073 246 2992
> Landline:   021 554 3130
> Fax:          086 733 8432
> Skype:      gregory_rowles
>

Follow ups

References