dhis2-devs team mailing list archive

Thread
Date

Re: Fwd: On categories and dimensions and zooks

To: Bob Jolliffe <bobjolliffe@xxxxxxxxx>
From: Knut Staring <knutst@xxxxxxxxx>
Date: Mon, 12 Oct 2009 15:44:09 +0200
Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <a1820cc70910120318k5991a20fv9a6b7f2b253b00b9@mail.gmail.com>

Attached is the format that the IMR webservice provides currently (not yet
SDMX) for MDG indicators

2009/10/12 Bob Jolliffe <bobjolliffe@xxxxxxxxx>

> Hi Lars
>
> I think your suggestion might adequately cover the analysis use case, but
> there remains a missing piece to the puzzle re SDMX export.  I am
> particulalrly thinking of the challenge Ola and Knut are shortly facing of
> presenting DHIS as a consumer of WHO MDG Indicator metadata and producer of
> SDMX MDG reports.  Comments inline below:
>
> 2009/10/10 Lars Helge Øverland <larshelge@xxxxxxxxx>
>
>
>> Here comes my shot at this issue. I'm gonna use Ola's example as a basis.
>>
>> <!-- start -->
>>
>> *
>> *The flat data element names:
>> "Malaria death <5 year"
>> "Malaria death >5 year"
>> "Malaria in OPD 1st attendance <5 year"
>> "Malaria in OPD 1st attendance >5 year"
>> "Malaria IP discharge <5 year"
>> "Malaria IP discharge >5 year"
>> "Typhoid death <5 year"
>> "Typhoid death >5 year"
>> etc.
>> (OPD is outpatient, patients treated at the clinic, IP is inpatient
>> meaning patients that was admitted to a hospital).
>>
>> There are three dimensions in the data elements above, so I define three
>> data element group sets:
>> Disease, Patient Status, and Age.
>> I also define 7 new data element groups (Malaria, Typhoid, <5, >5, Death,
>> OPD, IP) and assign these groups to the group set they belong to:
>> Disease (Malaria, Typhoid)
>> Patient Status (Death, OPD, IP)
>> Age (<5, >5)
>>
>> I then assign the data element groups to the data elements
>> "Malaria death <5 year" assigned to "Malaria", "Death", and "<5".
>> etc.
>>
>> All these groupings can exist completely independent of data entry and be
>> changed at any time.
>> >From this I can generate a new resource table for my data analysis
>> (similar to the one we already have for orgunit group sets) that provides:
>> Data Element Group Set, Data Element Group, Data Element
>> "Disease", "Malaria", "Malaria death <5 year",
>> "Disease", "Typhoid", "Typhoid death <5 year"
>> "Patient Status", "Death", "Malaria death <5 year"
>> etc.
>>
>> When joining the above table with an aggregated data value table you can
>> define a pivot table with your three data element group sets as columns
>> (pivot fields) and analyse the data across these three dimensions. The data
>> element name dimension can then be completely hidden in the analysis.
>>
>> <!-- end -->
>>
>>
>> Some observations:
>>
>>
>> a) From this we can derive that a GroupSet corresponds to a Dimension and
>> that a Group corresponds to a DimensionOption.
>>
>> Dimension = GroupSet
>> DimensionOption = Group
>>
>>
>> b) The current Category model and the suggested simplified version both
>> generate CategoryOptionCombos/DimensionElementCombinations which are linked
>> to DataValue and constitute all possible combinations of their associated
>> CategoryOptions/DimensionOptions. This means that once those
>> CategoryOptionCombos/ DimensionElementCombinations are generated and
>> DataValues are registered for them, they cannot change. Also, once a data
>> entry grid is defined, the underlying model cannot change. According to Ola
>> and Jason we must be able to assign "any dimension to a DataElement" at any
>> time.
>
>
> I think here is the snag.  In the proposed scheme you are not really
> assigning dimensions to a dataelement at all.  In fact you do the reverse -
> you assign dataelements to a dimension.  I still need to end up with a
> resulting indicator/dataelement which has a name and which has these
> dimensions.  I'll try a snippet of Patrick's sample sdmx inline here to
> illustrate the point (Best viewed by making your font size very small).
>
> Here is an example of some indicators:
> <CodeLists>
>         <structure:CodeList id="CL_INDICATOR" agencyID="SDMX-HD"
> version="1.0" isFinal="false"
> urn="urn:sdmx:org.sdmx.infomodel.codelist=SDMX-HD:CL_INDICATOR" >
>             <structure:Name xml:lang="en">Indicator</structure:Name>
>             <structure:Code value="0"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].0">
>                 <structure:Description xml:lang="en">Neonatal mortality
> rate (per 1000 live births)</structure:Description>
>             </structure:Code>
>             <structure:Code value="1"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].1">
>                 <structure:Description xml:lang="en">Number of deaths
> during first 28 completed days of life</structure:Description>
>             </structure:Code>
>             <structure:Code value="2"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].2">
>                 <structure:Description xml:lang="en">1000 live births in a
> given year</structure:Description>
>             </structure:Code>
>             <structure:Code value="3"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].3">
>                 <structure:Description xml:lang="en">Life expectancy at
> birth</structure:Description>
>             </structure:Code>
>             <structure:Code value="4"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:INDICATOR[1.0].4">
>                 <structure:Description xml:lang="en">Adults aged = 15 years
> who are obese</structure:Description>
>             </structure:Code>
>         </structure:CodeList>
>   </CodeLists>
>
> (The last one strikes me as a bid odd.  I would have thought the indicator
> would be "Number of people who are Obese" and the age stuff would be in a
> dimension.  But anyway ... best not to get obsessed with dimensions)
>
> Here is an example of a dimension:
>   <CodeLists>
>     <structure:CodeList id="CL_GENDER" agencyID="SDMX-HD" version="1.0"
> isFinal="true" urn="urn:sdmx:org.sdmx.infomodel.codelist=SDMX-HD:CL_GENDER">
>       <structure:Name xml:lang="en">Gender</structure:Name>
>       <structure:Description xml:lang="en">Gender.</structure:Description>
>       <structure:Code value="1"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].1">
>         <structure:Description xml:lang="en">Male</structure:Description>
>       </structure:Code>
>       <structure:Code value="2"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].2">
>         <structure:Description xml:lang="en">Female</structure:Description>
>       </structure:Code>
>       <structure:Code value="3"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0].3">
>         <structure:Description
> xml:lang="en">Transgender</structure:Description>
>       </structure:Code>
>       <structure:Code value="_NA"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._NA">
>         <structure:Description xml:lang="en">Not
> Applicable</structure:Description>
>       </structure:Code>
>       <structure:Code value="_ALL"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._ALL">
>         <structure:Description xml:lang="en">All</structure:Description>
>       </structure:Code>
>       <structure:Code value="_UNK"
> urn="urn:sdmx:org.sdmx.infomodel.codelist.Code=SDMX-HD:CL_GENDER[1.0]._UNK">
>         <structure:Description
> xml:lang="en">Unknown</structure:Description>
>       </structure:Code>
>     </structure:CodeList>
>   </CodeLists>
>
> Note that both the indicator and the dimension are represented by a common
> element (structure:CodeList).  This is not purely coincidental.  In terms of
> the DataValue the indicator and the dimension are treated the same way - as
> an attribute.  So in this sense the Indicator (like the period and orgunit)
> are like compulsory dimensions.
>
>         <ns:Series DISEASE="1" PROG="0" GEOGRAPHIC_PLACE_NAME="CH-GE"
> ORGANIZATION="1" INDICATOR="4" VALUE_TYPE="1" GENDER="_ALL" AGROUP="5"
> GLOCATION="3" PERIODICITY="4" UNIT="_NA" REPEATS="0"  >
>             <ns:Obs OBS_VALUE="400" TIME_PERIOD="2008"
> DATE_COLLECT="2009-03-20" />
>         </ns:Series>
>
> (Series is just used to group datavalues in a time series.  DISEASE might
> be for example Malaria)
>
> What would (or what could) the Indicator be in our sample scenario?  This
> is where it would be really useful to get hold of the actual MDG indicator
> definitions that we apparently won't see till the 20th.  Having said that we
> can get a pretty good idea of what they will look like from here:
> http://mdgs.un.org/unsd/mdg/Host.aspx?Content=Indicators/OfficialList.htm.
>
> Anyway, I hope you see my point.  Whereas we do need to be able to group
> indicators/dataelements into dimensions, those dimensions still have to be a
> dimension of something.  Is it a dimension of the Indicator?  Well almost,
> but not quite.  Its interesting if you look at the indicator list above that
> there is no mention of dimensions.  I think - and I don't want to confuse
> things further by bringing in further terminology - it is actually a
> dimension of the "measure".  Contrary to some recent discussions in which,
> myself included, we thought that dataelement might be equivalent to what
> some people call measure.  This is not the case, as Jørn quickly and
> vigorously pointed out.  The "measure" is the type of data value (or series
> of datavalues) which might be something like "percentage of population" or
> "proportion of poulation per 1000" or something like that.
>
> And the measure would have dimensions, including compulsory ones like
> Indicator, Period, OrganisationUnit as well as optional ones like Disease,
> Gender, Age etc.
>
> But in practice, because the Indicator is a compulsory dimension,  a
> particular instance of a measure (an OBS_VALUE in SDMX) would be associated
> with a particular Indicator + its other dimensions.  So I think, besides the
> Indicators which make up the dimensions as per the groupset idea, we must
> also have an Indicator which *has* these dimensions.  A recursion I know.
>
> So, in addition to Lars' model, I would propose an Indicator (and
> DataElement) interface as follows:
>
> interface MultiDimensionalElement
> {
>    OrderedList<Dimension> getDimensions():
>    void setDimensions(OrderedList<Dimension>);
>    void addDimension(Dimension);
>    etc
> }
>
> and Indicator implements MultiDimensionalElement; and DataElement
> implements MultiDimensionalElement.
>
> And of course getDimensions() can (and many or most cases will) return
> NULL.
>
> Remaing thoughts:
> (i)  an Indicator, even a multidimensional one, still needs a value.  I
> suspect in most cases this will be the aggregation of its dimension values.
> For example, taking MDG indicator number 4.1 (Under-five mortality rate),
> this will probably have a Gender dimension which we will implement using
> groups and groupsets, but it will also have an aggregate value.
>
> (ii)  medium term.  I don't think it makes any sense to continue to support
> two methods of implementing multidimensionality.  The revised model of Lars
> (with additions) should eventually also be able to be used to implement the
> grid data entry requirement.  But we can suspend that discussion for now
>
> Sorry for the long mail.  Lars do you think it makes sense to extend your
> model this way?  I know we need to come up with a solution pretty quickly on
> this.
>
> Regards
> Bob
>
>
>
>  To me this rules out re-using the same dimensional attributes for data
>> entry and analysis - we must in any case have on set of dimensions for data
>> entry and one set of dimensions for analysis.
>>
>>
>> c) Ola's suggested solution supports this. It is powerful in the ability
>> to assign "raw" DataElements to Dimensions/GroupSets through
>> DimensionOptions/Groups, completely independent of which Categories the
>> DataElement was assigned to for data entry. The weakness is that it is based
>> on flat data elements, not Categorized data elements, which we must include
>> if we are to justify the Categorized data entry.
>>
>>
>> d) The Category model is pretty good at what it currently does -
>> facilitating grid-based dataentry and cutting down on the number of data
>> elements (as well as making the data element naming more elegant).
>>
>>
>> Based on this I suggest we do the following:
>>
>> 1) We continue to use the Category model as it is, not for analysis - but
>> for data entry.
>>
>> 2) Taken from Bob's suggestion - we phase out the existing Group and
>> replace it with a new DimensionOption object. We introduce a new Dimension
>> object which will work similarly to a GroupSet. We use this model for
>> analysis.
>>
>> 3) We go for Ola's mentioned suggestion for analysis, with one exception:
>> Rather than assigning DataElements to a Group/DimensionOption, we assign a
>> combination of DataElement and CategoryOptionCombo (We create a new object
>> for this for every assignment - and remove it for every de-assignment). If
>> we want to see the total, we can assign a DataElement with the "default"
>> CategoryOptionCombo, or create a DimensionOption where the elements make a
>> total when summarized.
>>
>> 4) We use the same thing for Indicators.
>>
>>
>> The resource table Ola mentions will then look like this:
>>
>> Group Set -Group - Data Element - CategoryOptionCombo
>>
>> "Disease" - "Malaria" - "Malaria" - "(death, <5 year)"
>>  "Disease" - "Typhoid" - "Typhoid" - "(death, >5 year)"
>>
>>
>> This way we can assign dimensions as we like without loosing the fine
>> granularity of the captured categorized data. We can improve the report
>> table functionality in order to utilize this. This will be feasible with the
>> time and resource constraints we are operating with. It also alleviates the
>> challenge regarding Indicators and SDMX.
>>
>>
>> Additionally, one could expand the quotation from a) to:
>>
>> Dimension = GroupSet = Category
>> DimensionOption = Group = CategoryOption
>>
>> which means there is potential in merging those objects/making them
>> implement a common interface. But I don't see the value if b) is valid.
>>
>>
>> Waiting for your replies/slaughter.
>>
>>
>> Lars
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Cheers,
Knut Staring

<?xml version="1.0" encoding="utf-8"?>
<Indicators>
  <Indicator xmlns="">
    <IndicatorId system="5">5</IndicatorId>
    <Name name="Name">Contraceptive prevalence rate</Name>
    <ShortName name="Short name">Contraceptive prevalence</ShortName>
    <DataType name="Data type">Percent</DataType>
    <Topic name="Topic">Health service coverage</Topic>
    <Rationale name="Rationale">Contraceptive prevalence rate is an indicator of health, population, development and women's empowerment. It also serves as a proxy measure of access to reproductive health services that are essential for meeting many of the Millennium Development Goals, especially those related to child mortality, maternal health, HIV/AIDS, and gender equality.</Rationale>
    <Definition name="Definition">The percentage of women aged 15-49 years, married or in-union, who are currently using, or whose sexual partner is using, at least one method of contraception, regardless of the method used.</Definition>
    <AssociatedTerms name="Associated terms">Contraceptive methods : Include clinic and supply (modern) methods and non-supply (traditional) methods of contraception. Clinic and supply methods include female and male sterilization, intrauterine devices (IUDs), hormonal methods (oral pills, injectables, and hormone-releasing implants, skin patches and vaginal rings), condoms and vaginal barrier methods (diaphragm, cervical cap and spermicidal foams, jellies, creams and sponges). Traditional methods include rhythm, withdrawal, abstinence and lactational amenorrhoea.</AssociatedTerms>
    <PreferredDataSources name="Preferred data sources">Household surveys</PreferredDataSources>
    <OtherPossibleDataSources name="Other possible data sources">
    </OtherPossibleDataSources>
    <MeasurementMethod name="Measurement method">Contraceptive prevalence = (Women of reproductive age (15-49) who are married or in union and who are currently using any method of contraception / Total number of women of reproductive age (15-49) who are married or in union) x 100

Household surveys that can generate this indicator includes Demographic and Health Surveys (DHS), Multiple Indicator Cluster Surveys (MICS), Fertility and Family Surveys (FFS), Reproductive Health Surveys (RHS) and other surveys based on similar methodologies.</MeasurementMethod>
    <MethodOfEstimation name="Method of estimation">The United Nations Population Division compiles data from nationally representative surveys including the Demographic and Health Surveys (DHS), the Fertility and Family Surveys (FFS), the CDC-assisted Reproductive Health Surveys (RHS), the Multiple Indicator Cluster Surveys (MICS) and national family planning, or health, or household, or socio-economic surveys. The results are published regularly in the World Contraceptive Use report.
 
Predominnat type of statistics: adjusted</MethodOfEstimation>
    <MethodOfEstimationOfRegionalAndGlobalEstimates name="Method of estimation of regional and global estimates">Regional and global estimates are based on weighted averages, using the total number of of women of reproductive age (15-49) who are married or in union. These estimates are presented only if available data cover at least 50% of total number of women of reproductive age (15-49) who are married or in union in the regional or global groupings.</MethodOfEstimationOfRegionalAndGlobalEstimates>
    <Disaggregation name="Disaggregation">Age</Disaggregation>
    <Disaggregation name="Disaggregation">Location (urban/rural)</Disaggregation>
    <Disaggregation name="Disaggregation">By major region</Disaggregation>
    <Disaggregation name="Disaggregation">By province or similar level</Disaggregation>
    <Disaggregation name="Disaggregation">Education level</Disaggregation>
    <Disaggregation name="Disaggregation">By wealth quintile</Disaggregation>
    <Disaggregation name="Disaggregation">Marital status</Disaggregation>
    <Disaggregation name="Disaggregation">By method of contraception</Disaggregation>
    <UnitOfMeasure name="Unit of Measure">Percentage</UnitOfMeasure>
    <UnitMultiplier name="Unit Multiplier">
    </UnitMultiplier>
    <ExpectedFrequencyOfDataDissemination name="Expected frequency of data dissemination">Biennial (Two years)</ExpectedFrequencyOfDataDissemination>
    <ExpectedFrequencyOfDataCollection name="Expected frequency of data collection">
    </ExpectedFrequencyOfDataCollection>
    <CommentsAndLimitations name="Comments and limitations">The indicator “unmet need for family planning” provides complementary information to contraceptive prevalence.</CommentsAndLimitations>
    <Links name="Links" url="http://www.measuredhs.com/";>Demographic and Health Surveys (DHS)</Links>
    <Links name="Links" url="http://www.un.org/esa/population/publications/contraceptive2007/contraceptive2007.htm";>World Contraceptive Use 2007 (United Nations, 2008)</Links>
    <ContactPerson name="Contact person">
    </ContactPerson>
  </Indicator>
</Indicators>

References

On categories and dimensions and zooks
From: Jason Pickering, 2009-09-16
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-10-05
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-10-05
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-10-05
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-10-05
Fwd: On categories and dimensions and zooks
From: Jason Pickering, 2009-10-06
Re: Fwd: On categories and dimensions and zooks
From: Lars Helge Øverland, 2009-10-10
Re: Fwd: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-10-12