← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis-dev] DataElement -> PeriodType association

 

On Sat, May 22, 2010 at 8:51 PM, Ola Hodne Titlestad <olatitle@xxxxxxxxx> wrote:
> On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>
>> On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>> > 2010/5/20 Ola Hodne Titlestad <olatitle@xxxxxxxxx>:
>> >>
>> >> 2010/5/20 Lars Helge Øverland <larshelge@xxxxxxxxx>
>> >>>
>> >>> Data elements derive their period type from the data sets they are
>> >>> members
>> >>> of.
>> >
>> > Restated (what I just sent Lars only by mistake):  a datavalue derives
>> > its period type from the data set of
>> > which its data element is a member  :-)
>> >
>> >>
>> >> And when they are members of two datasets with different period types
>> >> they
>> >> have multiple period types right?
>> >
>> > It's important to remain aware that it is values ultimately which have
>> > periods (and hence period types).
>> >
>> > And when you look at a value you can derive its period type in one of
>> > two ways - via dataset or via period.  Potentially these could
>> > disagree,  The one which derives from its period should be considered
>> > authoritative ie. if the period is 2009-Jan then regardless of what
>> > the dataset might say this really must be monthly.  Of course we hope
>> > these always agree.  Incidentally the lookup from
>> > datelement-to-dataset-to-period looks like a greater complexity than
>> > the lookup from period->periodType.
>> >
>> >>
>> >> The key thing to look out for in data entry and data import is to avoid
>> >> overlaps in data values that will cause duplication when aggregating
>> >> data
>> >> periods.
>> >> E.g. if the SAME ORGUNIT registers values for the same data element for
>> >> two
>> >> different period types that have overlapping periods, e.g. Jan-10 and
>> >> Q1-10.
>> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all
>> >> show
>> >> an incorrect value since the value for Jan-10 is counted twice.
>> >
>> > OK.  Thats a good concrete constraint to have.
>> >
>> >>
>> >> One way to enforce this constraint is to monitor which datasets an
>> >> orgunit
>> >> is assigned to, and not allow orgunits to be assigned to two datasets
>> >> that
>> >> have the same data element AND different period types.
>> >
>> > Agreed,  Though this constraint should probably be imposed on forms
>> > rather than datasets.
>> >
>> >>As far as I am aware,
>> >> we are not checking for this today. During data import it could be
>> >> checked
>> >> on data element level by looking up the period type the way Bob has
>> >> shown,
>> >> but that sounds like a lot of look ups and time consuming validation,
>> >> or?
>> >
>> > On data import we don't really validate at all, beyond whatever
>> > constraints the db imposes. For efficiency we simply pop the values in
>> > with multiple insert statement.  So this validation would have to
>> > happen as a stage before the actual import or would have to be
>> > constrained within the db.  In fact it can't be validated easily
>> > before the import as it is dependent on existing values within the db.
>> >
>> >>
>> >> A relatively normal use case that we probably have to find a way to
>> >> support,
>> >> and I think they are struggling with in Vietnam, is that different
>> >> provinces
>> >> can use different period types for the same data elements (even for
>> >> complete
>> >> data sets). E.g. if the national data flow policy says to report on
>> >> immunisation data every quarter, so that becomes the minimum
>> >> requirement for
>> >> all provinces. Then some of the provinces decide that all their
>> >> facilities
>> >> have to collect this data monthly anyway, and then at the province
>> >> level
>> >> they simply send the quarterly aggregates to national level (in the
>> >> paper-based or Excel world). At the same time other provinces just
>> >> collect
>> >> quarterly data at the facility level as in the minimum national
>> >> requirement.
>> >> At the national level there is a need to consolidate all this data,
>> >> even
>> >> data by the facility level, so ideally a national DHIS database should
>> >> be
>> >> able to store both monthly and quarterly raw data values for the same
>> >> data
>> >> elements, but for different orgunits. The national information users
>> >> can
>> >> then easily generate quarterly reports on immunisation for all
>> >> provinces,
>> >> while in some provinces they can do monthly data analysis if they want
>> >> to
>> >> collect data using that frequency.
>> >>
>> >> We support the above scenario by allowing the same data elements to be
>> >> assigned to different data sets with different period types, but we
>> >> don't
>> >> control for misuse of this flexibility which can lead to duplication
>> >> and
>> >> inconsistent aggregated data values as pointed out above.
>> >
>> > Thinking further ... I really think the problem arises because we we
>> > have a dataset concept which represents a form and is also used to
>> > constrain periodtypes on dataelements.  Thinking of the use case you
>> > have just described, it should be the case that one can have a paper
>> > form which national level expect to collect quarterly, and the same
>> > form be used at a lower level to collect data monthly.  If we wanted
>> > to mirror that use case electronically we would have to divorce the
>> > form from the periodtype - ie a form would collect datavalues of a
>> > certain period, but the same form could be used in different orgunits
>> > for collecting data at a different frequency..
>> >
>> > So (leaving dataset aside for the moment) if we can't assign a
>> > periodtype to a form and we can't assign to a dataelement and its too
>> > inefficient to validate on a one by one datavalue basis what is a girl
>> > to do?
>> >
>> > I suspect the correct answer is to refactor datavalue and create a
>> > datavalueset type - note: a set of datavalues rather than a set of
>> > dataelements.  Designing out loud, a datavalueset would have the
>> > following fields/attributes:
>> >
>> > 1.  a formid - the collection instrument used - roughly corresponds to
>> > current dataset
>> > 2.  an orgunitid - where the datavalues come from
>> > 3.  a periodid - the period of all the datavalues
>> > couple of other useful attributes I can think of
>> >
>> > Datavalue now becomes slightly simpler (which is always a good thing).
>> >  It only has:
>> > value, dataelementid, categorycombooption, datasetid
>>
>> Afterthought:
>> At the risk of adding complexity to what is otherwise a
>> simplification, my life could become even simpler if datavalueset also
>> had a categorycombo attribute, which would imply that a dataset was
>> linked to a formsectionid rather than a formid.
>>
>> So a form has sections.  sections have dataelements.  And sections
>> have a datavalueset as a model - which implies a uniform categorycombo
>> within the section.
>>
>> There isn't really a need for dataelements to have a categorycombo.
>> And in lots of ways its good that they don't. Then I am reducing
>> complexity rather than adding to it :-)
>>
>> Consider one orgunit has collected malaria deaths disaggregated by
>> age.  Another has collected values for the the same dataelement, but
>> not disaggregated by age.  The datavalues will come from a
>> datavalueset so will have a categorycombo.  It is possible to
>> aggregate or compare these datavalues,from different datavaluesets,
>> but using the lowest common denominator of categorycombo ie. in both
>> cases you have access to malaria deaths - in the one case you have to
>> "roll-up" the categorycombo which does of course assume that the sum
>> of category options make a sensible whole, but Ola has mentioned this
>> one many times.
>>
>
> Some really interesting ideas you are bringing up here Bob. I like the kind
> of flexibility and yet structure this would bring to the data model.

Agree that this is really interesting and important - and I don't want
to complicate things further, but from the perspective of my
department, there is also a need (mostly pronounced at higher levels
like national, but not necessarily) to accommodate estimates and
adjustments in values and indicators. This is linked to completeness -
when you know data is missing, you still want to have a reasonable
figure for reports. As an example: DHIS may not used in hospitals,
where all cesarean deliveries are performed. Thus, a province or
ministry relying only on data from DHIS will report 0 for this
particular dataelement, which is obviously wrong. I guess adjusted
figures are technically a bit like targets, in terms of how they
relate to dataelements and datavalues?

Or does this topic rather belong in its own thread/blueprint?

Knut



References