← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis-dev] DataElement -> PeriodType association

 

On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> 2010/5/20 Ola Hodne Titlestad <olatitle@xxxxxxxxx>:
>>
>> 2010/5/20 Lars Helge Øverland <larshelge@xxxxxxxxx>
>>>
>>> Data elements derive their period type from the data sets they are members
>>> of.
>
> Restated (what I just sent Lars only by mistake):  a datavalue derives
> its period type from the data set of
> which its data element is a member  :-)
>
>>
>> And when they are members of two datasets with different period types they
>> have multiple period types right?
>
> It's important to remain aware that it is values ultimately which have
> periods (and hence period types).
>
> And when you look at a value you can derive its period type in one of
> two ways - via dataset or via period.  Potentially these could
> disagree,  The one which derives from its period should be considered
> authoritative ie. if the period is 2009-Jan then regardless of what
> the dataset might say this really must be monthly.  Of course we hope
> these always agree.  Incidentally the lookup from
> datelement-to-dataset-to-period looks like a greater complexity than
> the lookup from period->periodType.
>
>>
>> The key thing to look out for in data entry and data import is to avoid
>> overlaps in data values that will cause duplication when aggregating data
>> periods.
>> E.g. if the SAME ORGUNIT registers values for the same data element for two
>> different period types that have overlapping periods, e.g. Jan-10 and Q1-10.
>> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all show
>> an incorrect value since the value for Jan-10 is counted twice.
>
> OK.  Thats a good concrete constraint to have.
>
>>
>> One way to enforce this constraint is to monitor which datasets an orgunit
>> is assigned to, and not allow orgunits to be assigned to two datasets that
>> have the same data element AND different period types.
>
> Agreed,  Though this constraint should probably be imposed on forms
> rather than datasets.
>
>>As far as I am aware,
>> we are not checking for this today. During data import it could be checked
>> on data element level by looking up the period type the way Bob has shown,
>> but that sounds like a lot of look ups and time consuming validation, or?
>
> On data import we don't really validate at all, beyond whatever
> constraints the db imposes. For efficiency we simply pop the values in
> with multiple insert statement.  So this validation would have to
> happen as a stage before the actual import or would have to be
> constrained within the db.  In fact it can't be validated easily
> before the import as it is dependent on existing values within the db.
>
>>
>> A relatively normal use case that we probably have to find a way to support,
>> and I think they are struggling with in Vietnam, is that different provinces
>> can use different period types for the same data elements (even for complete
>> data sets). E.g. if the national data flow policy says to report on
>> immunisation data every quarter, so that becomes the minimum requirement for
>> all provinces. Then some of the provinces decide that all their facilities
>> have to collect this data monthly anyway, and then at the province level
>> they simply send the quarterly aggregates to national level (in the
>> paper-based or Excel world). At the same time other provinces just collect
>> quarterly data at the facility level as in the minimum national requirement.
>> At the national level there is a need to consolidate all this data, even
>> data by the facility level, so ideally a national DHIS database should be
>> able to store both monthly and quarterly raw data values for the same data
>> elements, but for different orgunits. The national information users can
>> then easily generate quarterly reports on immunisation for all provinces,
>> while in some provinces they can do monthly data analysis if they want to
>> collect data using that frequency.
>>
>> We support the above scenario by allowing the same data elements to be
>> assigned to different data sets with different period types, but we don't
>> control for misuse of this flexibility which can lead to duplication and
>> inconsistent aggregated data values as pointed out above.
>
> Thinking further ... I really think the problem arises because we we
> have a dataset concept which represents a form and is also used to
> constrain periodtypes on dataelements.  Thinking of the use case you
> have just described, it should be the case that one can have a paper
> form which national level expect to collect quarterly, and the same
> form be used at a lower level to collect data monthly.  If we wanted
> to mirror that use case electronically we would have to divorce the
> form from the periodtype - ie a form would collect datavalues of a
> certain period, but the same form could be used in different orgunits
> for collecting data at a different frequency..
>
> So (leaving dataset aside for the moment) if we can't assign a
> periodtype to a form and we can't assign to a dataelement and its too
> inefficient to validate on a one by one datavalue basis what is a girl
> to do?
>
> I suspect the correct answer is to refactor datavalue and create a
> datavalueset type - note: a set of datavalues rather than a set of
> dataelements.  Designing out loud, a datavalueset would have the
> following fields/attributes:
>
> 1.  a formid - the collection instrument used - roughly corresponds to
> current dataset
> 2.  an orgunitid - where the datavalues come from
> 3.  a periodid - the period of all the datavalues
> couple of other useful attributes I can think of
>
> Datavalue now becomes slightly simpler (which is always a good thing).
>  It only has:
> value, dataelementid, categorycombooption, datasetid

Afterthought:
At the risk of adding complexity to what is otherwise a
simplification, my life could become even simpler if datavalueset also
had a categorycombo attribute, which would imply that a dataset was
linked to a formsectionid rather than a formid.

So a form has sections.  sections have dataelements.  And sections
have a datavalueset as a model - which implies a uniform categorycombo
within the section.

There isn't really a need for dataelements to have a categorycombo.
And in lots of ways its good that they don't. Then I am reducing
complexity rather than adding to it :-)

Consider one orgunit has collected malaria deaths disaggregated by
age.  Another has collected values for the the same dataelement, but
not disaggregated by age.  The datavalues will come from a
datavalueset so will have a categorycombo.  It is possible to
aggregate or compare these datavalues,from different datavaluesets,
but using the lowest common denominator of categorycombo ie. in both
cases you have access to malaria deaths - in the one case you have to
"roll-up" the categorycombo which does of course assume that the sum
of category options make a sensible whole, but Ola has mentioned this
one many times.

Regards
Bob

>
> We can relatively efficiently validate that a dataset object is not
> persisted which has the same formid, orgunitid and an overlapping
> period.
>
> There is no longer any ambiguity about periodtype of a datavalue.
>
> stored_by, timestamp, comment might go either way.  Probably they need
> to stay on datavalue.  I notice comment is rarely used but its really
> useful to have a comment on datavalueset for import purposes.
>
> 'nuff designing out loud. Got to go.
>
> Regards
> Bob
>
>>
>>
>> Ola
>> ---------
>>
>>>
>>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad <olatitle@xxxxxxxxx>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> After Kim Anh's email about the use of the same data elements with
>>>> different period types I dug up this old discussion from March 2009.
>>>>
>>>> What is the status on this work, or did we not conclude this?
>>>>
>>>> Ola
>>>> ----------
>>>>
>>>> 2009/3/20 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>>>>
>>>>> 2009/3/20 Lars Helge Øverland <larshelge@xxxxxxxxx>:
>>>>> >
>>>>> >>
>>>>> >> Yes this is true.  But what do you think of the idea to enforce
>>>>> >> DataSet membership having a default DataSet for all the delinquents?
>>>>> >> I'm not sure if it can be enforced by the schema, but at least by the
>>>>> >> application.
>>>>> >
>>>>> > OK but what does this give us in terms of PeriodType-determining if
>>>>> > this
>>>>> > default DataSet has a null PeriodType?
>>>>>
>>>>> Nothing really.  The only effect would be you have an index on the
>>>>> unassigned DataElements for what its worth.  Mainly it would be useful
>>>>> for determining easily the available DataElements which can be added
>>>>> to a DataSet.  Maybe its a nonsense idea - I was just trying to think
>>>>> of ways to make editing DataSets reasonably straightforward.
>>>>>
>>>>> >
>>>>> >>
>>>>> >> I don't know if its about right or wrong.  There are pros and cons of
>>>>> >> both approaches.  What you gain on the swings you lose on the
>>>>> >> roundabouts :-)
>>>>> >>
>>>>> >> In the explicit case the application will have to enforce that
>>>>> >> DataSet
>>>>> >> members all have the same periodType.
>>>>> >>
>>>>> >> In the implicit case the application will have to enforce that
>>>>> >> DataElements can only be members of multiple groups if these share
>>>>> >> the
>>>>> >> same PeriodType.
>>>>> >>
>>>>> >> The net result as far as the Data API is concerned can and must be
>>>>> >> the
>>>>> >> same.  Perhaps we should define exactly what extra methods we want in
>>>>> >> the API first.  We have already identified a few.  Then decide
>>>>> >> whether
>>>>> >> a database change is necessitated by these.
>>>>> >
>>>>> > Yes. We need at least service method:
>>>>> >
>>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType )
>>>>> >
>>>>> > and getter on the DataElement object:
>>>>> >
>>>>> > PeriodType getPeriodType()
>>>>> >
>>>>> >
>>>>> > I guess we could make a branch, start coding and see how it works out.
>>>>>
>>>>> Sure.  So long as we are adding methods we won't be breaking anything
>>>>> in terms of backward compatibility.  Just enforcing application level
>>>>> constraints.  Then we can really encourage (enforce?) upper layers to
>>>>> strictly interact with the data via the API.  Even if this might
>>>>> occasionally mean making some lightweight API methods which bypass the
>>>>> ORM.
>>>>>
>>>>> >
>>>>> > Another issue would arise in the (exotic) situation where someone
>>>>> > assigns a
>>>>> > DataElement to a DataSet, enter data for it, then removes it from the
>>>>> > DataElement. The data is there, but how do we deal with it in regard
>>>>> > to the
>>>>> > mentioned required functionaly (trend analysis, datamart) ?
>>>>> >
>>>>>
>>>>> Yes this gets a bit weird (I presume you mean removes it from the
>>>>> DataSet).  I'm guessing you haven't lost the data because the
>>>>> dataValues each have a PeriodID which in turn is linked to a
>>>>> PeriodType.  I suppose that (in such an exotic headspace) DataElements
>>>>> can in fact change their PeriodTypes over time, though I imagine its
>>>>> not a great idea.
>>>>>
>>>>> The effect would be the same in the explicit relationship case, if
>>>>> someone assigns a DataElement to a DataSet, enter data for it, then
>>>>> changes the PeriodType of the DataElement ...
>>>>>
>>>>> Cheers
>>>>> Bob
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>
>>
>>
>



Follow ups

References