← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis-dev] DataElement -> PeriodType association

 

On 22 May 2010 19:51, Ola Hodne Titlestad <olatitle@xxxxxxxxx> wrote:
> On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>
>> On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>> > 2010/5/20 Ola Hodne Titlestad <olatitle@xxxxxxxxx>:
>> >>
>> >> 2010/5/20 Lars Helge Øverland <larshelge@xxxxxxxxx>
>> >>>
>> >>> Data elements derive their period type from the data sets they are
>> >>> members
>> >>> of.
>> >
>> > Restated (what I just sent Lars only by mistake):  a datavalue derives
>> > its period type from the data set of
>> > which its data element is a member  :-)
>> >
>> >>
>> >> And when they are members of two datasets with different period types
>> >> they
>> >> have multiple period types right?
>> >
>> > It's important to remain aware that it is values ultimately which have
>> > periods (and hence period types).
>> >
>> > And when you look at a value you can derive its period type in one of
>> > two ways - via dataset or via period.  Potentially these could
>> > disagree,  The one which derives from its period should be considered
>> > authoritative ie. if the period is 2009-Jan then regardless of what
>> > the dataset might say this really must be monthly.  Of course we hope
>> > these always agree.  Incidentally the lookup from
>> > datelement-to-dataset-to-period looks like a greater complexity than
>> > the lookup from period->periodType.
>> >
>> >>
>> >> The key thing to look out for in data entry and data import is to avoid
>> >> overlaps in data values that will cause duplication when aggregating
>> >> data
>> >> periods.
>> >> E.g. if the SAME ORGUNIT registers values for the same data element for
>> >> two
>> >> different period types that have overlapping periods, e.g. Jan-10 and
>> >> Q1-10.
>> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all
>> >> show
>> >> an incorrect value since the value for Jan-10 is counted twice.
>> >
>> > OK.  Thats a good concrete constraint to have.
>> >
>> >>
>> >> One way to enforce this constraint is to monitor which datasets an
>> >> orgunit
>> >> is assigned to, and not allow orgunits to be assigned to two datasets
>> >> that
>> >> have the same data element AND different period types.
>> >
>> > Agreed,  Though this constraint should probably be imposed on forms
>> > rather than datasets.
>> >
>> >>As far as I am aware,
>> >> we are not checking for this today. During data import it could be
>> >> checked
>> >> on data element level by looking up the period type the way Bob has
>> >> shown,
>> >> but that sounds like a lot of look ups and time consuming validation,
>> >> or?
>> >
>> > On data import we don't really validate at all, beyond whatever
>> > constraints the db imposes. For efficiency we simply pop the values in
>> > with multiple insert statement.  So this validation would have to
>> > happen as a stage before the actual import or would have to be
>> > constrained within the db.  In fact it can't be validated easily
>> > before the import as it is dependent on existing values within the db.
>> >
>> >>
>> >> A relatively normal use case that we probably have to find a way to
>> >> support,
>> >> and I think they are struggling with in Vietnam, is that different
>> >> provinces
>> >> can use different period types for the same data elements (even for
>> >> complete
>> >> data sets). E.g. if the national data flow policy says to report on
>> >> immunisation data every quarter, so that becomes the minimum
>> >> requirement for
>> >> all provinces. Then some of the provinces decide that all their
>> >> facilities
>> >> have to collect this data monthly anyway, and then at the province
>> >> level
>> >> they simply send the quarterly aggregates to national level (in the
>> >> paper-based or Excel world). At the same time other provinces just
>> >> collect
>> >> quarterly data at the facility level as in the minimum national
>> >> requirement.
>> >> At the national level there is a need to consolidate all this data,
>> >> even
>> >> data by the facility level, so ideally a national DHIS database should
>> >> be
>> >> able to store both monthly and quarterly raw data values for the same
>> >> data
>> >> elements, but for different orgunits. The national information users
>> >> can
>> >> then easily generate quarterly reports on immunisation for all
>> >> provinces,
>> >> while in some provinces they can do monthly data analysis if they want
>> >> to
>> >> collect data using that frequency.
>> >>
>> >> We support the above scenario by allowing the same data elements to be
>> >> assigned to different data sets with different period types, but we
>> >> don't
>> >> control for misuse of this flexibility which can lead to duplication
>> >> and
>> >> inconsistent aggregated data values as pointed out above.
>> >
>> > Thinking further ... I really think the problem arises because we we
>> > have a dataset concept which represents a form and is also used to
>> > constrain periodtypes on dataelements.  Thinking of the use case you
>> > have just described, it should be the case that one can have a paper
>> > form which national level expect to collect quarterly, and the same
>> > form be used at a lower level to collect data monthly.  If we wanted
>> > to mirror that use case electronically we would have to divorce the
>> > form from the periodtype - ie a form would collect datavalues of a
>> > certain period, but the same form could be used in different orgunits
>> > for collecting data at a different frequency..
>> >
>> > So (leaving dataset aside for the moment) if we can't assign a
>> > periodtype to a form and we can't assign to a dataelement and its too
>> > inefficient to validate on a one by one datavalue basis what is a girl
>> > to do?
>> >
>> > I suspect the correct answer is to refactor datavalue and create a
>> > datavalueset type - note: a set of datavalues rather than a set of
>> > dataelements.  Designing out loud, a datavalueset would have the
>> > following fields/attributes:
>> >
>> > 1.  a formid - the collection instrument used - roughly corresponds to
>> > current dataset
>> > 2.  an orgunitid - where the datavalues come from
>> > 3.  a periodid - the period of all the datavalues
>> > couple of other useful attributes I can think of
>> >
>> > Datavalue now becomes slightly simpler (which is always a good thing).
>> >  It only has:
>> > value, dataelementid, categorycombooption, datasetid
>>
>> Afterthought:
>> At the risk of adding complexity to what is otherwise a
>> simplification, my life could become even simpler if datavalueset also
>> had a categorycombo attribute, which would imply that a dataset was
>> linked to a formsectionid rather than a formid.
>>
>> So a form has sections.  sections have dataelements.  And sections
>> have a datavalueset as a model - which implies a uniform categorycombo
>> within the section.
>>
>> There isn't really a need for dataelements to have a categorycombo.
>> And in lots of ways its good that they don't. Then I am reducing
>> complexity rather than adding to it :-)
>>
>> Consider one orgunit has collected malaria deaths disaggregated by
>> age.  Another has collected values for the the same dataelement, but
>> not disaggregated by age.  The datavalues will come from a
>> datavalueset so will have a categorycombo.  It is possible to
>> aggregate or compare these datavalues,from different datavaluesets,
>> but using the lowest common denominator of categorycombo ie. in both
>> cases you have access to malaria deaths - in the one case you have to
>> "roll-up" the categorycombo which does of course assume that the sum
>> of category options make a sensible whole, but Ola has mentioned this
>> one many times.
>>
>
> Some really interesting ideas you are bringing up here Bob. I like the kind
> of flexibility and yet structure this would bring to the data model.
>
> One quick question though:
> How would this fit with the use of data elements and categorycombooptions in
> metadata expressions like indicators and validation rules that are (and
> should be) completely independent from data collection structures? E.g.
> which categories and options should be available for a given data element
> when setting up an indicator formula? All?

I think its a question of the "lowest common denominator" of the
datavalues that you have.  Indicators are calculated from datavalues
even though we express the calculation in terms of dataelements.

Ivalue = f(de1,de2,de3...)/g(de4, de5 ..)

Looking just at the numerator - if the set of datavalues you have
corresponding to de1, de2 and de3 share the same categorycombo (and
note that datavalues do have a categorycombo from which their
categoryoptioncombo is derived) , then you can also produce a
similalrly disaggregated indicator value.

If they use different categorycombos (some have age+sex, some have
hiv_age+sex, and some have just sex), but each of these have at least
the sex category, then you could produce an indicator value
disaggregated by sex.

If the categorycombos are a jumble of apples and pears then you can
produce just the rolled up calculation.

What is  the implication?  At design time, when you are coding the
expression, you probably should not include the categoryoptioncombo at
all.  The indicator is just expressed in terms of dataelements (I
guess traditional DHIS14 style).  But when you are generating for
example, the reporttable, the first pass analyzes the data you have
selected and suggests - would you like the indicator data
disaggregated by sex? Or age+sex?  Or no disaggregation.  So what you
can report on is determined by the data you've got.  I think that's a
sound principle.

And I think all of this is completely independent of data collection structures.

Of course in practice you will have designed and deployed your
collection instruments such that all your datavalues for a given
dataelement will have the same categorycombo.  But if you want to
compare data over the past five years, and the ministry decided only
in year two that they wanted to disaggregate by sex and in year 4
decided to introduce a third sex category, then you could still
calculate an indicator from all of those datavalues - but by rolling
up sex category.

I think what we do currently - specifying the categorycombo in the
indicator expression - is more rigid and more fragile.

In summary, what we have with categorycombos etc is really quite
brilliant.  We don't have ragged data.  Our datavalues are stored
compactly and uniformly.  All this is great.  I think a mistake we may
have made is attaching categorycombo to the dataelement.  The
relationship between a categorycombo and a dataelement can and should
be a transient thing.  I believe the categorycombo should be a
characteristic of the way we collect the particular datavalues ie. a
characteristic of a particular form.  There is a long conversation
before where it emerged that part of the original design rationale of
the categorycombo was indeed related to form layout.  At the time this
upset me a bit, because I too had bought into the rigid edifice we had
created.  But in retrospect I think this thinking was absolutely on
the right track.  Using the categorycombo to specify the
disaggregation layout of a particular form elements makes very good
sense.  What was also inspired was having the categorycombo as a named
persisted object in its own right which could be used across different
dataelements.

Cheers
Bob

>
> Ola
> --------
>
>
>
>
>>
>> Regards
>> Bob
>>
>> >
>> > We can relatively efficiently validate that a dataset object is not
>> > persisted which has the same formid, orgunitid and an overlapping
>> > period.
>> >
>> > There is no longer any ambiguity about periodtype of a datavalue.
>> >
>> > stored_by, timestamp, comment might go either way.  Probably they need
>> > to stay on datavalue.  I notice comment is rarely used but its really
>> > useful to have a comment on datavalueset for import purposes.
>> >
>> > 'nuff designing out loud. Got to go.
>> >
>> > Regards
>> > Bob
>> >
>> >>
>> >>
>> >> Ola
>> >> ---------
>> >>
>> >>>
>> >>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad
>> >>> <olatitle@xxxxxxxxx>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> After Kim Anh's email about the use of the same data elements with
>> >>>> different period types I dug up this old discussion from March 2009.
>> >>>>
>> >>>> What is the status on this work, or did we not conclude this?
>> >>>>
>> >>>> Ola
>> >>>> ----------
>> >>>>
>> >>>> 2009/3/20 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>> >>>>>
>> >>>>> 2009/3/20 Lars Helge Øverland <larshelge@xxxxxxxxx>:
>> >>>>> >
>> >>>>> >>
>> >>>>> >> Yes this is true.  But what do you think of the idea to enforce
>> >>>>> >> DataSet membership having a default DataSet for all the
>> >>>>> >> delinquents?
>> >>>>> >> I'm not sure if it can be enforced by the schema, but at least by
>> >>>>> >> the
>> >>>>> >> application.
>> >>>>> >
>> >>>>> > OK but what does this give us in terms of PeriodType-determining
>> >>>>> > if
>> >>>>> > this
>> >>>>> > default DataSet has a null PeriodType?
>> >>>>>
>> >>>>> Nothing really.  The only effect would be you have an index on the
>> >>>>> unassigned DataElements for what its worth.  Mainly it would be
>> >>>>> useful
>> >>>>> for determining easily the available DataElements which can be added
>> >>>>> to a DataSet.  Maybe its a nonsense idea - I was just trying to
>> >>>>> think
>> >>>>> of ways to make editing DataSets reasonably straightforward.
>> >>>>>
>> >>>>> >
>> >>>>> >>
>> >>>>> >> I don't know if its about right or wrong.  There are pros and
>> >>>>> >> cons of
>> >>>>> >> both approaches.  What you gain on the swings you lose on the
>> >>>>> >> roundabouts :-)
>> >>>>> >>
>> >>>>> >> In the explicit case the application will have to enforce that
>> >>>>> >> DataSet
>> >>>>> >> members all have the same periodType.
>> >>>>> >>
>> >>>>> >> In the implicit case the application will have to enforce that
>> >>>>> >> DataElements can only be members of multiple groups if these
>> >>>>> >> share
>> >>>>> >> the
>> >>>>> >> same PeriodType.
>> >>>>> >>
>> >>>>> >> The net result as far as the Data API is concerned can and must
>> >>>>> >> be
>> >>>>> >> the
>> >>>>> >> same.  Perhaps we should define exactly what extra methods we
>> >>>>> >> want in
>> >>>>> >> the API first.  We have already identified a few.  Then decide
>> >>>>> >> whether
>> >>>>> >> a database change is necessitated by these.
>> >>>>> >
>> >>>>> > Yes. We need at least service method:
>> >>>>> >
>> >>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType )
>> >>>>> >
>> >>>>> > and getter on the DataElement object:
>> >>>>> >
>> >>>>> > PeriodType getPeriodType()
>> >>>>> >
>> >>>>> >
>> >>>>> > I guess we could make a branch, start coding and see how it works
>> >>>>> > out.
>> >>>>>
>> >>>>> Sure.  So long as we are adding methods we won't be breaking
>> >>>>> anything
>> >>>>> in terms of backward compatibility.  Just enforcing application
>> >>>>> level
>> >>>>> constraints.  Then we can really encourage (enforce?) upper layers
>> >>>>> to
>> >>>>> strictly interact with the data via the API.  Even if this might
>> >>>>> occasionally mean making some lightweight API methods which bypass
>> >>>>> the
>> >>>>> ORM.
>> >>>>>
>> >>>>> >
>> >>>>> > Another issue would arise in the (exotic) situation where someone
>> >>>>> > assigns a
>> >>>>> > DataElement to a DataSet, enter data for it, then removes it from
>> >>>>> > the
>> >>>>> > DataElement. The data is there, but how do we deal with it in
>> >>>>> > regard
>> >>>>> > to the
>> >>>>> > mentioned required functionaly (trend analysis, datamart) ?
>> >>>>> >
>> >>>>>
>> >>>>> Yes this gets a bit weird (I presume you mean removes it from the
>> >>>>> DataSet).  I'm guessing you haven't lost the data because the
>> >>>>> dataValues each have a PeriodID which in turn is linked to a
>> >>>>> PeriodType.  I suppose that (in such an exotic headspace)
>> >>>>> DataElements
>> >>>>> can in fact change their PeriodTypes over time, though I imagine its
>> >>>>> not a great idea.
>> >>>>>
>> >>>>> The effect would be the same in the explicit relationship case, if
>> >>>>> someone assigns a DataElement to a DataSet, enter data for it, then
>> >>>>> changes the PeriodType of the DataElement ...
>> >>>>>
>> >>>>> Cheers
>> >>>>> Bob
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Mailing list: https://launchpad.net/~dhis2-devs
>> >>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> >>>>> More help   : https://help.launchpad.net/ListHelp
>> >>>>
>> >>>
>> >>
>> >>
>> >
>
>



Follow ups

References