← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis-dev] DataElement -> PeriodType association

 

On 23 May 2010 08:36, Ola Hodne Titlestad <olatitle@xxxxxxxxx> wrote:
> On 23 May 2010 03:32, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>
>> On 22 May 2010 19:51, Ola Hodne Titlestad <olatitle@xxxxxxxxx> wrote:
>> > On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>> >>
>> >> On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>> >> > 2010/5/20 Ola Hodne Titlestad <olatitle@xxxxxxxxx>:
>> >> >>
>> >> >> 2010/5/20 Lars Helge Øverland <larshelge@xxxxxxxxx>
>> >> >>>
>> >> >>> Data elements derive their period type from the data sets they are
>> >> >>> members
>> >> >>> of.
>> >> >
>> >> > Restated (what I just sent Lars only by mistake):  a datavalue
>> >> > derives
>> >> > its period type from the data set of
>> >> > which its data element is a member  :-)
>> >> >
>> >> >>
>> >> >> And when they are members of two datasets with different period
>> >> >> types
>> >> >> they
>> >> >> have multiple period types right?
>> >> >
>> >> > It's important to remain aware that it is values ultimately which
>> >> > have
>> >> > periods (and hence period types).
>> >> >
>> >> > And when you look at a value you can derive its period type in one of
>> >> > two ways - via dataset or via period.  Potentially these could
>> >> > disagree,  The one which derives from its period should be considered
>> >> > authoritative ie. if the period is 2009-Jan then regardless of what
>> >> > the dataset might say this really must be monthly.  Of course we hope
>> >> > these always agree.  Incidentally the lookup from
>> >> > datelement-to-dataset-to-period looks like a greater complexity than
>> >> > the lookup from period->periodType.
>> >> >
>> >> >>
>> >> >> The key thing to look out for in data entry and data import is to
>> >> >> avoid
>> >> >> overlaps in data values that will cause duplication when aggregating
>> >> >> data
>> >> >> periods.
>> >> >> E.g. if the SAME ORGUNIT registers values for the same data element
>> >> >> for
>> >> >> two
>> >> >> different period types that have overlapping periods, e.g. Jan-10
>> >> >> and
>> >> >> Q1-10.
>> >> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will
>> >> >> all
>> >> >> show
>> >> >> an incorrect value since the value for Jan-10 is counted twice.
>> >> >
>> >> > OK.  Thats a good concrete constraint to have.
>> >> >
>> >> >>
>> >> >> One way to enforce this constraint is to monitor which datasets an
>> >> >> orgunit
>> >> >> is assigned to, and not allow orgunits to be assigned to two
>> >> >> datasets
>> >> >> that
>> >> >> have the same data element AND different period types.
>> >> >
>> >> > Agreed,  Though this constraint should probably be imposed on forms
>> >> > rather than datasets.
>> >> >
>> >> >>As far as I am aware,
>> >> >> we are not checking for this today. During data import it could be
>> >> >> checked
>> >> >> on data element level by looking up the period type the way Bob has
>> >> >> shown,
>> >> >> but that sounds like a lot of look ups and time consuming
>> >> >> validation,
>> >> >> or?
>> >> >
>> >> > On data import we don't really validate at all, beyond whatever
>> >> > constraints the db imposes. For efficiency we simply pop the values
>> >> > in
>> >> > with multiple insert statement.  So this validation would have to
>> >> > happen as a stage before the actual import or would have to be
>> >> > constrained within the db.  In fact it can't be validated easily
>> >> > before the import as it is dependent on existing values within the
>> >> > db.
>> >> >
>> >> >>
>> >> >> A relatively normal use case that we probably have to find a way to
>> >> >> support,
>> >> >> and I think they are struggling with in Vietnam, is that different
>> >> >> provinces
>> >> >> can use different period types for the same data elements (even for
>> >> >> complete
>> >> >> data sets). E.g. if the national data flow policy says to report on
>> >> >> immunisation data every quarter, so that becomes the minimum
>> >> >> requirement for
>> >> >> all provinces. Then some of the provinces decide that all their
>> >> >> facilities
>> >> >> have to collect this data monthly anyway, and then at the province
>> >> >> level
>> >> >> they simply send the quarterly aggregates to national level (in the
>> >> >> paper-based or Excel world). At the same time other provinces just
>> >> >> collect
>> >> >> quarterly data at the facility level as in the minimum national
>> >> >> requirement.
>> >> >> At the national level there is a need to consolidate all this data,
>> >> >> even
>> >> >> data by the facility level, so ideally a national DHIS database
>> >> >> should
>> >> >> be
>> >> >> able to store both monthly and quarterly raw data values for the
>> >> >> same
>> >> >> data
>> >> >> elements, but for different orgunits. The national information users
>> >> >> can
>> >> >> then easily generate quarterly reports on immunisation for all
>> >> >> provinces,
>> >> >> while in some provinces they can do monthly data analysis if they
>> >> >> want
>> >> >> to
>> >> >> collect data using that frequency.
>> >> >>
>> >> >> We support the above scenario by allowing the same data elements to
>> >> >> be
>> >> >> assigned to different data sets with different period types, but we
>> >> >> don't
>> >> >> control for misuse of this flexibility which can lead to duplication
>> >> >> and
>> >> >> inconsistent aggregated data values as pointed out above.
>> >> >
>> >> > Thinking further ... I really think the problem arises because we we
>> >> > have a dataset concept which represents a form and is also used to
>> >> > constrain periodtypes on dataelements.  Thinking of the use case you
>> >> > have just described, it should be the case that one can have a paper
>> >> > form which national level expect to collect quarterly, and the same
>> >> > form be used at a lower level to collect data monthly.  If we wanted
>> >> > to mirror that use case electronically we would have to divorce the
>> >> > form from the periodtype - ie a form would collect datavalues of a
>> >> > certain period, but the same form could be used in different orgunits
>> >> > for collecting data at a different frequency..
>> >> >
>> >> > So (leaving dataset aside for the moment) if we can't assign a
>> >> > periodtype to a form and we can't assign to a dataelement and its too
>> >> > inefficient to validate on a one by one datavalue basis what is a
>> >> > girl
>> >> > to do?
>> >> >
>> >> > I suspect the correct answer is to refactor datavalue and create a
>> >> > datavalueset type - note: a set of datavalues rather than a set of
>> >> > dataelements.  Designing out loud, a datavalueset would have the
>> >> > following fields/attributes:
>> >> >
>> >> > 1.  a formid - the collection instrument used - roughly corresponds
>> >> > to
>> >> > current dataset
>> >> > 2.  an orgunitid - where the datavalues come from
>> >> > 3.  a periodid - the period of all the datavalues
>> >> > couple of other useful attributes I can think of
>> >> >
>> >> > Datavalue now becomes slightly simpler (which is always a good
>> >> > thing).
>> >> >  It only has:
>> >> > value, dataelementid, categorycombooption, datasetid
>> >>
>> >> Afterthought:
>> >> At the risk of adding complexity to what is otherwise a
>> >> simplification, my life could become even simpler if datavalueset also
>> >> had a categorycombo attribute, which would imply that a dataset was
>> >> linked to a formsectionid rather than a formid.
>> >>
>> >> So a form has sections.  sections have dataelements.  And sections
>> >> have a datavalueset as a model - which implies a uniform categorycombo
>> >> within the section.
>> >>
>> >> There isn't really a need for dataelements to have a categorycombo.
>> >> And in lots of ways its good that they don't. Then I am reducing
>> >> complexity rather than adding to it :-)
>> >>
>> >> Consider one orgunit has collected malaria deaths disaggregated by
>> >> age.  Another has collected values for the the same dataelement, but
>> >> not disaggregated by age.  The datavalues will come from a
>> >> datavalueset so will have a categorycombo.  It is possible to
>> >> aggregate or compare these datavalues,from different datavaluesets,
>> >> but using the lowest common denominator of categorycombo ie. in both
>> >> cases you have access to malaria deaths - in the one case you have to
>> >> "roll-up" the categorycombo which does of course assume that the sum
>> >> of category options make a sensible whole, but Ola has mentioned this
>> >> one many times.
>> >>
>> >
>> > Some really interesting ideas you are bringing up here Bob. I like the
>> > kind
>> > of flexibility and yet structure this would bring to the data model.
>> >
>> > One quick question though:
>> > How would this fit with the use of data elements and
>> > categorycombooptions in
>> > metadata expressions like indicators and validation rules that are (and
>> > should be) completely independent from data collection structures? E.g.
>> > which categories and options should be available for a given data
>> > element
>> > when setting up an indicator formula? All?
>>
>> I think its a question of the "lowest common denominator" of the
>> datavalues that you have.  Indicators are calculated from datavalues
>> even though we express the calculation in terms of dataelements.
>>
>> Ivalue = f(de1,de2,de3...)/g(de4, de5 ..)
>>
>> Looking just at the numerator - if the set of datavalues you have
>> corresponding to de1, de2 and de3 share the same categorycombo (and
>> note that datavalues do have a categorycombo from which their
>> categoryoptioncombo is derived) , then you can also produce a
>> similalrly disaggregated indicator value.
>>
>> If they use different categorycombos (some have age+sex, some have
>> hiv_age+sex, and some have just sex), but each of these have at least
>> the sex category, then you could produce an indicator value
>> disaggregated by sex.
>>
>> If the categorycombos are a jumble of apples and pears then you can
>> produce just the rolled up calculation.
>>
> I like this idea.
>
>
>>
>> What is  the implication?  At design time, when you are coding the
>> expression, you probably should not include the categoryoptioncombo at
>> all.  The indicator is just expressed in terms of dataelements (I
>> guess traditional DHIS14 style).  But when you are generating for
>> example, the reporttable, the first pass analyzes the data you have
>> selected and suggests - would you like the indicator data
>> disaggregated by sex? Or age+sex?  Or no disaggregation.  So what you
>> can report on is determined by the data you've got.  I think that's a
>> sound principle.
>>
> I can see a few challenges with this principle. In typical implementations
> of DHIS you would design forms and canned/fixed reports at the same time
> before rolling out the installations. If it is impossible to design reports
> before you have any data values I can see a problem with this approach. But
> I guess you would know, from the forms information the potential
> datavaluesets and therefore could allow some disaggregated reports to be
> prepared even before you have any data values?

Yes I hesitated a bit before I suggested that.  You are of course
right.  You might generally want to express an indicator to be
reported in terms of a particular dataelement and categoryoptioncombo.
 And this would only produce results if you collect data using that
dataelement and that categorycombo.  Which would be the case if you
have forms which do that.

>
> Another issue I would like to bring up is performance. In the past we have
> struggled with and spent a lot of time on improving the performance of the
> datamart, the aggregation of data values. To me it sounds more complicated
> to have a floating set of disaggregations that needs to be looked up in a
> potentially huge storage of datavalues compared to working with a fixed set.
> Any thoughts on data mart service performance with this proposed design
> compared to the existing one?

I am not really sure.  Would have to think more and look closer at the
datamart.  I suspect it wouldn't actually make much difference.
Either way you are going to be aggregating datavalues with a
particular datalement value and a particular categoryoptioncombo.  I
can't see how the fact that a particualr categorycombo is hard-linked
to a dataelement at a particular moment in time actually should makes
any difference in the calculation.  Of course I could be wrong.  What
could make a difference is that instead of selecting datavalues from
the gazillions there might be some performance benefit of being able
to first easily select the datavaluesets to be aggregated - assuming
that datavalueset contains the important time and space dimensions of
period and orgunit.   As I say, I'm not really sure.  At the end of
the day you are still going to end up with the same bundle of
datavalues to crunch.  Having these datavalues grouped by period and
orgunit by virtue of being members of datavaluesets may or may not
help, but as I mentioned above, I suspect it won't actually make a
difference.

Lars has pointed out to me at least once, that it the categorycombo is
actually not hardlinked to the dataelement anyway.  You can change the
categorycombo on dataelement at any time in the maintenance module.
And what this would mean is that "from now on we're going to collect
datavalues for that dataelement using the new categorycombo".  Which
only feeds my curioisity into believing that the categorycombo isn't
really an innate characteristic of the dataelement.  Its just a way of
saying how we are *currently* collecting datavalues for it.  A form
object (or maybe form section) strikes me as a better place to
maintain that information.  For one thing, when you create a new form
you would still have your old ones in the system.  Having knowledge of
the forms in the system can give your report designer clues as to what
sort of indicators it can produce.  Including across 10 years of data
where the categorycombos have gone through some changes.

I'm sorry I haven't really thought this through and I do really have
other things to think about so I'm not likely to in the immediate
future... the train of thought really started when I start thinking
about having a form or data collection object - like we would have if
we implemented xforms or something similar.  Having a form and a
datavalueset (instead of a dhis dataset) creates new possibilities of
where to attach things like the categorycombo and periodtype.   But
that's not the primary motivation for having such a beast.

Regards
Bob

>
>> And I think all of this is completely independent of data collection
>> structures.
>>
>> Of course in practice you will have designed and deployed your
>> collection instruments such that all your datavalues for a given
>> dataelement will have the same categorycombo.  But if you want to
>> compare data over the past five years, and the ministry decided only
>> in year two that they wanted to disaggregate by sex and in year 4
>> decided to introduce a third sex category, then you could still
>> calculate an indicator from all of those datavalues - but by rolling
>> up sex category.
>>
>> I think what we do currently - specifying the categorycombo in the
>> indicator expression - is more rigid and more fragile.
>>
> Agree, and I think most indicators analysis will be on the data element
> level anyway (without any disaggregations), so the current design is too
> complicated and cumbersome to work with.
>
> Ola
> ----------
>
>>
>> In summary, what we have with categorycombos etc is really quite
>> brilliant.  We don't have ragged data.  Our datavalues are stored
>> compactly and uniformly.  All this is great.  I think a mistake we may
>> have made is attaching categorycombo to the dataelement.  The
>> relationship between a categorycombo and a dataelement can and should
>> be a transient thing.  I believe the categorycombo should be a
>> characteristic of the way we collect the particular datavalues ie. a
>> characteristic of a particular form.  There is a long conversation
>> before where it emerged that part of the original design rationale of
>> the categorycombo was indeed related to form layout.  At the time this
>> upset me a bit, because I too had bought into the rigid edifice we had
>> created.  But in retrospect I think this thinking was absolutely on
>> the right track.  Using the categorycombo to specify the
>> disaggregation layout of a particular form elements makes very good
>> sense.  What was also inspired was having the categorycombo as a named
>> persisted object in its own right which could be used across different
>> dataelements.
>>
>> Cheers
>> Bob
>>
>> >
>> > Ola
>> > --------
>> >
>> >
>> >
>> >
>> >>
>> >> Regards
>> >> Bob
>> >>
>> >> >
>> >> > We can relatively efficiently validate that a dataset object is not
>> >> > persisted which has the same formid, orgunitid and an overlapping
>> >> > period.
>> >> >
>> >> > There is no longer any ambiguity about periodtype of a datavalue.
>> >> >
>> >> > stored_by, timestamp, comment might go either way.  Probably they
>> >> > need
>> >> > to stay on datavalue.  I notice comment is rarely used but its really
>> >> > useful to have a comment on datavalueset for import purposes.
>> >> >
>> >> > 'nuff designing out loud. Got to go.
>> >> >
>> >> > Regards
>> >> > Bob
>> >> >
>> >> >>
>> >> >>
>> >> >> Ola
>> >> >> ---------
>> >> >>
>> >> >>>
>> >> >>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad
>> >> >>> <olatitle@xxxxxxxxx>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Hi,
>> >> >>>>
>> >> >>>> After Kim Anh's email about the use of the same data elements with
>> >> >>>> different period types I dug up this old discussion from March
>> >> >>>> 2009.
>> >> >>>>
>> >> >>>> What is the status on this work, or did we not conclude this?
>> >> >>>>
>> >> >>>> Ola
>> >> >>>> ----------
>> >> >>>>
>> >> >>>> 2009/3/20 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>> >> >>>>>
>> >> >>>>> 2009/3/20 Lars Helge Øverland <larshelge@xxxxxxxxx>:
>> >> >>>>> >
>> >> >>>>> >>
>> >> >>>>> >> Yes this is true.  But what do you think of the idea to
>> >> >>>>> >> enforce
>> >> >>>>> >> DataSet membership having a default DataSet for all the
>> >> >>>>> >> delinquents?
>> >> >>>>> >> I'm not sure if it can be enforced by the schema, but at least
>> >> >>>>> >> by
>> >> >>>>> >> the
>> >> >>>>> >> application.
>> >> >>>>> >
>> >> >>>>> > OK but what does this give us in terms of
>> >> >>>>> > PeriodType-determining
>> >> >>>>> > if
>> >> >>>>> > this
>> >> >>>>> > default DataSet has a null PeriodType?
>> >> >>>>>
>> >> >>>>> Nothing really.  The only effect would be you have an index on
>> >> >>>>> the
>> >> >>>>> unassigned DataElements for what its worth.  Mainly it would be
>> >> >>>>> useful
>> >> >>>>> for determining easily the available DataElements which can be
>> >> >>>>> added
>> >> >>>>> to a DataSet.  Maybe its a nonsense idea - I was just trying to
>> >> >>>>> think
>> >> >>>>> of ways to make editing DataSets reasonably straightforward.
>> >> >>>>>
>> >> >>>>> >
>> >> >>>>> >>
>> >> >>>>> >> I don't know if its about right or wrong.  There are pros and
>> >> >>>>> >> cons of
>> >> >>>>> >> both approaches.  What you gain on the swings you lose on the
>> >> >>>>> >> roundabouts :-)
>> >> >>>>> >>
>> >> >>>>> >> In the explicit case the application will have to enforce that
>> >> >>>>> >> DataSet
>> >> >>>>> >> members all have the same periodType.
>> >> >>>>> >>
>> >> >>>>> >> In the implicit case the application will have to enforce that
>> >> >>>>> >> DataElements can only be members of multiple groups if these
>> >> >>>>> >> share
>> >> >>>>> >> the
>> >> >>>>> >> same PeriodType.
>> >> >>>>> >>
>> >> >>>>> >> The net result as far as the Data API is concerned can and
>> >> >>>>> >> must
>> >> >>>>> >> be
>> >> >>>>> >> the
>> >> >>>>> >> same.  Perhaps we should define exactly what extra methods we
>> >> >>>>> >> want in
>> >> >>>>> >> the API first.  We have already identified a few.  Then decide
>> >> >>>>> >> whether
>> >> >>>>> >> a database change is necessitated by these.
>> >> >>>>> >
>> >> >>>>> > Yes. We need at least service method:
>> >> >>>>> >
>> >> >>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType
>> >> >>>>> > )
>> >> >>>>> >
>> >> >>>>> > and getter on the DataElement object:
>> >> >>>>> >
>> >> >>>>> > PeriodType getPeriodType()
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>> > I guess we could make a branch, start coding and see how it
>> >> >>>>> > works
>> >> >>>>> > out.
>> >> >>>>>
>> >> >>>>> Sure.  So long as we are adding methods we won't be breaking
>> >> >>>>> anything
>> >> >>>>> in terms of backward compatibility.  Just enforcing application
>> >> >>>>> level
>> >> >>>>> constraints.  Then we can really encourage (enforce?) upper
>> >> >>>>> layers
>> >> >>>>> to
>> >> >>>>> strictly interact with the data via the API.  Even if this might
>> >> >>>>> occasionally mean making some lightweight API methods which
>> >> >>>>> bypass
>> >> >>>>> the
>> >> >>>>> ORM.
>> >> >>>>>
>> >> >>>>> >
>> >> >>>>> > Another issue would arise in the (exotic) situation where
>> >> >>>>> > someone
>> >> >>>>> > assigns a
>> >> >>>>> > DataElement to a DataSet, enter data for it, then removes it
>> >> >>>>> > from
>> >> >>>>> > the
>> >> >>>>> > DataElement. The data is there, but how do we deal with it in
>> >> >>>>> > regard
>> >> >>>>> > to the
>> >> >>>>> > mentioned required functionaly (trend analysis, datamart) ?
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>> Yes this gets a bit weird (I presume you mean removes it from the
>> >> >>>>> DataSet).  I'm guessing you haven't lost the data because the
>> >> >>>>> dataValues each have a PeriodID which in turn is linked to a
>> >> >>>>> PeriodType.  I suppose that (in such an exotic headspace)
>> >> >>>>> DataElements
>> >> >>>>> can in fact change their PeriodTypes over time, though I imagine
>> >> >>>>> its
>> >> >>>>> not a great idea.
>> >> >>>>>
>> >> >>>>> The effect would be the same in the explicit relationship case,
>> >> >>>>> if
>> >> >>>>> someone assigns a DataElement to a DataSet, enter data for it,
>> >> >>>>> then
>> >> >>>>> changes the PeriodType of the DataElement ...
>> >> >>>>>
>> >> >>>>> Cheers
>> >> >>>>> Bob
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> Mailing list: https://launchpad.net/~dhis2-devs
>> >> >>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> >> >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> >> >>>>> More help   : https://help.launchpad.net/ListHelp
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>



References