← Back to team overview

dhis2-devs team mailing list archive

Re: [Dhis-dev] DataElement -> PeriodType association

 

On 23 May 2010 03:32, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:

> On 22 May 2010 19:51, Ola Hodne Titlestad <olatitle@xxxxxxxxx> wrote:
> > On 20 May 2010 18:39, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> >>
> >> On 20 May 2010 15:56, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> >> > 2010/5/20 Ola Hodne Titlestad <olatitle@xxxxxxxxx>:
> >> >>
> >> >> 2010/5/20 Lars Helge Øverland <larshelge@xxxxxxxxx>
> >> >>>
> >> >>> Data elements derive their period type from the data sets they are
> >> >>> members
> >> >>> of.
> >> >
> >> > Restated (what I just sent Lars only by mistake):  a datavalue derives
> >> > its period type from the data set of
> >> > which its data element is a member  :-)
> >> >
> >> >>
> >> >> And when they are members of two datasets with different period types
> >> >> they
> >> >> have multiple period types right?
> >> >
> >> > It's important to remain aware that it is values ultimately which have
> >> > periods (and hence period types).
> >> >
> >> > And when you look at a value you can derive its period type in one of
> >> > two ways - via dataset or via period.  Potentially these could
> >> > disagree,  The one which derives from its period should be considered
> >> > authoritative ie. if the period is 2009-Jan then regardless of what
> >> > the dataset might say this really must be monthly.  Of course we hope
> >> > these always agree.  Incidentally the lookup from
> >> > datelement-to-dataset-to-period looks like a greater complexity than
> >> > the lookup from period->periodType.
> >> >
> >> >>
> >> >> The key thing to look out for in data entry and data import is to
> avoid
> >> >> overlaps in data values that will cause duplication when aggregating
> >> >> data
> >> >> periods.
> >> >> E.g. if the SAME ORGUNIT registers values for the same data element
> for
> >> >> two
> >> >> different period types that have overlapping periods, e.g. Jan-10 and
> >> >> Q1-10.
> >> >> Then the aggregate values for Q1-10, Jan-June 2010, and 2010 will all
> >> >> show
> >> >> an incorrect value since the value for Jan-10 is counted twice.
> >> >
> >> > OK.  Thats a good concrete constraint to have.
> >> >
> >> >>
> >> >> One way to enforce this constraint is to monitor which datasets an
> >> >> orgunit
> >> >> is assigned to, and not allow orgunits to be assigned to two datasets
> >> >> that
> >> >> have the same data element AND different period types.
> >> >
> >> > Agreed,  Though this constraint should probably be imposed on forms
> >> > rather than datasets.
> >> >
> >> >>As far as I am aware,
> >> >> we are not checking for this today. During data import it could be
> >> >> checked
> >> >> on data element level by looking up the period type the way Bob has
> >> >> shown,
> >> >> but that sounds like a lot of look ups and time consuming validation,
> >> >> or?
> >> >
> >> > On data import we don't really validate at all, beyond whatever
> >> > constraints the db imposes. For efficiency we simply pop the values in
> >> > with multiple insert statement.  So this validation would have to
> >> > happen as a stage before the actual import or would have to be
> >> > constrained within the db.  In fact it can't be validated easily
> >> > before the import as it is dependent on existing values within the db.
> >> >
> >> >>
> >> >> A relatively normal use case that we probably have to find a way to
> >> >> support,
> >> >> and I think they are struggling with in Vietnam, is that different
> >> >> provinces
> >> >> can use different period types for the same data elements (even for
> >> >> complete
> >> >> data sets). E.g. if the national data flow policy says to report on
> >> >> immunisation data every quarter, so that becomes the minimum
> >> >> requirement for
> >> >> all provinces. Then some of the provinces decide that all their
> >> >> facilities
> >> >> have to collect this data monthly anyway, and then at the province
> >> >> level
> >> >> they simply send the quarterly aggregates to national level (in the
> >> >> paper-based or Excel world). At the same time other provinces just
> >> >> collect
> >> >> quarterly data at the facility level as in the minimum national
> >> >> requirement.
> >> >> At the national level there is a need to consolidate all this data,
> >> >> even
> >> >> data by the facility level, so ideally a national DHIS database
> should
> >> >> be
> >> >> able to store both monthly and quarterly raw data values for the same
> >> >> data
> >> >> elements, but for different orgunits. The national information users
> >> >> can
> >> >> then easily generate quarterly reports on immunisation for all
> >> >> provinces,
> >> >> while in some provinces they can do monthly data analysis if they
> want
> >> >> to
> >> >> collect data using that frequency.
> >> >>
> >> >> We support the above scenario by allowing the same data elements to
> be
> >> >> assigned to different data sets with different period types, but we
> >> >> don't
> >> >> control for misuse of this flexibility which can lead to duplication
> >> >> and
> >> >> inconsistent aggregated data values as pointed out above.
> >> >
> >> > Thinking further ... I really think the problem arises because we we
> >> > have a dataset concept which represents a form and is also used to
> >> > constrain periodtypes on dataelements.  Thinking of the use case you
> >> > have just described, it should be the case that one can have a paper
> >> > form which national level expect to collect quarterly, and the same
> >> > form be used at a lower level to collect data monthly.  If we wanted
> >> > to mirror that use case electronically we would have to divorce the
> >> > form from the periodtype - ie a form would collect datavalues of a
> >> > certain period, but the same form could be used in different orgunits
> >> > for collecting data at a different frequency..
> >> >
> >> > So (leaving dataset aside for the moment) if we can't assign a
> >> > periodtype to a form and we can't assign to a dataelement and its too
> >> > inefficient to validate on a one by one datavalue basis what is a girl
> >> > to do?
> >> >
> >> > I suspect the correct answer is to refactor datavalue and create a
> >> > datavalueset type - note: a set of datavalues rather than a set of
> >> > dataelements.  Designing out loud, a datavalueset would have the
> >> > following fields/attributes:
> >> >
> >> > 1.  a formid - the collection instrument used - roughly corresponds to
> >> > current dataset
> >> > 2.  an orgunitid - where the datavalues come from
> >> > 3.  a periodid - the period of all the datavalues
> >> > couple of other useful attributes I can think of
> >> >
> >> > Datavalue now becomes slightly simpler (which is always a good thing).
> >> >  It only has:
> >> > value, dataelementid, categorycombooption, datasetid
> >>
> >> Afterthought:
> >> At the risk of adding complexity to what is otherwise a
> >> simplification, my life could become even simpler if datavalueset also
> >> had a categorycombo attribute, which would imply that a dataset was
> >> linked to a formsectionid rather than a formid.
> >>
> >> So a form has sections.  sections have dataelements.  And sections
> >> have a datavalueset as a model - which implies a uniform categorycombo
> >> within the section.
> >>
> >> There isn't really a need for dataelements to have a categorycombo.
> >> And in lots of ways its good that they don't. Then I am reducing
> >> complexity rather than adding to it :-)
> >>
> >> Consider one orgunit has collected malaria deaths disaggregated by
> >> age.  Another has collected values for the the same dataelement, but
> >> not disaggregated by age.  The datavalues will come from a
> >> datavalueset so will have a categorycombo.  It is possible to
> >> aggregate or compare these datavalues,from different datavaluesets,
> >> but using the lowest common denominator of categorycombo ie. in both
> >> cases you have access to malaria deaths - in the one case you have to
> >> "roll-up" the categorycombo which does of course assume that the sum
> >> of category options make a sensible whole, but Ola has mentioned this
> >> one many times.
> >>
> >
> > Some really interesting ideas you are bringing up here Bob. I like the
> kind
> > of flexibility and yet structure this would bring to the data model.
> >
> > One quick question though:
> > How would this fit with the use of data elements and categorycombooptions
> in
> > metadata expressions like indicators and validation rules that are (and
> > should be) completely independent from data collection structures? E.g.
> > which categories and options should be available for a given data element
> > when setting up an indicator formula? All?
>
> I think its a question of the "lowest common denominator" of the
> datavalues that you have.  Indicators are calculated from datavalues
> even though we express the calculation in terms of dataelements.
>
> Ivalue = f(de1,de2,de3...)/g(de4, de5 ..)
>
> Looking just at the numerator - if the set of datavalues you have
> corresponding to de1, de2 and de3 share the same categorycombo (and
> note that datavalues do have a categorycombo from which their
> categoryoptioncombo is derived) , then you can also produce a
> similalrly disaggregated indicator value.
>
> If they use different categorycombos (some have age+sex, some have
> hiv_age+sex, and some have just sex), but each of these have at least
> the sex category, then you could produce an indicator value
> disaggregated by sex.
>
> If the categorycombos are a jumble of apples and pears then you can
> produce just the rolled up calculation.
>
> I like this idea.



> What is  the implication?  At design time, when you are coding the
> expression, you probably should not include the categoryoptioncombo at
> all.  The indicator is just expressed in terms of dataelements (I
> guess traditional DHIS14 style).  But when you are generating for
> example, the reporttable, the first pass analyzes the data you have
> selected and suggests - would you like the indicator data
> disaggregated by sex? Or age+sex?  Or no disaggregation.  So what you
> can report on is determined by the data you've got.  I think that's a
> sound principle.
>
> I can see a few challenges with this principle. In typical implementations
of DHIS you would design forms and canned/fixed reports at the same time
before rolling out the installations. If it is impossible to design reports
before you have any data values I can see a problem with this approach. But
I guess you would know, from the forms information the potential
datavaluesets and therefore could allow some disaggregated reports to be
prepared even before you have any data values?

Another issue I would like to bring up is performance. In the past we have
struggled with and spent a lot of time on improving the performance of the
datamart, the aggregation of data values. To me it sounds more complicated
to have a floating set of disaggregations that needs to be looked up in a
potentially huge storage of datavalues compared to working with a fixed set.
Any thoughts on data mart service performance with this proposed design
compared to the existing one?

And I think all of this is completely independent of data collection
> structures.
>
> Of course in practice you will have designed and deployed your
> collection instruments such that all your datavalues for a given
> dataelement will have the same categorycombo.  But if you want to
> compare data over the past five years, and the ministry decided only
> in year two that they wanted to disaggregate by sex and in year 4
> decided to introduce a third sex category, then you could still
> calculate an indicator from all of those datavalues - but by rolling
> up sex category.
>
> I think what we do currently - specifying the categorycombo in the
> indicator expression - is more rigid and more fragile.
>
> Agree, and I think most indicators analysis will be on the data element
level anyway (without any disaggregations), so the current design is too
complicated and cumbersome to work with.

Ola
----------


> In summary, what we have with categorycombos etc is really quite
> brilliant.  We don't have ragged data.  Our datavalues are stored
> compactly and uniformly.  All this is great.  I think a mistake we may
> have made is attaching categorycombo to the dataelement.  The
> relationship between a categorycombo and a dataelement can and should
> be a transient thing.  I believe the categorycombo should be a
> characteristic of the way we collect the particular datavalues ie. a
> characteristic of a particular form.  There is a long conversation
> before where it emerged that part of the original design rationale of
> the categorycombo was indeed related to form layout.  At the time this
> upset me a bit, because I too had bought into the rigid edifice we had
> created.  But in retrospect I think this thinking was absolutely on
> the right track.  Using the categorycombo to specify the
> disaggregation layout of a particular form elements makes very good
> sense.  What was also inspired was having the categorycombo as a named
> persisted object in its own right which could be used across different
> dataelements.
>
> Cheers
> Bob
>
> >
> > Ola
> > --------
> >
> >
> >
> >
> >>
> >> Regards
> >> Bob
> >>
> >> >
> >> > We can relatively efficiently validate that a dataset object is not
> >> > persisted which has the same formid, orgunitid and an overlapping
> >> > period.
> >> >
> >> > There is no longer any ambiguity about periodtype of a datavalue.
> >> >
> >> > stored_by, timestamp, comment might go either way.  Probably they need
> >> > to stay on datavalue.  I notice comment is rarely used but its really
> >> > useful to have a comment on datavalueset for import purposes.
> >> >
> >> > 'nuff designing out loud. Got to go.
> >> >
> >> > Regards
> >> > Bob
> >> >
> >> >>
> >> >>
> >> >> Ola
> >> >> ---------
> >> >>
> >> >>>
> >> >>> On Thu, May 20, 2010 at 11:44 AM, Ola Hodne Titlestad
> >> >>> <olatitle@xxxxxxxxx>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> After Kim Anh's email about the use of the same data elements with
> >> >>>> different period types I dug up this old discussion from March
> 2009.
> >> >>>>
> >> >>>> What is the status on this work, or did we not conclude this?
> >> >>>>
> >> >>>> Ola
> >> >>>> ----------
> >> >>>>
> >> >>>> 2009/3/20 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> >> >>>>>
> >> >>>>> 2009/3/20 Lars Helge Øverland <larshelge@xxxxxxxxx>:
> >> >>>>> >
> >> >>>>> >>
> >> >>>>> >> Yes this is true.  But what do you think of the idea to enforce
> >> >>>>> >> DataSet membership having a default DataSet for all the
> >> >>>>> >> delinquents?
> >> >>>>> >> I'm not sure if it can be enforced by the schema, but at least
> by
> >> >>>>> >> the
> >> >>>>> >> application.
> >> >>>>> >
> >> >>>>> > OK but what does this give us in terms of PeriodType-determining
> >> >>>>> > if
> >> >>>>> > this
> >> >>>>> > default DataSet has a null PeriodType?
> >> >>>>>
> >> >>>>> Nothing really.  The only effect would be you have an index on the
> >> >>>>> unassigned DataElements for what its worth.  Mainly it would be
> >> >>>>> useful
> >> >>>>> for determining easily the available DataElements which can be
> added
> >> >>>>> to a DataSet.  Maybe its a nonsense idea - I was just trying to
> >> >>>>> think
> >> >>>>> of ways to make editing DataSets reasonably straightforward.
> >> >>>>>
> >> >>>>> >
> >> >>>>> >>
> >> >>>>> >> I don't know if its about right or wrong.  There are pros and
> >> >>>>> >> cons of
> >> >>>>> >> both approaches.  What you gain on the swings you lose on the
> >> >>>>> >> roundabouts :-)
> >> >>>>> >>
> >> >>>>> >> In the explicit case the application will have to enforce that
> >> >>>>> >> DataSet
> >> >>>>> >> members all have the same periodType.
> >> >>>>> >>
> >> >>>>> >> In the implicit case the application will have to enforce that
> >> >>>>> >> DataElements can only be members of multiple groups if these
> >> >>>>> >> share
> >> >>>>> >> the
> >> >>>>> >> same PeriodType.
> >> >>>>> >>
> >> >>>>> >> The net result as far as the Data API is concerned can and must
> >> >>>>> >> be
> >> >>>>> >> the
> >> >>>>> >> same.  Perhaps we should define exactly what extra methods we
> >> >>>>> >> want in
> >> >>>>> >> the API first.  We have already identified a few.  Then decide
> >> >>>>> >> whether
> >> >>>>> >> a database change is necessitated by these.
> >> >>>>> >
> >> >>>>> > Yes. We need at least service method:
> >> >>>>> >
> >> >>>>> > Collection<DataElement> getDataElementsByPeriodType( PeriodType
> )
> >> >>>>> >
> >> >>>>> > and getter on the DataElement object:
> >> >>>>> >
> >> >>>>> > PeriodType getPeriodType()
> >> >>>>> >
> >> >>>>> >
> >> >>>>> > I guess we could make a branch, start coding and see how it
> works
> >> >>>>> > out.
> >> >>>>>
> >> >>>>> Sure.  So long as we are adding methods we won't be breaking
> >> >>>>> anything
> >> >>>>> in terms of backward compatibility.  Just enforcing application
> >> >>>>> level
> >> >>>>> constraints.  Then we can really encourage (enforce?) upper layers
> >> >>>>> to
> >> >>>>> strictly interact with the data via the API.  Even if this might
> >> >>>>> occasionally mean making some lightweight API methods which bypass
> >> >>>>> the
> >> >>>>> ORM.
> >> >>>>>
> >> >>>>> >
> >> >>>>> > Another issue would arise in the (exotic) situation where
> someone
> >> >>>>> > assigns a
> >> >>>>> > DataElement to a DataSet, enter data for it, then removes it
> from
> >> >>>>> > the
> >> >>>>> > DataElement. The data is there, but how do we deal with it in
> >> >>>>> > regard
> >> >>>>> > to the
> >> >>>>> > mentioned required functionaly (trend analysis, datamart) ?
> >> >>>>> >
> >> >>>>>
> >> >>>>> Yes this gets a bit weird (I presume you mean removes it from the
> >> >>>>> DataSet).  I'm guessing you haven't lost the data because the
> >> >>>>> dataValues each have a PeriodID which in turn is linked to a
> >> >>>>> PeriodType.  I suppose that (in such an exotic headspace)
> >> >>>>> DataElements
> >> >>>>> can in fact change their PeriodTypes over time, though I imagine
> its
> >> >>>>> not a great idea.
> >> >>>>>
> >> >>>>> The effect would be the same in the explicit relationship case, if
> >> >>>>> someone assigns a DataElement to a DataSet, enter data for it,
> then
> >> >>>>> changes the PeriodType of the DataElement ...
> >> >>>>>
> >> >>>>> Cheers
> >> >>>>> Bob
> >> >>>>>
> >> >>>>> _______________________________________________
> >> >>>>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >> >>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >> >>>>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >> >>>>> More help   : https://help.launchpad.net/ListHelp
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >> >
> >
> >
>

Follow ups

References