← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

Hi

2009/9/29 Abyot Gizaw <abyota@xxxxxxxxx>

> Yes your suggestion is doable and less is better .... but I think the
> requirement from the field is more complex.
>
> If, for a moment, we stop talking about datavalues and talk about
> dataelements - why are we talking about dimension combinations?
>
> Because you are assuming a dataelement to have only one dimension. Am I
> correct? If that is the case, I see a little bit of inconsistency here.
> DataElement talks about one dimesion, but its corresponding value talks
> about combination of dimensions.
>

No you are misreading me - or I have made a mistake.  DataElement can have
may dimensions.  If it were just one there would just be a n-1 relation
between Dimension and DataElement.  Because DataElement can have more than
one dimension, I have the DateElementDimension table inbetween.  I actually
meant to call it DateElementDimensions but table names should generally be
singular.  So the contents of this table might look like:

dimensionID, dataElementID
1, 45
1, 46
2, 45
3, 45
4, 6

So dataelement 45 would have 3 dimensions etc

>
> Yes from the datavalue I can have dimensionelementcombinations, pick
> dimensionelments regroup and put them in their corresponding dimesions -- in
> the end telling me from which dimension they came from. But from this point
> onwards I am no more talking about a value of a single dataelement but a
> value for combination of dataelements (because I have to pull different
> dataelements which can give me the identified dimensions) .... but is this
> what we want?
>
> The other point I would like the raise is - will there not be any
> limitation on the flexibility of the system when putting the restriction "A
> Dimension has many DimensionElements.  But a DimensionElement is a member of
> only one Dimension" ? Not only system flexibility problem, I see a logical
> problem as well. Because if we think for example beyond the obvious
> SEX(male,female,unknown) - I see a strong need for letting dimensionelements
> to be member of multiple dimensions: For example take the other obvious
> dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its
> dimesionelements. May be such scaling of the AGE dimension is approrpiate
> for Malaria case, but for TB case people might be interested to break the
> AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to
> handle cases like this? Are we going to define a number of <5yrs or are we
> going to use the same <5yr dimensionelement ?
>

I think in this case we would have to define a number of "<5"
dimensionelements.  I agree that the way it is now there is maximum
flexibility, but it comes at quite a cost.  I haven't seen much to suggest
that this would be a real limitation.  Anyway, the way it stands "<5" is
just a label without any intrinsic meaning.  So we can just as easily
combine it with apples or oranges.  By binding a set of dimensionelements to
a dimension we at least give them some meaning as an aggregation group.

Thanks for your input.  I will lokk again at the first issue and see whether
I have made a mistake.

Regards
Bob


>
>
> Thank you
> Abyot.
>
>
>
>
> On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>wrote:
>
>> OK.  Here's my first attempt to rationalize things.  Please excuse the
>> attachments.  I try not to send attachments to mailing lists but these are
>> at least fairly small.  (And Lars I will write it up in docbook after
>> fishing for feedback).
>>
>> My primary aim has been to disturb the existing model as little as
>> possible whilst trying to simplify wherever possible.
>>
>> Attached oldmodel.png shows the participants in the existing model.  As
>> you can see there are 11 tables in all.  I haven't showed the relations as
>> it becomes a bit of a web.
>>
>> Also attached is a proposed amended database model which bears sufficient
>> similarity to the old that migration between the two should be feasible.
>> But it is down to 6 tables.  And I have named the tables according to the
>> terms we have been discussing.  Of course this is just the database model.
>> I've also put together an XML view of what some sample dataset might look
>> like.  There is also a UML model required which would be richer than the
>> underlying datamodel, but one step at a time ....
>>
>> Walking through:
>>
>> 1.  DataElements can have Dimensions.  And different dataElements can (and
>> hopefully will) share some of the same Dimensions.  So there is a m-to-n
>> relationship between the two necessitating an extra table
>> (DataElementDimensions).  An example of a Dimension is SEX.  Nothing new
>> here.
>>
>> 2.  Dimensions have DimensionElements.  So SEX for example might have
>> DimensionElements "Male", "Female", "Unknown".  A big difference from the
>> old model is that there is 1-n relationship between DimensionElements and
>> Dimensions.  A Dimension has many DimensionElements.  But a DimensionElement
>> is a a member of only one Dimension.
>>
>> 3.  DataValues represent the values at intersection of these Dimensions.
>> Keeping with the spirit of the old model this intersection is represented by
>> a single key, DimensionElementCombination.  The DimensionElementCombinations
>> would be populated when a new Dimension is added to a DataElement.  Like the
>> original model there is some fragility here.  Changing dimensions on
>> dataelements could create a situation where datavalues become orphaned or
>> misdirected.  The API must have robust methods for defending this integrity
>> particulalrly when updating the structural metadata.  But this is perhaps
>> doable.  Either way its not worse than we have.
>>
>> I haven't given a name to DimensionElementCombinations.  From the examples
>> I have seen from SL this seems to be unnecessary.  The names I have seen
>> being used are generally simply contrived from the dimensions or (worse
>> still) from the categoryoptions.  What is important is that dataelements can
>> have sets of dimensions.
>>
>> And then much of what is different is just a renaming of the original
>> entities.    From the attached XML file I think you can see some of the
>> issues faced re names and identifiers.  I find myself following a sort of
>> convention of CODE, Name, Description and UUID.  CODE's must be unique
>> within the scope of the database.  I suppose this is close to what we
>> currently call ShortName.  I would like to place constraints on CODES in
>> terms of length and also the disallowing of spaces and other funny
>> characters.  The reason being that we may well have to use these codes in
>> making up uri's.  So CODES must be unique.  For the moment we could keep
>> name unique but should migrate from it.  Its a matter of rewriting all our
>> comparators I guess.  UUIDs I am told are unique through some sort of
>> divinity so we apparently do not need to worry about them :-)
>>
>> I've also tried to reduce the number of knees on the donkey - from 11
>> tables to 6.  I believe this can be done whilst preserving the existing
>> functionality.  This arangement would make it much more sensible to produce
>> the XML I need to produce.  I'm hoping that it would also be more friendly
>> to those who would be trying to pivot the data across dimensions.
>>
>> Jason do you think this works for you?  I might have missed out something
>> really fundamental.  Abyot, you've been through this process before - am I
>> missing something?  From the DataValue you can see DimensionElements.  And
>> once you know a DimensionElement you also know the Dimension to which it
>> belongs.  I think thats queryable.  Will have to hydrate with some data and
>> see.
>>
>> Shaking the multidimensional model up like this would obviously have
>> implications.  But I suspect most of it is taking stuff away rather than
>> adding new so it might just be doable.  Less is more.
>>
>> Not spending time with docbook yet, till I get some feedback.
>>
>> Cheers
>> Bob
>>
>> 2009/9/29 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>
>>  Hi
>>>
>>> On the back of Jason and others comments, I've reached the conclusion
>>> that we cannot really live with the MD model the way it is.  Whereas I think
>>> it is (just about) workable there are some serious optimizations we can and
>>> should do.  I am going to put my other work back a day or two and propose
>>> some changes in a branch.
>>>
>>> I think central to the inefficiency is the many-many relation between
>>> categories and categoryoptions.  This strikes me as illogical as well as
>>> being cumbersome in the UI.  Do we really want to be able to make categories
>>> with options like {'0<5','6-10','Male','Out of stock','35-40'}.  Reducing
>>> the relation between categories and category options to 1-n cuts two tables,
>>> should make sql queries more efficient and grokkable and also matches other
>>> models such as sdmx better.
>>>
>>> The other possiible inefficiency is the dimensionset.  It can be useful
>>> in some contexts but I'm guessing that when querying the data (which we want
>>> to be fast) it is not relevant.  A dataelement can have dimensions.  The
>>> fact that some dataelements have the same combinations of dimensions is very
>>> useful to know for some purposes, but it should be possible to get from the
>>> dataelement to the dimension directly.
>>>
>>> On the other side of the road is the hierarchical dimensionality idea I
>>> see Ola and Jason have been discussing, where dimensions are composed
>>> (perhaps post-facto) of uni-dimensional dataelements rather than decomposed
>>> into pre-structured dimensional elements.  I suspect that:
>>> 1.  we need both; and
>>> 2.  from the API, user and reporting perspective they should look the
>>> same (ie a dataelement can have dimensions - how they come about should not
>>> be a concern at the end point).
>>>
>>> I'll try out some of these ideas and point you to the branch.
>>>
>>> Regards
>>> Bob
>>>
>>> 2009/9/29 Lars Helge Øverland <larshelge@xxxxxxxxx>
>>>
>>>>
>>>>
>>>>> Thanks for the explanations Jason. The multidimensional model is quite
>>>>> complicated, is poorly documented, and as you say is DHIS-centric in the way
>>>>> that it is built around the DHIS notion of a Data Element.
>>>>>
>>>>>
>>>> Could we assemble and put some of the text being written on the list to
>>>> docbook?
>>>>
>>>>
>>>>> That said, and I think Jason already has made a strong case for this,
>>>>> also in a 100% DHIS2 scenario you will need more flexibility in defining
>>>>> dimensions to your data than what categories can provide. Being able to
>>>>> define data dimensions independent of data collection is powerful and should
>>>>> be supported in a better way than what data element groups provide today.
>>>>> Given that we already have the orgunit group set code in place I would
>>>>> assume that adding group sets to data elements could be a relatively
>>>>> straight forward thing to do (but then again, I am not the programmer...).
>>>>>
>>>>
>>>> I don't see any implications in adding this to the system, it won't
>>>> require changes to the existing model as the association goes from the
>>>> groupset to the groups. We can prioritize this for the 2.0.3 release.
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>

References