dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #02394
Re: On categories and dimensions and zooks
Hi
2009/9/29 Abyot Gizaw <abyota@xxxxxxxxx>
> Yes your suggestion is doable and less is better .... but I think the
> requirement from the field is more complex.
>
> If, for a moment, we stop talking about datavalues and talk about
> dataelements - why are we talking about dimension combinations?
>
> Because you are assuming a dataelement to have only one dimension. Am I
> correct? If that is the case, I see a little bit of inconsistency here.
> DataElement talks about one dimesion, but its corresponding value talks
> about combination of dimensions.
>
No you are misreading me - or I have made a mistake. DataElement can have
may dimensions. If it were just one there would just be a n-1 relation
between Dimension and DataElement. Because DataElement can have more than
one dimension, I have the DateElementDimension table inbetween. I actually
meant to call it DateElementDimensions but table names should generally be
singular. So the contents of this table might look like:
dimensionID, dataElementID
1, 45
1, 46
2, 45
3, 45
4, 6
So dataelement 45 would have 3 dimensions etc
>
> Yes from the datavalue I can have dimensionelementcombinations, pick
> dimensionelments regroup and put them in their corresponding dimesions -- in
> the end telling me from which dimension they came from. But from this point
> onwards I am no more talking about a value of a single dataelement but a
> value for combination of dataelements (because I have to pull different
> dataelements which can give me the identified dimensions) .... but is this
> what we want?
>
> The other point I would like the raise is - will there not be any
> limitation on the flexibility of the system when putting the restriction "A
> Dimension has many DimensionElements. But a DimensionElement is a member of
> only one Dimension" ? Not only system flexibility problem, I see a logical
> problem as well. Because if we think for example beyond the obvious
> SEX(male,female,unknown) - I see a strong need for letting dimensionelements
> to be member of multiple dimensions: For example take the other obvious
> dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its
> dimesionelements. May be such scaling of the AGE dimension is approrpiate
> for Malaria case, but for TB case people might be interested to break the
> AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to
> handle cases like this? Are we going to define a number of <5yrs or are we
> going to use the same <5yr dimensionelement ?
>
I think in this case we would have to define a number of "<5"
dimensionelements. I agree that the way it is now there is maximum
flexibility, but it comes at quite a cost. I haven't seen much to suggest
that this would be a real limitation. Anyway, the way it stands "<5" is
just a label without any intrinsic meaning. So we can just as easily
combine it with apples or oranges. By binding a set of dimensionelements to
a dimension we at least give them some meaning as an aggregation group.
Thanks for your input. I will lokk again at the first issue and see whether
I have made a mistake.
Regards
Bob
>
>
> Thank you
> Abyot.
>
>
>
>
> On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>wrote:
>
>> OK. Here's my first attempt to rationalize things. Please excuse the
>> attachments. I try not to send attachments to mailing lists but these are
>> at least fairly small. (And Lars I will write it up in docbook after
>> fishing for feedback).
>>
>> My primary aim has been to disturb the existing model as little as
>> possible whilst trying to simplify wherever possible.
>>
>> Attached oldmodel.png shows the participants in the existing model. As
>> you can see there are 11 tables in all. I haven't showed the relations as
>> it becomes a bit of a web.
>>
>> Also attached is a proposed amended database model which bears sufficient
>> similarity to the old that migration between the two should be feasible.
>> But it is down to 6 tables. And I have named the tables according to the
>> terms we have been discussing. Of course this is just the database model.
>> I've also put together an XML view of what some sample dataset might look
>> like. There is also a UML model required which would be richer than the
>> underlying datamodel, but one step at a time ....
>>
>> Walking through:
>>
>> 1. DataElements can have Dimensions. And different dataElements can (and
>> hopefully will) share some of the same Dimensions. So there is a m-to-n
>> relationship between the two necessitating an extra table
>> (DataElementDimensions). An example of a Dimension is SEX. Nothing new
>> here.
>>
>> 2. Dimensions have DimensionElements. So SEX for example might have
>> DimensionElements "Male", "Female", "Unknown". A big difference from the
>> old model is that there is 1-n relationship between DimensionElements and
>> Dimensions. A Dimension has many DimensionElements. But a DimensionElement
>> is a a member of only one Dimension.
>>
>> 3. DataValues represent the values at intersection of these Dimensions.
>> Keeping with the spirit of the old model this intersection is represented by
>> a single key, DimensionElementCombination. The DimensionElementCombinations
>> would be populated when a new Dimension is added to a DataElement. Like the
>> original model there is some fragility here. Changing dimensions on
>> dataelements could create a situation where datavalues become orphaned or
>> misdirected. The API must have robust methods for defending this integrity
>> particulalrly when updating the structural metadata. But this is perhaps
>> doable. Either way its not worse than we have.
>>
>> I haven't given a name to DimensionElementCombinations. From the examples
>> I have seen from SL this seems to be unnecessary. The names I have seen
>> being used are generally simply contrived from the dimensions or (worse
>> still) from the categoryoptions. What is important is that dataelements can
>> have sets of dimensions.
>>
>> And then much of what is different is just a renaming of the original
>> entities. From the attached XML file I think you can see some of the
>> issues faced re names and identifiers. I find myself following a sort of
>> convention of CODE, Name, Description and UUID. CODE's must be unique
>> within the scope of the database. I suppose this is close to what we
>> currently call ShortName. I would like to place constraints on CODES in
>> terms of length and also the disallowing of spaces and other funny
>> characters. The reason being that we may well have to use these codes in
>> making up uri's. So CODES must be unique. For the moment we could keep
>> name unique but should migrate from it. Its a matter of rewriting all our
>> comparators I guess. UUIDs I am told are unique through some sort of
>> divinity so we apparently do not need to worry about them :-)
>>
>> I've also tried to reduce the number of knees on the donkey - from 11
>> tables to 6. I believe this can be done whilst preserving the existing
>> functionality. This arangement would make it much more sensible to produce
>> the XML I need to produce. I'm hoping that it would also be more friendly
>> to those who would be trying to pivot the data across dimensions.
>>
>> Jason do you think this works for you? I might have missed out something
>> really fundamental. Abyot, you've been through this process before - am I
>> missing something? From the DataValue you can see DimensionElements. And
>> once you know a DimensionElement you also know the Dimension to which it
>> belongs. I think thats queryable. Will have to hydrate with some data and
>> see.
>>
>> Shaking the multidimensional model up like this would obviously have
>> implications. But I suspect most of it is taking stuff away rather than
>> adding new so it might just be doable. Less is more.
>>
>> Not spending time with docbook yet, till I get some feedback.
>>
>> Cheers
>> Bob
>>
>> 2009/9/29 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>
>> Hi
>>>
>>> On the back of Jason and others comments, I've reached the conclusion
>>> that we cannot really live with the MD model the way it is. Whereas I think
>>> it is (just about) workable there are some serious optimizations we can and
>>> should do. I am going to put my other work back a day or two and propose
>>> some changes in a branch.
>>>
>>> I think central to the inefficiency is the many-many relation between
>>> categories and categoryoptions. This strikes me as illogical as well as
>>> being cumbersome in the UI. Do we really want to be able to make categories
>>> with options like {'0<5','6-10','Male','Out of stock','35-40'}. Reducing
>>> the relation between categories and category options to 1-n cuts two tables,
>>> should make sql queries more efficient and grokkable and also matches other
>>> models such as sdmx better.
>>>
>>> The other possiible inefficiency is the dimensionset. It can be useful
>>> in some contexts but I'm guessing that when querying the data (which we want
>>> to be fast) it is not relevant. A dataelement can have dimensions. The
>>> fact that some dataelements have the same combinations of dimensions is very
>>> useful to know for some purposes, but it should be possible to get from the
>>> dataelement to the dimension directly.
>>>
>>> On the other side of the road is the hierarchical dimensionality idea I
>>> see Ola and Jason have been discussing, where dimensions are composed
>>> (perhaps post-facto) of uni-dimensional dataelements rather than decomposed
>>> into pre-structured dimensional elements. I suspect that:
>>> 1. we need both; and
>>> 2. from the API, user and reporting perspective they should look the
>>> same (ie a dataelement can have dimensions - how they come about should not
>>> be a concern at the end point).
>>>
>>> I'll try out some of these ideas and point you to the branch.
>>>
>>> Regards
>>> Bob
>>>
>>> 2009/9/29 Lars Helge Øverland <larshelge@xxxxxxxxx>
>>>
>>>>
>>>>
>>>>> Thanks for the explanations Jason. The multidimensional model is quite
>>>>> complicated, is poorly documented, and as you say is DHIS-centric in the way
>>>>> that it is built around the DHIS notion of a Data Element.
>>>>>
>>>>>
>>>> Could we assemble and put some of the text being written on the list to
>>>> docbook?
>>>>
>>>>
>>>>> That said, and I think Jason already has made a strong case for this,
>>>>> also in a 100% DHIS2 scenario you will need more flexibility in defining
>>>>> dimensions to your data than what categories can provide. Being able to
>>>>> define data dimensions independent of data collection is powerful and should
>>>>> be supported in a better way than what data element groups provide today.
>>>>> Given that we already have the orgunit group set code in place I would
>>>>> assume that adding group sets to data elements could be a relatively
>>>>> straight forward thing to do (but then again, I am not the programmer...).
>>>>>
>>>>
>>>> I don't see any implications in adding this to the system, it won't
>>>> require changes to the existing model as the association goes from the
>>>> groupset to the groups. We can prioritize this for the 2.0.3 release.
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>>> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>
References
-
On categories and dimensions and zooks
From: Jason Pickering, 2009-09-16
-
Re: On categories and dimensions and zooks
From: Abyot Gizaw, 2009-09-25
-
Re: On categories and dimensions and zooks
From: Knut Staring, 2009-09-25
-
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-09-28
-
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Ola Hodne Titlestad, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Lars Helge Øverland, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Abyot Gizaw, 2009-09-29