dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #02392
Re: On categories and dimensions and zooks
I think Abyot raises some good points, especially his last one about
differenences of what the age dimension really is.
I think the biggest challenge is going to be how to unite the concepts
of a multidimensional data element (as it is currently implemented
with categories) and a data element that has no multidimensionality,
at least in the sense of it not being assigned any categories.
What about the following scenario. Could the cateogry/category combos
be transformed somehow into a sort of data element generator? Users
could define a dimensionality set, assign a master data element, and
DHIS would create all of the necessary data elements. So a category
combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1
,Under 5 and Over 5) and template data element (Clinical malaria)
would produce :
OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}
OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}
OPD Over 5 Clinical Malaria ...
OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}
...
..
..
IP Clinical Malaria Total {IP, All ages, Clinical Malaria}
...
...
...
Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}
Clinical Malaria Total {All patient status, All ages, Clinical malaria}
Each one of those data elements would then be assigned a set of
dimensions, and a set of dimensional elements.
The cateogries functionality would simply be an artifact to produce
multiple data elements, without having to enter them seperately, which
if I understood Ola yesterday, was one of its intended purposes.
Now, for those of use such as myself, that do that have already create
dozens of data elements with different dimensions in their names (but
no where in a relational table) we could assign the dimensionality in
a seperate step (post-facto as Bob mentioned earlier). I might want to
assign a "uber" dimension of "Communicalble" and "Non-communicable" to
a disease type that might not have anything to do with the definition
of the data element itself, but would be simply for analysis purposes
later. Again, I may be rehashing my previous emails here, but from a
pure SQl standpoint, the approach I suggest here makes sense to me, in
terms of queries of how to pull this into a crosstab as well as how to
generate a fact table that something like an OLAP server could deal
with
This approach might seem to resolve the issue of how to deal with
these two different beasts, but unfolding the multidimensional data
element into simpler components. Meaning that the
cateorgy/combos/options would be used as a templating mechanisms, but
that dimensionality could be assigned through a separate set of
relations. Perhaps this is what is represented in the diagram, but I
will need to study it tomorrow after some sleep.
I do think that that dimenional elements should not be able to be
share by dimensions, and that dimensions and dimensional elements
should not be able to be deleted without lots of bells and whistles
going off once they have been assigned to data elements.
I guess the key question is whether data elements should be able to
have multiple DimensionElementCombinations, which I think is the
current implementation. I am just not sure this will work with a
combination of DHIS2-type-multidimensional elements, and DHIS1.4-type
data elements.
Enough for today.
Thanks for this Bob. It is a good start. Can't you make this diagram
in DocBook so I can edit it? :D
Regards,
Jason
On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw <abyodia@xxxxxxxxx> wrote:
> Yes your suggestion is doable and less is better .... but I think the
> requirement from the field is more complex.
>
> If, for a moment, we stop talking about datavalues and talk about
> dataelements - why are we talking about dimension combinations?
>
> Because you are assuming a dataelement to have only one dimension. Am I
> correct? If that is the case, I see a little bit of inconsistency here.
> DataElement talks about one dimesion, but its corresponding value talks
> about combination of dimensions.
>
> Yes from the datavalue I can have dimensionelementcombinations, pick
> dimensionelments regroup and put them in their corresponding dimesions -- in
> the end telling me from which dimension they came from. But from this point
> onwards I am no more talking about a value of a single dataelement but a
> value for combination of dataelements (because I have to pull different
> dataelements which can give me the identified dimensions) .... but is this
> what we want?
>
> The other point I would like the raise is - will there not be any limitation
> on the flexibility of the system when putting the restriction "A Dimension
> has many DimensionElements. But a DimensionElement is a member of only one
> Dimension" ? Not only system flexibility problem, I see a logical problem as
> well. Because if we think for example beyond the obvious
> SEX(male,female,unknown) - I see a strong need for letting dimensionelements
> to be member of multiple dimensions: For example take the other obvious
> dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its
> dimesionelements. May be such scaling of the AGE dimension is approrpiate
> for Malaria case, but for TB case people might be interested to break the
> AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are we going to
> handle cases like this? Are we going to define a number of <5yrs or are we
> going to use the same <5yr dimensionelement ?
>
>
> Thank you
> Abyot.
>
>
>
> On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
>>
>> OK. Here's my first attempt to rationalize things. Please excuse the
>> attachments. I try not to send attachments to mailing lists but these are
>> at least fairly small. (And Lars I will write it up in docbook after
>> fishing for feedback).
>>
>> My primary aim has been to disturb the existing model as little as
>> possible whilst trying to simplify wherever possible.
>>
>> Attached oldmodel.png shows the participants in the existing model. As
>> you can see there are 11 tables in all. I haven't showed the relations as
>> it becomes a bit of a web.
>>
>> Also attached is a proposed amended database model which bears sufficient
>> similarity to the old that migration between the two should be feasible.
>> But it is down to 6 tables. And I have named the tables according to the
>> terms we have been discussing. Of course this is just the database model.
>> I've also put together an XML view of what some sample dataset might look
>> like. There is also a UML model required which would be richer than the
>> underlying datamodel, but one step at a time ....
>>
>> Walking through:
>>
>> 1. DataElements can have Dimensions. And different dataElements can (and
>> hopefully will) share some of the same Dimensions. So there is a m-to-n
>> relationship between the two necessitating an extra table
>> (DataElementDimensions). An example of a Dimension is SEX. Nothing new
>> here.
>>
>> 2. Dimensions have DimensionElements. So SEX for example might have
>> DimensionElements "Male", "Female", "Unknown". A big difference from the
>> old model is that there is 1-n relationship between DimensionElements and
>> Dimensions. A Dimension has many DimensionElements. But a DimensionElement
>> is a a member of only one Dimension.
>>
>> 3. DataValues represent the values at intersection of these Dimensions.
>> Keeping with the spirit of the old model this intersection is represented by
>> a single key, DimensionElementCombination. The DimensionElementCombinations
>> would be populated when a new Dimension is added to a DataElement. Like the
>> original model there is some fragility here. Changing dimensions on
>> dataelements could create a situation where datavalues become orphaned or
>> misdirected. The API must have robust methods for defending this integrity
>> particulalrly when updating the structural metadata. But this is perhaps
>> doable. Either way its not worse than we have.
>>
>> I haven't given a name to DimensionElementCombinations. From the examples
>> I have seen from SL this seems to be unnecessary. The names I have seen
>> being used are generally simply contrived from the dimensions or (worse
>> still) from the categoryoptions. What is important is that dataelements can
>> have sets of dimensions.
>>
>> And then much of what is different is just a renaming of the original
>> entities. From the attached XML file I think you can see some of the
>> issues faced re names and identifiers. I find myself following a sort of
>> convention of CODE, Name, Description and UUID. CODE's must be unique
>> within the scope of the database. I suppose this is close to what we
>> currently call ShortName. I would like to place constraints on CODES in
>> terms of length and also the disallowing of spaces and other funny
>> characters. The reason being that we may well have to use these codes in
>> making up uri's. So CODES must be unique. For the moment we could keep
>> name unique but should migrate from it. Its a matter of rewriting all our
>> comparators I guess. UUIDs I am told are unique through some sort of
>> divinity so we apparently do not need to worry about them :-)
>>
>> I've also tried to reduce the number of knees on the donkey - from 11
>> tables to 6. I believe this can be done whilst preserving the existing
>> functionality. This arangement would make it much more sensible to produce
>> the XML I need to produce. I'm hoping that it would also be more friendly
>> to those who would be trying to pivot the data across dimensions.
>>
>> Jason do you think this works for you? I might have missed out something
>> really fundamental. Abyot, you've been through this process before - am I
>> missing something? From the DataValue you can see DimensionElements. And
>> once you know a DimensionElement you also know the Dimension to which it
>> belongs. I think thats queryable. Will have to hydrate with some data and
>> see.
>>
>> Shaking the multidimensional model up like this would obviously have
>> implications. But I suspect most of it is taking stuff away rather than
>> adding new so it might just be doable. Less is more.
>>
>> Not spending time with docbook yet, till I get some feedback.
>>
>> Cheers
>> Bob
>>
>> 2009/9/29 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>>
>>> Hi
>>>
>>> On the back of Jason and others comments, I've reached the conclusion
>>> that we cannot really live with the MD model the way it is. Whereas I think
>>> it is (just about) workable there are some serious optimizations we can and
>>> should do. I am going to put my other work back a day or two and propose
>>> some changes in a branch.
>>>
>>> I think central to the inefficiency is the many-many relation between
>>> categories and categoryoptions. This strikes me as illogical as well as
>>> being cumbersome in the UI. Do we really want to be able to make categories
>>> with options like {'0<5','6-10','Male','Out of stock','35-40'}. Reducing
>>> the relation between categories and category options to 1-n cuts two tables,
>>> should make sql queries more efficient and grokkable and also matches other
>>> models such as sdmx better.
>>>
>>> The other possiible inefficiency is the dimensionset. It can be useful
>>> in some contexts but I'm guessing that when querying the data (which we want
>>> to be fast) it is not relevant. A dataelement can have dimensions. The
>>> fact that some dataelements have the same combinations of dimensions is very
>>> useful to know for some purposes, but it should be possible to get from the
>>> dataelement to the dimension directly.
>>>
>>> On the other side of the road is the hierarchical dimensionality idea I
>>> see Ola and Jason have been discussing, where dimensions are composed
>>> (perhaps post-facto) of uni-dimensional dataelements rather than decomposed
>>> into pre-structured dimensional elements. I suspect that:
>>> 1. we need both; and
>>> 2. from the API, user and reporting perspective they should look the
>>> same (ie a dataelement can have dimensions - how they come about should not
>>> be a concern at the end point).
>>>
>>> I'll try out some of these ideas and point you to the branch.
>>>
>>> Regards
>>> Bob
>>>
>>> 2009/9/29 Lars Helge Øverland <larshelge@xxxxxxxxx>
>>>>
>>>>>
>>>>> Thanks for the explanations Jason. The multidimensional model is quite
>>>>> complicated, is poorly documented, and as you say is DHIS-centric in the way
>>>>> that it is built around the DHIS notion of a Data Element.
>>>>>
>>>>
>>>> Could we assemble and put some of the text being written on the list to
>>>> docbook?
>>>>
>>>>>
>>>>> That said, and I think Jason already has made a strong case for this,
>>>>> also in a 100% DHIS2 scenario you will need more flexibility in defining
>>>>> dimensions to your data than what categories can provide. Being able to
>>>>> define data dimensions independent of data collection is powerful and should
>>>>> be supported in a better way than what data element groups provide today.
>>>>> Given that we already have the orgunit group set code in place I would
>>>>> assume that adding group sets to data elements could be a relatively
>>>>> straight forward thing to do (but then again, I am not the programmer...).
>>>>
>>>> I don't see any implications in adding this to the system, it won't
>>>> require changes to the existing model as the association goes from the
>>>> groupset to the groups. We can prioritize this for the 2.0.3 release.
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~dhis2-devs
>>>> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>>> More help : https://help.launchpad.net/ListHelp
>>>>
>>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help : https://help.launchpad.net/ListHelp
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>
>
Follow ups
References
-
On categories and dimensions and zooks
From: Jason Pickering, 2009-09-16
-
Re: On categories and dimensions and zooks
From: Abyot Gizaw, 2009-09-25
-
Re: On categories and dimensions and zooks
From: Knut Staring, 2009-09-25
-
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-09-28
-
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Ola Hodne Titlestad, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Lars Helge Øverland, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-29
-
Re: On categories and dimensions and zooks
From: Abyot Gizaw, 2009-09-29