← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

2009/10/1 Abyot Gizaw <abyota@xxxxxxxxx>

>
>
> On Thu, Oct 1, 2009 at 9:28 AM, Jason Pickering <
> jason.p.pickering@xxxxxxxxx> wrote:
>
>> >> It is still not clear to me how the
>> >> multidimensional data elements are used to calculate indicators in the
>> >> same was as PODE (plain ol,d data element).  I guess this is handled
>> >> somehow by the API?
>>
>> >> I have not played around with this, but I suppose it is possible
>> >> somehow from wihtin the  indicator definition panels.
>>
>> Well, I have now played around with it and see more or less how it
>> works. I have promised Lars I would try and put together some more
>> documentation on the multidimensional data elements funcitonality,
>> which I will try and distill together out of all these mails. But
>> first a few more questions.
>>
>> Once I define an data element (Clinical Malaria) with Cateogries (Age
>> (3 cateogory options) /Gender (2 cateogory options)) I get six data
>> elements in my entry screen corresponding to the combination of all of
>> these. When I go to define an indicator (say Malaria incidence for
>> under 1), I can select these "sub-elements" . So for the numerator I
>> get something like
>>
>> Clinical Malaria (Under1, Female,) + Clinical Malaria (Under1, Male,)
>>
>> and for the denominator, I would choose a semi-permanent data element
>> like population..
>>
>> Age Under 1
>>
>> That is pretty sweet and I can calculate a Clinical Malaria under 1
>> incidence rate, so defining inidcators with multidimensional data
>> elements seems to work fine (have not tried to calculate anything but
>> I guess this works as well).
>>
>> Anyway, my first question is about that last little comma. It would
>> seem somehow (I have not looIked at the code) that there are
>> three-dimensions that are sort of hard-wired. I have only defined two.
>> Is that last comma significant, or just a bit of screen lint?
>>
>
> Yes that is just a bug! The for-loop adds a comma after each
> "dimensionelement" assuming there will another one coming :-) we will tell
> the loop not to add a comma if the "dimensionelement" is the last one (or
> simply truncate in the end)
>
>
>>
>> Now, my next question which is a bit more erudite. Supposing I would
>> go down the route of defining all of my indicators in a
>> multidimensional fashion, is there any limit to the level of
>> dimensionality that I can assign them and where should I start? Lets
>> think about the malaria data element
>>
>> For malaria I might decide to get very complicated and choose many
>> categories and options (Cateogories are after the number, possile
>> cateogry options are in parentheses.
>>
>> 1) Type (Disease, Service delivery, equipment)
>> 2) Disease type (Communicable, Non-communicable)
>> 3) Transmission method (Vector Borne, Water borne, Air borne, Sexually
>> transmitted)
>> 4) Disease (Malaria, Leprosy, Leishmaniasis, etc)
>> 5) Diagnosis status (Clinical, confirmed)
>> 5) Patient status (OPD, IP, Deaths)
>> 6) Age (Under 1, 1-5, Over 5)
>> 7) Gender (Male, Female)
>>
>> This list is not complete, and would need some more category elements
>> to be totally complete, but this enough to get started. So, I can see
>> that if I define my categories and options like this, I will get a
>> data element for "OPD Clinical Cases of Malaria Under 1" at some
>> point.
>>
>>
Thanks for coming up with this list Jason, It helps the discussion to have
some more real examples.




> So, I guess my question is, where do i start to define my data element
>> dimensionality? With Disease? There are dimensions "above" the disease
>> however like I indicate here, like the transmission method. What if I
>> want to be able to know the total number of cases of all vector borne
>> diseases? Not a totally unusual request. Would I need to start the
>> definition of my data elements from there? Would this need to be a
>> "Dataelement group" instead? What about if I need to know the total
>> number of cases of communicable diseases? Would this not imply I would
>> need to add this data element to two seperate data element groups,
>> which at least with DHIS 1.4 is a no-no as it results in duplicates in
>> the PivotTables?
>>
>>
I think we can only go as far as providing guidelines or examples of best
practice here, it will eventually be up to the user to define these. I have
seen many different approaches and I am not sure there is ONE correct.

To your question of what should be the data element I would start think of
what is the most important piece of information you need. The data element
will be the total of all dimensions right, at least of all the
categoryoptions you define, so it will always be easier to get that total
than any slice of it. E.g that total could well be "Malaria cases in OPD".

To get back to my previous statement of separating input and output, there
are some of these dimensions that I would use data element groups for as
they would mean e.g. grouping together a large number of diseases or types
of data (equipment).

Using data element groups to add dimensionality assumes that we will have a
data element group set feature in place soon.
All these I would definitely use group for as they are too broad to capture
in one data element.
1) Type (Disease, Service delivery, equipment)
2) Disease type (Communicable, Non-communicable)
3) Transmission method (Vector Borne, Water borne, Air borne, Sexually
transmitted)

5) Diagnosis status, 6) Age, and 7) Gender I would define as categories.
They do not have a large number of options (like all possible diseases) and
more importantly they would all be present in the same data entry form, or
captured by the same facility.

Then we are only left with Disease and Patient status. Patient status is
tricky because its CategoryOptions (OPD, IP, and Death) span over multiple
data entry forms and therefore different orgunits would use them, e.g. OPD
would only apply to orgunits with outpatient clinics and IP(inpatient) only
to orgunits with beds. Rarely one orgunit would do both and therefore they
would possibly not be on the same form at all. For this reason I would not
create a category Patient Status. Disease would be my obvious choice for
data element, because it is almost always in the center of data analysis. It
is the dimensional you most often look at, so a total for each disease makes
more sense than any other total. You can also  easily group diseases by 1)
Type (Disease, Service delivery, equipment), 2) Disease type (Communicable,
Non-communicable), 3) Transmission method (Vector Borne, Water borne, Air
borne, Sexually transmitted), so a data element <Disease Name> and data
element group sets like the above would make sense.

Since I would not use Patient status as a category, I could use a data
element group set to define this dimension. The problem with that is that
there is no way I can find out which of the data values for the data element
"Malaria" actually belong to OPD and which ones come from IP or Death. You
cannot use a data element groups to break up a data element into smaller
pieces. Since I cannot use groups and not use categories I would simply
include patient status in the data element name itself, ending up with data
elements like "Malaria case in OPD", "Malaria case in IP", "Malaria Death".


> It seems like we have stumbled on a partical accelerator. The deeper
>> you dig, the more dimensions there are.
>>
>
> Emm...... I don't know. But I think there is a sort of bias here. Like
> starting from a flat DHIS 1.4 dataelements and trying to genereate DHIS 2
> dataelements by breaking into pieces. If I am not mistaken "OPD Clinical
> Cases of Malaria Under 1"  is a common dataelement in 1.4 so you can start
> to break this into pieces and get
>
> "Under 1"
> "Malaria"
> "Clinical Cases"
> "OPD"
> ....
> ...
>
> but in the end getting confused which one is the dataelement which one is
> the dimension. Well the MD model can handle such a breakup I guess but the
> point is not that.
>
> The point is, what users should do is I guess to first define what they
> need from that functionality - what kind of data are they going to collect?
> what does their dataentry screen look like?
>
> The multimdimensionality model came into existence because of tabular
> dataentry screens. As Ola suggested last time, there might a limitation with
> this (multidimensionality - input screen) ... specifically when trying to do
> some kind of analysis (like the Piovting thing mentioned). But how different
> is the analysis going to be from our input formats? The way I see it, if
> there is a need for further breakup during anaylsis then we have made a
> mistake in defining our pieces during data collection. In most cases our
> analysis is going to a combination and rearrangement of different pieces
> collected by using our input screens.
>
>
I don't agree with this, and I think the example I just made above
strengthens that. There are dimensions that are needed in data entry to be
able to break up a data element (age, gender, etc.), and there are other
dimensions that are broader groupings of data like type of diseases that you
do not need to know about in order to register data about diseases.

In general I think we should keep the design of the data element as the
atomic unit in DHIS, and datasets, groups and indicators as compositions of
that unit. That has always been one of the key success factors of DHIS
because it provides flexibility to change the compositions over time.

While the category model allows for further break ups of a data element, we
should still think of the data element as a small atomic unit and not use
this model to create giant data elements like e.g. "Cases in OPD",
"Communicable diseases", "Equipment". These are to broad for being data
elements and should be data element groups in stead.

Ola
----------




Anyways for me a dimension is just an attribute to a dataelement. So before
> talking about a dimension first we need to have a dataelement and
> (logically) we can't mix the two!
>
> Thank you
> Abyot.
>
>
>>
>> Any practical suggestions. I know this is yet another erutdite
>> example, but it highlights that if we are going to have
>> multidimensional data elements, we need to be able to provide guidance
>> on how they should be setup.
>>
>> Best regards,
>> Jason
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References