← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

Hi there.

My basic issue with the category/category combo is that it appears to be a
one-to-one relationship with data elements. If I look at the data model,
there is a one-to-one relationship between dataelement and categorycomboid.
For a given category combo, you can have multiple options. So, you can
establish a relationship for a given data element and a group of category
options.

Let me try and describe the issue. We have a set of data elements related to
malaria for this example. We would like to be able to pivot the data on
other dimensions dimensions (Data element, age, disease, patient status).
Obviously there are other dimensions that are pivotable (orgunit, period,
dataset)

The data elements look like this. I have put the dimensions in square
brackets, and the dimensional elements into curly brackets.

[Data element, Age, Disease, Patient status]
Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
Deaths}
Deaths Confirmed Malaria 1 to Under 5 Years  {1-5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria Cases,
IP}
IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
Deaths}
OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria
Cases, OPD}
OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,
OPD}
OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
OPD}
OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
OPD}

OK,  I hope this is pretty clear. Obviously, there are more data elements
(Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I
have had for all diseases, or how many OPD cases I have had for each
disease. How can I do this with the existing data model? It is not obvious
to me because there is no relationship between dimensional elements
(categoryoptions) to each other. Category options can be related through a
cateogry combination, but since data elements can only be assigned a single
category option, the dimensionality is broken once it gets time to pull the
data into a pivot table.

 In the incomplete example that I gave yesterday, I established a
one-to-many relationship between a data element and a dimension. If I
understand the current data model, I would have to create a separate
categorycombo  for each of these data elements, and assign this
categorycombo to the data element. Now, I might be able to unfold the
dimensions using the categories and categorycombos. I it is not apparent how
the dimensional elements correspond themselves to a particular dimension, as
there is no relation for this in the database as I can see it.

 As (Johan pointed out a few mails ago, if I understand him correctly) is
different categorycombo's can be created for individual data elements, and
assigned to these elements. However, this seems to be 1) incredibly
inefficient and 2) does not establish any relationship between dimensional
elements and dimensions. Perhaps it is there, and maybe it has been done in
SL, but the SQL is not apparent to me at all.

 It would appear to me, looking from an SQL perspective, that a one-to-many
relationship between a data element, a dimension (category) and dimensional
element (category combo) would be much more effieicnet, and highly usable
from an SQL perspective. As I mentioned in my mail, I am not sure how easy
this would be to implement in a procedural language like Java, but I assume
it should be possible to either do it this way, or rewrite my Postgres
proprietary query in standard SQL (which there are ways to do with ANSI
SQL).  This would require modification to the data model (similar to the
table I provided yesterday) and modification to the UI to allow users to 1)
select a dimension (category) 2) Select a dimensional element for the given
dimension.  This would populate the table with a dataelementid, a
dimensionid (categoryid) and a dimensional element (cateogryoptionid).

My gut feeling this is exactly the same functionality as has currently been
implemented for organizational units. Users can define a hierarchy for
organizational units, and then assign them to
categories/dimension/organizational group sets, decide whether the groups
are compulsory and exlusive, and then assign a particular organizational
unit to a particular group (which is analogous to a dimensional element).
Organizational group sets define the dimension, and one-to-one assignment of
an organizational unit to a particular organizational group defines which
dimensional element the organizational unit is a member of. These dimensions
can then be used in PivotTable analyses, where the orgunitgroupsets become
dimensions, and orgunitgroups become dimensional elements..

I beleive that data elements are no different than organizational units.
They should be able to be grouped into some sort of hierarchy and pivoted on
any dimension.  Data elements groups establish a one-to-many relationship
between data elements and a data element group, but there is no concept of
how data element groups relate to each other.
I think this is perhaps the same concept you mention, ReportSet.

I suspect we would need to potentially rethink the entire concept of
multidimensionality if we really wanted to get it right. It would see to me
that the DHIS datamodel and associated aggregation methods have been
hardwired into aggregation across time (period) and geography (orgunit).
What we can do with PivotTables and (and OLAP) is to aggregate across any
possible dimension, slicing as you mention ,on any dimension . I am not sure
this will be so simple to implement but I think there is a way to do it,
without major modifications.

I am not sure it solves the SDMX issue. There are potential issues related
to "ragged" dimensions and how these get handled. Some data elements might
have three dimensions, while others may have more. I have not thought about
this in detail, but know it is an issue with cross-tab queries in SQL. You
normally have to know how many dimensions you are working with in order to
perform a cross-tab, but there are dynamic solutions. Perhaps this could be
dealt with somehow in SDMX.

Anyway ,I am rambling. Hope this mail helps though to push my point further.
Once I get the SQL from SL, I will see if perhaps it has been done already,
and that I am just writing long emails for nothing. :)

Regards,
Jason




On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
wrote:
> Hi Jason and Johan
>
> I'm really pleased to see you having this discussion as I have been
> grappling with a similar issue which involves unravelling categories,
> category options and combos into something more familiar.  I have reached
> similar conclusions regarding nomenclature:
>
> category = dimension
> categorycombo - I have been calling a dimension set (it bears a strong,
and
> useful, resemblance to xslt:attribute-set)
> category option - I like your suggestion of DimensionalElement.  I am
going
> to start calling it that too.
>
> In my case I need to export (and import data) into a standard format
called
> sdmx.  So whereas in the DHIS2 native DXF we export datavalues with
> effectively three dimensions (source, period, categorycombooption) the
last
> dimension is a sort of uber-dimension.  Like a peppercorn or a cardamon
> seed, when you break it open it explodes its rich complexity of
dimensions.
>
> In sdmx we need the dimensions exploded.  So data values look like:
>
> <dataset>
>   <datavalue name="TB test given" uid="44344 ...44" gender="Male"
age="0-5"
> value="32" />
>    <datavalue name="TB test given" uid="44344 ...44" gender="Female"
> age="0-5" value="38" />
> ..
> </dataset>
>
> My approach to unpicking the dimensions from the dxf file is to transform
it
> with an xslt transformation which is still incomplete but seems to work
> well.
>
> One other nomenclatures issue which has surfaced as a result is what we
call
> a "dataset".  In DHIS2, if I understand correctly, a dataset corresponds
> roughly to all the dataelements which might occur on a datacollection
form.
> If we view all dataelements as having just the three "dimensions" then all
> is well, but if we explode the actual dimensions then we have an issue.
In
> the sdmx model a dataset consists only of dataelements with the same
> dimensionset.  After discussing this with Ola we have reached the
conclusion
> that we need another level of grouping, primarily for the UI - eg FormSet
or
> ReportSet which allows us to group related datasets.  But that is an aside
> from what you are talking about.
>
> I know that you guys can do magic with sql, but it seems that we should
try
> to capture some of this and place it down in the datamodel API.  It occurs
> to me that for a multidimensional dataelement we might benefit from some
> utility methods to retrieve slices and dices which might assist in
> constructing the pivot tables around dimensions.  Does this sound like the
> right thing to do.
>
> Regards
> Bob
>
> 2009/9/24 <johansa@xxxxxxxxxx>
>>
>> Jason,
>> I will leave to others to comment the code, but I have a few comments...
>>
>> > I have done a bit more thinking on this, and would like to offer some
>> > more examples up for discussion.
>> >
>> > Basically, we have a lot of data elements that are somehow related to
>> > each other, similar to my kooky example in my original mail. I assume
>> > this is fairly common throughout other HMIS systems. Here,  malaria
>> > attendance is broken down into various dimensions/category by patient
>> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
>> > over 5). But say you want to be able to pivot to look at outpatient,
>> > inpatient and deaths totals (i.e. summed up by age). Well, you could
>> > create a separate data element for this, but it sure would be nice to
>> > be able to Pivot the data somehow.
>>
>> In the Sierra Leone db, Edem and Romain set up views that pulled the
>> categories through into a "Category" pivot field, which you can then use
>> to get what you want. Simply tick the categories (see below) you want to
>> see, and group them together in excel. Maybe Edem and Romain can help
>> further here.
>>
>>
>> > Dimension ? Category
>> > Dimensional element ? Category option ? Category combo ( I think)
>>
>> The right symbol disappeared from my reply-mail here, but some
>> clarification:
>>
>> Crosstab Dimension (age AND gender) = Category combo
>> Dimension (age, gender) = Category
>> Dimensional element (inpatient, outpatient, death, under1, 1-5, and over
>> 5) = Category option
>>
>> So by assigning a DE the category combo of "gender_age", you get 9
>> dimensional elements, 3 category options (in category age) by 3 category
>> options (in category gender)
>>
>> Johan
>>
>>
>>
>>
>> > Anyway, here is the helper table I created.
>> >
>> > CREATE TABLE test_dataelementcategorycombo
>> > (
>> >   test_dataelementid integer NOT NULL,
>> >   test_dataelementcategoryid integer NOT NULL,
>> >   test_dataelementcategorycomboid integer NOT NULL,
>> >   CONSTRAINT pk_testdataelementcategory PRIMARY KEY
>> > (test_dataelementid, test_dataelementcategoryid,
>> > test_dataelementcategorycomboid)
>> > )
>> > WITH (OIDS=FALSE);
>> >
>> > So this is  a real simple table which references a data element, a
>> > data element category, and a data element combo. The reference to a
>> > data element category may be redundant, but anyway, lets leave it in
>> > for now.
>> >
>> > I populated the table with some data, which will be used to assign
>> > dimensions to data elements. It looks like this in my DB, which looks
>> > like this.
>> >
>> > 309;25250;25251
>> > 309;25257;25255
>> > 348;25250;25252
>> > 348;25257;25255
>> > 455;25250;25253
>> > 455;25257;25255
>> >
>> > but of course this is meaningless to you. What do these values
>> > correspond
>> > to?
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Age";"Under
>> > 1"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Age";"Age 1-5"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over
>> > 5"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
>> > status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> > Years";"Patient status";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
>> > status";"OPD"
>> >
>> > which can be produced by the following view.
>> >
>> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
>> >  SELECT dataelement.name, dataelementcategory.name AS dimension,
>> > dataelementcategoryoption.name AS dimension_element
>> >    FROM dataelement
>> >    JOIN test_dataelementcategorycombo ON
>> > test_dataelementcategorycombo.test_dataelementid =
>> > dataelement.dataelementid
>> >    JOIN dataelementcategory ON dataelementcategory.categoryid =
>> > test_dataelementcategorycombo.test_dataelementcategoryid
>> >    JOIN dataelementcategoryoption ON
>> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
>> > dataelementcategoryoption.categoryoptionid;
>> >
>> > So, that view just provides a human readable view of those integers
>> > that I populated in the the test_dataelementcategorycombo table I
>> > created above.  This table just assigns particular data elements to
>> > different category options (dimensional elements).
>> >
>> > OK, so far so good, but the problem now is, how to use this with the
>> > aggregatedatavalue table? If we try and join this table directly, we
>> > will have issues with duplicates in the pivot table, so we need to
>> > transform the data slightly.
>> >
>> > This should do the trick.
>> >
>> > SELECT * FROM crosstab
>> >       (
>> > 'SELECT name, dimension, dimension_element FROM
>> > vw_dataelements_dimensions ORDER BY 1,2,3',
>> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
>> > ASC'
>> >       )
>> > as
>> > (
>> > name character varying(230),
>> > age character varying(160),
>> > status character varying(160)
>> > );
>> >
>> >
>> > which returns this record set
>> >
>> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
>> > 1-5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
>> > 5";"OPD"
>> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
>> > 1";"OPD"
>> >
>> >
>> > OK, admittedly, I cheated a bit and used the crosstab function of
>> > Postgresql, but I assume that this query could be rewritten with a few
>> > more lines of code in standard SQL or some procedural language like
>> > Java. Now, this record set looks like something that I can almost use
>> > with the aggregateddatavalue table simply by joining up the table on
>> > the appropriate dataelementid and pulling everything into a pivot
>> > table.  I would not have any duplicated values and would have columns
>> > like data element name, period, orgunit, age, patient status and of
>> > course the value of the data element.  I hope that part is pretty
>> > clear. Just join up that table to the aggregateddata table, and you
>> > have pretty much what is needed to pull the data directly into a
>> > PivotTable for further analysis.
>> >
>> > This is not a complete example, but it is very close to what I need
>> > here ,and I think this type of functionality would be much more useful
>> > than the current data element categories functionality. Basically, all
>> > that would be required, at least initially, would be another user
>> > interface screen to allow the definition of which category(ies) and
>> > category options a data element is a member of. The rest could ,in the
>> > first instance be executed with custom SQL (obviously, I am partial to
>> > this language and hobbled by the fact that I do not know Java), but
>> > eventually this would need to be implemented somehow in Java.
>> >
>> >  I am not sure if this really solves all of the issues surrounding
>> > multidimensional analysis of data elements, but it seems to solve the
>> > issues that I am having by trying to assign some sort of dimensional
>> > hierarchy to data elements (similar to the exclusive/compulsory
>> > functionality of orgunits). Any thoughts on this?
>> >
>> > Best regards,
>> > Jason
>> >
>> >
>> >
>> >
>> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
>> > <jason.p.pickering@xxxxxxxxx> wrote:
>> >>
>> >>
>> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
>> >>>
>> >>> >> However, there does seem to be the ability to assign dimensions,
>> >>> there
>> >>> >> does
>> >>> >> not seem to be the ability to assign particular elements within
>> >>> those
>> >>> >> dimensions to a particular DHIS data element.
>> >>>
>> >>>
>> >>> Just some more clarification here: you can make category combos which
>> >>> you
>> >>> assign to data elements. However, it is not possible to assign just
>> >>> specific parts of a category combo (only some of the category
options)
>> >>> to
>> >>> a data element.
>> >>
>> >> Yes, this was exactly what I wanted. Assigning different categories
>> >> would
>> >> seem to break the dimensionality.
>> >>
>> >>>
>> >>> Then you must make a specific category (as the only one in
>> >>> or part of a new category combo) with just those options. It can be
>> >>> hell;
>> >>> in Tajikistan there were way over 20 categories I think, at least 10
>> >>> just
>> >>> on various age groups.
>> >>>
>> >>> Johan
>> >>>
>> >>
>> >> This was my fear.
>> >>
>> >> I will need to do some testing and see. I still fear it is not exactly
>> >> the
>> >> intended functionality.
>> >>
>> >> Basically, I think I need something akin to the exclusive/compulsory
>> >> groups
>> >> that are in place for organizational units, but instead, for arbitrary
>> >> dimensions. I will give a try and see what happens.
>> >>
>> >> Thanks,
>> >> Jason
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References