← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

I have done a bit more thinking on this, and would like to offer some
more examples up for discussion.

Basically, we have a lot of data elements that are somehow related to
each other, similar to my kooky example in my original mail. I assume
this is fairly common throughout other HMIS systems. Here,  malaria
attendance is broken down into various dimensions/category by patient
type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
over 5). But say you want to be able to pivot to look at outpatient,
inpatient and deaths totals (i.e. summed up by age). Well, you could
create a separate data element for this, but it sure would be nice to
be able to Pivot the data somehow.

So, I thought about the following. I have created a separate table in
my DHIS database.

I used the DHIS 2 interface to create two categories (Age  and Patient
Status) and populated these with various category options.

Now, I find the terminology very confusing. So, if you see me
interchanging terms, it is because I find the DHIS2 way of referring
to things a bit confusing. :)

Dimension ≡ Category
Dimensional element ≡ Category option ≡ Category combo ( I think)

Anyway, here is the helper table I created.

CREATE TABLE test_dataelementcategorycombo
(
  test_dataelementid integer NOT NULL,
  test_dataelementcategoryid integer NOT NULL,
  test_dataelementcategorycomboid integer NOT NULL,
  CONSTRAINT pk_testdataelementcategory PRIMARY KEY
(test_dataelementid, test_dataelementcategoryid,
test_dataelementcategorycomboid)
)
WITH (OIDS=FALSE);

So this is  a real simple table which references a data element, a
data element category, and a data element combo. The reference to a
data element category may be redundant, but anyway, lets leave it in
for now.

I populated the table with some data, which will be used to assign
dimensions to data elements. It looks like this in my DB, which looks
like this.

309;25250;25251
309;25257;25255
348;25250;25252
348;25257;25255
455;25250;25253
455;25257;25255

but of course this is meaningless to you. What do these values correspond to?

"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Age";"Under 1"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age";"Age 1-5"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over 5"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
Years";"Patient status";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
status";"OPD"

which can be produced by the following view.

CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
 SELECT dataelement.name, dataelementcategory.name AS dimension,
dataelementcategoryoption.name AS dimension_element
   FROM dataelement
   JOIN test_dataelementcategorycombo ON
test_dataelementcategorycombo.test_dataelementid =
dataelement.dataelementid
   JOIN dataelementcategory ON dataelementcategory.categoryid =
test_dataelementcategorycombo.test_dataelementcategoryid
   JOIN dataelementcategoryoption ON
test_dataelementcategorycombo.test_dataelementcategorycomboid =
dataelementcategoryoption.categoryoptionid;

So, that view just provides a human readable view of those integers
that I populated in the the test_dataelementcategorycombo table I
created above.  This table just assigns particular data elements to
different category options (dimensional elements).

OK, so far so good, but the problem now is, how to use this with the
aggregatedatavalue table? If we try and join this table directly, we
will have issues with duplicates in the pivot table, so we need to
transform the data slightly.

This should do the trick.

SELECT * FROM crosstab
	(
'SELECT name, dimension, dimension_element FROM
vw_dataelements_dimensions ORDER BY 1,2,3',
'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1 ASC'
	)
as
(
name character varying(230),
age character varying(160),
status character varying(160)
);


which returns this record set

"OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age 1-5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over 5";"OPD"
"OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under 1";"OPD"


OK, admittedly, I cheated a bit and used the crosstab function of
Postgresql, but I assume that this query could be rewritten with a few
more lines of code in standard SQL or some procedural language like
Java. Now, this record set looks like something that I can almost use
with the aggregateddatavalue table simply by joining up the table on
the appropriate dataelementid and pulling everything into a pivot
table.  I would not have any duplicated values and would have columns
like data element name, period, orgunit, age, patient status and of
course the value of the data element.  I hope that part is pretty
clear. Just join up that table to the aggregateddata table, and you
have pretty much what is needed to pull the data directly into a
PivotTable for further analysis.

This is not a complete example, but it is very close to what I need
here ,and I think this type of functionality would be much more useful
than the current data element categories functionality. Basically, all
that would be required, at least initially, would be another user
interface screen to allow the definition of which category(ies) and
category options a data element is a member of. The rest could ,in the
first instance be executed with custom SQL (obviously, I am partial to
this language and hobbled by the fact that I do not know Java), but
eventually this would need to be implemented somehow in Java.

 I am not sure if this really solves all of the issues surrounding
multidimensional analysis of data elements, but it seems to solve the
issues that I am having by trying to assign some sort of dimensional
hierarchy to data elements (similar to the exclusive/compulsory
functionality of orgunits). Any thoughts on this?

Best regards,
Jason




On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
<jason.p.pickering@xxxxxxxxx> wrote:
>
>
> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
>>
>> >> However, there does seem to be the ability to assign dimensions, there
>> >> does
>> >> not seem to be the ability to assign particular elements within those
>> >> dimensions to a particular DHIS data element.
>>
>>
>> Just some more clarification here: you can make category combos which you
>> assign to data elements. However, it is not possible to assign just
>> specific parts of a category combo (only some of the category options) to
>> a data element.
>
> Yes, this was exactly what I wanted. Assigning different categories would
> seem to break the dimensionality.
>
>>
>> Then you must make a specific category (as the only one in
>> or part of a new category combo) with just those options. It can be hell;
>> in Tajikistan there were way over 20 categories I think, at least 10 just
>> on various age groups.
>>
>> Johan
>>
>
> This was my fear.
>
> I will need to do some testing and see. I still fear it is not exactly the
> intended functionality.
>
> Basically, I think I need something akin to the exclusive/compulsory groups
> that are in place for organizational units, but instead, for arbitrary
> dimensions. I will give a try and see what happens.
>
> Thanks,
> Jason
>
>
>
>
>



Follow ups

References