← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

Hi Jason and Johan

I'm really pleased to see you having this discussion as I have been
grappling with a similar issue which involves unravelling categories,
category options and combos into something more familiar.  I have reached
similar conclusions regarding nomenclature:

category = dimension
categorycombo - I have been calling a dimension set (it bears a strong, and
useful, resemblance to xslt:attribute-set)
category option - I like your suggestion of DimensionalElement.  I am going
to start calling it that too.

In my case I need to export (and import data) into a standard format called
sdmx.  So whereas in the DHIS2 native DXF we export datavalues with
effectively three dimensions (source, period, categorycombooption) the last
dimension is a sort of uber-dimension.  Like a peppercorn or a cardamon
seed, when you break it open it explodes its rich complexity of dimensions.

In sdmx we need the dimensions exploded.  So data values look like:

<dataset>
  <datavalue name="TB test given" uid="44344 ...44" gender="Male" age="0-5"
value="32" />
   <datavalue name="TB test given" uid="44344 ...44" gender="Female"
age="0-5" value="38" />
..
</dataset>

My approach to unpicking the dimensions from the dxf file is to transform it
with an xslt transformation which is still incomplete but seems to work
well.

One other nomenclatures issue which has surfaced as a result is what we call
a "dataset".  In DHIS2, if I understand correctly, a dataset corresponds
roughly to all the dataelements which might occur on a datacollection form.
If we view all dataelements as having just the three "dimensions" then all
is well, but if we explode the actual dimensions then we have an issue.  In
the sdmx model a dataset consists only of dataelements with the same
dimensionset.  After discussing this with Ola we have reached the conclusion
that we need another level of grouping, primarily for the UI - eg FormSet or
ReportSet which allows us to group related datasets.  But that is an aside
from what you are talking about.

I know that you guys can do magic with sql, but it seems that we should try
to capture some of this and place it down in the datamodel API.  It occurs
to me that for a multidimensional dataelement we might benefit from some
utility methods to retrieve slices and dices which might assist in
constructing the pivot tables around dimensions.  Does this sound like the
right thing to do.

Regards
Bob

2009/9/24 <johansa@xxxxxxxxxx>

> Jason,
> I will leave to others to comment the code, but I have a few comments...
>
> > I have done a bit more thinking on this, and would like to offer some
> > more examples up for discussion.
> >
> > Basically, we have a lot of data elements that are somehow related to
> > each other, similar to my kooky example in my original mail. I assume
> > this is fairly common throughout other HMIS systems. Here,  malaria
> > attendance is broken down into various dimensions/category by patient
> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
> > over 5). But say you want to be able to pivot to look at outpatient,
> > inpatient and deaths totals (i.e. summed up by age). Well, you could
> > create a separate data element for this, but it sure would be nice to
> > be able to Pivot the data somehow.
>
> In the Sierra Leone db, Edem and Romain set up views that pulled the
> categories through into a "Category" pivot field, which you can then use
> to get what you want. Simply tick the categories (see below) you want to
> see, and group them together in excel. Maybe Edem and Romain can help
> further here.
>
>
> > Dimension ? Category
> > Dimensional element ? Category option ? Category combo ( I think)
>
> The right symbol disappeared from my reply-mail here, but some
> clarification:
>
> Crosstab Dimension (age AND gender) = Category combo
> Dimension (age, gender) = Category
> Dimensional element (inpatient, outpatient, death, under1, 1-5, and over
> 5) = Category option
>
> So by assigning a DE the category combo of "gender_age", you get 9
> dimensional elements, 3 category options (in category age) by 3 category
> options (in category gender)
>
> Johan
>
>
>
>
> > Anyway, here is the helper table I created.
> >
> > CREATE TABLE test_dataelementcategorycombo
> > (
> >   test_dataelementid integer NOT NULL,
> >   test_dataelementcategoryid integer NOT NULL,
> >   test_dataelementcategorycomboid integer NOT NULL,
> >   CONSTRAINT pk_testdataelementcategory PRIMARY KEY
> > (test_dataelementid, test_dataelementcategoryid,
> > test_dataelementcategorycomboid)
> > )
> > WITH (OIDS=FALSE);
> >
> > So this is  a real simple table which references a data element, a
> > data element category, and a data element combo. The reference to a
> > data element category may be redundant, but anyway, lets leave it in
> > for now.
> >
> > I populated the table with some data, which will be used to assign
> > dimensions to data elements. It looks like this in my DB, which looks
> > like this.
> >
> > 309;25250;25251
> > 309;25257;25255
> > 348;25250;25252
> > 348;25257;25255
> > 455;25250;25253
> > 455;25257;25255
> >
> > but of course this is meaningless to you. What do these values correspond
> > to?
> >
> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Age";"Under
> 1"
> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
> > Years";"Age";"Age 1-5"
> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over 5"
> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
> > status";"OPD"
> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
> > Years";"Patient status";"OPD"
> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
> > status";"OPD"
> >
> > which can be produced by the following view.
> >
> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
> >  SELECT dataelement.name, dataelementcategory.name AS dimension,
> > dataelementcategoryoption.name AS dimension_element
> >    FROM dataelement
> >    JOIN test_dataelementcategorycombo ON
> > test_dataelementcategorycombo.test_dataelementid =
> > dataelement.dataelementid
> >    JOIN dataelementcategory ON dataelementcategory.categoryid =
> > test_dataelementcategorycombo.test_dataelementcategoryid
> >    JOIN dataelementcategoryoption ON
> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
> > dataelementcategoryoption.categoryoptionid;
> >
> > So, that view just provides a human readable view of those integers
> > that I populated in the the test_dataelementcategorycombo table I
> > created above.  This table just assigns particular data elements to
> > different category options (dimensional elements).
> >
> > OK, so far so good, but the problem now is, how to use this with the
> > aggregatedatavalue table? If we try and join this table directly, we
> > will have issues with duplicates in the pivot table, so we need to
> > transform the data slightly.
> >
> > This should do the trick.
> >
> > SELECT * FROM crosstab
> >       (
> > 'SELECT name, dimension, dimension_element FROM
> > vw_dataelements_dimensions ORDER BY 1,2,3',
> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
> ASC'
> >       )
> > as
> > (
> > name character varying(230),
> > age character varying(160),
> > status character varying(160)
> > );
> >
> >
> > which returns this record set
> >
> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
> > 1-5";"OPD"
> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over 5";"OPD"
> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
> 1";"OPD"
> >
> >
> > OK, admittedly, I cheated a bit and used the crosstab function of
> > Postgresql, but I assume that this query could be rewritten with a few
> > more lines of code in standard SQL or some procedural language like
> > Java. Now, this record set looks like something that I can almost use
> > with the aggregateddatavalue table simply by joining up the table on
> > the appropriate dataelementid and pulling everything into a pivot
> > table.  I would not have any duplicated values and would have columns
> > like data element name, period, orgunit, age, patient status and of
> > course the value of the data element.  I hope that part is pretty
> > clear. Just join up that table to the aggregateddata table, and you
> > have pretty much what is needed to pull the data directly into a
> > PivotTable for further analysis.
> >
> > This is not a complete example, but it is very close to what I need
> > here ,and I think this type of functionality would be much more useful
> > than the current data element categories functionality. Basically, all
> > that would be required, at least initially, would be another user
> > interface screen to allow the definition of which category(ies) and
> > category options a data element is a member of. The rest could ,in the
> > first instance be executed with custom SQL (obviously, I am partial to
> > this language and hobbled by the fact that I do not know Java), but
> > eventually this would need to be implemented somehow in Java.
> >
> >  I am not sure if this really solves all of the issues surrounding
> > multidimensional analysis of data elements, but it seems to solve the
> > issues that I am having by trying to assign some sort of dimensional
> > hierarchy to data elements (similar to the exclusive/compulsory
> > functionality of orgunits). Any thoughts on this?
> >
> > Best regards,
> > Jason
> >
> >
> >
> >
> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
> > <jason.p.pickering@xxxxxxxxx> wrote:
> >>
> >>
> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
> >>>
> >>> >> However, there does seem to be the ability to assign dimensions,
> >>> there
> >>> >> does
> >>> >> not seem to be the ability to assign particular elements within
> >>> those
> >>> >> dimensions to a particular DHIS data element.
> >>>
> >>>
> >>> Just some more clarification here: you can make category combos which
> >>> you
> >>> assign to data elements. However, it is not possible to assign just
> >>> specific parts of a category combo (only some of the category options)
> >>> to
> >>> a data element.
> >>
> >> Yes, this was exactly what I wanted. Assigning different categories
> >> would
> >> seem to break the dimensionality.
> >>
> >>>
> >>> Then you must make a specific category (as the only one in
> >>> or part of a new category combo) with just those options. It can be
> >>> hell;
> >>> in Tajikistan there were way over 20 categories I think, at least 10
> >>> just
> >>> on various age groups.
> >>>
> >>> Johan
> >>>
> >>
> >> This was my fear.
> >>
> >> I will need to do some testing and see. I still fear it is not exactly
> >> the
> >> intended functionality.
> >>
> >> Basically, I think I need something akin to the exclusive/compulsory
> >> groups
> >> that are in place for organizational units, but instead, for arbitrary
> >> dimensions. I will give a try and see what happens.
> >>
> >> Thanks,
> >> Jason
> >>
> >>
> >>
> >>
> >>
> >
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References