← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

2009/9/25 Jason Pickering <jason.p.pickering@xxxxxxxxx>

> Hi there.
>
> My basic issue with the category/category combo is that it appears to be a
> one-to-one relationship with data elements. If I look at the data model,
> there is a one-to-one relationship between dataelement and categorycomboid.
> For a given category combo, you can have multiple options. So, you can
> establish a relationship for a given data element and a group of category
> options.
>
> Let me try and describe the issue. We have a set of data elements related
> to malaria for this example. We would like to be able to pivot the data on
> other dimensions dimensions (Data element, age, disease, patient status).
> Obviously there are other dimensions that are pivotable (orgunit, period,
> dataset)
>

Aren't your diseases already in the data element name? While data element is
one of many dimensions e.g. in a pivot table it is more than just a
dimension in the DHIS model I guess, so it is important to be consistent
breaking up "name of what you capture (e.g. 1.4 data element names) into
data element + categories. Diseases are normally what is repeated across
many datasets so I would use that as the Data element. But that is a design
consideration I guess.



>
> The data elements look like this. I have put the dimensions in square
> brackets, and the dimensional elements into curly brackets.
>
> [Data element, Age, Disease, Patient status]
> Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
> Deaths}
> Deaths Confirmed Malaria 1 to Under 5 Years  {1-5, Malaria Cases, Deaths}
> Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
> Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
> IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
> Cases, IP}
> IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
> IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
> IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
> Deaths}
> OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria
> Cases, OPD}
> OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
> Cases, OPD}
> OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
> OPD}
> OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
> OPD}
>

OK,  I hope this is pretty clear. Obviously, there are more data elements
> (Typhoid, Yellow fever, etc).

I might want to know how many Under 1 deaths I have had for all diseases, or
> how many OPD cases I have had for each disease. How can I do this with the
> existing data model? It is not obvious to me because there is no
> relationship between dimensional elements (categoryoptions) to each other.
> Category options can be related through a cateogry combination, but since
> data elements can only be assigned a single category option, the
> dimensionality is broken once it gets time to pull the data into a pivot
> table.
>
>
Hi Jason,

Thanks for all these examples and suggestions. I have been waiting for users
to request dimensional data analysis for a long time, as it is something
that has been neglected in DHIS so far. There is functionality to capture
data linked to categoryoptioncombos (Male, <5),
(Female,<5),(Male,>5),(Female,>5), but there is no functionality yet
(although blueprints exist) to use these dimensions in report tables, or in
data mart in general. Currently you can get the total for the data element
or the flattened out value (data element + categoryoptioncombo, as above.
After discussing with Bob, and I know he has investigated the model in
detail I understand that it is possible to bring categories and category
options into e.g. a pivot table or report table and analyse data elements
across these, e.g. data element + male, data element + <5, etc. So there
must (I hope) be something in the database that you're not seeing that can
help you pull this into a pivot table. You SHOULD be able to bring in
categories as your  pivot fields (dimensions), and for each of them list
their category options (dimensional elements). In my example there should be
at least three dimensions (with options): data element name (Malaria,
Typhoid, Tuberculosis, gender (male, female, age (<5,>5).

At least according to Bob this is possible from the API, I hope also from
the database directly. We want to make this available in data mart tables or
report tables as soon as possible I think.

There are some minor changes to the model that are needed in order to get
here. Bob mentioned datasets. To be able to use the multidimensional model
for anything else than data entry in a custom form we need to make it a
constraint that one data set can only have 1 categorycombo. I guess this
will lead to more datasets than we have today in most places at least, and
it breaks the 1-1 relationship between data set and data entry form that we
have today. We need to look at forms as a separate object that can hold 1 or
more data sets. This can e.g. allow for a data entry form with multiple
autogenerated tables based on the datasets' categorycombo (like we have
today for datasets that actually have only 1 categoprycombo.

Another important constraint I see is to limit the use of dimensional data
elements to use cases where there is a hierarchy of data values that
actually match up to a total. E.g. the categorycombo gender+age with
datavalues for (Male, <5), (Female,<5),(Male,>5),(Female,>5) match up to a
total of all cases for that data element. I know that the model has been
used as a tool to shorten data element names without any logical hierarchy
where values add up. E.g. BCG + "in stock", BCG "doses wasted", BCG + "Doses
received". Although I can see the point of reducing data element names I
this exampel is more of a presentation or view problem and this break up can
be sorted out in the data entry form design, and should not be in the data
element model as it makes no sense for aggregation and data analysis to have
those breakups.

I also support a name change of the multidimensional model. I prefer
dimensions, dimensional elements(or options), and dimension  sets.



>  In the incomplete example that I gave yesterday, I established a
> one-to-many relationship between a data element and a dimension. If I
> understand the current data model, I would have to create a separate
> categorycombo  for each of these data elements, and assign this
> categorycombo to the data element. Now, I might be able to unfold the
> dimensions using the categories and categorycombos. I it is not apparent how
> the dimensional elements correspond themselves to a particular dimension, as
> there is no relation for this in the database as I can see it.
>
>  As (Johan pointed out a few mails ago, if I understand him correctly) is
> different categorycombo's can be created for individual data elements, and
> assigned to these elements. However, this seems to be 1) incredibly
> inefficient and 2) does not establish any relationship between dimensional
> elements and dimensions. Perhaps it is there, and maybe it has been done in
> SL, but the SQL is not apparent to me at all.
>
>
Categorycombos will be assigned to a whole datasets of data elements sharing
the dimension set. We should look into some kind of bulk update of data
elements in the UI where you can apply one categorycombo to a list of ticked
data elements or something.



>  It would appear to me, looking from an SQL perspective, that a one-to-many
> relationship between a data element, a dimension (category) and dimensional
> element (category combo) would be much more effieicnet, and highly usable
> from an SQL perspective. As I mentioned in my mail, I am not sure how easy
> this would be to implement in a procedural language like Java, but I assume
> it should be possible to either do it this way, or rewrite my Postgres
> proprietary query in standard SQL (which there are ways to do with ANSI
> SQL).  This would require modification to the data model (similar to the
> table I provided yesterday) and modification to the UI to allow users to 1)
> select a dimension (category) 2) Select a dimensional element for the given
> dimension.  This would populate the table with a dataelementid, a
> dimensionid (categoryid) and a dimensional element (cateogryoptionid).
>
> My gut feeling this is exactly the same functionality as has currently been
> implemented for organizational units. Users can define a hierarchy for
> organizational units, and then assign them to
> categories/dimension/organizational group sets, decide whether the groups
> are compulsory and exlusive, and then assign a particular organizational
> unit to a particular group (which is analogous to a dimensional element).
> Organizational group sets define the dimension, and one-to-one assignment of
> an organizational unit to a particular organizational group defines which
> dimensional element the organizational unit is a member of. These dimensions
> can then be used in PivotTable analyses, where the orgunitgroupsets become
> dimensions, and orgunitgroups become dimensional elements..
>
> I beleive that data elements are no different than organizational units.
> They should be able to be grouped into some sort of hierarchy and pivoted on
> any dimension.  Data elements groups establish a one-to-many relationship
> between data elements and a data element group, but there is no concept of
> how data element groups relate to each other.
> I think this is perhaps the same concept you mention, ReportSet.
>
>
I think data element groups should play a role of more broader dimensions,
not linked to how data values are captured, but reflect themes or
collections of data that makes sense to look at together. This means
groupings across datasets (or across paper forms as we often like to see
it). This implies that we probably would not like to use the data element
categories and groups together i.e a pivot table, but in stead look at data
elements totals (sum of all categories) and groups only since the data
elements would not necessarily share categories. This makes sense e.g. when
doing correlation beetween antenatal care, deliveries and immunisation,
following the life cycle of clients. These data elements do not share age or
gender break downs, but still are very useful to analyse together and a
group would help to quickly show these data elements together in a table or
report.

So for this reason I am a bit reluctant to merge data element
categories/dimensions and groups. I know that a lot of data analysis does
not require all the break downs of age and gender etc. and its always a
battle whether you put them in or not. The way we can easily hide all that
detail by just looking at the total of each data element without any
categories and still add other useful groupings provides flexibility I
think.

Totally agree with Knut that we need to document much better what we mean by
data set, data element group, data element category, and what the intended
use is. To an outsider they might all mean the same thing, but inside DHIS
they serve specific needs.


> I suspect we would need to potentially rethink the entire concept of
> multidimensionality if we really wanted to get it right. It would see to me
> that the DHIS datamodel and associated aggregation methods have been
> hardwired into aggregation across time (period) and geography (orgunit).
> What we can do with PivotTables and (and OLAP) is to aggregate across any
> possible dimension, slicing as you mention ,on any dimension . I am not sure
> this will be so simple to implement but I think there is a way to do it,
> without major modifications.
>
> I am not sure it solves the SDMX issue. There are potential issues related
> to "ragged" dimensions and how these get handled. Some data elements might
> have three dimensions, while others may have more. I have not thought about
> this in detail, but know it is an issue with cross-tab queries in SQL. You
> normally have to know how many dimensions you are working with in order to
> perform a cross-tab, but there are dynamic solutions. Perhaps this could be
> dealt with somehow in SDMX.
>
> From Bob I understand that this will not be a problem as every dataset
would define its own "structure" that will be used for exchanging data on
that specific set. Bob can fill in..


> Anyway ,I am rambling. Hope this mail helps though to push my point
> further. Once I get the SQL from SL, I will see if perhaps it has been done
> already, and that I am just writing long emails for nothing. :)
>
> Regards,
> Jason
>
>
>
While writing this very long email I noticed that Bob has replied
already....

Ola
---------

>
>
>
> On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
> > Hi Jason and Johan
> >
> > I'm really pleased to see you having this discussion as I have been
> > grappling with a similar issue which involves unravelling categories,
> > category options and combos into something more familiar.  I have reached
> > similar conclusions regarding nomenclature:
> >
> > category = dimension
> > categorycombo - I have been calling a dimension set (it bears a strong,
> and
> > useful, resemblance to xslt:attribute-set)
> > category option - I like your suggestion of DimensionalElement.  I am
> going
> > to start calling it that too.
> >
> > In my case I need to export (and import data) into a standard format
> called
> > sdmx.  So whereas in the DHIS2 native DXF we export datavalues with
> > effectively three dimensions (source, period, categorycombooption) the
> last
> > dimension is a sort of uber-dimension.  Like a peppercorn or a cardamon
> > seed, when you break it open it explodes its rich complexity of
> dimensions.
> >
> > In sdmx we need the dimensions exploded.  So data values look like:
> >
> > <dataset>
> >   <datavalue name="TB test given" uid="44344 ...44" gender="Male"
> age="0-5"
> > value="32" />
> >    <datavalue name="TB test given" uid="44344 ...44" gender="Female"
> > age="0-5" value="38" />
> > ..
> > </dataset>
> >
> > My approach to unpicking the dimensions from the dxf file is to transform
> it
> > with an xslt transformation which is still incomplete but seems to work
> > well.
> >
> > One other nomenclatures issue which has surfaced as a result is what we
> call
> > a "dataset".  In DHIS2, if I understand correctly, a dataset corresponds
> > roughly to all the dataelements which might occur on a datacollection
> form.
> > If we view all dataelements as having just the three "dimensions" then
> all
> > is well, but if we explode the actual dimensions then we have an issue.
> In
> > the sdmx model a dataset consists only of dataelements with the same
> > dimensionset.  After discussing this with Ola we have reached the
> conclusion
> > that we need another level of grouping, primarily for the UI - eg FormSet
> or
> > ReportSet which allows us to group related datasets.  But that is an
> aside
> > from what you are talking about.
> >
> > I know that you guys can do magic with sql, but it seems that we should
> try
> > to capture some of this and place it down in the datamodel API.  It
> occurs
> > to me that for a multidimensional dataelement we might benefit from some
> > utility methods to retrieve slices and dices which might assist in
> > constructing the pivot tables around dimensions.  Does this sound like
> the
> > right thing to do.
> >
> > Regards
> > Bob
> >
> > 2009/9/24 <johansa@xxxxxxxxxx>
> >>
> >> Jason,
> >> I will leave to others to comment the code, but I have a few comments...
> >>
> >> > I have done a bit more thinking on this, and would like to offer some
> >> > more examples up for discussion.
> >> >
> >> > Basically, we have a lot of data elements that are somehow related to
> >> > each other, similar to my kooky example in my original mail. I assume
> >> > this is fairly common throughout other HMIS systems. Here,  malaria
> >> > attendance is broken down into various dimensions/category by patient
> >> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
> >> > over 5). But say you want to be able to pivot to look at outpatient,
> >> > inpatient and deaths totals (i.e. summed up by age). Well, you could
> >> > create a separate data element for this, but it sure would be nice to
> >> > be able to Pivot the data somehow.
> >>
> >> In the Sierra Leone db, Edem and Romain set up views that pulled the
> >> categories through into a "Category" pivot field, which you can then use
> >> to get what you want. Simply tick the categories (see below) you want to
> >> see, and group them together in excel. Maybe Edem and Romain can help
> >> further here.
> >>
> >>
> >> > Dimension ? Category
> >> > Dimensional element ? Category option ? Category combo ( I think)
> >>
> >> The right symbol disappeared from my reply-mail here, but some
> >> clarification:
> >>
> >> Crosstab Dimension (age AND gender) = Category combo
> >> Dimension (age, gender) = Category
> >> Dimensional element (inpatient, outpatient, death, under1, 1-5, and over
> >> 5) = Category option
> >>
> >> So by assigning a DE the category combo of "gender_age", you get 9
> >> dimensional elements, 3 category options (in category age) by 3 category
> >> options (in category gender)
> >>
> >> Johan
> >>
> >>
> >>
> >>
> >> > Anyway, here is the helper table I created.
> >> >
> >> > CREATE TABLE test_dataelementcategorycombo
> >> > (
> >> >   test_dataelementid integer NOT NULL,
> >> >   test_dataelementcategoryid integer NOT NULL,
> >> >   test_dataelementcategorycomboid integer NOT NULL,
> >> >   CONSTRAINT pk_testdataelementcategory PRIMARY KEY
> >> > (test_dataelementid, test_dataelementcategoryid,
> >> > test_dataelementcategorycomboid)
> >> > )
> >> > WITH (OIDS=FALSE);
> >> >
> >> > So this is  a real simple table which references a data element, a
> >> > data element category, and a data element combo. The reference to a
> >> > data element category may be redundant, but anyway, lets leave it in
> >> > for now.
> >> >
> >> > I populated the table with some data, which will be used to assign
> >> > dimensions to data elements. It looks like this in my DB, which looks
> >> > like this.
> >> >
> >> > 309;25250;25251
> >> > 309;25257;25255
> >> > 348;25250;25252
> >> > 348;25257;25255
> >> > 455;25250;25253
> >> > 455;25257;25255
> >> >
> >> > but of course this is meaningless to you. What do these values
> >> > correspond
> >> > to?
> >> >
> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1
> Year";"Age";"Under
> >> > 1"
> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
> >> > Years";"Age";"Age 1-5"
> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over
> >> > 5"
> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
> >> > status";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
> >> > Years";"Patient status";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
> >> > status";"OPD"
> >> >
> >> > which can be produced by the following view.
> >> >
> >> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
> >> >  SELECT dataelement.name, dataelementcategory.name AS dimension,
> >> > dataelementcategoryoption.name AS dimension_element
> >> >    FROM dataelement
> >> >    JOIN test_dataelementcategorycombo ON
> >> > test_dataelementcategorycombo.test_dataelementid =
> >> > dataelement.dataelementid
> >> >    JOIN dataelementcategory ON dataelementcategory.categoryid =
> >> > test_dataelementcategorycombo.test_dataelementcategoryid
> >> >    JOIN dataelementcategoryoption ON
> >> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
> >> > dataelementcategoryoption.categoryoptionid;
> >> >
> >> > So, that view just provides a human readable view of those integers
> >> > that I populated in the the test_dataelementcategorycombo table I
> >> > created above.  This table just assigns particular data elements to
> >> > different category options (dimensional elements).
> >> >
> >> > OK, so far so good, but the problem now is, how to use this with the
> >> > aggregatedatavalue table? If we try and join this table directly, we
> >> > will have issues with duplicates in the pivot table, so we need to
> >> > transform the data slightly.
> >> >
> >> > This should do the trick.
> >> >
> >> > SELECT * FROM crosstab
> >> >       (
> >> > 'SELECT name, dimension, dimension_element FROM
> >> > vw_dataelements_dimensions ORDER BY 1,2,3',
> >> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
> >> > ASC'
> >> >       )
> >> > as
> >> > (
> >> > name character varying(230),
> >> > age character varying(160),
> >> > status character varying(160)
> >> > );
> >> >
> >> >
> >> > which returns this record set
> >> >
> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
> >> > 1-5";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
> >> > 5";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
> >> > 1";"OPD"
> >> >
> >> >
> >> > OK, admittedly, I cheated a bit and used the crosstab function of
> >> > Postgresql, but I assume that this query could be rewritten with a few
> >> > more lines of code in standard SQL or some procedural language like
> >> > Java. Now, this record set looks like something that I can almost use
> >> > with the aggregateddatavalue table simply by joining up the table on
> >> > the appropriate dataelementid and pulling everything into a pivot
> >> > table.  I would not have any duplicated values and would have columns
> >> > like data element name, period, orgunit, age, patient status and of
> >> > course the value of the data element.  I hope that part is pretty
> >> > clear. Just join up that table to the aggregateddata table, and you
> >> > have pretty much what is needed to pull the data directly into a
> >> > PivotTable for further analysis.
> >> >
> >> > This is not a complete example, but it is very close to what I need
> >> > here ,and I think this type of functionality would be much more useful
> >> > than the current data element categories functionality. Basically, all
> >> > that would be required, at least initially, would be another user
> >> > interface screen to allow the definition of which category(ies) and
> >> > category options a data element is a member of. The rest could ,in the
> >> > first instance be executed with custom SQL (obviously, I am partial to
> >> > this language and hobbled by the fact that I do not know Java), but
> >> > eventually this would need to be implemented somehow in Java.
> >> >
> >> >  I am not sure if this really solves all of the issues surrounding
> >> > multidimensional analysis of data elements, but it seems to solve the
> >> > issues that I am having by trying to assign some sort of dimensional
> >> > hierarchy to data elements (similar to the exclusive/compulsory
> >> > functionality of orgunits). Any thoughts on this?
> >> >
> >> > Best regards,
> >> > Jason
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
> >> > <jason.p.pickering@xxxxxxxxx> wrote:
> >> >>
> >> >>
> >> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
> >> >>>
> >> >>> >> However, there does seem to be the ability to assign dimensions,
> >> >>> there
> >> >>> >> does
> >> >>> >> not seem to be the ability to assign particular elements within
> >> >>> those
> >> >>> >> dimensions to a particular DHIS data element.
> >> >>>
> >> >>>
> >> >>> Just some more clarification here: you can make category combos
> which
> >> >>> you
> >> >>> assign to data elements. However, it is not possible to assign just
> >> >>> specific parts of a category combo (only some of the category
> options)
> >> >>> to
> >> >>> a data element.
> >> >>
> >> >> Yes, this was exactly what I wanted. Assigning different categories
> >> >> would
> >> >> seem to break the dimensionality.
> >> >>
> >> >>>
> >> >>> Then you must make a specific category (as the only one in
> >> >>> or part of a new category combo) with just those options. It can be
> >> >>> hell;
> >> >>> in Tajikistan there were way over 20 categories I think, at least 10
> >> >>> just
> >> >>> on various age groups.
> >> >>>
> >> >>> Johan
> >> >>>
> >> >>
> >> >> This was my fear.
> >> >>
> >> >> I will need to do some testing and see. I still fear it is not
> exactly
> >> >> the
> >> >> intended functionality.
> >> >>
> >> >> Basically, I think I need something akin to the exclusive/compulsory
> >> >> groups
> >> >> that are in place for organizational units, but instead, for
> arbitrary
> >> >> dimensions. I will give a try and see what happens.
> >> >>
> >> >> Thanks,
> >> >> Jason
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >> More help   : https://help.launchpad.net/ListHelp
> >
> >
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> More help   : https://help.launchpad.net/ListHelp
>
>

References