← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

The one-to-one relationship mentioned between dataelement and categorycombo
is not correct !

The realtionship is one-to-many. A categorycombo can be assigned for many
dataelements. But a dataelement can have only one categorycombo.

Thank you
Abyot.

On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering <
jason.p.pickering@xxxxxxxxx> wrote:

> Hi there.
>
> My basic issue with the category/category combo is that it appears to be a
> one-to-one relationship with data elements. If I look at the data model,
> there is a one-to-one relationship between dataelement and categorycomboid.
> For a given category combo, you can have multiple options. So, you can
> establish a relationship for a given data element and a group of category
> options.
>
> Let me try and describe the issue. We have a set of data elements related
> to malaria for this example. We would like to be able to pivot the data on
> other dimensions dimensions (Data element, age, disease, patient status).
> Obviously there are other dimensions that are pivotable (orgunit, period,
> dataset)
>
> The data elements look like this. I have put the dimensions in square
> brackets, and the dimensional elements into curly brackets.
>
> [Data element, Age, Disease, Patient status]
> Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
> Deaths}
> Deaths Confirmed Malaria 1 to Under 5 Years  {1-5, Malaria Cases, Deaths}
> Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
> Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
> IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
> Cases, IP}
> IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
> IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
> IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
> Deaths}
> OPD 1st Attendance Confirmed Malaria total (composed of) {All ages, Malaria
> Cases, OPD}
> OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
> Cases, OPD}
> OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
> OPD}
> OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
> OPD}
>
> OK,  I hope this is pretty clear. Obviously, there are more data elements
> (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I
> have had for all diseases, or how many OPD cases I have had for each
> disease. How can I do this with the existing data model? It is not obvious
> to me because there is no relationship between dimensional elements
> (categoryoptions) to each other. Category options can be related through a
> cateogry combination, but since data elements can only be assigned a single
> category option, the dimensionality is broken once it gets time to pull the
> data into a pivot table.
>
>  In the incomplete example that I gave yesterday, I established a
> one-to-many relationship between a data element and a dimension. If I
> understand the current data model, I would have to create a separate
> categorycombo  for each of these data elements, and assign this
> categorycombo to the data element. Now, I might be able to unfold the
> dimensions using the categories and categorycombos. I it is not apparent how
> the dimensional elements correspond themselves to a particular dimension, as
> there is no relation for this in the database as I can see it.
>
>  As (Johan pointed out a few mails ago, if I understand him correctly) is
> different categorycombo's can be created for individual data elements, and
> assigned to these elements. However, this seems to be 1) incredibly
> inefficient and 2) does not establish any relationship between dimensional
> elements and dimensions. Perhaps it is there, and maybe it has been done in
> SL, but the SQL is not apparent to me at all.
>
>  It would appear to me, looking from an SQL perspective, that a one-to-many
> relationship between a data element, a dimension (category) and dimensional
> element (category combo) would be much more effieicnet, and highly usable
> from an SQL perspective. As I mentioned in my mail, I am not sure how easy
> this would be to implement in a procedural language like Java, but I assume
> it should be possible to either do it this way, or rewrite my Postgres
> proprietary query in standard SQL (which there are ways to do with ANSI
> SQL).  This would require modification to the data model (similar to the
> table I provided yesterday) and modification to the UI to allow users to 1)
> select a dimension (category) 2) Select a dimensional element for the given
> dimension.  This would populate the table with a dataelementid, a
> dimensionid (categoryid) and a dimensional element (cateogryoptionid).
>
> My gut feeling this is exactly the same functionality as has currently been
> implemented for organizational units. Users can define a hierarchy for
> organizational units, and then assign them to
> categories/dimension/organizational group sets, decide whether the groups
> are compulsory and exlusive, and then assign a particular organizational
> unit to a particular group (which is analogous to a dimensional element).
> Organizational group sets define the dimension, and one-to-one assignment of
> an organizational unit to a particular organizational group defines which
> dimensional element the organizational unit is a member of. These dimensions
> can then be used in PivotTable analyses, where the orgunitgroupsets become
> dimensions, and orgunitgroups become dimensional elements..
>
> I beleive that data elements are no different than organizational units.
> They should be able to be grouped into some sort of hierarchy and pivoted on
> any dimension.  Data elements groups establish a one-to-many relationship
> between data elements and a data element group, but there is no concept of
> how data element groups relate to each other.
> I think this is perhaps the same concept you mention, ReportSet.
>
> I suspect we would need to potentially rethink the entire concept of
> multidimensionality if we really wanted to get it right. It would see to me
> that the DHIS datamodel and associated aggregation methods have been
> hardwired into aggregation across time (period) and geography (orgunit).
> What we can do with PivotTables and (and OLAP) is to aggregate across any
> possible dimension, slicing as you mention ,on any dimension . I am not sure
> this will be so simple to implement but I think there is a way to do it,
> without major modifications.
>
> I am not sure it solves the SDMX issue. There are potential issues related
> to "ragged" dimensions and how these get handled. Some data elements might
> have three dimensions, while others may have more. I have not thought about
> this in detail, but know it is an issue with cross-tab queries in SQL. You
> normally have to know how many dimensions you are working with in order to
> perform a cross-tab, but there are dynamic solutions. Perhaps this could be
> dealt with somehow in SDMX.
>
> Anyway ,I am rambling. Hope this mail helps though to push my point
> further. Once I get the SQL from SL, I will see if perhaps it has been done
> already, and that I am just writing long emails for nothing. :)
>
> Regards,
> Jason
>
>
>
>
>
> On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
> > Hi Jason and Johan
> >
> > I'm really pleased to see you having this discussion as I have been
> > grappling with a similar issue which involves unravelling categories,
> > category options and combos into something more familiar.  I have reached
> > similar conclusions regarding nomenclature:
> >
> > category = dimension
> > categorycombo - I have been calling a dimension set (it bears a strong,
> and
> > useful, resemblance to xslt:attribute-set)
> > category option - I like your suggestion of DimensionalElement.  I am
> going
> > to start calling it that too.
> >
> > In my case I need to export (and import data) into a standard format
> called
> > sdmx.  So whereas in the DHIS2 native DXF we export datavalues with
> > effectively three dimensions (source, period, categorycombooption) the
> last
> > dimension is a sort of uber-dimension.  Like a peppercorn or a cardamon
> > seed, when you break it open it explodes its rich complexity of
> dimensions.
> >
> > In sdmx we need the dimensions exploded.  So data values look like:
> >
> > <dataset>
> >   <datavalue name="TB test given" uid="44344 ...44" gender="Male"
> age="0-5"
> > value="32" />
> >    <datavalue name="TB test given" uid="44344 ...44" gender="Female"
> > age="0-5" value="38" />
> > ..
> > </dataset>
> >
> > My approach to unpicking the dimensions from the dxf file is to transform
> it
> > with an xslt transformation which is still incomplete but seems to work
> > well.
> >
> > One other nomenclatures issue which has surfaced as a result is what we
> call
> > a "dataset".  In DHIS2, if I understand correctly, a dataset corresponds
> > roughly to all the dataelements which might occur on a datacollection
> form.
> > If we view all dataelements as having just the three "dimensions" then
> all
> > is well, but if we explode the actual dimensions then we have an issue.
> In
> > the sdmx model a dataset consists only of dataelements with the same
> > dimensionset.  After discussing this with Ola we have reached the
> conclusion
> > that we need another level of grouping, primarily for the UI - eg FormSet
> or
> > ReportSet which allows us to group related datasets.  But that is an
> aside
> > from what you are talking about.
> >
> > I know that you guys can do magic with sql, but it seems that we should
> try
> > to capture some of this and place it down in the datamodel API.  It
> occurs
> > to me that for a multidimensional dataelement we might benefit from some
> > utility methods to retrieve slices and dices which might assist in
> > constructing the pivot tables around dimensions.  Does this sound like
> the
> > right thing to do.
> >
> > Regards
> > Bob
> >
> > 2009/9/24 <johansa@xxxxxxxxxx>
> >>
> >> Jason,
> >> I will leave to others to comment the code, but I have a few comments...
> >>
> >> > I have done a bit more thinking on this, and would like to offer some
> >> > more examples up for discussion.
> >> >
> >> > Basically, we have a lot of data elements that are somehow related to
> >> > each other, similar to my kooky example in my original mail. I assume
> >> > this is fairly common throughout other HMIS systems. Here,  malaria
> >> > attendance is broken down into various dimensions/category by patient
> >> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
> >> > over 5). But say you want to be able to pivot to look at outpatient,
> >> > inpatient and deaths totals (i.e. summed up by age). Well, you could
> >> > create a separate data element for this, but it sure would be nice to
> >> > be able to Pivot the data somehow.
> >>
> >> In the Sierra Leone db, Edem and Romain set up views that pulled the
> >> categories through into a "Category" pivot field, which you can then use
> >> to get what you want. Simply tick the categories (see below) you want to
> >> see, and group them together in excel. Maybe Edem and Romain can help
> >> further here.
> >>
> >>
> >> > Dimension ? Category
> >> > Dimensional element ? Category option ? Category combo ( I think)
> >>
> >> The right symbol disappeared from my reply-mail here, but some
> >> clarification:
> >>
> >> Crosstab Dimension (age AND gender) = Category combo
> >> Dimension (age, gender) = Category
> >> Dimensional element (inpatient, outpatient, death, under1, 1-5, and over
> >> 5) = Category option
> >>
> >> So by assigning a DE the category combo of "gender_age", you get 9
> >> dimensional elements, 3 category options (in category age) by 3 category
> >> options (in category gender)
> >>
> >> Johan
> >>
> >>
> >>
> >>
> >> > Anyway, here is the helper table I created.
> >> >
> >> > CREATE TABLE test_dataelementcategorycombo
> >> > (
> >> >   test_dataelementid integer NOT NULL,
> >> >   test_dataelementcategoryid integer NOT NULL,
> >> >   test_dataelementcategorycomboid integer NOT NULL,
> >> >   CONSTRAINT pk_testdataelementcategory PRIMARY KEY
> >> > (test_dataelementid, test_dataelementcategoryid,
> >> > test_dataelementcategorycomboid)
> >> > )
> >> > WITH (OIDS=FALSE);
> >> >
> >> > So this is  a real simple table which references a data element, a
> >> > data element category, and a data element combo. The reference to a
> >> > data element category may be redundant, but anyway, lets leave it in
> >> > for now.
> >> >
> >> > I populated the table with some data, which will be used to assign
> >> > dimensions to data elements. It looks like this in my DB, which looks
> >> > like this.
> >> >
> >> > 309;25250;25251
> >> > 309;25257;25255
> >> > 348;25250;25252
> >> > 348;25257;25255
> >> > 455;25250;25253
> >> > 455;25257;25255
> >> >
> >> > but of course this is meaningless to you. What do these values
> >> > correspond
> >> > to?
> >> >
> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1
> Year";"Age";"Under
> >> > 1"
> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
> >> > Years";"Age";"Age 1-5"
> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Age";"Over
> >> > 5"
> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
> >> > status";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
> >> > Years";"Patient status";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
> >> > status";"OPD"
> >> >
> >> > which can be produced by the following view.
> >> >
> >> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
> >> >  SELECT dataelement.name, dataelementcategory.name AS dimension,
> >> > dataelementcategoryoption.name AS dimension_element
> >> >    FROM dataelement
> >> >    JOIN test_dataelementcategorycombo ON
> >> > test_dataelementcategorycombo.test_dataelementid =
> >> > dataelement.dataelementid
> >> >    JOIN dataelementcategory ON dataelementcategory.categoryid =
> >> > test_dataelementcategorycombo.test_dataelementcategoryid
> >> >    JOIN dataelementcategoryoption ON
> >> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
> >> > dataelementcategoryoption.categoryoptionid;
> >> >
> >> > So, that view just provides a human readable view of those integers
> >> > that I populated in the the test_dataelementcategorycombo table I
> >> > created above.  This table just assigns particular data elements to
> >> > different category options (dimensional elements).
> >> >
> >> > OK, so far so good, but the problem now is, how to use this with the
> >> > aggregatedatavalue table? If we try and join this table directly, we
> >> > will have issues with duplicates in the pivot table, so we need to
> >> > transform the data slightly.
> >> >
> >> > This should do the trick.
> >> >
> >> > SELECT * FROM crosstab
> >> >       (
> >> > 'SELECT name, dimension, dimension_element FROM
> >> > vw_dataelements_dimensions ORDER BY 1,2,3',
> >> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
> >> > ASC'
> >> >       )
> >> > as
> >> > (
> >> > name character varying(230),
> >> > age character varying(160),
> >> > status character varying(160)
> >> > );
> >> >
> >> >
> >> > which returns this record set
> >> >
> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
> >> > 1-5";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
> >> > 5";"OPD"
> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
> >> > 1";"OPD"
> >> >
> >> >
> >> > OK, admittedly, I cheated a bit and used the crosstab function of
> >> > Postgresql, but I assume that this query could be rewritten with a few
> >> > more lines of code in standard SQL or some procedural language like
> >> > Java. Now, this record set looks like something that I can almost use
> >> > with the aggregateddatavalue table simply by joining up the table on
> >> > the appropriate dataelementid and pulling everything into a pivot
> >> > table.  I would not have any duplicated values and would have columns
> >> > like data element name, period, orgunit, age, patient status and of
> >> > course the value of the data element.  I hope that part is pretty
> >> > clear. Just join up that table to the aggregateddata table, and you
> >> > have pretty much what is needed to pull the data directly into a
> >> > PivotTable for further analysis.
> >> >
> >> > This is not a complete example, but it is very close to what I need
> >> > here ,and I think this type of functionality would be much more useful
> >> > than the current data element categories functionality. Basically, all
> >> > that would be required, at least initially, would be another user
> >> > interface screen to allow the definition of which category(ies) and
> >> > category options a data element is a member of. The rest could ,in the
> >> > first instance be executed with custom SQL (obviously, I am partial to
> >> > this language and hobbled by the fact that I do not know Java), but
> >> > eventually this would need to be implemented somehow in Java.
> >> >
> >> >  I am not sure if this really solves all of the issues surrounding
> >> > multidimensional analysis of data elements, but it seems to solve the
> >> > issues that I am having by trying to assign some sort of dimensional
> >> > hierarchy to data elements (similar to the exclusive/compulsory
> >> > functionality of orgunits). Any thoughts on this?
> >> >
> >> > Best regards,
> >> > Jason
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
> >> > <jason.p.pickering@xxxxxxxxx> wrote:
> >> >>
> >> >>
> >> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
> >> >>>
> >> >>> >> However, there does seem to be the ability to assign dimensions,
> >> >>> there
> >> >>> >> does
> >> >>> >> not seem to be the ability to assign particular elements within
> >> >>> those
> >> >>> >> dimensions to a particular DHIS data element.
> >> >>>
> >> >>>
> >> >>> Just some more clarification here: you can make category combos
> which
> >> >>> you
> >> >>> assign to data elements. However, it is not possible to assign just
> >> >>> specific parts of a category combo (only some of the category
> options)
> >> >>> to
> >> >>> a data element.
> >> >>
> >> >> Yes, this was exactly what I wanted. Assigning different categories
> >> >> would
> >> >> seem to break the dimensionality.
> >> >>
> >> >>>
> >> >>> Then you must make a specific category (as the only one in
> >> >>> or part of a new category combo) with just those options. It can be
> >> >>> hell;
> >> >>> in Tajikistan there were way over 20 categories I think, at least 10
> >> >>> just
> >> >>> on various age groups.
> >> >>>
> >> >>> Johan
> >> >>>
> >> >>
> >> >> This was my fear.
> >> >>
> >> >> I will need to do some testing and see. I still fear it is not
> exactly
> >> >> the
> >> >> intended functionality.
> >> >>
> >> >> Basically, I think I need something akin to the exclusive/compulsory
> >> >> groups
> >> >> that are in place for organizational units, but instead, for
> arbitrary
> >> >> dimensions. I will give a try and see what happens.
> >> >>
> >> >> Thanks,
> >> >> Jason
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >> More help   : https://help.launchpad.net/ListHelp
> >
> >
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References