← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

I am in favor of making the terminology adhere more to the general terms in
use in the Data warehouse/OLAP/SDMX fields (though it must also be
understandable to the users, of course - and documented!)
Knut

On Fri, Sep 25, 2009 at 12:07 PM, Knut Staring <knutst@xxxxxxxxx> wrote:

>
> http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model
>
>
> On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw <abyota@xxxxxxxxx> wrote:
>
>> The one-to-one relationship mentioned between dataelement and
>> categorycombo is not correct !
>>
>> The realtionship is one-to-many. A categorycombo can be assigned for many
>> dataelements. But a dataelement can have only one categorycombo.
>>
>> Thank you
>> Abyot.
>>
>>
>> On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering <
>> jason.p.pickering@xxxxxxxxx> wrote:
>>
>>> Hi there.
>>>
>>> My basic issue with the category/category combo is that it appears to be
>>> a one-to-one relationship with data elements. If I look at the data model,
>>> there is a one-to-one relationship between dataelement and categorycomboid.
>>> For a given category combo, you can have multiple options. So, you can
>>> establish a relationship for a given data element and a group of category
>>> options.
>>>
>>> Let me try and describe the issue. We have a set of data elements related
>>> to malaria for this example. We would like to be able to pivot the data on
>>> other dimensions dimensions (Data element, age, disease, patient status).
>>> Obviously there are other dimensions that are pivotable (orgunit, period,
>>> dataset)
>>>
>>> The data elements look like this. I have put the dimensions in square
>>> brackets, and the dimensional elements into curly brackets.
>>>
>>> [Data element, Age, Disease, Patient status]
>>> Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
>>> Deaths}
>>> Deaths Confirmed Malaria 1 to Under 5 Years  {1-5, Malaria Cases, Deaths}
>>>
>>> Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
>>> Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
>>> IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
>>> Cases, IP}
>>> IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases,
>>> IP}
>>> IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
>>> Deaths}
>>> IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
>>> Deaths}
>>> OPD 1st Attendance Confirmed Malaria total (composed of) {All ages,
>>> Malaria Cases, OPD}
>>> OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
>>> Cases, OPD}
>>> OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
>>> OPD}
>>> OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria
>>> Cases, OPD}
>>>
>>> OK,  I hope this is pretty clear. Obviously, there are more data elements
>>> (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I
>>> have had for all diseases, or how many OPD cases I have had for each
>>> disease. How can I do this with the existing data model? It is not obvious
>>> to me because there is no relationship between dimensional elements
>>> (categoryoptions) to each other. Category options can be related through a
>>> cateogry combination, but since data elements can only be assigned a single
>>> category option, the dimensionality is broken once it gets time to pull the
>>> data into a pivot table.
>>>
>>>  In the incomplete example that I gave yesterday, I established a
>>> one-to-many relationship between a data element and a dimension. If I
>>> understand the current data model, I would have to create a separate
>>> categorycombo  for each of these data elements, and assign this
>>> categorycombo to the data element. Now, I might be able to unfold the
>>> dimensions using the categories and categorycombos. I it is not apparent how
>>> the dimensional elements correspond themselves to a particular dimension, as
>>> there is no relation for this in the database as I can see it.
>>>
>>>  As (Johan pointed out a few mails ago, if I understand him correctly)
>>> is  different categorycombo's can be created for individual data elements,
>>> and assigned to these elements. However, this seems to be 1) incredibly
>>> inefficient and 2) does not establish any relationship between dimensional
>>> elements and dimensions. Perhaps it is there, and maybe it has been done in
>>> SL, but the SQL is not apparent to me at all.
>>>
>>>  It would appear to me, looking from an SQL perspective, that a
>>> one-to-many relationship between a data element, a dimension (category) and
>>> dimensional element (category combo) would be much more effieicnet, and
>>> highly usable from an SQL perspective. As I mentioned in my mail, I am not
>>> sure how easy this would be to implement in a procedural language like Java,
>>> but I assume it should be possible to either do it this way, or rewrite my
>>> Postgres proprietary query in standard SQL (which there are ways to do with
>>> ANSI SQL).  This would require modification to the data model (similar to
>>> the table I provided yesterday) and modification to the UI to allow users to
>>> 1) select a dimension (category) 2) Select a dimensional element for the
>>> given dimension.  This would populate the table with a dataelementid, a
>>> dimensionid (categoryid) and a dimensional element (cateogryoptionid).
>>>
>>> My gut feeling this is exactly the same functionality as has currently
>>> been implemented for organizational units. Users can define a hierarchy for
>>> organizational units, and then assign them to
>>> categories/dimension/organizational group sets, decide whether the groups
>>> are compulsory and exlusive, and then assign a particular organizational
>>> unit to a particular group (which is analogous to a dimensional element).
>>> Organizational group sets define the dimension, and one-to-one assignment of
>>> an organizational unit to a particular organizational group defines which
>>> dimensional element the organizational unit is a member of. These dimensions
>>> can then be used in PivotTable analyses, where the orgunitgroupsets become
>>> dimensions, and orgunitgroups become dimensional elements..
>>>
>>> I beleive that data elements are no different than organizational units.
>>> They should be able to be grouped into some sort of hierarchy and pivoted on
>>> any dimension.  Data elements groups establish a one-to-many relationship
>>> between data elements and a data element group, but there is no concept of
>>> how data element groups relate to each other.
>>> I think this is perhaps the same concept you mention, ReportSet.
>>>
>>> I suspect we would need to potentially rethink the entire concept of
>>> multidimensionality if we really wanted to get it right. It would see to me
>>> that the DHIS datamodel and associated aggregation methods have been
>>> hardwired into aggregation across time (period) and geography (orgunit).
>>> What we can do with PivotTables and (and OLAP) is to aggregate across any
>>> possible dimension, slicing as you mention ,on any dimension . I am not sure
>>> this will be so simple to implement but I think there is a way to do it,
>>> without major modifications.
>>>
>>> I am not sure it solves the SDMX issue. There are potential issues
>>> related to "ragged" dimensions and how these get handled. Some data elements
>>> might have three dimensions, while others may have more. I have not thought
>>> about this in detail, but know it is an issue with cross-tab queries in SQL.
>>> You normally have to know how many dimensions you are working with in order
>>> to perform a cross-tab, but there are dynamic solutions. Perhaps this could
>>> be dealt with somehow in SDMX.
>>>
>>> Anyway ,I am rambling. Hope this mail helps though to push my point
>>> further. Once I get the SQL from SL, I will see if perhaps it has been done
>>> already, and that I am just writing long emails for nothing. :)
>>>
>>> Regards,
>>>  Jason
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>>> wrote:
>>> > Hi Jason and Johan
>>> >
>>> > I'm really pleased to see you having this discussion as I have been
>>> > grappling with a similar issue which involves unravelling categories,
>>> > category options and combos into something more familiar.  I have
>>> reached
>>> > similar conclusions regarding nomenclature:
>>> >
>>> > category = dimension
>>> > categorycombo - I have been calling a dimension set (it bears a strong,
>>> and
>>> > useful, resemblance to xslt:attribute-set)
>>> > category option - I like your suggestion of DimensionalElement.  I am
>>> going
>>> > to start calling it that too.
>>> >
>>> > In my case I need to export (and import data) into a standard format
>>> called
>>> > sdmx.  So whereas in the DHIS2 native DXF we export datavalues with
>>> > effectively three dimensions (source, period, categorycombooption) the
>>> last
>>> > dimension is a sort of uber-dimension.  Like a peppercorn or a cardamon
>>> > seed, when you break it open it explodes its rich complexity of
>>> dimensions.
>>> >
>>> > In sdmx we need the dimensions exploded.  So data values look like:
>>> >
>>> > <dataset>
>>> >   <datavalue name="TB test given" uid="44344 ...44" gender="Male"
>>> age="0-5"
>>> > value="32" />
>>> >    <datavalue name="TB test given" uid="44344 ...44" gender="Female"
>>> > age="0-5" value="38" />
>>> > ..
>>> > </dataset>
>>> >
>>> > My approach to unpicking the dimensions from the dxf file is to
>>> transform it
>>> > with an xslt transformation which is still incomplete but seems to work
>>> > well.
>>> >
>>> > One other nomenclatures issue which has surfaced as a result is what we
>>> call
>>> > a "dataset".  In DHIS2, if I understand correctly, a dataset
>>> corresponds
>>> > roughly to all the dataelements which might occur on a datacollection
>>> form.
>>> > If we view all dataelements as having just the three "dimensions" then
>>> all
>>> > is well, but if we explode the actual dimensions then we have an
>>> issue.  In
>>> > the sdmx model a dataset consists only of dataelements with the same
>>> > dimensionset.  After discussing this with Ola we have reached the
>>> conclusion
>>> > that we need another level of grouping, primarily for the UI - eg
>>> FormSet or
>>> > ReportSet which allows us to group related datasets.  But that is an
>>> aside
>>> > from what you are talking about.
>>> >
>>> > I know that you guys can do magic with sql, but it seems that we should
>>> try
>>> > to capture some of this and place it down in the datamodel API.  It
>>> occurs
>>> > to me that for a multidimensional dataelement we might benefit from
>>> some
>>> > utility methods to retrieve slices and dices which might assist in
>>> > constructing the pivot tables around dimensions.  Does this sound like
>>> the
>>> > right thing to do.
>>> >
>>> > Regards
>>> > Bob
>>> >
>>> > 2009/9/24 <johansa@xxxxxxxxxx>
>>> >>
>>> >> Jason,
>>> >> I will leave to others to comment the code, but I have a few
>>> comments...
>>> >>
>>> >> > I have done a bit more thinking on this, and would like to offer
>>> some
>>> >> > more examples up for discussion.
>>> >> >
>>> >> > Basically, we have a lot of data elements that are somehow related
>>> to
>>> >> > each other, similar to my kooky example in my original mail. I
>>> assume
>>> >> > this is fairly common throughout other HMIS systems. Here,  malaria
>>> >> > attendance is broken down into various dimensions/category by
>>> patient
>>> >> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5
>>> and
>>> >> > over 5). But say you want to be able to pivot to look at outpatient,
>>> >> > inpatient and deaths totals (i.e. summed up by age). Well, you could
>>> >> > create a separate data element for this, but it sure would be nice
>>> to
>>> >> > be able to Pivot the data somehow.
>>> >>
>>> >> In the Sierra Leone db, Edem and Romain set up views that pulled the
>>> >> categories through into a "Category" pivot field, which you can then
>>> use
>>> >> to get what you want. Simply tick the categories (see below) you want
>>> to
>>> >> see, and group them together in excel. Maybe Edem and Romain can help
>>> >> further here.
>>> >>
>>> >>
>>> >> > Dimension ? Category
>>> >> > Dimensional element ? Category option ? Category combo ( I think)
>>> >>
>>> >> The right symbol disappeared from my reply-mail here, but some
>>> >> clarification:
>>> >>
>>> >> Crosstab Dimension (age AND gender) = Category combo
>>> >> Dimension (age, gender) = Category
>>> >> Dimensional element (inpatient, outpatient, death, under1, 1-5, and
>>> over
>>> >> 5) = Category option
>>> >>
>>> >> So by assigning a DE the category combo of "gender_age", you get 9
>>> >> dimensional elements, 3 category options (in category age) by 3
>>> category
>>> >> options (in category gender)
>>> >>
>>> >> Johan
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> > Anyway, here is the helper table I created.
>>> >> >
>>> >> > CREATE TABLE test_dataelementcategorycombo
>>> >> > (
>>> >> >   test_dataelementid integer NOT NULL,
>>> >> >   test_dataelementcategoryid integer NOT NULL,
>>> >> >   test_dataelementcategorycomboid integer NOT NULL,
>>> >> >   CONSTRAINT pk_testdataelementcategory PRIMARY KEY
>>> >> > (test_dataelementid, test_dataelementcategoryid,
>>> >> > test_dataelementcategorycomboid)
>>> >> > )
>>> >> > WITH (OIDS=FALSE);
>>> >> >
>>> >> > So this is  a real simple table which references a data element, a
>>> >> > data element category, and a data element combo. The reference to a
>>> >> > data element category may be redundant, but anyway, lets leave it in
>>> >> > for now.
>>> >> >
>>> >> > I populated the table with some data, which will be used to assign
>>> >> > dimensions to data elements. It looks like this in my DB, which
>>> looks
>>> >> > like this.
>>> >> >
>>> >> > 309;25250;25251
>>> >> > 309;25257;25255
>>> >> > 348;25250;25252
>>> >> > 348;25257;25255
>>> >> > 455;25250;25253
>>> >> > 455;25257;25255
>>> >> >
>>> >> > but of course this is meaningless to you. What do these values
>>> >> > correspond
>>> >> > to?
>>> >> >
>>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1
>>> Year";"Age";"Under
>>> >> > 1"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>>> >> > Years";"Age";"Age 1-5"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5
>>> Years";"Age";"Over
>>> >> > 5"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
>>> >> > status";"OPD"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>>> >> > Years";"Patient status";"OPD"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
>>> >> > status";"OPD"
>>> >> >
>>> >> > which can be produced by the following view.
>>> >> >
>>> >> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
>>> >> >  SELECT dataelement.name, dataelementcategory.name AS dimension,
>>> >> > dataelementcategoryoption.name AS dimension_element
>>> >> >    FROM dataelement
>>> >> >    JOIN test_dataelementcategorycombo ON
>>> >> > test_dataelementcategorycombo.test_dataelementid =
>>> >> > dataelement.dataelementid
>>> >> >    JOIN dataelementcategory ON dataelementcategory.categoryid =
>>> >> > test_dataelementcategorycombo.test_dataelementcategoryid
>>> >> >    JOIN dataelementcategoryoption ON
>>> >> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
>>> >> > dataelementcategoryoption.categoryoptionid;
>>> >> >
>>> >> > So, that view just provides a human readable view of those integers
>>> >> > that I populated in the the test_dataelementcategorycombo table I
>>> >> > created above.  This table just assigns particular data elements to
>>> >> > different category options (dimensional elements).
>>> >> >
>>> >> > OK, so far so good, but the problem now is, how to use this with the
>>> >> > aggregatedatavalue table? If we try and join this table directly, we
>>> >> > will have issues with duplicates in the pivot table, so we need to
>>> >> > transform the data slightly.
>>> >> >
>>> >> > This should do the trick.
>>> >> >
>>> >> > SELECT * FROM crosstab
>>> >> >       (
>>> >> > 'SELECT name, dimension, dimension_element FROM
>>> >> > vw_dataelements_dimensions ORDER BY 1,2,3',
>>> >> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY
>>> 1
>>> >> > ASC'
>>> >> >       )
>>> >> > as
>>> >> > (
>>> >> > name character varying(230),
>>> >> > age character varying(160),
>>> >> > status character varying(160)
>>> >> > );
>>> >> >
>>> >> >
>>> >> > which returns this record set
>>> >> >
>>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>>> Years";"Age
>>> >> > 1-5";"OPD"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
>>> >> > 5";"OPD"
>>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
>>> >> > 1";"OPD"
>>> >> >
>>> >> >
>>> >> > OK, admittedly, I cheated a bit and used the crosstab function of
>>> >> > Postgresql, but I assume that this query could be rewritten with a
>>> few
>>> >> > more lines of code in standard SQL or some procedural language like
>>> >> > Java. Now, this record set looks like something that I can almost
>>> use
>>> >> > with the aggregateddatavalue table simply by joining up the table on
>>> >> > the appropriate dataelementid and pulling everything into a pivot
>>> >> > table.  I would not have any duplicated values and would have
>>> columns
>>> >> > like data element name, period, orgunit, age, patient status and of
>>> >> > course the value of the data element.  I hope that part is pretty
>>> >> > clear. Just join up that table to the aggregateddata table, and you
>>> >> > have pretty much what is needed to pull the data directly into a
>>> >> > PivotTable for further analysis.
>>> >> >
>>> >> > This is not a complete example, but it is very close to what I need
>>> >> > here ,and I think this type of functionality would be much more
>>> useful
>>> >> > than the current data element categories functionality. Basically,
>>> all
>>> >> > that would be required, at least initially, would be another user
>>> >> > interface screen to allow the definition of which category(ies) and
>>> >> > category options a data element is a member of. The rest could ,in
>>> the
>>> >> > first instance be executed with custom SQL (obviously, I am partial
>>> to
>>> >> > this language and hobbled by the fact that I do not know Java), but
>>> >> > eventually this would need to be implemented somehow in Java.
>>> >> >
>>> >> >  I am not sure if this really solves all of the issues surrounding
>>> >> > multidimensional analysis of data elements, but it seems to solve
>>> the
>>> >> > issues that I am having by trying to assign some sort of dimensional
>>> >> > hierarchy to data elements (similar to the exclusive/compulsory
>>> >> > functionality of orgunits). Any thoughts on this?
>>> >> >
>>> >> > Best regards,
>>> >> > Jason
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
>>> >> > <jason.p.pickering@xxxxxxxxx> wrote:
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
>>> >> >>>
>>> >> >>> >> However, there does seem to be the ability to assign
>>> dimensions,
>>> >> >>> there
>>> >> >>> >> does
>>> >> >>> >> not seem to be the ability to assign particular elements within
>>> >> >>> those
>>> >> >>> >> dimensions to a particular DHIS data element.
>>> >> >>>
>>> >> >>>
>>> >> >>> Just some more clarification here: you can make category combos
>>> which
>>> >> >>> you
>>> >> >>> assign to data elements. However, it is not possible to assign
>>> just
>>> >> >>> specific parts of a category combo (only some of the category
>>> options)
>>> >> >>> to
>>> >> >>> a data element.
>>> >> >>
>>> >> >> Yes, this was exactly what I wanted. Assigning different categories
>>> >> >> would
>>> >> >> seem to break the dimensionality.
>>> >> >>
>>> >> >>>
>>> >> >>> Then you must make a specific category (as the only one in
>>> >> >>> or part of a new category combo) with just those options. It can
>>> be
>>> >> >>> hell;
>>> >> >>> in Tajikistan there were way over 20 categories I think, at least
>>> 10
>>> >> >>> just
>>> >> >>> on various age groups.
>>> >> >>>
>>> >> >>> Johan
>>> >> >>>
>>> >> >>
>>> >> >> This was my fear.
>>> >> >>
>>> >> >> I will need to do some testing and see. I still fear it is not
>>> exactly
>>> >> >> the
>>> >> >> intended functionality.
>>> >> >>
>>> >> >> Basically, I think I need something akin to the
>>> exclusive/compulsory
>>> >> >> groups
>>> >> >> that are in place for organizational units, but instead, for
>>> arbitrary
>>> >> >> dimensions. I will give a try and see what happens.
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Jason
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>> >> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> >> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>> >> More help   : https://help.launchpad.net/ListHelp
>>> >
>>> >
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Cheers,
> Knut Staring
>



-- 
Cheers,
Knut Staring

References