dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #02271
Re: On categories and dimensions and zooks
http://208.76.222.114/confluence/display/RandD/General+multi-dimensional+model
On Fri, Sep 25, 2009 at 11:55 AM, Abyot Gizaw <abyota@xxxxxxxxx> wrote:
> The one-to-one relationship mentioned between dataelement and categorycombo
> is not correct !
>
> The realtionship is one-to-many. A categorycombo can be assigned for many
> dataelements. But a dataelement can have only one categorycombo.
>
> Thank you
> Abyot.
>
>
> On Fri, Sep 25, 2009 at 11:44 AM, Jason Pickering <
> jason.p.pickering@xxxxxxxxx> wrote:
>
>> Hi there.
>>
>> My basic issue with the category/category combo is that it appears to be a
>> one-to-one relationship with data elements. If I look at the data model,
>> there is a one-to-one relationship between dataelement and categorycomboid.
>> For a given category combo, you can have multiple options. So, you can
>> establish a relationship for a given data element and a group of category
>> options.
>>
>> Let me try and describe the issue. We have a set of data elements related
>> to malaria for this example. We would like to be able to pivot the data on
>> other dimensions dimensions (Data element, age, disease, patient status).
>> Obviously there are other dimensions that are pivotable (orgunit, period,
>> dataset)
>>
>> The data elements look like this. I have put the dimensions in square
>> brackets, and the dimensional elements into curly brackets.
>>
>> [Data element, Age, Disease, Patient status]
>> Deaths Confirmed Malaria total (composed of) {All ages, Malaria Cases,
>> Deaths}
>> Deaths Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, Deaths}
>> Deaths Confirmed Malaria Over 5 Years {Over 5, Malaria Cases, Deaths}
>> Deaths Confirmed Malaria Under 1 Year {Under 1, Malaria Cases, Deaths}
>> IP Discharge Confirmed Malaria total (composed of) {All ages, Malaria
>> Cases, IP}
>> IP Discharge Confirmed Malaria 1 to Under 5 Years {1-5, Malaria Cases, IP}
>> IP Discharge Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
>> Deaths}
>> IP Discharge Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
>> Deaths}
>> OPD 1st Attendance Confirmed Malaria total (composed of) {All ages,
>> Malaria Cases, OPD}
>> OPD 1st Attendance Confirmed Malaria 1 to Under 5 Years {1-5, Malaria
>> Cases, OPD}
>> OPD 1st Attendance Confirmed Malaria Over 5 Years {Over 5, Malaria Cases,
>> OPD}
>> OPD 1st Attendance Confirmed Malaria Under 1 Year {Under 1, Malaria Cases,
>> OPD}
>>
>> OK, I hope this is pretty clear. Obviously, there are more data elements
>> (Typhoid, Yellow fever, etc). I might want to know how many Under 1 deaths I
>> have had for all diseases, or how many OPD cases I have had for each
>> disease. How can I do this with the existing data model? It is not obvious
>> to me because there is no relationship between dimensional elements
>> (categoryoptions) to each other. Category options can be related through a
>> cateogry combination, but since data elements can only be assigned a single
>> category option, the dimensionality is broken once it gets time to pull the
>> data into a pivot table.
>>
>> In the incomplete example that I gave yesterday, I established a
>> one-to-many relationship between a data element and a dimension. If I
>> understand the current data model, I would have to create a separate
>> categorycombo for each of these data elements, and assign this
>> categorycombo to the data element. Now, I might be able to unfold the
>> dimensions using the categories and categorycombos. I it is not apparent how
>> the dimensional elements correspond themselves to a particular dimension, as
>> there is no relation for this in the database as I can see it.
>>
>> As (Johan pointed out a few mails ago, if I understand him correctly) is
>> different categorycombo's can be created for individual data elements, and
>> assigned to these elements. However, this seems to be 1) incredibly
>> inefficient and 2) does not establish any relationship between dimensional
>> elements and dimensions. Perhaps it is there, and maybe it has been done in
>> SL, but the SQL is not apparent to me at all.
>>
>> It would appear to me, looking from an SQL perspective, that a
>> one-to-many relationship between a data element, a dimension (category) and
>> dimensional element (category combo) would be much more effieicnet, and
>> highly usable from an SQL perspective. As I mentioned in my mail, I am not
>> sure how easy this would be to implement in a procedural language like Java,
>> but I assume it should be possible to either do it this way, or rewrite my
>> Postgres proprietary query in standard SQL (which there are ways to do with
>> ANSI SQL). This would require modification to the data model (similar to
>> the table I provided yesterday) and modification to the UI to allow users to
>> 1) select a dimension (category) 2) Select a dimensional element for the
>> given dimension. This would populate the table with a dataelementid, a
>> dimensionid (categoryid) and a dimensional element (cateogryoptionid).
>>
>> My gut feeling this is exactly the same functionality as has currently
>> been implemented for organizational units. Users can define a hierarchy for
>> organizational units, and then assign them to
>> categories/dimension/organizational group sets, decide whether the groups
>> are compulsory and exlusive, and then assign a particular organizational
>> unit to a particular group (which is analogous to a dimensional element).
>> Organizational group sets define the dimension, and one-to-one assignment of
>> an organizational unit to a particular organizational group defines which
>> dimensional element the organizational unit is a member of. These dimensions
>> can then be used in PivotTable analyses, where the orgunitgroupsets become
>> dimensions, and orgunitgroups become dimensional elements..
>>
>> I beleive that data elements are no different than organizational units.
>> They should be able to be grouped into some sort of hierarchy and pivoted on
>> any dimension. Data elements groups establish a one-to-many relationship
>> between data elements and a data element group, but there is no concept of
>> how data element groups relate to each other.
>> I think this is perhaps the same concept you mention, ReportSet.
>>
>> I suspect we would need to potentially rethink the entire concept of
>> multidimensionality if we really wanted to get it right. It would see to me
>> that the DHIS datamodel and associated aggregation methods have been
>> hardwired into aggregation across time (period) and geography (orgunit).
>> What we can do with PivotTables and (and OLAP) is to aggregate across any
>> possible dimension, slicing as you mention ,on any dimension . I am not sure
>> this will be so simple to implement but I think there is a way to do it,
>> without major modifications.
>>
>> I am not sure it solves the SDMX issue. There are potential issues related
>> to "ragged" dimensions and how these get handled. Some data elements might
>> have three dimensions, while others may have more. I have not thought about
>> this in detail, but know it is an issue with cross-tab queries in SQL. You
>> normally have to know how many dimensions you are working with in order to
>> perform a cross-tab, but there are dynamic solutions. Perhaps this could be
>> dealt with somehow in SDMX.
>>
>> Anyway ,I am rambling. Hope this mail helps though to push my point
>> further. Once I get the SQL from SL, I will see if perhaps it has been done
>> already, and that I am just writing long emails for nothing. :)
>>
>> Regards,
>> Jason
>>
>>
>>
>>
>>
>> On Fri, Sep 25, 2009 at 10:44 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
>> wrote:
>> > Hi Jason and Johan
>> >
>> > I'm really pleased to see you having this discussion as I have been
>> > grappling with a similar issue which involves unravelling categories,
>> > category options and combos into something more familiar. I have
>> reached
>> > similar conclusions regarding nomenclature:
>> >
>> > category = dimension
>> > categorycombo - I have been calling a dimension set (it bears a strong,
>> and
>> > useful, resemblance to xslt:attribute-set)
>> > category option - I like your suggestion of DimensionalElement. I am
>> going
>> > to start calling it that too.
>> >
>> > In my case I need to export (and import data) into a standard format
>> called
>> > sdmx. So whereas in the DHIS2 native DXF we export datavalues with
>> > effectively three dimensions (source, period, categorycombooption) the
>> last
>> > dimension is a sort of uber-dimension. Like a peppercorn or a cardamon
>> > seed, when you break it open it explodes its rich complexity of
>> dimensions.
>> >
>> > In sdmx we need the dimensions exploded. So data values look like:
>> >
>> > <dataset>
>> > <datavalue name="TB test given" uid="44344 ...44" gender="Male"
>> age="0-5"
>> > value="32" />
>> > <datavalue name="TB test given" uid="44344 ...44" gender="Female"
>> > age="0-5" value="38" />
>> > ..
>> > </dataset>
>> >
>> > My approach to unpicking the dimensions from the dxf file is to
>> transform it
>> > with an xslt transformation which is still incomplete but seems to work
>> > well.
>> >
>> > One other nomenclatures issue which has surfaced as a result is what we
>> call
>> > a "dataset". In DHIS2, if I understand correctly, a dataset corresponds
>> > roughly to all the dataelements which might occur on a datacollection
>> form.
>> > If we view all dataelements as having just the three "dimensions" then
>> all
>> > is well, but if we explode the actual dimensions then we have an issue.
>> In
>> > the sdmx model a dataset consists only of dataelements with the same
>> > dimensionset. After discussing this with Ola we have reached the
>> conclusion
>> > that we need another level of grouping, primarily for the UI - eg
>> FormSet or
>> > ReportSet which allows us to group related datasets. But that is an
>> aside
>> > from what you are talking about.
>> >
>> > I know that you guys can do magic with sql, but it seems that we should
>> try
>> > to capture some of this and place it down in the datamodel API. It
>> occurs
>> > to me that for a multidimensional dataelement we might benefit from some
>> > utility methods to retrieve slices and dices which might assist in
>> > constructing the pivot tables around dimensions. Does this sound like
>> the
>> > right thing to do.
>> >
>> > Regards
>> > Bob
>> >
>> > 2009/9/24 <johansa@xxxxxxxxxx>
>> >>
>> >> Jason,
>> >> I will leave to others to comment the code, but I have a few
>> comments...
>> >>
>> >> > I have done a bit more thinking on this, and would like to offer some
>> >> > more examples up for discussion.
>> >> >
>> >> > Basically, we have a lot of data elements that are somehow related to
>> >> > each other, similar to my kooky example in my original mail. I assume
>> >> > this is fairly common throughout other HMIS systems. Here, malaria
>> >> > attendance is broken down into various dimensions/category by patient
>> >> > type (outpatient, inpatient, and deaths) and by age (under 1 ,1-5 and
>> >> > over 5). But say you want to be able to pivot to look at outpatient,
>> >> > inpatient and deaths totals (i.e. summed up by age). Well, you could
>> >> > create a separate data element for this, but it sure would be nice to
>> >> > be able to Pivot the data somehow.
>> >>
>> >> In the Sierra Leone db, Edem and Romain set up views that pulled the
>> >> categories through into a "Category" pivot field, which you can then
>> use
>> >> to get what you want. Simply tick the categories (see below) you want
>> to
>> >> see, and group them together in excel. Maybe Edem and Romain can help
>> >> further here.
>> >>
>> >>
>> >> > Dimension ? Category
>> >> > Dimensional element ? Category option ? Category combo ( I think)
>> >>
>> >> The right symbol disappeared from my reply-mail here, but some
>> >> clarification:
>> >>
>> >> Crosstab Dimension (age AND gender) = Category combo
>> >> Dimension (age, gender) = Category
>> >> Dimensional element (inpatient, outpatient, death, under1, 1-5, and
>> over
>> >> 5) = Category option
>> >>
>> >> So by assigning a DE the category combo of "gender_age", you get 9
>> >> dimensional elements, 3 category options (in category age) by 3
>> category
>> >> options (in category gender)
>> >>
>> >> Johan
>> >>
>> >>
>> >>
>> >>
>> >> > Anyway, here is the helper table I created.
>> >> >
>> >> > CREATE TABLE test_dataelementcategorycombo
>> >> > (
>> >> > test_dataelementid integer NOT NULL,
>> >> > test_dataelementcategoryid integer NOT NULL,
>> >> > test_dataelementcategorycomboid integer NOT NULL,
>> >> > CONSTRAINT pk_testdataelementcategory PRIMARY KEY
>> >> > (test_dataelementid, test_dataelementcategoryid,
>> >> > test_dataelementcategorycomboid)
>> >> > )
>> >> > WITH (OIDS=FALSE);
>> >> >
>> >> > So this is a real simple table which references a data element, a
>> >> > data element category, and a data element combo. The reference to a
>> >> > data element category may be redundant, but anyway, lets leave it in
>> >> > for now.
>> >> >
>> >> > I populated the table with some data, which will be used to assign
>> >> > dimensions to data elements. It looks like this in my DB, which looks
>> >> > like this.
>> >> >
>> >> > 309;25250;25251
>> >> > 309;25257;25255
>> >> > 348;25250;25252
>> >> > 348;25257;25255
>> >> > 455;25250;25253
>> >> > 455;25257;25255
>> >> >
>> >> > but of course this is meaningless to you. What do these values
>> >> > correspond
>> >> > to?
>> >> >
>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1
>> Year";"Age";"Under
>> >> > 1"
>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> >> > Years";"Age";"Age 1-5"
>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5
>> Years";"Age";"Over
>> >> > 5"
>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Patient
>> >> > status";"OPD"
>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5
>> >> > Years";"Patient status";"OPD"
>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Patient
>> >> > status";"OPD"
>> >> >
>> >> > which can be produced by the following view.
>> >> >
>> >> > CREATE OR REPLACE VIEW vw_dataelements_dimensions AS
>> >> > SELECT dataelement.name, dataelementcategory.name AS dimension,
>> >> > dataelementcategoryoption.name AS dimension_element
>> >> > FROM dataelement
>> >> > JOIN test_dataelementcategorycombo ON
>> >> > test_dataelementcategorycombo.test_dataelementid =
>> >> > dataelement.dataelementid
>> >> > JOIN dataelementcategory ON dataelementcategory.categoryid =
>> >> > test_dataelementcategorycombo.test_dataelementcategoryid
>> >> > JOIN dataelementcategoryoption ON
>> >> > test_dataelementcategorycombo.test_dataelementcategorycomboid =
>> >> > dataelementcategoryoption.categoryoptionid;
>> >> >
>> >> > So, that view just provides a human readable view of those integers
>> >> > that I populated in the the test_dataelementcategorycombo table I
>> >> > created above. This table just assigns particular data elements to
>> >> > different category options (dimensional elements).
>> >> >
>> >> > OK, so far so good, but the problem now is, how to use this with the
>> >> > aggregatedatavalue table? If we try and join this table directly, we
>> >> > will have issues with duplicates in the pivot table, so we need to
>> >> > transform the data slightly.
>> >> >
>> >> > This should do the trick.
>> >> >
>> >> > SELECT * FROM crosstab
>> >> > (
>> >> > 'SELECT name, dimension, dimension_element FROM
>> >> > vw_dataelements_dimensions ORDER BY 1,2,3',
>> >> > 'SELECT DISTINCT dimension from vw_dataelements_dimensions ORDER BY 1
>> >> > ASC'
>> >> > )
>> >> > as
>> >> > (
>> >> > name character varying(230),
>> >> > age character varying(160),
>> >> > status character varying(160)
>> >> > );
>> >> >
>> >> >
>> >> > which returns this record set
>> >> >
>> >> > "OPD 1st Attendance Clinical Case of Malaria 1 to Under 5 Years";"Age
>> >> > 1-5";"OPD"
>> >> > "OPD 1st Attendance Clinical Case of Malaria Over 5 Years";"Over
>> >> > 5";"OPD"
>> >> > "OPD 1st Attendance Clinical Case of Malaria Under 1 Year";"Under
>> >> > 1";"OPD"
>> >> >
>> >> >
>> >> > OK, admittedly, I cheated a bit and used the crosstab function of
>> >> > Postgresql, but I assume that this query could be rewritten with a
>> few
>> >> > more lines of code in standard SQL or some procedural language like
>> >> > Java. Now, this record set looks like something that I can almost use
>> >> > with the aggregateddatavalue table simply by joining up the table on
>> >> > the appropriate dataelementid and pulling everything into a pivot
>> >> > table. I would not have any duplicated values and would have columns
>> >> > like data element name, period, orgunit, age, patient status and of
>> >> > course the value of the data element. I hope that part is pretty
>> >> > clear. Just join up that table to the aggregateddata table, and you
>> >> > have pretty much what is needed to pull the data directly into a
>> >> > PivotTable for further analysis.
>> >> >
>> >> > This is not a complete example, but it is very close to what I need
>> >> > here ,and I think this type of functionality would be much more
>> useful
>> >> > than the current data element categories functionality. Basically,
>> all
>> >> > that would be required, at least initially, would be another user
>> >> > interface screen to allow the definition of which category(ies) and
>> >> > category options a data element is a member of. The rest could ,in
>> the
>> >> > first instance be executed with custom SQL (obviously, I am partial
>> to
>> >> > this language and hobbled by the fact that I do not know Java), but
>> >> > eventually this would need to be implemented somehow in Java.
>> >> >
>> >> > I am not sure if this really solves all of the issues surrounding
>> >> > multidimensional analysis of data elements, but it seems to solve the
>> >> > issues that I am having by trying to assign some sort of dimensional
>> >> > hierarchy to data elements (similar to the exclusive/compulsory
>> >> > functionality of orgunits). Any thoughts on this?
>> >> >
>> >> > Best regards,
>> >> > Jason
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Sep 16, 2009 at 10:28 PM, Jason Pickering
>> >> > <jason.p.pickering@xxxxxxxxx> wrote:
>> >> >>
>> >> >>
>> >> >> On Wed, Sep 16, 2009 at 10:13 PM, <johansa@xxxxxxxxxx> wrote:
>> >> >>>
>> >> >>> >> However, there does seem to be the ability to assign dimensions,
>> >> >>> there
>> >> >>> >> does
>> >> >>> >> not seem to be the ability to assign particular elements within
>> >> >>> those
>> >> >>> >> dimensions to a particular DHIS data element.
>> >> >>>
>> >> >>>
>> >> >>> Just some more clarification here: you can make category combos
>> which
>> >> >>> you
>> >> >>> assign to data elements. However, it is not possible to assign just
>> >> >>> specific parts of a category combo (only some of the category
>> options)
>> >> >>> to
>> >> >>> a data element.
>> >> >>
>> >> >> Yes, this was exactly what I wanted. Assigning different categories
>> >> >> would
>> >> >> seem to break the dimensionality.
>> >> >>
>> >> >>>
>> >> >>> Then you must make a specific category (as the only one in
>> >> >>> or part of a new category combo) with just those options. It can be
>> >> >>> hell;
>> >> >>> in Tajikistan there were way over 20 categories I think, at least
>> 10
>> >> >>> just
>> >> >>> on various age groups.
>> >> >>>
>> >> >>> Johan
>> >> >>>
>> >> >>
>> >> >> This was my fear.
>> >> >>
>> >> >> I will need to do some testing and see. I still fear it is not
>> exactly
>> >> >> the
>> >> >> intended functionality.
>> >> >>
>> >> >> Basically, I think I need something akin to the exclusive/compulsory
>> >> >> groups
>> >> >> that are in place for organizational units, but instead, for
>> arbitrary
>> >> >> dimensions. I will give a try and see what happens.
>> >> >>
>> >> >> Thanks,
>> >> >> Jason
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> >> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> >> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> >> More help : https://help.launchpad.net/ListHelp
>> >
>> >
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help : https://help.launchpad.net/ListHelp
>
>
--
Cheers,
Knut Staring
Follow ups
References