← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

OK.  Here's my first attempt to rationalize things.  Please excuse the
attachments.  I try not to send attachments to mailing lists but these are
at least fairly small.  (And Lars I will write it up in docbook after
fishing for feedback).

My primary aim has been to disturb the existing model as little as possible
whilst trying to simplify wherever possible.

Attached oldmodel.png shows the participants in the existing model.  As you
can see there are 11 tables in all.  I haven't showed the relations as it
becomes a bit of a web.

Also attached is a proposed amended database model which bears sufficient
similarity to the old that migration between the two should be feasible.
But it is down to 6 tables.  And I have named the tables according to the
terms we have been discussing.  Of course this is just the database model.
I've also put together an XML view of what some sample dataset might look
like.  There is also a UML model required which would be richer than the
underlying datamodel, but one step at a time ....

Walking through:

1.  DataElements can have Dimensions.  And different dataElements can (and
hopefully will) share some of the same Dimensions.  So there is a m-to-n
relationship between the two necessitating an extra table
(DataElementDimensions).  An example of a Dimension is SEX.  Nothing new
here.

2.  Dimensions have DimensionElements.  So SEX for example might have
DimensionElements "Male", "Female", "Unknown".  A big difference from the
old model is that there is 1-n relationship between DimensionElements and
Dimensions.  A Dimension has many DimensionElements.  But a DimensionElement
is a a member of only one Dimension.

3.  DataValues represent the values at intersection of these Dimensions.
Keeping with the spirit of the old model this intersection is represented by
a single key, DimensionElementCombination.  The DimensionElementCombinations
would be populated when a new Dimension is added to a DataElement.  Like the
original model there is some fragility here.  Changing dimensions on
dataelements could create a situation where datavalues become orphaned or
misdirected.  The API must have robust methods for defending this integrity
particulalrly when updating the structural metadata.  But this is perhaps
doable.  Either way its not worse than we have.

I haven't given a name to DimensionElementCombinations.  From the examples I
have seen from SL this seems to be unnecessary.  The names I have seen being
used are generally simply contrived from the dimensions or (worse still)
from the categoryoptions.  What is important is that dataelements can have
sets of dimensions.

And then much of what is different is just a renaming of the original
entities.    From the attached XML file I think you can see some of the
issues faced re names and identifiers.  I find myself following a sort of
convention of CODE, Name, Description and UUID.  CODE's must be unique
within the scope of the database.  I suppose this is close to what we
currently call ShortName.  I would like to place constraints on CODES in
terms of length and also the disallowing of spaces and other funny
characters.  The reason being that we may well have to use these codes in
making up uri's.  So CODES must be unique.  For the moment we could keep
name unique but should migrate from it.  Its a matter of rewriting all our
comparators I guess.  UUIDs I am told are unique through some sort of
divinity so we apparently do not need to worry about them :-)

I've also tried to reduce the number of knees on the donkey - from 11 tables
to 6.  I believe this can be done whilst preserving the existing
functionality.  This arangement would make it much more sensible to produce
the XML I need to produce.  I'm hoping that it would also be more friendly
to those who would be trying to pivot the data across dimensions.

Jason do you think this works for you?  I might have missed out something
really fundamental.  Abyot, you've been through this process before - am I
missing something?  From the DataValue you can see DimensionElements.  And
once you know a DimensionElement you also know the Dimension to which it
belongs.  I think thats queryable.  Will have to hydrate with some data and
see.

Shaking the multidimensional model up like this would obviously have
implications.  But I suspect most of it is taking stuff away rather than
adding new so it might just be doable.  Less is more.

Not spending time with docbook yet, till I get some feedback.

Cheers
Bob

2009/9/29 Bob Jolliffe <bobjolliffe@xxxxxxxxx>

> Hi
>
> On the back of Jason and others comments, I've reached the conclusion that
> we cannot really live with the MD model the way it is.  Whereas I think it
> is (just about) workable there are some serious optimizations we can and
> should do.  I am going to put my other work back a day or two and propose
> some changes in a branch.
>
> I think central to the inefficiency is the many-many relation between
> categories and categoryoptions.  This strikes me as illogical as well as
> being cumbersome in the UI.  Do we really want to be able to make categories
> with options like {'0<5','6-10','Male','Out of stock','35-40'}.  Reducing
> the relation between categories and category options to 1-n cuts two tables,
> should make sql queries more efficient and grokkable and also matches other
> models such as sdmx better.
>
> The other possiible inefficiency is the dimensionset.  It can be useful in
> some contexts but I'm guessing that when querying the data (which we want to
> be fast) it is not relevant.  A dataelement can have dimensions.  The fact
> that some dataelements have the same combinations of dimensions is very
> useful to know for some purposes, but it should be possible to get from the
> dataelement to the dimension directly.
>
> On the other side of the road is the hierarchical dimensionality idea I see
> Ola and Jason have been discussing, where dimensions are composed (perhaps
> post-facto) of uni-dimensional dataelements rather than decomposed into
> pre-structured dimensional elements.  I suspect that:
> 1.  we need both; and
> 2.  from the API, user and reporting perspective they should look the same
> (ie a dataelement can have dimensions - how they come about should not be a
> concern at the end point).
>
> I'll try out some of these ideas and point you to the branch.
>
> Regards
> Bob
>
> 2009/9/29 Lars Helge Øverland <larshelge@xxxxxxxxx>
>
>>
>>
>>> Thanks for the explanations Jason. The multidimensional model is quite
>>> complicated, is poorly documented, and as you say is DHIS-centric in the way
>>> that it is built around the DHIS notion of a Data Element.
>>>
>>>
>> Could we assemble and put some of the text being written on the list to
>> docbook?
>>
>>
>>> That said, and I think Jason already has made a strong case for this,
>>> also in a 100% DHIS2 scenario you will need more flexibility in defining
>>> dimensions to your data than what categories can provide. Being able to
>>> define data dimensions independent of data collection is powerful and should
>>> be supported in a better way than what data element groups provide today.
>>> Given that we already have the orgunit group set code in place I would
>>> assume that adding group sets to data elements could be a relatively
>>> straight forward thing to do (but then again, I am not the programmer...).
>>>
>>
>> I don't see any implications in adding this to the system, it won't
>> require changes to the existing model as the association goes from the
>> groupset to the groups. We can prioritize this for the 2.0.3 release.
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>

Attachment: newmodel.png
Description: PNG image

Attachment: oldmodel.png
Description: PNG image

<?xml version="1.0" encoding="UTF-8"?>

<dxf xmlns="http://dhis2.org/dxf/version2.0";>
	<metadata>
		<dimensions>
			<dimension code="AGE_1" name="Age" uuid="543534646456">
				<dimensionelement code="0">0-5</dimensionelement>
				<dimensionelement code="1">6-20</dimensionelement>
				<dimensionelement code="2">over 20</dimensionelement>
			</dimension>

			<dimension code="SEX" name="Sex" uuid="543534646445">
				<dimensionelement code="0">Male</dimensionelement>
				<dimensionelement code="1">Female</dimensionelement>
				<dimensionelement code="2">Unknown</dimensionelement>
			</dimension>
		
			<combinations>
				<combination id="1" AGE_1="0" SEX="0" />
				<combination id="2" AGE_1="0" SEX="1" />
				<combination id="3" AGE_1="0" SEX="2" />
				<combination id="4" AGE_1="1" SEX="0" />
				<combination id="5" AGE_1="1" SEX="1" />
				<combination id="6" AGE_1="1" SEX="2" />
				<combination id="7" AGE_1="2" SEX="0" />
				<combination id="8" AGE_1="2" SEX="1" />
				<combination id="9" AGE_1="2" SEX="2" />
			</combinations>
			
		</dimensions>
		
		<dataElements>
<!--			A multi dimensional dataelement-->
			<dataElement code="NEW_STI_TREATMENT"  uuid="543534646423">
				<name xml:lang="en">New STI Treatment</name> 
				<description xml:lang="en">New patients receiving STI Treatment at facility</description>
				<dimension>AGE_1</dimension> 
				<dimension>SEX</dimension> 
			</dataElement>
<!--			A single dimensional dataelement-->
			<dataElement code="BEDS_OCCUPPIED"  uuid="543534646425">
				<name xml:lang="en">Beds Occupied</name> 
				<description xml:lang="en">The number of hospital beds occupied at facility</description>
			</dataElement>
		</dataElements>
	</metadata>

	<data>
<!--	Note there are a number of different ways one could group the following.  Or not group at all.-->
		<dataelement code="NEW_STI_TREATMENT">
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="1" Value="23"/>
<!--			<datavalue source="Mogale Clinic" period="2008-10" AGE_1="10-15" SEX="Female" Value="23"/>-->
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="2" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="3" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="4" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="5" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="6" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="7" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="8" Value="23"/>
			<datavalue source="Mogale Clinic" period="2008-10" CombinationID="9" Value="23"/>
		</dataelement>
<!--		The CombinationID attribute is optional.  "NULL" values allowed in database.-->
		<dataelement code="BEDS_OCCUPIED">
			<datavalue source="Mogale Clinic" period="2008-10"  Value="23"/>
		</dataelement>
	</data>
</dxf>

Follow ups

References