dhis2-devs team mailing list archive

Thread
Date
Re: On categories and dimensions and zooks

To: Jason Pickering <jason.p.pickering@xxxxxxxxx>
From: Bob Jolliffe <bobjolliffe@xxxxxxxxx>
Date: Wed, 30 Sep 2009 15:01:14 +0100
Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <244b6a540909300631i23510afbl7e47d09f765996f7@mail.gmail.com>
2009/9/30 Jason Pickering <jason.p.pickering@xxxxxxxxx>

> Indicators are most certainly multi-dimensional, but without a formal
> way of extending the multidimensional concept to indicators, I cannot
> see how it can work.


While there is still some disquiet about the current multidimensional
concept I would be reluctant to rush in ...


> It is still not clear to me how the
> multidimensional data elements are used to calculate indicators in the
> same was as PODE (plain old data element).  I guess this is handled
> somehow by the API?


Jason don't overestimate the API.  I suppose the point is this functionality
should be there as part of the MD implementation.  But it seemed work
stopped after the input side was done.  I can probably understand why, but
its a pity.  Because it is only when you start looking at this that you
realize that the queries involved will be considerably more complex than the
existing ones.  Curently the API for DataValueService is providing only flat
queries off the datavalues table.  (Even I can do these).  These new ones
are a completely different kettle of fish.  That's why you haven't seen any
of the goodness floating up to the reportTable design for instance.


> For instance, if I define my indicator (Malaria
> cases) with category combos (Under 1, Under 5, Over 5) and Patient
> status (OPD, IPD, Deaths), how do I calculate the Under 1 malaria
> incidence rate, which would be a slice of Under 1 malaria cases (with
> the patient status dimension folded) divided by a multi-dimensional
> population figure. Does this imply that the population figures and the
> incidence/coverages that result from combinations of indicators must
> share the exact same dimensionality so that DHIS can divine the
> correct dimensional intersections?
>
> I have not played around with this, but I suppose it is possible
> somehow from wihtin the  indicator definition panels.
>
> I can see why indicators should be multi-dimensional, both in terms in
> definition and in terms of analysis, but it feels like it would
> require a major rework.
>
> In terms of keeping it simple, again, all i require is the ability to
> assign dimensions and dimensional elements to data elements. :)
>

Which of course you can do.  Its just that it seems to be tricky to easily
get them back out again.

Regards
Bob


>
>
> On Wed, Sep 30, 2009 at 3:14 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
> > 2009/9/30 Ola Hodne Titlestad <olatitle@xxxxxxxxx>
> >>
> >> 2009/9/30 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> >>>
> >>> OK.  I've reached the conclusion that the model can and probably should
> >>> be simplified, but it is really far too much work for what I have time
> for
> >>> now.  The categoryoptioncombo is already deeply ingrained in many parts
> of
> >>> the system.  So don't hold your breath.
> >>>
> >>> I'm going back to focus on my much simpler problem of exploding
> >>> categorycombooptions into dimensions and vice versa.
> >>>
> >>> For querying, I can see the API needs methods added to return
> datavalues
> >>> by arbitrary collections of category rather than just fixed
> >>> categoryoptioncombos.  These only exist for the purpose of data
> collection.
> >>> I suspect that this is what Ola needs to create more flexible
> reporttables.
> >>> Then when configuring the reporttable you would freely select the
> dimensions
> >>> you were interested in.  This is of course do-able - I can see it - but
> my
> >>> little brain is struggling with the complexity.
> >>>
> >>> Looking at a two stage process it is a matter of getting the collection
> >>> of categorycombooptionids which intersect with the given set of
> categories
> >>> and then passing that collection to the existing API method which
> returns
> >>> collections of  datavalues which match particular
> categorycombooptionids.
> >>>
> >>> In principle if we can expose the required methods in the API then it
> >>> might be possible at some time in the future to revamp the underlying
> table
> >>> structure without disturbing the API.
> >>>
> >>> Two final thoughts:
> >>> 1.  if we are bound to the model whereby categoryoptions are free
> >>> standing entitities (ie many to many relation with categories) then,
> for the
> >>> purpose of import/export we are obliged to uniquely identify these as
> well.
> >>> So I will have to reluctantly also put uuids on categoryoptions.  After
> >>> discussing with Abyot last night, I can see that there is some value in
> >>> having them the way they are, but we will have to live with the
> complexity.
> >>> What you gain on the swings you lose on the roundabouts.
> >>>
> >>
> >> OK. I still don't get why we need this flexibility though. When using
> the
> >> data values you would only query for data element +
> categories/dimensions
> >> anyway right, and <5 means <5 whether it is part of AGE1, AGE2 or AGE 3.
> Or?
> >
> > I guess the problem is that "<5" is just a label.  Using OpenMRS-speak
> you
> > could say there is no "concept" attached to the label.  So in another
> > category there could be a label "lessThan5".  By allowing options to be
> > shared between categories, Abyot is hoping you will just use "<5" in all
> > cases.  Of course there is nothing forcing you to do this.  Just as there
> is
> > nothing stopping you having a category of "<5", "Oranges" and "Apples".
> So
> > combined with the flexibility to do something possibly useful you also
> have
> > the flexibility to do quite silly things.
> >
> > There is a strong sense in which Age is quite a special and common case
> > (like Period).  For example, if you had one category with {<5, 5-10, >10}
> > and another with {0-10, >10} then you should really be able to aggregate
> all
> > the 0-10's.
> >
> > Perhaps the category Age (or any categories implementing the Age concept)
> > requires some special status where there are formal requirements on the
> > naming of categoryoptions within it.  Don't know - you are more familiar
> > with the use cases.
> >
> >>
> >>
> >>>
> >>> 2.  Indicators are not multidimensional.  Why is this?  Was it a
> >>> conscious decision resulting from earlier discussion or is it just that
> we
> >>> haven't got there yet?
> >>
> >> Data analysis could benefit from having multidimensional indicators, but
> >> then since this is strictly for output and never input I would suggest
> using
> >> the post-method of assigning indicator group sets and groups (or
> whatever
> >> you end up calling it in the UI). What makes indicators interesting and
> >> complex in this context is that the numerator/denominator formulas
> should be
> >> able to contain slices of the multidimensional data element, e.g.
> "Malaria"
> >> + "all ages", "male", and not only the flat data element (data element +
> 1
> >> categoryoptioncombo, "Malaria"+ "<5", "male") like it is today.
> >
> > This distinction between input and output is strange.  Having input-only
> > dimensions is like a sort of statistical masturbation :-)  Lot of effort
> > with no end result.
> >
> > Yet looking at SDMX, it is clear that the protocol is much more suited
> for
> > indicators than it is for dataelements.  In fact using it to shunt
> > dataelements around between systems is a bit of a perversion.  But my
> sense
> > is that WHO would like DHIS in national offices to produce SDMX formatted
> > indicator reports for them.  Is that your sense too?  And should we care?
> > If so there is some expectation that indicators should have dimensions.
> And
> > including the slices you refer to above.  In fact if we were ever to try
> and
> > import the metadata from the famous WHO indicator repository that is
> exactly
> > what we will see.  Not sure how we might handle it without a md model.
> I
> > suppose we will create flat indicators with the dimensions encoded in the
> > name and then set about grouping the buggers :-(.
> >
> > I haven't really looked much at the indicator end of the beast.  Been
> > focussed more on getting datavalues from openmrs.
> >
> > Regards
> > Bob
> >
> >>
> >>>
> >>> Regards
> >>> Bob
> >>>
> >>> 2009/9/29 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> >>>>
> >>>> 2009/9/29 Abyot Gizaw <abyota@xxxxxxxxx>
> >>>>>
> >>>>>
> >>>>> On Tue, Sep 29, 2009 at 9:16 PM, Jason Pickering
> >>>>> <jason.p.pickering@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> I think Abyot raises some good points, especially his last one about
> >>>>>> differenences of what the age dimension really is.
> >>>>>>
> >>>>>> I think the biggest challenge is going to be how to unite the
> concepts
> >>>>>> of a multidimensional data element (as it is currently implemented
> >>>>>> with categories) and a data element that has no multidimensionality,
> >>>>>> at least in the sense of it not being assigned any categories.
> >>>>>
> >>>>> Isn't this what we have in the current system? If you are not
> assigning
> >>>>> any combination of categories for a dataelement (well of course for
> the sake
> >>>>> of consistency - from programming logic point of view - implicitly a
> default
> >>>>> category combination with one default category having one default
> option is
> >>>>> assigned - it is like putting your value at zero on the dimensions
> axis)
> >>>>> then the dataelement has no dimensionality.
> >>>>
> >>>> I don't really like the default category idea.  The way I have
> currently
> >>>> proposed there is no default category.  By default a dataelement has
> no
> >>>> dimensions.  It doesn't need a default dimension.  And also by default
> the
> >>>> dimensionelementcombination in datavalue is NULL.
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> What about the following scenario. Could the cateogry/category
> combos
> >>>>>> be transformed somehow into a sort of data element generator? Users
> >>>>>> could define a dimensionality set, assign a master data element, and
> >>>>>> DHIS would create all of the necessary data elements. So a category
> >>>>>> combination of Patient Status (OPD, IPD, Deaths) and Age (Under 1
> >>>>>> ,Under 5 and Over 5) and template data element (Clinical malaria)
> >>>>>> would produce :
> >>>>>>
> >>>>>> OPD Under 1 Clinical Malaria {OPD, Under 1, Clinical Malaria}
> >>>>>> OPD Under 5 Clinical Malaria {OPD, 1-5, Clinical Malaria}
> >>>>>> OPD Over 5 Clinical Malaria ...
> >>>>>> OPD Clinical Malaria Total {OPD, All ages, Clinical Malaria}
> >>>>>> ...
> >>>>>> ..
> >>>>>> ..
> >>>>>> IP Clinical Malaria Total {IP, All ages, Clinical Malaria}
> >>>>>> ...
> >>>>>> ...
> >>>>>> ...
> >>>>>> Deaths Clinical Malaria Total {Deaths, All ages, Clinical malaria}
> >>>>>> Clinical Malaria Total {All patient status, All ages, Clinical
> >>>>>> malaria}
> >>>>>>
> >>>>>> Each one of those data elements would then be assigned a set of
> >>>>>> dimensions, and a set of dimensional elements.
> >>>>>> The cateogries functionality would simply be an artifact to produce
> >>>>>> multiple data elements, without having to enter them seperately,
> which
> >>>>>> if I understood Ola yesterday, was one of its intended purposes.
> >>>>>>
> >>>>>> Now, for those of use such as myself, that do that have already
> create
> >>>>>> dozens of data elements with different dimensions in their names
> (but
> >>>>>> no where in a relational table) we could assign the dimensionality
> in
> >>>>>> a seperate step (post-facto as Bob mentioned earlier). I might want
> to
> >>>>>> assign a "uber" dimension of "Communicalble" and "Non-communicable"
> to
> >>>>>> a disease type that might not have anything to do with the
> definition
> >>>>>> of the data element itself, but would be simply for analysis
> purposes
> >>>>>> later.  Again, I may be rehashing my previous emails here, but from
> a
> >>>>>> pure SQl standpoint, the approach I suggest here makes sense to me,
> in
> >>>>>> terms of queries of how to pull this into a crosstab as well as how
> to
> >>>>>> generate a fact table that something like an OLAP server could deal
> >>>>>> with
> >>>>>>
> >>>>>> This approach might seem to resolve the issue of how to deal with
> >>>>>> these two different beasts, but unfolding the multidimensional data
> >>>>>> element into simpler components. Meaning that the
> >>>>>> cateorgy/combos/options would be used as a templating mechanisms,
> but
> >>>>>> that dimensionality could be assigned through a separate set of
> >>>>>> relations.  Perhaps this is what is represented in the diagram, but
> I
> >>>>>> will need to study it tomorrow after some sleep.
> >>>>>>
> >>>>>> I do think that that dimenional elements should not be able to be
> >>>>>> share by dimensions, and that dimensions and dimensional elements
> >>>>>> should not be able to be deleted without lots of bells and whistles
> >>>>>> going off once they have been assigned to data elements.
> >>>>>
> >>>>> What is wrong with that as long as values are not associated with
> them?
> >>>>> I think we will be falling back to the current implemention instead -
> like
> >>>>> dimensional elements should not be deleted once values are assigned
> to their
> >>>>> combinations.
> >>>>
> >>>> I agree.  I think we all will agree on this much.
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> I guess the key question is whether data elements should be able to
> >>>>>> have multiple DimensionElementCombinations, which I think is the
> >>>>>> current implementation. I am just not sure this will work with a
> >>>>>> combination of DHIS2-type-multidimensional elements, and
> DHIS1.4-type
> >>>>>>
> >>>>>> data elements.
> >>>>>
> >>>>> Can anyone explain me how the DHIS2 multidimensional dataelement
> >>>>> concept fails to handle the DHIS 1.4 dataelements - sorry may be I
> missed
> >>>>> this from your earlier discussion? I think the way I see it - if the
> >>>>> objective is on OLAP, pivoting/querying, then what we need is not to
> change
> >>>>> the model - instead to develop more APIs which can pull data along a
> >>>>> dimension, varying degree of overlappings across dimensions - or more
> >>>>> generally aggregation of values over a flexible set of
> >>>>> dimensionelementcombinations !
> >>>>
> >>>> Again I am with you mostly on this.  In fact that has been my
> suggestion
> >>>> all along - to push the functionality into the API.  But having said
> that I
> >>>> think the current model is too double-jointed and complex.  I have
> seen by
> >>>> trying to unpick the dimensions using xslt I need too many hash tables
> which
> >>>> is inefficient.  This no doubt would also translate into too many SQL
> >>>> clauses.  By trimming the requirement that dimensionelements are
> freely
> >>>> assignable the model becomes a good bit simpler.  Beyond that it is
> mostly
> >>>> changing names.
> >>>>
> >>>>>
> >>>>> Using the example above -  {OPD, IPD}, {Male, Female},{Under 1, 1-5,
> >>>>> Above 5} and malaria as base dataelement
> >>>>>
> >>>>> What we have currently is an API to provide values for
> >>>>>
> >>>>> Malaria(OPD,Male,Under 1)
> >>>>> Malaria(OPD,Male,1-5)
> >>>>> Malaria(OPD,Male,Above 5)
> >>>>> Malaria(OPD,Female,Under 1)
> >>>>> Malaria(OPD,Female,1-5)
> >>>>> Malaria(OPD,Female,Above 5)
> >>>>> ....
> >>>>> ...
> >>>>>
> >>>>> And if I understood correctly .. what is required is to have
> registred
> >>>>> cases of
> >>>>>
> >>>>> Malaria in the OPD,
> >>>>> Malaria in the IPD
> >>>>> Malaria for Males
> >>>>> Malaria for Females
> >>>>> ....
> >>>>> ..
> >>>>>
> >>>>> Malaria In the OPD but only those Female
> >>>>> Malaria In the IPD but for male
> >>>>> ..
> >>>>> ..
> >>>>> ..
> >>>>> we can list different combinations....
> >>>>>
> >>>>> or finally ask ...... for the Malaria
> >>>>>
> >>>>> Isn't this a simple question of Aggregation? Does the
> multidimensional
> >>>>> datamodel have a limitation to handle the above requirements - or am
> I
> >>>>> talking a different stuff here?
> >>>>
> >>>> No I believe it can probably be done - but yet it doesn't seem to have
> >>>> been done.  When I started looking at how I might do it I realized
> that it
> >>>> could also be simplified.
> >>>>
> >>>> Regards
> >>>> Bob
> >>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Enough for today.
> >>>>>>
> >>>>>> Thanks for this Bob. It is a good start.  Can't you make this
> diagram
> >>>>>> in DocBook so I can edit it? :D
> >>>>>>
> >>>>>> Regards,
> >>>>>> Jason
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 29, 2009 at 8:01 PM, Abyot Gizaw <abyodia@xxxxxxxxx>
> >>>>>> wrote:
> >>>>>> > Yes your suggestion is doable and less is better .... but I think
> >>>>>> > the
> >>>>>> > requirement from the field is more complex.
> >>>>>> >
> >>>>>> > If, for a moment, we stop talking about datavalues and talk about
> >>>>>> > dataelements - why are we talking about dimension combinations?
> >>>>>> >
> >>>>>> > Because you are assuming a dataelement to have only one dimension.
> >>>>>> > Am I
> >>>>>> > correct? If that is the case, I see a little bit of inconsistency
> >>>>>> > here.
> >>>>>> > DataElement talks about one dimesion, but its corresponding value
> >>>>>> > talks
> >>>>>> > about combination of dimensions.
> >>>>>> >
> >>>>>> > Yes from the datavalue I can have dimensionelementcombinations,
> pick
> >>>>>> > dimensionelments regroup and put them in their corresponding
> >>>>>> > dimesions -- in
> >>>>>> > the end telling me from which dimension they came from. But from
> >>>>>> > this point
> >>>>>> > onwards I am no more talking about a value of a single dataelement
> >>>>>> > but a
> >>>>>> > value for combination of dataelements (because I have to pull
> >>>>>> > different
> >>>>>> > dataelements which can give me the identified dimensions) .... but
> >>>>>> > is this
> >>>>>> > what we want?
> >>>>>> >
> >>>>>> > The other point I would like the raise is - will there not be any
> >>>>>> > limitation
> >>>>>> > on the flexibility of the system when putting the restriction "A
> >>>>>> > Dimension
> >>>>>> > has many DimensionElements.  But a DimensionElement is a member of
> >>>>>> > only one
> >>>>>> > Dimension" ? Not only system flexibility problem, I see a logical
> >>>>>> > problem as
> >>>>>> > well. Because if we think for example beyond the obvious
> >>>>>> > SEX(male,female,unknown) - I see a strong need for letting
> >>>>>> > dimensionelements
> >>>>>> > to be member of multiple dimensions: For example take the other
> >>>>>> > obvious
> >>>>>> > dimension - AGE. And assume <5 yrs, 5-10 yrs, and <5 yrs as its
> >>>>>> > dimesionelements. May be such scaling of the AGE dimension is
> >>>>>> > approrpiate
> >>>>>> > for Malaria case, but for TB case people might be interested to
> >>>>>> > break the
> >>>>>> > AGE dimension into <5yrs, 5-10yrs, 10-15yrs, >15yrs - so how are
> we
> >>>>>> > going to
> >>>>>> > handle cases like this? Are we going to define a number of <5yrs
> or
> >>>>>> > are we
> >>>>>> > going to use the same <5yr dimensionelement ?
> >>>>>> >
> >>>>>> >
> >>>>>> > Thank you
> >>>>>> > Abyot.
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Tue, Sep 29, 2009 at 4:45 PM, Bob Jolliffe
> >>>>>> > <bobjolliffe@xxxxxxxxx> wrote:
> >>>>>> >>
> >>>>>> >> OK.  Here's my first attempt to rationalize things.  Please
> excuse
> >>>>>> >> the
> >>>>>> >> attachments.  I try not to send attachments to mailing lists but
> >>>>>> >> these are
> >>>>>> >> at least fairly small.  (And Lars I will write it up in docbook
> >>>>>> >> after
> >>>>>> >> fishing for feedback).
> >>>>>> >>
> >>>>>> >> My primary aim has been to disturb the existing model as little
> as
> >>>>>> >> possible whilst trying to simplify wherever possible.
> >>>>>> >>
> >>>>>> >> Attached oldmodel.png shows the participants in the existing
> >>>>>> >> model.  As
> >>>>>> >> you can see there are 11 tables in all.  I haven't showed the
> >>>>>> >> relations as
> >>>>>> >> it becomes a bit of a web.
> >>>>>> >>
> >>>>>> >> Also attached is a proposed amended database model which bears
> >>>>>> >> sufficient
> >>>>>> >> similarity to the old that migration between the two should be
> >>>>>> >> feasible.
> >>>>>> >> But it is down to 6 tables.  And I have named the tables
> according
> >>>>>> >> to the
> >>>>>> >> terms we have been discussing.  Of course this is just the
> database
> >>>>>> >> model.
> >>>>>> >> I've also put together an XML view of what some sample dataset
> >>>>>> >> might look
> >>>>>> >> like.  There is also a UML model required which would be richer
> >>>>>> >> than the
> >>>>>> >> underlying datamodel, but one step at a time ....
> >>>>>> >>
> >>>>>> >> Walking through:
> >>>>>> >>
> >>>>>> >> 1.  DataElements can have Dimensions.  And different dataElements
> >>>>>> >> can (and
> >>>>>> >> hopefully will) share some of the same Dimensions.  So there is a
> >>>>>> >> m-to-n
> >>>>>> >> relationship between the two necessitating an extra table
> >>>>>> >> (DataElementDimensions).  An example of a Dimension is SEX.
> >>>>>> >> Nothing new
> >>>>>> >> here.
> >>>>>> >>
> >>>>>> >> 2.  Dimensions have DimensionElements.  So SEX for example might
> >>>>>> >> have
> >>>>>> >> DimensionElements "Male", "Female", "Unknown".  A big difference
> >>>>>> >> from the
> >>>>>> >> old model is that there is 1-n relationship between
> >>>>>> >> DimensionElements and
> >>>>>> >> Dimensions.  A Dimension has many DimensionElements.  But a
> >>>>>> >> DimensionElement
> >>>>>> >> is a a member of only one Dimension.
> >>>>>> >>
> >>>>>> >> 3.  DataValues represent the values at intersection of these
> >>>>>> >> Dimensions.
> >>>>>> >> Keeping with the spirit of the old model this intersection is
> >>>>>> >> represented by
> >>>>>> >> a single key, DimensionElementCombination.  The
> >>>>>> >> DimensionElementCombinations
> >>>>>> >> would be populated when a new Dimension is added to a
> DataElement.
> >>>>>> >> Like the
> >>>>>> >> original model there is some fragility here.  Changing dimensions
> >>>>>> >> on
> >>>>>> >> dataelements could create a situation where datavalues become
> >>>>>> >> orphaned or
> >>>>>> >> misdirected.  The API must have robust methods for defending this
> >>>>>> >> integrity
> >>>>>> >> particulalrly when updating the structural metadata.  But this is
> >>>>>> >> perhaps
> >>>>>> >> doable.  Either way its not worse than we have.
> >>>>>> >>
> >>>>>> >> I haven't given a name to DimensionElementCombinations.  From the
> >>>>>> >> examples
> >>>>>> >> I have seen from SL this seems to be unnecessary.  The names I
> have
> >>>>>> >> seen
> >>>>>> >> being used are generally simply contrived from the dimensions or
> >>>>>> >> (worse
> >>>>>> >> still) from the categoryoptions.  What is important is that
> >>>>>> >> dataelements can
> >>>>>> >> have sets of dimensions.
> >>>>>> >>
> >>>>>> >> And then much of what is different is just a renaming of the
> >>>>>> >> original
> >>>>>> >> entities.    From the attached XML file I think you can see some
> of
> >>>>>> >> the
> >>>>>> >> issues faced re names and identifiers.  I find myself following a
> >>>>>> >> sort of
> >>>>>> >> convention of CODE, Name, Description and UUID.  CODE's must be
> >>>>>> >> unique
> >>>>>> >> within the scope of the database.  I suppose this is close to
> what
> >>>>>> >> we
> >>>>>> >> currently call ShortName.  I would like to place constraints on
> >>>>>> >> CODES in
> >>>>>> >> terms of length and also the disallowing of spaces and other
> funny
> >>>>>> >> characters.  The reason being that we may well have to use these
> >>>>>> >> codes in
> >>>>>> >> making up uri's.  So CODES must be unique.  For the moment we
> could
> >>>>>> >> keep
> >>>>>> >> name unique but should migrate from it.  Its a matter of
> rewriting
> >>>>>> >> all our
> >>>>>> >> comparators I guess.  UUIDs I am told are unique through some
> sort
> >>>>>> >> of
> >>>>>> >> divinity so we apparently do not need to worry about them :-)
> >>>>>> >>
> >>>>>> >> I've also tried to reduce the number of knees on the donkey -
> from
> >>>>>> >> 11
> >>>>>> >> tables to 6.  I believe this can be done whilst preserving the
> >>>>>> >> existing
> >>>>>> >> functionality.  This arangement would make it much more sensible
> to
> >>>>>> >> produce
> >>>>>> >> the XML I need to produce.  I'm hoping that it would also be more
> >>>>>> >> friendly
> >>>>>> >> to those who would be trying to pivot the data across dimensions.
> >>>>>> >>
> >>>>>> >> Jason do you think this works for you?  I might have missed out
> >>>>>> >> something
> >>>>>> >> really fundamental.  Abyot, you've been through this process
> before
> >>>>>> >> - am I
> >>>>>> >> missing something?  From the DataValue you can see
> >>>>>> >> DimensionElements.  And
> >>>>>> >> once you know a DimensionElement you also know the Dimension to
> >>>>>> >> which it
> >>>>>> >> belongs.  I think thats queryable.  Will have to hydrate with
> some
> >>>>>> >> data and
> >>>>>> >> see.
> >>>>>> >>
> >>>>>> >> Shaking the multidimensional model up like this would obviously
> >>>>>> >> have
> >>>>>> >> implications.  But I suspect most of it is taking stuff away
> rather
> >>>>>> >> than
> >>>>>> >> adding new so it might just be doable.  Less is more.
> >>>>>> >>
> >>>>>> >> Not spending time with docbook yet, till I get some feedback.
> >>>>>> >>
> >>>>>> >> Cheers
> >>>>>> >> Bob
> >>>>>> >>
> >>>>>> >> 2009/9/29 Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> >>>>>> >>>
> >>>>>> >>> Hi
> >>>>>> >>>
> >>>>>> >>> On the back of Jason and others comments, I've reached the
> >>>>>> >>> conclusion
> >>>>>> >>> that we cannot really live with the MD model the way it is.
> >>>>>> >>> Whereas I think
> >>>>>> >>> it is (just about) workable there are some serious optimizations
> >>>>>> >>> we can and
> >>>>>> >>> should do.  I am going to put my other work back a day or two
> and
> >>>>>> >>> propose
> >>>>>> >>> some changes in a branch.
> >>>>>> >>>
> >>>>>> >>> I think central to the inefficiency is the many-many relation
> >>>>>> >>> between
> >>>>>> >>> categories and categoryoptions.  This strikes me as illogical as
> >>>>>> >>> well as
> >>>>>> >>> being cumbersome in the UI.  Do we really want to be able to
> make
> >>>>>> >>> categories
> >>>>>> >>> with options like {'0<5','6-10','Male','Out of stock','35-40'}.
> >>>>>> >>> Reducing
> >>>>>> >>> the relation between categories and category options to 1-n cuts
> >>>>>> >>> two tables,
> >>>>>> >>> should make sql queries more efficient and grokkable and also
> >>>>>> >>> matches other
> >>>>>> >>> models such as sdmx better.
> >>>>>> >>>
> >>>>>> >>> The other possiible inefficiency is the dimensionset.  It can be
> >>>>>> >>> useful
> >>>>>> >>> in some contexts but I'm guessing that when querying the data
> >>>>>> >>> (which we want
> >>>>>> >>> to be fast) it is not relevant.  A dataelement can have
> >>>>>> >>> dimensions.  The
> >>>>>> >>> fact that some dataelements have the same combinations of
> >>>>>> >>> dimensions is very
> >>>>>> >>> useful to know for some purposes, but it should be possible to
> get
> >>>>>> >>> from the
> >>>>>> >>> dataelement to the dimension directly.
> >>>>>> >>>
> >>>>>> >>> On the other side of the road is the hierarchical dimensionality
> >>>>>> >>> idea I
> >>>>>> >>> see Ola and Jason have been discussing, where dimensions are
> >>>>>> >>> composed
> >>>>>> >>> (perhaps post-facto) of uni-dimensional dataelements rather than
> >>>>>> >>> decomposed
> >>>>>> >>> into pre-structured dimensional elements.  I suspect that:
> >>>>>> >>> 1.  we need both; and
> >>>>>> >>> 2.  from the API, user and reporting perspective they should
> look
> >>>>>> >>> the
> >>>>>> >>> same (ie a dataelement can have dimensions - how they come about
> >>>>>> >>> should not
> >>>>>> >>> be a concern at the end point).
> >>>>>> >>>
> >>>>>> >>> I'll try out some of these ideas and point you to the branch.
> >>>>>> >>>
> >>>>>> >>> Regards
> >>>>>> >>> Bob
> >>>>>> >>>
> >>>>>> >>> 2009/9/29 Lars Helge Øverland <larshelge@xxxxxxxxx>
> >>>>>> >>>>
> >>>>>> >>>>>
> >>>>>> >>>>> Thanks for the explanations Jason. The multidimensional model
> is
> >>>>>> >>>>> quite
> >>>>>> >>>>> complicated, is poorly documented, and as you say is
> >>>>>> >>>>> DHIS-centric in the way
> >>>>>> >>>>> that it is built around the DHIS notion of a Data Element.
> >>>>>> >>>>>
> >>>>>> >>>>
> >>>>>> >>>> Could we assemble and put some of the text being written on the
> >>>>>> >>>> list to
> >>>>>> >>>> docbook?
> >>>>>> >>>>
> >>>>>> >>>>>
> >>>>>> >>>>> That said, and I think Jason already has made a strong case
> for
> >>>>>> >>>>> this,
> >>>>>> >>>>> also in a 100% DHIS2 scenario you will need more flexibility
> in
> >>>>>> >>>>> defining
> >>>>>> >>>>> dimensions to your data than what categories can provide.
> Being
> >>>>>> >>>>> able to
> >>>>>> >>>>> define data dimensions independent of data collection is
> >>>>>> >>>>> powerful and should
> >>>>>> >>>>> be supported in a better way than what data element groups
> >>>>>> >>>>> provide today.
> >>>>>> >>>>> Given that we already have the orgunit group set code in place
> I
> >>>>>> >>>>> would
> >>>>>> >>>>> assume that adding group sets to data elements could be a
> >>>>>> >>>>> relatively
> >>>>>> >>>>> straight forward thing to do (but then again, I am not the
> >>>>>> >>>>> programmer...).
> >>>>>> >>>>
> >>>>>> >>>> I don't see any implications in adding this to the system, it
> >>>>>> >>>> won't
> >>>>>> >>>> require changes to the existing model as the association goes
> >>>>>> >>>> from the
> >>>>>> >>>> groupset to the groups. We can prioritize this for the 2.0.3
> >>>>>> >>>> release.
> >>>>>> >>>>
> >>>>>> >>>>
> >>>>>> >>>> _______________________________________________
> >>>>>> >>>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>>>>> >>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >>>>>> >>>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>>>>> >>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>> >>>>
> >>>>>> >>>
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> _______________________________________________
> >>>>>> >> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>>>>> >> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >>>>>> >> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>>>>> >> More help   : https://help.launchpad.net/ListHelp
> >>>>>> >>
> >>>>>> >
> >>>>>> >
> >>>>>> > _______________________________________________
> >>>>>> > Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>>>>> > Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >>>>>> > Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>>>>> > More help   : https://help.launchpad.net/ListHelp
> >>>>>> >
> >>>>>> >
> >>>>>
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>> More help   : https://help.launchpad.net/ListHelp
> >>>
> >>
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> > Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> > More help   : https://help.launchpad.net/ListHelp
> >
> >
>
Follow ups

Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-10-01
References

On categories and dimensions and zooks
From: Jason Pickering, 2009-09-16
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-29
Re: On categories and dimensions and zooks
From: Abyot Gizaw, 2009-09-29
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-09-29
Re: On categories and dimensions and zooks
From: Abyot Gizaw, 2009-09-29
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-29
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-30
Re: On categories and dimensions and zooks
From: Ola Hodne Titlestad, 2009-09-30
Re: On categories and dimensions and zooks
From: Bob Jolliffe, 2009-09-30
Re: On categories and dimensions and zooks
From: Jason Pickering, 2009-09-30