← Back to team overview

dhis2-devs team mailing list archive

Re: On categories and dimensions and zooks

 

Hi Bob,
I know essentially nothing about Java, so this may be a rather fluffy,
philosophical email. However ,I will offer from  relational
standpoint, some comments, which I think overlap with yours, but which
are clearly less technical.

There is significant overlap in terms of what the OrgUnitGroupSets,
DataElementGroupSets and if it was implemented PeriodGroupsSets are
supposed to accomplish. They establish some sort of hierarchy and
grouping. A set of districts belong to a province. Days belong to
weeks. Certain data values were recorded for children with malaria
under 5.

It would seem there are two separate, but not competing requirements
for the data element group sets. One for data entry, and the other for
analysis. As I indicated in my earlier email, my gut feeling is that
there is no difference conceptually between a "category" and "data
element set". There may be differences in the implementation of the
classes, but conceptually, it seems to be only a way of lumping data
element together into some type of hierarchical relation. Whether
these are a seperate or single row in a database, is of little concern
to the end uers.

Those of you that have followed the OpenHealth functional prototype
have seen that what was attempted to do there was to create a union
between multidimensional data entry, and multidimensional analysis. It
was not entirely successful, but the point was clear. Sometimes you
need to be able to enter data for multiple organizational units for a
single data element (think of population indicators entered at the
national level and then distributed to districts (a requirement here)
),  in other cases (and the one that DHIS has catered to) is the entry
of multiple data elements for a single organizational unit for a
single time period.  The same could be said about entering a set of
data over multiple time periods for multiple organizational units for
a single data element.

I realize this may be asking to much, but is there a way that this
Dimension class could somehow be used to implement common methods
across Periods, OrgUnits and Data elements. These three concepts are
distinct and central to DHIS and data in general (when, where, what).
The rules of how these dimensions work internally are governed, are
also distinct. For instance, there are seven days in a week, three
months in a quarter, etc for Periods. I suppose this is/can be
implemented in code to tell the aggregation engine what to do.   The
similarities with the concepts of hierarchies and exclusivity being
the two that come to mind, are clear to me.

I am not sure if it can be done. But if there was anyway that the
existing categories class(es) could be used for two purposes
1) Creation of multidimensional data elements for the purpose of data
entry, etc.
2) Grouping of non-multidimensional elements into a multidimensional
data element after the fact.

Perhaps it is not possible, easily to do this, but as Bob highlights,
implementers will be left with a choice, and it is not clear to me
which one is preferable. Prima facie, I would say use
non-multidimensional data elements. What happens when the
dimensionality of a multi-dimensional data element changes? Is it
possible to change this after it has been created and data entered for
it? I am not sure, perhaps it is. However, if I was given a choice
that was flexible, and allowed me to arbitrarily assign independent
data elements to group sets, either for the purpose of data entry or
analysis, this might be the route that I would choose.

I better stop here, before I continue down my sophmoric pathway.

Regards,
Jason





2009/10/5 Bob Jolliffe <bobjolliffe@xxxxxxxxx>:
> Hi,
>
> 2009/10/4 Lars Helge Øverland <larshelge@xxxxxxxxx>
>>
>> Big thanks to all for illuminating the pros and cons of the current
>> multidimensional model. It was designed in 2006 basically to support the ICD
>> based dataentry, and we must admit that Bob is at least partially right when
>> saying that output could have been given better thought. Anyway it is not
>> working out too bad either it seems.
>>
>> I like Bob's suggestion for simplifying the model and it would apparently
>> made querying easier and improve the user interface. I have a few concerns:
>>
>> - Feasibility. The Category-related model is integrated into 9 out of 11
>> service projects in DHIS 2. Re-factoring and testing all this would take
>> months.
>>
>> - Backwards compatibility. Lots of databases and data-entry forms exist in
>> the field. Conversion must be managed.
>
> I reached the same conclusion :-(.   I think there is still some small
> rationalisation can be done, but the model is already deeply coupled with
> many parts of the system.   Having said that I have a suggestion related to
> the refactoring of dimensions and dataelementgroups below.
>
>>
>> - Suitability for the data-entry module. It seems likely that the
>> CategoryCombo class can be "emulated" through the API.
>
> Not sure what exactly what you mean by this .. but I guess probably.  I
> suspect the work that most needs to be done on the CategoryCombo class in
> the API is to provide "unpicking" methods to be able to conveniently access
> the underlying categories (dimensions).
>
>>
>> - Does it cut tables to change from m-n to 1-n? Using join tables to
>> represent 1-n associations is preferred by many as it keeps the domain model
>> cleaner.
>
> My proposal improved the situation by making a 1-n relation of category to
> categoryOptions.  This would certainly be more efficient but doesn't meet
> the use case where a categorOption might participate in different
> categories.
>
>>
>> If people say we can live with the current model I'd say we do just that.
>> Anyway Bob's suggestion should be documented and looked at again later. I
>> think the point about "input without output is statistical m..." is valid.
>> At least we will need to focus more on how to make "the goodness float up".
>
> I think we can only know whether we can live with the current model once the
> api methods which seem theoretically possible are implemented.  My concern
> is that if we provide an alternative to MD analysis through extending the
> groupset idea then we have no justification in recommending that
> implementors implement MD dataelements.  Convenience of UI is not enough if
> in the process we enter data which we can't unpack.  What will happen is
> that implementors with an eye on analysis will ignore the MD notion entirely
> because it creates difficulties for them and they have a ready analysis
> solution with groups and groupsets.
>>
>> Re the data element / indicator group set I think this is something we can
>> do without risk. It won't change the existing model and won't break anything
>> and wouldn't take too long to implement. Will start on it on Wednesday. A
>> minor comment here is that I believe we should keep the exclusiveness and
>> compulsory-ness of the group set optional (..eh) like we have it for
>> organisation unit group sets today.
>
> Lars I think this is the correct response to what is clearly a very real
> need.  But I want to suggest that we approach it as follows:
>
> - We create two new abstract classes, Dimension and DimensionOption.
> - DataElement should be extended with methods to retrieve Dimensions -
> fold/unfold whatever the gathered requirements are.  These are the methods
> which would be used in reportable design.
> - Both Category and Group should in some way implement Dimension.  In both
> cases I think the underlying structures, however imperfect, allows for this
> symmetry.  If this is difficult for Categories initially we can throw
> unImplemented() for now but we will have provided the structural guidance
> towards harmonising the two.
> - We might need a DimensionSet class or perhaps just a Set<Dimension>
> getDimensions() member function of DataElement.
>
> The point here is that if we have dimensions to a dataelement then from the
> reporting/analysis perspective it can be made invisible how those dimensions
> are implemented.  Instinctively I feel it should simply be possible to
> retrieve datavalues from a dimension or crosstabs of dimensions.
>
> One missing piece of the puzzle (or required symmetry) is that I don't think
> currently we name a dataelement which has *beneath* it a dataElementGroup or
> set of groups.  But I suspect this could be implemented relatively easily.
>
> Whereas the above might look like it is complicating the picture I think in
> fact it can considerably simplify it in the long run.  The correct starting
> point is to gather the requirements of what methods a Dimension should
> have.  If there were to be a Dimension class and we knew nothing of
> implementation details, what would Jason and Ola and others really require
> of that class.  Then we do the dirty work in the concrete implementations.
> Otherwise known as the sweep-it-under-the-carpet pattern :-)  Or what others
> might call encapsulation.
>
> Regards
> Bob
>
>>
>>
>> Finally I hope people who are troubled about the lack of documentation
>> would use Jason's instructions and convert some of this newly discovered
>> wisdom into... documentation.
>>
>>
>> cheers
>>
>> Lars
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>



Follow ups

References