← Back to team overview

dhis2-devs team mailing list archive

Reconstructing a categoryoptioncombo (long story)

 

Hi

Here's a problem.  Apologies, its a long mail, but its a serious business
and needs to be untangled.

Two or more systems have matching dataelements, categorycombos, categories
and categoryoptions.  They could be matched on uid, name, code or what
ever.  Assuming they also have matching orgunit identifiers, those two
systems should be able to exchange data.  There is really no need for
either of them to know anything about the other's categoryoptioncombos.
Which is a good thing on a number of fronts.  Not least being that if
either one of the two is not dhis2 then it won't have the faintest notion
of a categoryoptioncombo anywat.  And even if they were both dhis2, we all
know that keeping these catoptcombos in synch is notoriously difficult.

So I've been over some of this ground before, but now thinking about
implementation, there are some missing pieces in our model (and some
shortcomings of the java language) which makes this a bit trickier than it
should be.  Picture this datavalue being imported (using codes for
legibility):

<datavalue dataElement='MalariaCases' sex='M' age='under5' ..... />

1.  Once we know the dataelement we can immediately retrieve the
categorycombo, which tells us to expect two more attributes: sex and age in
this case.

2.  We could go the database at this point and query from the
 categoryoptioncombos_categoryoptions table, having first retrieved the
primary ids for the categoryoptions.  This would certainly work, but the
table might be quite big and the query would be required many times for a
large datavalueset.  Given that we know the categorycombo from 1 above, we
should only need to query from a very much smaller set of data contained in
an in-memory data structure.

3.  But what would such a data structure look like?  Essentially what is
required is a multidimensional associative array which is keyed along each
of its dimensions using the categoryoptions of a category.  For most of our
categorycombos this would be a 1 or 2 dimensional array, but with some
rarer cases of 3 or 4 categories.  That would allow lookups of the sort
getCatOptCombo(sex='M', age='u5', ...)

Such a dynamic associative array is a natural paradigm in languages like
perl, tcl, php, javascript, and probably R, but java leaves us a bit short.
The structure is not easily expressed, at least not efficiently.

4.  One alternative is to model it as a tree structure.  This has a minor
drawback that a tree has to put the categories (the layers of the tree) in
some order which is not implicit in our model, but that's not a very big
problem.  If you know the order they were put in, you can use the same
order to search them out.  A bit of xml below shows more or less what the
structure of that tree would be like for a typical age-sex combo:

<categoryCombo name="bhj" id="hjhkjkj" code="kmjkl">
    <category name="sex" >
        <categoryOption name="Male" >
            <category name="Age" >
                <categoryOption name="under5" >
                    <catoptcombo name="(Male/under5)" id="767866"/>
                </categoryOption>
            </category>
            <category name="Age" >
                <categoryOption name="over5" >
                    <catoptcombo name="(Male/under5)" id="ghuy8y"/>
                </categoryOption>
            </category>
        </categoryOption>
        <categoryOption name="Female" >
            <category name="Age" >
                <categoryOption name="under5" >
                    <catoptcombo name="(Female/under5)" id="767876"/>
                </categoryOption>
            </category>
            <category name="Age" >
                <categoryOption name="over5" >
                    <catoptcombo name="(Female/under5)" id="ghuy9y"/>
                </categoryOption>
            </category>
        </categoryOption>
    </category>
</categoryCombo>

Note the xml is incidental.  The point is the tree structure. Mind you,
java doesn't have a built in tree type but it does have a DOM model which
could be used very adequately for this kind of structure (tip off
stackoverflow).  Assuming we had created such a DOM model for a particular
categorycombo, then we can answer our question of what the catoptcombo is
for age='<5' and sex='M' with a relatively simple XPath query like:

//categoryOption[@code='M']/category/categoryOption[@code='u5']/catoptcombo/@id

Given these categorycombo trees will each individually be relatively small,
these will in fact be quite efficient lookups.

So I'm left with a few questions.

1.  does it make sense to use a DOM tree for this rather than invent our
own custom tree structure?  I'm inclined to do the DOM first because its
easy and will definitely work.  But won't be the fastest.  Could optimize a
custom tree later.  Mind you, after considering 3 below, the DOM tree might
make more sense than appears at first pass.

2.  Am I missing something.  Is a tree the right way to do this?  Is there
something about java's  apparent lack of multi-associative-arrays which I
am just not getting?

3.  Looking at the XML above (which was meant to be incidental), I now
think it makes a lot more sense than what we actually output currently from
our resources api.  For example, cutting a few bits from
https://apps.dhis2.org/demo/api/categoryCombos/dzjKKQq0cSO:

<categoryCombo id="dzjKKQq0cSO" name="Location and age group" >
<categoryOptionCombos>
  <categoryOptionCombo id="V6L425pT3A0" name="(<1y, Outreach)"
code="COC_290"  />
  <categoryOptionCombo id="hEFKSsPV5et" name="(>1y, Outreach)"
code="COC_289"  />
  <categoryOptionCombo id="psbwp3CQEhs" name="(Fixed, >1y)" code="COC_291"
/>
  <categoryOptionCombo id="Prlt0C1RF0s" name="(<1y, Fixed)" code="COC_292"
/>
</categoryOptionCombos>
<categories>
  <category id="fMZEcRHuamy" name="Location Fixed/Outreach"  />
  <category id="YNZyaJHiHYq" name="EPI/nutrition age"  />
  </categories>
</categoryCombo>

This representation  in itself is actually not very useful and asks more
questions than it answers.  In particular, and this is important, how is
that list of categoryoptioncombos related to that list of categories.  And
even more particularly their categoryoptions.  A not inconsiderable number
of additional api requests would have to be made before reaching sensible
answers to this pretty fundamental question.  In fact the arbitrary
presentation of these two elements (optioncombos and categories) is even a
bit odd in the sense that both of these lists are entirely derivable from
the other.  The more verbose but explicit tree model above is IMHO a much
more useful representation of the resource.

So that's my thinking on the problem at present.  We need to make a new
object (eg CategoryComboTree) to allow for simple lookups of the local
catoptcombos.  As we read in a datavalueset, we check each dataelement for
its categorycombo.  If we haven't yet instantiated a tree for that combo,
we instantiate one.  (Typical datavaluesets might only have a small handful
of these).  Using our tree, we look up the catoptcombo for each set of
category attributes in the datavalue.

Any better ideas?

Bob

Follow ups