← Back to team overview

dhis2-devs team mailing list archive

Re: uuids

 

Thinking some more about unique identifiers and names (and how they are not
the same thing) and still in the realm of multi dimensions, I recall having
this discussion in Delhi workshop.

Currently we have with dataElements, Categories and a whole host of other
stuff, a uniqueness constraint on the name.  So we are effectively using
name as an identifier.  We also use name for display purposes which is quite
a different use case.  The important characteristics in the first case are
uniqueness and preferably succintness.  The latter case is to look
meaningful and presentable in a form display or report.  One of the
downsides of using names as identifiers as we do is that we can't have two
categories with the same name.  I have seen recently that there are somtimes
some really compelling reasons you might want to do that.

The classic case which seems to come up is the question of "Age".   Whereas
you might want to have an "Age" category with options {"0-5","5-15","over
15"} you might equally want to have an "Age" category with options
{"under18", "18 and over"}.  What seems to then happen is that implementors
tie themselves up in knots trying to find new and imaginative ways of naming
the "Age" categories so that they are unique from one another when there is
really no compelling reason to this.  With a structure like:

<category id="23" name="Age" uid="4454545656456477756" />

where id is just the internal key and uid is a unique identifier (which
could be a uuid but not necessarily) then there is really no reason not to
have any number of categories with the name "Age" where each one has
different sets of category options.

There are also cases where you might want to have a different name for the
same uniquely identified element.  This is frequently the case with
internationalization.

In my ideal scenario, I don't like uuids because they are too long and carry
no meaning, I'd rather see the unique identifier being something like a code
AGE_STI or AGE_ART_COHORT (or even AGE_1, AGE_2, AGE_3 ...) rather than a
uuid.  To enforce global uniquensess they would have to be demarcated in a
namespace such as the URI scheme I have mentioned below.  Of course the
downside here is that some effort now needs to be put into defining unique
identifiers but one could create simple guidelines.

Either way, for the moment I'm sticking with uuids because we already have
them elsewhere and because other systems we might want to talk to (openMRS)
also use them.  But I'd like folk to give it some thought.  In particular
some thought to relaxing the uniqueness requirement on names where we also
have a unique identifier.  Where we have such an identifier we should use
that to compare and disambiguate between entities.  That in most cases the
names will also be identical should be treated as incidental.

Regards
Bob

2009/9/28 Bob Jolliffe <bobjolliffe@xxxxxxxxx>

> Hi Lars
>
> Much as I hate uuids I am now attaching them to DataElementCategories.
> This is the process I have followed:
>
> - I have modified DataElementCategory.java in API to provide for the string
> member and getters and setter.
> - Modified addDataElementCategory() of
> defaultDataElementCategoryService.java to generate uuid as per the
> equivalent method in defaultDataElementService.
> - Added property in DataElementCategory.hbm.xml
>
> So far so good.  I see that the DataElementCategory uses generic store so
> nothing more to do there.  Built and fired up and everything works fine.
> New field is created on DataElementCategory table (nice!).  New categories
> are now created with uuids.
>
> Obviously what I now need to do is to have an upgrade script to attach
> uuids to existing categories. Where is the best place to do this?
> Presumably in one of the (12) startup routines.  I don't want to add more
> fat to the start up but I guess this is unavoidable.  Please suggest the
> best place for this and I'll add it.  Meanwhile I'll commit the above.
>
> I do need to add child element to the dxf representation of the category.
>
> And then there is all the stuff to do with comparisons (which is I suppose
> the point of having the uuids).  Probably should modify the isEqual() method
> to take uuid into account.  Looking at the GenericNameStore I think we
> should either create a GenericUUIDStore or add an extra method to the former
> to retrieve by UUID.  Currently, we enforce a requirement of uniqueness on
> the name which should actually make UUIDs redundant.  We don't need two
> unique fields to identify.  If the name is unique anyway we could compose a
> URI string like for example
> http://dhis2.org/names/TZ/dataElementCategory/Sex .
>
> Much though I'd like to do that I think we would have to enforce better
> naming conventions to make it work well.  Currently we have some quite
> unwieldy category names which don't make very nice URIs.  Perhaps enforcing
> a camelcase or underscore convention through the user interface might work.
> Anyway, for the moment UUIDs it is.
>
> Cheers
> Bob
>
>
>
>

Follow ups