dhis2-devs team mailing list archive

Thread
Date
Re: dhis2 dxf data import

To: Jo Størset <storset@xxxxxxxxx>
From: Bob Jolliffe <bobjolliffe@xxxxxxxxx>
Date: Thu, 1 Sep 2011 16:04:33 +0100
Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <FA388E30-CE15-48A9-B030-C37F3D6F423E@gmail.com>
On 1 September 2011 15:02, Jo Størset <storset@xxxxxxxxx> wrote:
> Great that you're looking at this. Some immediate feedback (pardon the lack of structure:)

Thanks for feedback ...

>
> Den 1. sep. 2011 kl. 13.55 skrev Bob Jolliffe:
>
>> As a first step I am interested in reusing the DataValueSet stuff from
>> the model rather than the representation of metadata, which I think
>> needs to be done more completely and not until changes to the
>> dimensional model are realized in the not too distant future.
>
> The only metadata stuff I've done was basically just to serve up some basic html, that should not be used, and certainly not reused :)
>
>> 1.  We should shift storedBy up to the dataValueSet level.  I'm
>> assuming all datavalues in a datavalueset will be stored by the same
>> user.  I'd put back an optional Comment attribute here as well.
>> Currently its only useful for rolling back imports.  Not the most
>> efficient way to implement it but still useful.
>
> I agree it would be nice if we could move it up. I am a bit unsure of the semantics of our data model and the use cases for this. If this were to be used to communicate between dhis instances, I guess it wont be an unthinkable situation that I have edited/added a value in a set that you originally stored, and that granularity would be lost. If that is something we should rethink in our data model rather than inherit to the xml structure, I don't know.

Me neither so it was a bit of a tentative suggestion :-)  I think the
semantics have their origin in an earlier era of standalone and
isolated dhis.  My thinking would be that what is relevant is who has
stored this value in *this* database.  Usernames from strange
databases wouldn't make much sense anyway.  And if one wanted to audit
it's absolute origin, one would have to follow the trail back to the
producer of the datavalueset - which might or might not be a dhis
instance.  Of course persisting the datavalueset would be immensely
helpful for this, but as Lars has pointed out, no requirement for this
has emerged yet so we hold off on that for now.

Its not critical either way at the moment - just looks a bit untidy.
Thus far, unless I hear compelling argument to the contrary, it seems
better to move it up.  Will wait and listen.

>
>> 2.  I don't think categoryOptionCombo should *necessarily* be exposed
>> to the external world.  Its very much an internal arrangement of DHIS.
>> Its useful enough in cases where HISP folk are involved on both
>> producer and consumer side of the equation, but for other 3rd parties
>> in the world it is best to hide this internal arrangement.  I suggest
>> that dataElement and value are *required* attributes,
>> categoryOptionCombo is optional and in addition we have have an
>> <xs:anyAttribute> extension point which allows for additional
>> attributes.  The implication would be that the above dataset will
>> remain valid (so existing stuff is still working),
>
> I think I agree that we need another model to better "externalize" dimensions. But it would become a bit more complex to implement if dataElement+optionCombo is not a "simple" identifier to the datavalue any more. It would be good to hear a little more about how you plan to implement it in the short run and if you think it should be combined with changes inside dhis..

I think that the simple {dataElement,optionCombo} tuple will remain
the internal identifier to the datavalue for the foreseeable future.
There's a lot of stuff built on top of it, it has some merits and it
can be coerced to behave reasonably well with some tightening of
constraints at the level of our java model.

>
> - Are you thinking of modelling this anyattribute extension point on the sdmx model in some way?

Well, similar enough I guess.

> - If there is a more explicit way to describe this in the schema than just anyattribute, I think it could help?

Schema languages are better at some things than others.  The problem
here is that we would be required to constrain the attributes on the
basis of a dynamic list which would vary from the concept list of one
application to another.  This would not be friendly to annotating
bindings for use on any system.  This is also the sdmx-hd problem.  As
it is, the xmlanyattribute annotation would bind to a map like:

@XmlAnyAttribute
    public Map<QName,Object> getAny(){
        if( any == null ){
            any = new HashMap<QName,Object>();
        }
        return any;
    }

The datavalue service can determine whether attributes are invalid or
not (in much the same way it determines whether orgunits, dataelements
really exist etc.  It could do this fairly painlessly by looking at
the categorycombo of the dataelement - which I think we need to do now
anyway to determine if the optioncombo is valid.

Of course it would be fairly trivial for a running instance of dhis to
generate a *strict* schema with the anyattributes replaced by fixed
attributes, which might be of value to producers.  But the internal
parser would have to be a bit agnostic.

> - And I think it would be advantageous if we could rework the internal data model to better fit this more general "schema" at the same time, or at least know a little bit more about how the internal changes would look.

Internally I want to change very little.  The most fundamental change
being to implement the category-concept-categoryoption binding in the
model and put strict constraints on concept names so that they are
obliged to conform to the intersection of requirements for sql column
names and xml attribute names.

Breaking mcdonalds and replacing with a star or snowflake type schema
is not really a sensible option at this juncture.

> - We need to stay backwards compatible with existing meta models, are we sure that the rules for names of dimensions (Sex, Age) is compatible with xml attribute names?

That we must impose through inspired regex on concept names which
should be relatively easy.  Category names can remain what they like.

> - We might need to think through how these dimensions would look in the metamodel xml, and how the link between this anyAttribute space and that model would be?

Will get to metamodel.xml next.  But the link between anyattribute
space and the model is essentially a fairly trivial one through
categorycombo and conceptname.

>
> Overall I guess allowing the two identifier schemes to coexist for a while, seems like a good idea. Though we should probably look to get rid of optionComboId asap, then.

Don't know.  Could well be that optionComboid has long legs.  It has
its uses between dhis systems which both understand the notion.

>
>> 3.  On the question of identifiers ....
>>
>> So I am going to suggest two additional attribute, probably at the
>> dataValueSets level which indicates the id system to use.  Currently I
>> can think of internal, code, uuid and map as possible candidates for
>> these attribute values.  Where map would imply that ids need to be
>> resolved using an aliases table keyed by a naming context, possibly
>> using some of Lars' objectmapper or perhaps simpler.  To maintain
>> compatibility with existing web service api this attribute can be
>> optional and default to uuid.
>
> Yep. I'm not sure what should be the default, though. Maybe just the internal id? For simple cases that looks easier than uuids (at least if we are thinking about the metamodel and how to communicate these id's *to* other systems?). Since we would maybe want to reuse this id model for the meta model as well, you think it would fit there?

I agree that uuid is not the most gentle default.  I just suggested it
because you were already using it.

>
>> I am pretty sure I can implement the above without breaking what is
>> currently there.  One possible but minor breaking change I would
>> suggest to improving parsing of very large datasets might be to
>> abbreviate some well known element names to dv, de and v for
>> compactness.
>
> I am not sure if these element names would really be that well known and obvious for the target people having to work with the schema.
> - Is there any alias mechanism for xml easily used with jaxb?

Not really.  There is a standard called DSRL which is designed to
alias/transform element names but not really applicable here.  Its not
that important.  I can live with long names or short.

> - Wouldn't we want explicit streaming/"batch" handling for use cases where sizes grew to this size, anyway?

I think for really large cases, database dumps and other tools are
maybe more appropriate anyway.  Of course one problem is that you
don't know the size of the stream when you start consuming it from the
head ...  I am sure some snakes have this problem :-)

Bob

>
> Overall, though, if you think abbreviated names are better, I'm all for it.
>
> Jo
Follow ups

Re: dhis2 dxf data import
From: Jo Størset, 2011-09-01
References

dhis2 dxf data import
From: Bob Jolliffe, 2011-09-01
Re: dhis2 dxf data import
From: Jo Størset, 2011-09-01