← Back to team overview

dhis2-devs team mailing list archive

Re: dhis2 dxf data import

 

Great that you're looking at this. Some immediate feedback (pardon the lack of structure:)

Den 1. sep. 2011 kl. 13.55 skrev Bob Jolliffe:

> As a first step I am interested in reusing the DataValueSet stuff from
> the model rather than the representation of metadata, which I think
> needs to be done more completely and not until changes to the
> dimensional model are realized in the not too distant future.

The only metadata stuff I've done was basically just to serve up some basic html, that should not be used, and certainly not reused :)

> 1.  We should shift storedBy up to the dataValueSet level.  I'm
> assuming all datavalues in a datavalueset will be stored by the same
> user.  I'd put back an optional Comment attribute here as well.
> Currently its only useful for rolling back imports.  Not the most
> efficient way to implement it but still useful.

I agree it would be nice if we could move it up. I am a bit unsure of the semantics of our data model and the use cases for this. If this were to be used to communicate between dhis instances, I guess it wont be an unthinkable situation that I have edited/added a value in a set that you originally stored, and that granularity would be lost. If that is something we should rethink in our data model rather than inherit to the xml structure, I don't know.

> 2.  I don't think categoryOptionCombo should *necessarily* be exposed
> to the external world.  Its very much an internal arrangement of DHIS.
> Its useful enough in cases where HISP folk are involved on both
> producer and consumer side of the equation, but for other 3rd parties
> in the world it is best to hide this internal arrangement.  I suggest
> that dataElement and value are *required* attributes,
> categoryOptionCombo is optional and in addition we have have an
> <xs:anyAttribute> extension point which allows for additional
> attributes.  The implication would be that the above dataset will
> remain valid (so existing stuff is still working),

I think I agree that we need another model to better "externalize" dimensions. But it would become a bit more complex to implement if dataElement+optionCombo is not a "simple" identifier to the datavalue any more. It would be good to hear a little more about how you plan to implement it in the short run and if you think it should be combined with changes inside dhis.. 

- Are you thinking of modelling this anyattribute extension point on the sdmx model in some way? 
- If there is a more explicit way to describe this in the schema than just anyattribute, I think it could help? 
- And I think it would be advantageous if we could rework the internal data model to better fit this more general "schema" at the same time, or at least know a little bit more about how the internal changes would look. 
- We need to stay backwards compatible with existing meta models, are we sure that the rules for names of dimensions (Sex, Age) is compatible with xml attribute names? 
- We might need to think through how these dimensions would look in the metamodel xml, and how the link between this anyAttribute space and that model would be?

Overall I guess allowing the two identifier schemes to coexist for a while, seems like a good idea. Though we should probably look to get rid of optionComboId asap, then.

> 3.  On the question of identifiers ....
> 
> So I am going to suggest two additional attribute, probably at the
> dataValueSets level which indicates the id system to use.  Currently I
> can think of internal, code, uuid and map as possible candidates for
> these attribute values.  Where map would imply that ids need to be
> resolved using an aliases table keyed by a naming context, possibly
> using some of Lars' objectmapper or perhaps simpler.  To maintain
> compatibility with existing web service api this attribute can be
> optional and default to uuid.

Yep. I'm not sure what should be the default, though. Maybe just the internal id? For simple cases that looks easier than uuids (at least if we are thinking about the metamodel and how to communicate these id's *to* other systems?). Since we would maybe want to reuse this id model for the meta model as well, you think it would fit there?

> I am pretty sure I can implement the above without breaking what is
> currently there.  One possible but minor breaking change I would
> suggest to improving parsing of very large datasets might be to
> abbreviate some well known element names to dv, de and v for
> compactness.

I am not sure if these element names would really be that well known and obvious for the target people having to work with the schema.
- Is there any alias mechanism for xml easily used with jaxb? 
- Wouldn't we want explicit streaming/"batch" handling for use cases where sizes grew to this size, anyway?

Overall, though, if you think abbreviated names are better, I'm all for it.

Jo

Follow ups

References