dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #13757
dhis2 dxf data import
I've been re-looking at dxf stuff in line with this blueprint:
https://blueprints.launchpad.net/dhis2/+spec/separation-of-meta-data-and-data-values.
A driving use case here is the import of HR data from iHRIS in Kenya.
With over 8000 orgunits, the current scheme of importing metadata then
mapping will scale badly and introduce fragility here.
So I started to review this thread,
https://blueprints.launchpad.net/dhis2/+spec/separation-of-meta-data-and-data-values,
where Jo introduced the concept of datavalueset and persisting
datavalues individually rather than via multiple inserts. That's a
really useful construct which I want to reuse. (BTW one of the reasons
the datavalueset is useful is where you want to check credentials for
importing. Currently its an admin only task. Checking for each
individual datavalue would be unworkable. Checking for permission to
import a datavalueset for an orgunit will scale better).
I've attached a schema which is extracted from Jo's existing annotated
code ( schemagen src/main/java/org/hisp/dhis/importexport/dxf2/model/
). XML schema language is ugly compared to relaxNg but never mind
that for now :-)
Much of what Jo has done here is geared towards solving his particular
requirements re the mobile interface, and much of it could be reused.
As a first step I am interested in reusing the DataValueSet stuff from
the model rather than the representation of metadata, which I think
needs to be done more completely and not until changes to the
dimensional model are realized in the not too distant future.
So before airing a few thoughts about the DataValueSet, lets look at
an example conforming to the existing schema for those of you less
comfortable with reading schema:
<dataValueSet orgUnit="550e8400-e29b-41d4-a716-446655440000" period="201004">
<dataValue dataElement="550e8400-e29b-41d4-a716-446655440000"
value="4" categoryOptionCombo="550e8400-e29b-41d4-a716-446655440000"
storedBy="Bob"/>
<dataValue dataElement="550e8400-e29b-41d4-a716-446655440000"
value="54" categoryOptionCombo="3" storedBy="Bob"/>
<dataValue dataElement="550e8400-e29b-41d4-a716-446655440000"
value="43" categoryOptionCombo="550e8400-e29b-41d4-a716-446655440000"
storedBy="Bob"/>
<dataValue dataElement="550e8400-e29b-41d4-a716-446655440000"
value="44" categoryOptionCombo="550e8400-e29b-41d4-a716-446655440000"
storedBy="Bob"/>
<dataValue dataElement="550e8400-e29b-41d4-a716-446655440000"
value="67" categoryOptionCombo="550e8400-e29b-41d4-a716-446655440000"
storedBy="Bob"/>
<dataValue dataElement="550e8400-e29b-41d4-a716-446655440000"
value="100" categoryOptionCombo="550e8400-e29b-41d4-a716-446655440000"
storedBy="Bob"/>
</dataValueSet>
1. We should shift storedBy up to the dataValueSet level. I'm
assuming all datavalues in a datavalueset will be stored by the same
user. I'd put back an optional Comment attribute here as well.
Currently its only useful for rolling back imports. Not the most
efficient way to implement it but still useful.
2. I don't think categoryOptionCombo should *necessarily* be exposed
to the external world. Its very much an internal arrangement of DHIS.
Its useful enough in cases where HISP folk are involved on both
producer and consumer side of the equation, but for other 3rd parties
in the world it is best to hide this internal arrangement. I suggest
that dataElement and value are *required* attributes,
categoryOptionCombo is optional and in addition we have have an
<xs:anyAttribute> extension point which allows for additional
attributes. The implication would be that the above dataset will
remain valid (so existing stuff is still working),
3. On the question of identifiers .... the schema as it stands
accepts any string identifier. The current model implementation makes
use of uuids for this. As we have all come to understand, the outside
world is more complex and there are many possible ways that different
systems will identify things. This can be via uuid, urn or some
mutually exchanged codelists of integer or other identifiers or even
identification by name. Try as we might to coerce the world into
using our one true identifier, all of the above might/will crop up
from time to time. For example we have a case in Kenya, where there
is a nationally agreed upon set of facility codes, which will be used
in data exchange
So I am going to suggest two additional attribute, probably at the
dataValueSets level which indicates the id system to use. Currently I
can think of internal, code, uuid and map as possible candidates for
these attribute values. Where map would imply that ids need to be
resolved using an aliases table keyed by a naming context, possibly
using some of Lars' objectmapper or perhaps simpler. To maintain
compatibility with existing web service api this attribute can be
optional and default to uuid.
The implication of adding all the above will be that whereas the
datavalueset above will remain valid (except perhaps shifting to
storedBy), the following would also be valid:
<dataValueSets orgUnitId="code" dataElementId="internal"
<dataValueSet orgUnit="23" period="201004" storedBy="Bob" >
<dataValue dataElement="2" value="4" Sex="1" />
<dataValue dataElement="2" value="5" Sex="2"/>
<dataValue dataElement="4" value="43" Sex="1" Age="3" />
<dataValue dataElement="5" value="44" Sex="1" Age="3" />
</dataValueSet>
</dataValueSets>
I am pretty sure I can implement the above without breaking what is
currently there. One possible but minor breaking change I would
suggest to improving parsing of very large datasets might be to
abbreviate some well known element names to dv, de and v for
compactness.
Please give me some feedback. I'll do up a temp branch with model
changes shortly.
Cheers
Bob
<?xml version="1.0" encoding="UTF-8"?>
<dataValueSet xmlns="http://dhis2.org/schema/dxf/2.0-SNAPSHOT"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://dhis2.org/schema/dxf/2.0-SNAPSHOT file:/home/bobj/space/schema1.xsd" orgUnit="orgUnit1" period="period1">
</dataValueSet>
Follow ups