dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #10460
Re: External api for posting data values
Keeping this short. I am on quota of a few email lines per day ...
On 17 February 2011 12:34, Jo Størset <storset@xxxxxxxxx> wrote:
> tl;dr.. I should have cleaned this up, but have to get it sent before other work takes precedence..
>
> So, the intention of this prototyping was to get some kind of general discussion going on what I see as challenges with the current system (in addition to hopefully ending up with a working api of some sort). And I wanted to base it on concrete examples, so that it would hopefully not be as abstract. I have commited a rework of the api in terms of how I understand a "common dxf format" version could look. The xml would basically look like this:
>
> <dxf xmlns="http://dhis2.org/schema/dxf/x.x">
> <dataValues>
> <dataValue
> dataSet="uuid - only required when there is an actual ambiguity in the system"
> period="period"
> orgUnit="uuid"
> storedBy="string"
> dataElement="uuid"
> value="value" />
> </dataValues>
> <dataValueSets>
> <dataValueSet
> dataSet="uuid"
> orgUnit="uuid"
> period="period in iso format"
> complete="date (yyyymmdd)"/>
> </dataValueSets>
> </dxf>
>
This is clearly ugly. I don't like it. You don't like. I doubt
anyone would like it.
> sataValueSets here is more or less just a renamed completeDataSetRegistrations (not necessary if we don't want to accomodate locks, they seem missing?), so basically this should almost exactly mirror current dxf (except uuids instead of ids). This is quite verbose, and it doesn't really mirror what the api clients we currently know of wants to do, but that is not a fundamental problem (making usage and implementation a bit more difficult and messy, but not overly so).
>
> In my view dxf's primary objective is as complete serialization of the domain model as it is implemented. It's basic mode is that it should represent the system state exactly, so that you can take the serialized format and put up more or less a clone of the exported system. An api, on the other hand is a message oriented protocol for changing state on the system. So, there will be competing interest, here.
There you are wrong. You keep making this point to the extent that I
think you now thoroughly believe it :-) Complete cloning of systems
by serializing to dxf was (a long time ago) what I thought was the
primary objective, and perhaps it even was once, but now it definitely
is not. For one thing, practice seems to show that people tend to
share their databases via postgres/mysql dumps in the real world.
Where complete serialization is important is in the migrating of data
from say postgres to h2 or mysql to postgres or what have you. There
it would be really good if dxf had a more complete representation but
it always lags a bit and probably always will. I don't want to
downplay the importance too much but my recent experience has been
that judicious use of sed might be a more reasonable and efficient way
of mangaing these sql dialect conversions anyway. Or a more inward
looking xml serialization like xstream or something along the lines of
Murod's recent ideas.
The real principle use of dxf, as it has emerged, is as an
interoperable medium of exchange between systems - which I think
includes your use case. But you should appreciate it also includes
others such as iHRIS or OpenMRS data and so the format needs to be
complete enough to be able to map effectively against things like sdmx
and maybe also the likes of the google xml Knut recently drew ourt
attention to Dataset Publishing Language (DSPL).
Now you shouldn't be surprized to notice that structurally all of
these have a tendency to look like what you have below labelled [1]
with some extra richness here and there. With the exception of
dataset which is I think a bit of a peculiarity.
[The dataset (and the categorycombo for that matter) has only a very
weak relationship with the structure of data. In fact, from a MVC
perspective, both of these constructs have more to do with the view
layer. They determine what appears on forms. If they were renamed
Form and FormDimensions or FormSectionDimensions we might be a lot
clearer. But we live with names.
So far as datavalues are concerned, when we capture a bundle from the
wild, what is important is the dataelement and the categoryoptioncombo
(and period, orgunit etc). Its really not that important what the
categorycombo was on the form, nor as (Ola and Abyot have pointed out)
nor it seems even necessarily which dataset it was part part of when
it was collected. At least not from an analysis perspective.]
>
> The significant thing to notice, is that I have ended up adding dataSet as an attribute on dataValue, even if it doesn't belong there for regular dxf use. This is difficult to avoid in a clean way: A client sending a message to the system to request a state change needs to be able to tell the system things that does not necessarily belong in the stable state description of the system. And if we want to avoid the possibility for ambiguity in certain cases, it is necessary to have dataSet as an attribute on datavalues, even though it does not belong there in dxf.
>
> The best I have been able to do in my spike is make it optional (only required when the system actually have an ambiguity). This is of course not the only possible solution to this problem, I can at least think of the following approaches to try to handle this:
>
> 1) Do like above and have the rule about not using the dataSet attribute for regular uses of dxf specified somewhere else than in the format.
> 2) Not insist on one xml format for "incompatible" use cases.
> 3) Not allow functionality that is not representable in "canonical" dxf.
> 4) disallow uses that are ambigous (i.e. you cant post values if it is ambigous how it should be treated).
> 5) "Approximate" functionality - allow all uses and make a "qualified" guess as to how it should be resolved.
> 6) rework the domain model to accomodate the revealed inconsistencies.
> 7) rework the api so that knowledge not in the dxf model is represented in other ways (as a contrived example, have a http header with the dataset uuid instead of in the dxf).
>
> Still, my current preference ends up being for 2 in the short term and more 6 in the longer term (I'm not ruling out the other approaches, just emphasizing what I think is the best bet as of now).
>
> Of course 2 has the obvious drawback that more formats means more maintainance and for 6 it is always the case that it is much easier to see the problems with what you have rather than what think you want. And there are of course a whole lot of other concerns and implications whatever way we want to go. But overall I think something like my initial proposal [1] makes more sense than what I have above for clients needing to send (at least in my experience) dataValueSets.
>
> I also think it could be a good idea to just admit that dataValueSets exists and see if we can introduce the concept to the domain model in an unobtrusive way (to group completeness and locking?), not rally changing much, just stating more clearly a concept that is implicitly there. But I'm guessing this might have a hard time justifying priority over other things, and it is not essential (but would be a more iterative approach to domain model changes than having to do a big bang redesign when it gets critical, which in the long run could have significant advantages in itself).
>
> In the longer term I think that it is pretty clear that as dhis2 is moving to be more and more of a "datawarehouse" rather than a selfcontained system, we have to find a way to keep reevaluating the solution and the needs we have to support (like data values in multiple dataValueSets). I'm not saying that we don't need to keep supporting existing needs, and I'm not saying we should lightly part with what we have. But we must strive to have a core domain model that is reasonably simple, consistent and answers new needs and developments as they come along. And the growing integration requirements I think justifies changes to the core model (even if it might be a while, and it makes supporting some older requirements a bit harder).
>
> Btw, if we someday ended up with a model where datavalues belong to a datavalueset, I think my initial xml [1] would in effect be the new dxf..
>
> Jo
>
> [1] For reference, my initial proposal was something like this:
>
> <dataValueSet xmlns="http://dhis2.org/schema/dataValueSet/0.1">
> dataSet="dataSet UUID"
> period="periodInIsoFormat"
> orgUnit="unit UUID">
> <dataValue dataElement="data element UUID" categoryOptionCombo="UUID, only specify if used" storedBy="string" value="value"/>
> </dataValueSet>
So this really looks fine enough. Except you have an extra '>' in
line 1 and maybe missing a few optional attributes. And there is
still some discomfort around the dataSet attribute. If you made that
optional would you not find yourself with the same benefit as above,
ie looking substantially like legacy dxf, still meet your original
requirement simply and elegantly. And in the short term we don't
persist the datavalueset but just use it as a convenience for
datavalues to inherit repeated attributes from.
Anyway .. thats my quota up.
Regards
Bob
>
>
Follow ups
References
-
[Branch ~dhis2-devs-core/dhis2/trunk] Rev 2851: Added spike for storing dataValueSets through a simple http post (see <dhis-root-url>/api/rpc)
From: noreply, 2011-02-15
-
External api for posting data values
From: Jo Størset, 2011-02-15
-
Re: External api for posting data values
From: Bob Jolliffe, 2011-02-15
-
Re: External api for posting data values
From: Jo Størset, 2011-02-15
-
Re: External api for posting data values
From: Lars Helge Øverland, 2011-02-15
-
Re: External api for posting data values
From: Bob Jolliffe, 2011-02-15
-
Re: External api for posting data values
From: Abyot Gizaw, 2011-02-16
-
Re: External api for posting data values
From: Bob Jolliffe, 2011-02-16
-
Re: External api for posting data values
From: Ola Hodne Titlestad, 2011-02-16
-
Re: External api for posting data values
From: Bob Jolliffe, 2011-02-16
-
Re: External api for posting data values
From: Lars Helge Øverland, 2011-02-16
-
Re: External api for posting data values
From: Jo Størset, 2011-02-16
-
Re: External api for posting data values
From: Lars Helge Øverland, 2011-02-16
-
Re: External api for posting data values
From: Bob Jolliffe, 2011-02-16
-
Re: External api for posting data values
From: Lars Helge Øverland, 2011-02-16
-
Re: External api for posting data values
From: Jo Størset, 2011-02-17