← Back to team overview

dhis2-devs team mailing list archive

Re: External api for posting data values

 

tl;dr.. I should have cleaned this up, but have to get it sent before other work takes precedence..

So, the intention of this prototyping was to get some kind of general discussion going on what I see as challenges with the current system (in addition to hopefully ending up with a working api of some sort). And I wanted to base it on concrete examples, so that it would hopefully not be as abstract. I have commited a rework of the api in terms of how I understand a "common dxf format" version could look. The xml would basically look like this:

<dxf xmlns="http://dhis2.org/schema/dxf/x.x";>
  <dataValues>
    <dataValue 
      dataSet="uuid - only required when there is an actual ambiguity in the system"
      period="period"
      orgUnit="uuid"
      storedBy="string"
      dataElement="uuid"
      value="value" />
    </dataValues>
  <dataValueSets>
    <dataValueSet
      dataSet="uuid"
      orgUnit="uuid"
      period="period in iso format"
      complete="date (yyyymmdd)"/>
  </dataValueSets>
</dxf>

sataValueSets here is more or less just a renamed completeDataSetRegistrations (not necessary if we don't want to accomodate locks, they seem missing?), so basically this should almost exactly mirror current dxf (except uuids instead of ids). This is quite verbose, and it doesn't really mirror what the api clients we currently know of wants to do, but that is not a fundamental problem (making usage and implementation a bit more difficult and messy, but not overly so). 

In my view dxf's primary objective is as complete serialization of the domain model as it is implemented. It's basic mode is that it should represent the system state exactly, so that you can take the serialized format and put up more or less a clone of the exported system. An api, on the other hand is a message oriented protocol for changing state on the system. So, there will be competing interest, here.

The significant thing to notice, is that I have ended up adding dataSet as an attribute on dataValue, even if it doesn't belong there for regular dxf use. This is difficult to avoid in a clean way: A client sending a message to the system to request a state change needs to be able to tell the system things that does not necessarily belong in the stable state description of the system. And if we want to avoid the possibility for ambiguity in certain cases, it is necessary to have dataSet as an attribute on datavalues, even though it does not belong there in dxf. 

The best I have been able to do in my spike is make it optional (only required when the system actually have an ambiguity). This is of course not the only possible solution to this problem, I can at least think of the following approaches to try to handle this:

1) Do like above and have the rule about not using the dataSet attribute for regular uses of dxf specified somewhere else than in the format.
2) Not insist on one xml format for "incompatible" use cases.
3) Not allow functionality that is not representable in "canonical" dxf.
4) disallow uses that are ambigous (i.e. you cant post values if it is ambigous how it should be treated).
5) "Approximate" functionality - allow all uses and make a "qualified" guess as to how it should be resolved.
6) rework the domain model to accomodate the revealed inconsistencies.
7) rework the api so that knowledge not in the dxf model is represented in other ways (as a contrived example, have a http header with the dataset uuid instead of in the dxf).

Still, my current preference ends up being for 2 in the short term and more 6 in the longer term (I'm not ruling out the other approaches, just emphasizing what I think is the best bet as of now). 

Of course 2 has the obvious drawback that more formats means more maintainance and for 6 it is always the case that it is much easier to see the problems with what you have rather than what think you want. And there are of course a whole lot of other concerns and implications whatever way we want to go. But overall I think something like my initial proposal [1] makes more sense than what I have above for clients needing to send (at least in my experience) dataValueSets. 

I also think it could be a good idea to just admit that dataValueSets exists and see if we can introduce the concept to the domain model in an unobtrusive way (to group completeness and locking?), not rally changing much, just stating more clearly a concept that is implicitly there. But I'm guessing this might have a hard time justifying priority over other things, and it is not essential (but would be a more iterative approach to domain model changes than having to do a big bang redesign when it gets critical, which in the long run could have significant advantages in itself).

In the longer term I think that it is pretty clear that as dhis2 is moving to be more and more of a "datawarehouse" rather than a selfcontained system, we have to find a way to keep reevaluating the solution and the needs we have to support (like data values in multiple dataValueSets). I'm not saying that we don't need to keep supporting existing needs, and I'm not saying we should lightly part with what we have. But we must strive to have a core domain model that is reasonably simple, consistent and answers new needs and developments as they come along. And the growing integration requirements I think justifies changes to the core model (even if it might be a while, and it makes supporting some older requirements a bit harder).

Btw, if we someday ended up with a model where datavalues belong to a datavalueset, I think my initial xml [1] would in effect be the new dxf..
 
Jo   

[1] For reference, my initial proposal was something like this:

<dataValueSet xmlns="http://dhis2.org/schema/dataValueSet/0.1";>
    dataSet="dataSet UUID" 
    period="periodInIsoFormat"
    orgUnit="unit UUID">
  <dataValue dataElement="data element UUID" categoryOptionCombo="UUID, only specify if used" storedBy="string" value="value"/>
</dataValueSet>





Follow ups

References