← Back to team overview

dhis2-devs team mailing list archive

Re: External api for posting data values

 

Den 17. feb. 2011 kl. 19.18 skrev Bob Jolliffe:

> There you are wrong.  You keep making this point to the extent that I
> think you now thoroughly believe it :-)  

Let me try to rephrase (without wanting to continue the semantics debate in any way :) 

I think I view the uses you describe as api uses. They are external systems,  and I'm not at all surprised that they tend to group values in sets :)

The point I was trying to make, is that as long as 
- we have values belonging to multiple sets in dhis 
- *one* of the uses of dxf is to be able to serialize/deserialize the dhis structure
it is difficult to group values in datavaluesets in dxf. 

I haven't thought it through, but I guess it would be possible to just duplicate the value accross datavaluesets and let the metadata guide us to the fact that this is a duplicated value on deserialization. But this sounds a bit complicating to me.

Secondly, I think it might be a matter of performance and scaling. The jaxb parser approach is not really tailored for large imports (at least not my vanilla use of it). I guess there is quite a 
bit of room for improvement, but I am not sure that it will be an equal replacement for the batch handler stuff (which seem to have performance as it's primary concern). 

If we were to replace the dxf parser with a new one based on something like what I have made here, that is at least a concern we should explicitly think through. While it is easy to write a "validating parser" for small stuff (like I have done here), I'm not sure how well it would scale.I think there is a possibility that we would end up having two parsers with different models of "content validation"? I think we should be able to avoid this, but at I don't know it. I don't have much experience with this kind of jaxb use, and I'm not sure my approach to "manual" content validation is the right way to make that happen. 

I guess another thing is the medatadata and data mangling. That confuses me quite a bit, and I guess I might "uncorrectly" label api uses as cases where those are "separate" concerns. And before this turns into an argument, yes, I agree that a split would probably be advantageous for all use cases (and stable ids is the first step towards that). So thinking of that as "api uses" is not really useful, but as a first step they sort of are..

In the end I think it reasonable to start down this road with a small (in the worst case totally reversible) implementation now, and hopefully we can look at how best to redesign the internals of this implementation as we hopefully have more time for this later this year. But I felt I needed some kind of general acceptance before doing this, and it is good to try to be explicit about the risks before making such a choice.

> So far as datavalues are concerned, when we capture a bundle from the
> wild, what is important is the dataelement and the categoryoptioncombo
> (and period, orgunit etc). Its really not that important what the
> categorycombo was on the form, nor as (Ola and Abyot have pointed out)
> nor it seems even necessarily which dataset it was part part of when
> it was collected.  At least not from an analysis perspective.]

I would of course agree, where it not for the fact that we *do* have the concept of locks and complete (and things like the required dataElements in the community module, that I think people want in general dhis as well). Yes it is a "data input" concern, but that *is* the concern when external systems send data..

> And there is still some discomfort around the dataSet attribute.  If you made that
> optional would you not find yourself with the same benefit as above,
> ie looking substantially like legacy dxf, still meet your original
> requirement simply and elegantly.

Yes, it would be possible to make it optional. It wouldn't be very elegant to get "right", though.

I would have to iterate through the elements to find one with only one dataset attached, and then validate/operate on that. If all elements are in more than one data set (like with mobile specific data sets, where all elements could easily be in another "web" set as well), I would have to find the potential data sets that cover all data elements, and in effect validate against all (locking, possibly other things like required element) before storing the values. And if things like complete (or other value set level properties) was set, I would probably have to deny it even if all values would be ok to save.. 

So yes, it would probably be optional if this grouping were to be used generally, but I'm just not sure of the value of promoting that for the current api uses? It sould be easy enough to relax that constraint later?

Jo




Follow ups

References