← Back to team overview

dhis2-devs team mailing list archive

Re: [Branch ~dhis2-devs-core/dhis2/trunk] Rev 4611: added proper validation for real/natural numbers (including support for e-notation)

 

Perhaps what is more important than being able to support a count of
the number of stars in the galaxy, is how we treat these numbers once
we have them captured at whatever that precision this might be.

Currently I think our notion of precision is a bit weak.  In the
datamart service for indicators, for example, we seem to have a fixed
notion of precision which is based on decimal places - from my reading
of the code, it seems we store accurate to 1 decimal place.

What we probably should be doing is maintaining some confidence level
of significant figures.  This becomes quite obvious if we start
inputting and storing values in scientific notation - those of us old
enough to have used a slide rule will be familiar with this :-).

So if I have a numerator (eg malaria cases) of 5436 and a denominator
(eg population) of 155000, then what can I say about the indicator
value?  Well if I calculate on my calculator I get:
0.035070968

but obviously I am not confident in all those digits.  But if my
numerator is accurate to 4 significant figures and my denominator is
accurate to 3, then I can be be confident to 2 significant figures in
my result; ie I can report the value as:
0.035

I am not sure what the best strategy of managing precision in dhis
should be, but it does strike me, for a system concerned with
aggregation, we should attempt to attack it a bit more rigorously than
we do.  What this probably requires, at the point of capture, is to
capture the precision of the number, particularly where we know we are
capturing an estimate eg. as a result of rounding.  This is done
implicitly when using scientific notation.  The problem is more
visible when we capture a string like "155000".  How precise is that?
Well we don't actually know.  Intuitively we suspect its not accurate
to 6 significant figure, and that its accurate to at least 3.  But it
could be 4 (eg. 1.550E5)..

Maybe its just me that worries a bit about these things.  Does anyone
else have a sense that it is important to be able to indicate the
precision of calculated indicator values?

Bob

PS. Storing natural number 'counts' as a floating point number
introduces some untidyness here, but one that can be dealt with as we
"know" the numberType of the datalement value.

PPS.  this is is a very similar issue with an earlier discussion re
rounding of coordinates during GML import.  The number of decimal
places should always be an outcome rather a target of specifying
precision.


On 19 September 2011 09:16, Morten Olav Hansen <mortenoh@xxxxxxxxx> wrote:
>> Yes, this is my point. I am sure (without knowing the details) that
>> there are restrictions on what would be a valid exponent and fraction
>> for a decimal representation of a real number.  If a number with 255
>> digits is stored as text, and whether the values are handled as a
>> double (I think that all values are treated as doubles regardless of
>> whether they are integers or not), this places different restrictions
>> on the number length which we should allow. So, if someone types in an
>> exponent with 200 numbers and 55 decimal points (which we could store
>> as text), would be be a valid double value?
>
> The range of double should be -1.79769313486231570E+308 to
> 1.79769313486231570E+308 (if using 64 bit java I assume..).
>
> There is also BigInteger / BigDecial that could be used, that supports
> even bigger numbers.
>
> That said, this is just what Java has to offer, what DHIS2 supports I
> do not know.
>
> --
> Morten
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>


Follow ups

References