← Back to team overview

dhis2-devs team mailing list archive

Re: [Branch ~dhis2-devs-core/dhis2/trunk] Rev 4611: added proper validation for real/natural numbers (including support for e-notation)

 

We have sort of wondered away from the original discussion, but of
course, you raise a good point Bob. We sometimes get somewhat
unexpected results, with particularly low values..i.e. 0.04 which in
DHIS is the same as 0.0. So, we maybe can use a "factor" to present
the number as 0.04*1000 but it seems the epi folks have particular
ways of representing certain indicators, i.e. they should be a
straight rate, or per 100,000 and the problem is, we do not always
know, prima facie, so that we can set the indicator to have the
correct "factor". So, we have to double check what the data mart gives
us, to be sure that the number really is a zero. The difference
between 0.0 and 0.04 seems to be significant to folks who are looking
closely.

Anyway, my original point here was that I think having numbers stored
as text is fine, just as long as we are very robust with how we store
them. Not that I think anyone is going to input a number greater than
1e308 (we could not store it anyway since we only allow 255
characters) but it is better to check and be sure that it does not
happen.

Regards,
Jason


On Mon, Sep 19, 2011 at 11:31 AM, Bob Jolliffe <bobjolliffe@xxxxxxxxx> wrote:
> Perhaps what is more important than being able to support a count of
> the number of stars in the galaxy, is how we treat these numbers once
> we have them captured at whatever that precision this might be.
>
> Currently I think our notion of precision is a bit weak.  In the
> datamart service for indicators, for example, we seem to have a fixed
> notion of precision which is based on decimal places - from my reading
> of the code, it seems we store accurate to 1 decimal place.
>
> What we probably should be doing is maintaining some confidence level
> of significant figures.  This becomes quite obvious if we start
> inputting and storing values in scientific notation - those of us old
> enough to have used a slide rule will be familiar with this :-).
>
> So if I have a numerator (eg malaria cases) of 5436 and a denominator
> (eg population) of 155000, then what can I say about the indicator
> value?  Well if I calculate on my calculator I get:
> 0.035070968
>
> but obviously I am not confident in all those digits.  But if my
> numerator is accurate to 4 significant figures and my denominator is
> accurate to 3, then I can be be confident to 2 significant figures in
> my result; ie I can report the value as:
> 0.035
>
> I am not sure what the best strategy of managing precision in dhis
> should be, but it does strike me, for a system concerned with
> aggregation, we should attempt to attack it a bit more rigorously than
> we do.  What this probably requires, at the point of capture, is to
> capture the precision of the number, particularly where we know we are
> capturing an estimate eg. as a result of rounding.  This is done
> implicitly when using scientific notation.  The problem is more
> visible when we capture a string like "155000".  How precise is that?
> Well we don't actually know.  Intuitively we suspect its not accurate
> to 6 significant figure, and that its accurate to at least 3.  But it
> could be 4 (eg. 1.550E5)..
>
> Maybe its just me that worries a bit about these things.  Does anyone
> else have a sense that it is important to be able to indicate the
> precision of calculated indicator values?
>
> Bob
>
> PS. Storing natural number 'counts' as a floating point number
> introduces some untidyness here, but one that can be dealt with as we
> "know" the numberType of the datalement value.
>
> PPS.  this is is a very similar issue with an earlier discussion re
> rounding of coordinates during GML import.  The number of decimal
> places should always be an outcome rather a target of specifying
> precision.
>
>
> On 19 September 2011 09:16, Morten Olav Hansen <mortenoh@xxxxxxxxx> wrote:
>>> Yes, this is my point. I am sure (without knowing the details) that
>>> there are restrictions on what would be a valid exponent and fraction
>>> for a decimal representation of a real number.  If a number with 255
>>> digits is stored as text, and whether the values are handled as a
>>> double (I think that all values are treated as doubles regardless of
>>> whether they are integers or not), this places different restrictions
>>> on the number length which we should allow. So, if someone types in an
>>> exponent with 200 numbers and 55 decimal points (which we could store
>>> as text), would be be a valid double value?
>>
>> The range of double should be -1.79769313486231570E+308 to
>> 1.79769313486231570E+308 (if using 64 bit java I assume..).
>>
>> There is also BigInteger / BigDecial that could be used, that supports
>> even bigger numbers.
>>
>> That said, this is just what Java has to offer, what DHIS2 supports I
>> do not know.
>>
>> --
>> Morten
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~dhis2-devs
>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~dhis2-devs
>> More help   : https://help.launchpad.net/ListHelp
>>
>


References