← Back to team overview

dhis2-devs team mailing list archive

validation of data values and data element value types

 

Hi there,

just wanted to share a tip. We often get inquiries about captured data not
showing up in pivots/analytics. A typical reason for this situation is data
values not being compatible with the value of type of their data elements.

This typically happens because data has been injected directly in the
database through SQL, or stems from the old days where DHIS 2 did not
strictly validate the data type during import (it does now).


As an example: A data element is set to value type "Number". Then textual
data values (e.g. "High", "Partial", "N/A") are imported directly into the
database.

Currently the analytics engine will filter out these values. As a result,
they will not be part of pivots and charts. It is a good idea to clean this
up, either by converting the data values, changing the data element value
type or simply deleting the data values.

This SQL integrity check will count the number of non-numeric data values
which are linked to data elements with numerical value types:


select count(*)
from datavalue dv
where value !~* '^(-?[0-9]+)(\.[0-9]+)?$'
and value is not null
and dv.dataelementid in (
  select dataelementid
  from dataelement de
  where de.valuetype in ('INTEGER', 'INTEGER_POSITIVE', 'INTEGER_NEGATIVE',
'INTEGER_ZERO_OR_POSITIVE', 'NUMBER', 'UNIT_INTERVAL', 'PERCENTAGE') );



This SQL integrity check will reveal the actual invalid values (limited to
1000 might have to be adjusted/run several times):

select de.name as de_name, de.uid as de_uid, de.dataelementid as de_id,
pe.startdate, pe.enddate, pe.periodid as pe_id,
ou.name as ou_name, ou.uid as ou_uid, ou.organisationunitid as ou_id,
dv.categoryoptioncomboid, dv.attributeoptioncomboid, dv.lastupdated,
dv.created, de.valuetype as de_valuetype, dv.value
from datavalue dv
inner join dataelement de on dv.dataelementid=de.dataelementid
inner join period pe on dv.periodid=pe.periodid
inner join organisationunit ou on dv.sourceid=ou.organisationunitid
where value !~* '^(-?[0-9]+)(\.[0-9]+)?$'
and value is not null
and dv.dataelementid in (
  select dataelementid
  from dataelement de
  where de.valuetype in ('INTEGER', 'INTEGER_POSITIVE', 'INTEGER_NEGATIVE',
'INTEGER_ZERO_OR_POSITIVE', 'NUMBER', 'UNIT_INTERVAL', 'PERCENTAGE') )
limit 1000;


The DHIS 2 team is working on performance improvements which require that
this is in order, so it's a good idea to check this for your systems and
potentially clean it up.


best regards,

Lars

-- 
Lars Helge Øverland
Lead developer, DHIS 2
University of Oslo
Skype: larshelgeoverland
lars@xxxxxxxxx
http://www.dhis2.org <https://www.dhis2.org/>