← Back to team overview

dhis2-devs team mailing list archive

Re: Regular expressions in data validation rules

 

Hi Jason,

Ok, I understood your point. If you are about import validation, than its good to do that, as there is no proper XML validator for DHIS data exchange, I mean content not validated against any dictionary or repository, when source is other than DHIS itself. Kettle is one such tool for ETL with inbuilt regexp support. You can use it for single task type of issues with transformation, validation and such. http://kettle.pentaho.org/. Support many different input formats as well as output.

Regards,
murod



----- Original Message ----
From: Jason Pickering <jason.p.pickering@xxxxxxxxx>
To: Murodullo Latifov <murodlatifov@xxxxxxxxx>
Cc: Hieu Dang Duy <hieu.hispvietnam@xxxxxxxxx>; dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
Sent: Mon, February 8, 2010 2:27:13 PM
Subject: Re: [Dhis2-devs] Regular expressions in data validation rules

Hi Murod,

This, of course, is one particular  trivial example and was provided
to illustrate a point.

I totally agree, this particular example could be solved through
JavaScript validation on the client, and it may already be there in
2.0. I have found this particular example by importing data from 1.4,
where organization units are allowed to have trailing spaces. I think
this is not really a one-off issue, as many people may need to import
data from external systems, which may or may not have this particular
validation enforced.

What I am trying to get at is that regular expressions could be used
to expand the scope of the current data integrity checks, by enforcing
certain patterns on the data (which in some cases could also be
enforced through JavaScript in through the UI). Of course, if we can
do it at the UI level great, but it may not work in all cases,
especially when receiving data from external system. This is why I
think that the data integrity checks come in place. For instance, as I
mentioned in the specs, I need to find all organizational units that
do not correspond to the naming conventions here in Zambia. I can do
this with this...

SELECT name from organisationunit where name !~ '^(ce|co|ea|ls|lu|no|nw|so|we) '

Well, I found 47, which do not correspond to the naming convention. I
have made my dislike of the supposed best practice naming conventions
in earlier threads, but with the implmenetation of regex for checking
of these conventions, at least we could enforce them, even if it is ex
post facto.

Again, these are all examples, and they are really impossible to
predict what they may be, thus the need for flexible rules, built by
administrators/users, and then applied during data integrity checks
(and/or during data entry).



Regards,
Jason




On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov
<murodlatifov@xxxxxxxxx> wrote:
> Hi Jason,
>
> Looks like single time task if I understood you correctly? If you want to clean data already on database. like data integrity checking. Why not to make it clean at the very beginning, when particular record being captured? For this one could use regexp in javascript on client side too. As for leading and trailing spaces String.trim(" string    ") should do before passing to database.
>
> regards,
> murod
>
>
>
> ----- Original Message ----
> From: Jason Pickering <jason.p.pickering@xxxxxxxxx>
> To: Hieu Dang Duy <hieu.hispvietnam@xxxxxxxxx>
> Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
> Sent: Mon, February 8, 2010 1:05:27 PM
> Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
>
> Hi Hieu,
> Yes, I am actively fishing for a developer to implement this, as it
> will really save me a huge amount of work in trying to clean up data.
>
> I have no idea really how it would be implemented, other than that
> java.util.regex should be able to be used, but let me give it a try at
> a better specification. I do not think it should be so difficult
> either.
>
> I am thinking of something like this....
>
> The user would create a regular expression for later assignment to a
> database object. The user would select a database table (object) and
> field for validation. For instance, lets say we want to validate that
> there are no trailing spaces in an organization name.
>
> So, we would create a rule called "Trailing spaces are not allowed"
>
> We would create this rule, and assign a description and a regular
> expression to it.
>
> in this case, it would probably be something really simple like '\s+$'
>
> Now, I have no idea how to do this in java, but I assume this would be
> really simple, something like this query in Postgresql.
>
> SELECT name from organisationunit where name ~*('\s+$')
>
> Wow, I found 571 orgunits in my organisationunittable with trailing
> spaces. Cool.
>
> So, i think we need two objects.
>
> 1) A persistence object that stores the following files for the
> RegexExpression
>
> a) regexid
> b) name
> c) expression
> d) description
> e) resolution description (telling the user how to solve this problem)
>
> 2) A table to assign regular expressions to database objects.
>
> a) regexid
> b) table
> c) field
>
> We could maybe reuse this rule on the davavalue table, to determine if
> any values have been stored with trailing spaces.
>
> Yeah, its very easy I think. I would do it myself if I knew a lick of Java. :)
>
> Best regards,
> Jason
>
>
> On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy
> <hieu.hispvietnam@xxxxxxxxx> wrote:
>> Hi all,
>>
>> I've no idea about using RegEx for validating data in DHIS2. Just a small
>> comment, I am also using this many times so my feeling on this is not easy
>> but not too difficult when applying RegEx in your coding, ie, javascript and
>> java also.
>> With RegEx, we can easy controlling any thing that we want to force the user
>> for entering data (text, number) or something else (a file name is an
>> example).
>> Let's try !
>>
>> Thanks !
>>
>> On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering
>> <jason.p.pickering@xxxxxxxxx> wrote:
>>>
>>> https://blueprints.launchpad.net/dhis2/+spec/regex-validation
>>>
>>> I have updated the blueprint on regular expression use in data
>>> validation rules. This would really make my life (and I suspect
>>> others) lives a lot easier, as long as we are using naming
>>> conventions, lets at least enforce them somehow.
>>>
>>> For discussion.
>>>
>>> Jason
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~dhis2-devs
>>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~dhis2-devs
>>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>>
>> --
>> Hieu.HISPVietnam
>> Good Health !
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>
>
>
>
>



      



References