← Back to team overview

dhis2-devs team mailing list archive

Re: Regular expressions in data validation rules

 

Hi

There are 4 places one could use these regex's:
1. in the browser - client side validation
2. in the framework action/interceptors (
http://struts.apache.org/2.1.8.1/docs/validation.html)
3. in the object persist methods
4. post fact validation checks.

There are lots of examples of validation with regex using javascript.  Not
much to say.

Regarding 2 it is a natural way to proceed but it won't affect import which
doesn't use the web interface.

Regarding 3 we do need to be aware of those places where we bypass the
object model.  But where the object model is being used it is not difficult
to validate with a regex on save.  Of course we have to find the
corresponding regex.  That is really the first problem to solve.  Where to
find the regex within the model.

Leaving values out of the picture for a while it might make sense to start
with names.  We have many named objects and the way we name then is
frequently very important as the names also act as primary identifiers.  We
need somehow to add a class-wide string regex field for descendents of
NamedObjects (you might want two - one for name and one shortName, but maybe
start with name).  This way the regex should  be available to clients of
orgunit, dataelement, category etc

On importing from XML it is very natural and easy to do regular expression
based validation using something like schematron which can validate against
any xpath expression - but regex is only available in XPath2 which means
using saxon and there are some concerns about introducing a saxon
dependency.  (We might re-look at that).  Though there is also another
reason to perhaps not use regex validation on dataValues.  It will slow
things enormously for large imports.

It is also possible to do regular expression matching at the schema level
(using either RelaxNG or XSD) and validate via schema.  This might be the
most viable way to go though it would imply that the Zambia dxf schema would
have slightly different constraints to say the Tajik one.  And these schema
variations would have to be auto-generated somehow based on the local
database.

Regards
Bob

On 8 February 2010 08:57, Jason Pickering <jason.p.pickering@xxxxxxxxx>wrote:

> Hi Murod,
>
> This, of course, is one particular  trivial example and was provided
> to illustrate a point.
>
> I totally agree, this particular example could be solved through
> JavaScript validation on the client, and it may already be there in
> 2.0. I have found this particular example by importing data from 1.4,
> where organization units are allowed to have trailing spaces. I think
> this is not really a one-off issue, as many people may need to import
> data from external systems, which may or may not have this particular
> validation enforced.
>
>  What I am trying to get at is that regular expressions could be used
> to expand the scope of the current data integrity checks, by enforcing
> certain patterns on the data (which in some cases could also be
> enforced through JavaScript in through the UI). Of course, if we can
> do it at the UI level great, but it may not work in all cases,
> especially when receiving data from external system. This is why I
> think that the data integrity checks come in place. For instance, as I
> mentioned in the specs, I need to find all organizational units that
> do not correspond to the naming conventions here in Zambia. I can do
> this with this...
>
> SELECT name from organisationunit where name !~
> '^(ce|co|ea|ls|lu|no|nw|so|we) '
>
> Well, I found 47, which do not correspond to the naming convention. I
> have made my dislike of the supposed best practice naming conventions
> in earlier threads, but with the implmenetation of regex for checking
> of these conventions, at least we could enforce them, even if it is ex
> post facto.
>
> Again, these are all examples, and they are really impossible to
> predict what they may be, thus the need for flexible rules, built by
> administrators/users, and then applied during data integrity checks
> (and/or during data entry).
>
>
>
> Regards,
> Jason
>
>
>
>
> On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov
> <murodlatifov@xxxxxxxxx> wrote:
> > Hi Jason,
> >
> > Looks like single time task if I understood you correctly? If you want to
> clean data already on database. like data integrity checking. Why not to
> make it clean at the very beginning, when particular record being captured?
> For this one could use regexp in javascript on client side too. As for
> leading and trailing spaces String.trim(" string    ") should do before
> passing to database.
> >
> > regards,
> > murod
> >
> >
> >
> > ----- Original Message ----
> > From: Jason Pickering <jason.p.pickering@xxxxxxxxx>
> > To: Hieu Dang Duy <hieu.hispvietnam@xxxxxxxxx>
> > Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
> > Sent: Mon, February 8, 2010 1:05:27 PM
> > Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
> >
> > Hi Hieu,
> > Yes, I am actively fishing for a developer to implement this, as it
> > will really save me a huge amount of work in trying to clean up data.
> >
> > I have no idea really how it would be implemented, other than that
> > java.util.regex should be able to be used, but let me give it a try at
> > a better specification. I do not think it should be so difficult
> > either.
> >
> > I am thinking of something like this....
> >
> > The user would create a regular expression for later assignment to a
> > database object. The user would select a database table (object) and
> > field for validation. For instance, lets say we want to validate that
> > there are no trailing spaces in an organization name.
> >
> > So, we would create a rule called "Trailing spaces are not allowed"
> >
> > We would create this rule, and assign a description and a regular
> > expression to it.
> >
> > in this case, it would probably be something really simple like '\s+$'
> >
> > Now, I have no idea how to do this in java, but I assume this would be
> > really simple, something like this query in Postgresql.
> >
> > SELECT name from organisationunit where name ~*('\s+$')
> >
> > Wow, I found 571 orgunits in my organisationunittable with trailing
> > spaces. Cool.
> >
> > So, i think we need two objects.
> >
> > 1) A persistence object that stores the following files for the
> > RegexExpression
> >
> > a) regexid
> > b) name
> > c) expression
> > d) description
> > e) resolution description (telling the user how to solve this problem)
> >
> > 2) A table to assign regular expressions to database objects.
> >
> > a) regexid
> > b) table
> > c) field
> >
> > We could maybe reuse this rule on the davavalue table, to determine if
> > any values have been stored with trailing spaces.
> >
> > Yeah, its very easy I think. I would do it myself if I knew a lick of
> Java. :)
> >
> > Best regards,
> > Jason
> >
> >
> > On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy
> > <hieu.hispvietnam@xxxxxxxxx> wrote:
> >> Hi all,
> >>
> >> I've no idea about using RegEx for validating data in DHIS2. Just a
> small
> >> comment, I am also using this many times so my feeling on this is not
> easy
> >> but not too difficult when applying RegEx in your coding, ie, javascript
> and
> >> java also.
> >> With RegEx, we can easy controlling any thing that we want to force the
> user
> >> for entering data (text, number) or something else (a file name is an
> >> example).
> >> Let's try !
> >>
> >> Thanks !
> >>
> >> On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering
> >> <jason.p.pickering@xxxxxxxxx> wrote:
> >>>
> >>> https://blueprints.launchpad.net/dhis2/+spec/regex-validation
> >>>
> >>> I have updated the blueprint on regular expression use in data
> >>> validation rules. This would really make my life (and I suspect
> >>> others) lives a lot easier, as long as we are using naming
> >>> conventions, lets at least enforce them somehow.
> >>>
> >>> For discussion.
> >>>
> >>> Jason
> >>>
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >>> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> >>> More help   : https://help.launchpad.net/ListHelp
> >>
> >>
> >>
> >> --
> >> Hieu.HISPVietnam
> >> Good Health !
> >>
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> > Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> > More help   : https://help.launchpad.net/ListHelp
> >
> >
> >
> >
> >
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs<https://launchpad.net/%7Edhis2-devs>
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References