← Back to team overview

dhis2-devs team mailing list archive

Re: Regular expressions in data validation rules

 

I very much agree with both Jason and Bob that it is useful to be able to
put checks (or not) at various levels, depending on use cases (and users).
Ideally, a common set of rules could be applied, as Jason says.

Which rules to switch on at what levels should be up to an administrator, or
in some cases even the end user. There is no one size fits all here, but
hopefully we can describe scenarios and best practices in the documentation
and training material, without tying everyone into the same mould.

Knut

On Mon, Feb 8, 2010 at 12:43 PM, Jason Pickering <
jason.p.pickering@xxxxxxxxx> wrote:

> Very good points. I was thinking initially at least, so start with 4
> and 1, in that order.
>
> There are already many checks already in place in the UI, but somehow
> it feels that it should be possible to extend them and make them more
> generic, to suit a particular implementations needs. Could the rules
> defined in the data integrity checks be reused at the UI level (and
> other levels?). It feels like it is possible, although there may be
> complications due to different regex flavors. The fourth alternative
> seems like a quick win.
>
>  Data integrity checks serve a useful purpose, by allowing people to
> enter some data, even if it may not be 100% correct. This is a
> property of HMIS systems that I think we all face, namely that some
> information is better than no information and all. For instance, you
> can enter values beyond the min/max values, but there are checks there
> to warn you. The same could be said of the functionality of regular
> expressions in the data validation process. Allow people to enter
> data, even though it may be not entirely correct (e.g. does not follow
> the countries naming conventions, includes decimal places where there
> should not be any, etc). Each of these rules are often highly specific
> to implementations. Placing regular expressions in the data integrity
> as a start, would seem fairly simple to implement, and would offer up
> some quick wins to allow better data quality.
>
> I agree that intercepting problems at the import level is important,
> but as Bob highlights, it is costly in terms of processing. At a
> personal level, I tend to want to get the data in the DB first, and
> then try and clean it up, rather than trying to analyze all the
> possibly problems prior to a data import. I think there are good
> arguments both ways, but in many cases, we have no control, except
> when we do the import ourselves, of whether imported data has been
> properly imported or no. 90% of the time here in Zambia, data imported
> is pretty good, but it is that 10% that can often only be resolved by
> a human most efficiently, at least when one thinks about the code
> required to try and correct every single issue that may arise from a
> particular naming convention, and whether someone follows it or not.
>
> Regards,
> Jason
>
>
> On Mon, Feb 8, 2010 at 12:34 PM, Bob Jolliffe <bobjolliffe@xxxxxxxxx>
> wrote:
> > Hi
> >
> > There are 4 places one could use these regex's:
> > 1. in the browser - client side validation
> > 2. in the framework action/interceptors
> > (http://struts.apache.org/2.1.8.1/docs/validation.html)
> > 3. in the object persist methods
> > 4. post fact validation checks.
> >
> > There are lots of examples of validation with regex using javascript.
> Not
> > much to say.
> >
> > Regarding 2 it is a natural way to proceed but it won't affect import
> which
> > doesn't use the web interface.
> >
> > Regarding 3 we do need to be aware of those places where we bypass the
> > object model.  But where the object model is being used it is not
> difficult
> > to validate with a regex on save.  Of course we have to find the
> > corresponding regex.  That is really the first problem to solve.  Where
> to
> > find the regex within the model.
> >
> > Leaving values out of the picture for a while it might make sense to
> start
> > with names.  We have many named objects and the way we name then is
> > frequently very important as the names also act as primary identifiers.
> We
> > need somehow to add a class-wide string regex field for descendents of
> > NamedObjects (you might want two - one for name and one shortName, but
> maybe
> > start with name).  This way the regex should  be available to clients of
> > orgunit, dataelement, category etc
> >
> > On importing from XML it is very natural and easy to do regular
> expression
> > based validation using something like schematron which can validate
> against
> > any xpath expression - but regex is only available in XPath2 which means
> > using saxon and there are some concerns about introducing a saxon
> > dependency.  (We might re-look at that).  Though there is also another
> > reason to perhaps not use regex validation on dataValues.  It will slow
> > things enormously for large imports.
> >
> > It is also possible to do regular expression matching at the schema level
> > (using either RelaxNG or XSD) and validate via schema.  This might be the
> > most viable way to go though it would imply that the Zambia dxf schema
> would
> > have slightly different constraints to say the Tajik one.  And these
> schema
> > variations would have to be auto-generated somehow based on the local
> > database.
> >
> > Regards
> > Bob
> >
> > On 8 February 2010 08:57, Jason Pickering <jason.p.pickering@xxxxxxxxx>
> > wrote:
> >>
> >> Hi Murod,
> >>
> >> This, of course, is one particular  trivial example and was provided
> >> to illustrate a point.
> >>
> >> I totally agree, this particular example could be solved through
> >> JavaScript validation on the client, and it may already be there in
> >> 2.0. I have found this particular example by importing data from 1.4,
> >> where organization units are allowed to have trailing spaces. I think
> >> this is not really a one-off issue, as many people may need to import
> >> data from external systems, which may or may not have this particular
> >> validation enforced.
> >>
> >>  What I am trying to get at is that regular expressions could be used
> >> to expand the scope of the current data integrity checks, by enforcing
> >> certain patterns on the data (which in some cases could also be
> >> enforced through JavaScript in through the UI). Of course, if we can
> >> do it at the UI level great, but it may not work in all cases,
> >> especially when receiving data from external system. This is why I
> >> think that the data integrity checks come in place. For instance, as I
> >> mentioned in the specs, I need to find all organizational units that
> >> do not correspond to the naming conventions here in Zambia. I can do
> >> this with this...
> >>
> >> SELECT name from organisationunit where name !~
> >> '^(ce|co|ea|ls|lu|no|nw|so|we) '
> >>
> >> Well, I found 47, which do not correspond to the naming convention. I
> >> have made my dislike of the supposed best practice naming conventions
> >> in earlier threads, but with the implmenetation of regex for checking
> >> of these conventions, at least we could enforce them, even if it is ex
> >> post facto.
> >>
> >> Again, these are all examples, and they are really impossible to
> >> predict what they may be, thus the need for flexible rules, built by
> >> administrators/users, and then applied during data integrity checks
> >> (and/or during data entry).
> >>
> >>
> >>
> >> Regards,
> >> Jason
> >>
> >>
> >>
> >>
> >> On Mon, Feb 8, 2010 at 9:55 AM, Murodullo Latifov
> >> <murodlatifov@xxxxxxxxx> wrote:
> >> > Hi Jason,
> >> >
> >> > Looks like single time task if I understood you correctly? If you want
> >> > to clean data already on database. like data integrity checking. Why
> not to
> >> > make it clean at the very beginning, when particular record being
> captured?
> >> > For this one could use regexp in javascript on client side too. As for
> >> > leading and trailing spaces String.trim(" string    ") should do
> before
> >> > passing to database.
> >> >
> >> > regards,
> >> > murod
> >> >
> >> >
> >> >
> >> > ----- Original Message ----
> >> > From: Jason Pickering <jason.p.pickering@xxxxxxxxx>
> >> > To: Hieu Dang Duy <hieu.hispvietnam@xxxxxxxxx>
> >> > Cc: dhis2-devs <dhis2-devs@xxxxxxxxxxxxxxxxxxx>
> >> > Sent: Mon, February 8, 2010 1:05:27 PM
> >> > Subject: Re: [Dhis2-devs] Regular expressions in data validation rules
> >> >
> >> > Hi Hieu,
> >> > Yes, I am actively fishing for a developer to implement this, as it
> >> > will really save me a huge amount of work in trying to clean up data.
> >> >
> >> > I have no idea really how it would be implemented, other than that
> >> > java.util.regex should be able to be used, but let me give it a try at
> >> > a better specification. I do not think it should be so difficult
> >> > either.
> >> >
> >> > I am thinking of something like this....
> >> >
> >> > The user would create a regular expression for later assignment to a
> >> > database object. The user would select a database table (object) and
> >> > field for validation. For instance, lets say we want to validate that
> >> > there are no trailing spaces in an organization name.
> >> >
> >> > So, we would create a rule called "Trailing spaces are not allowed"
> >> >
> >> > We would create this rule, and assign a description and a regular
> >> > expression to it.
> >> >
> >> > in this case, it would probably be something really simple like '\s+$'
> >> >
> >> > Now, I have no idea how to do this in java, but I assume this would be
> >> > really simple, something like this query in Postgresql.
> >> >
> >> > SELECT name from organisationunit where name ~*('\s+$')
> >> >
> >> > Wow, I found 571 orgunits in my organisationunittable with trailing
> >> > spaces. Cool.
> >> >
> >> > So, i think we need two objects.
> >> >
> >> > 1) A persistence object that stores the following files for the
> >> > RegexExpression
> >> >
> >> > a) regexid
> >> > b) name
> >> > c) expression
> >> > d) description
> >> > e) resolution description (telling the user how to solve this problem)
> >> >
> >> > 2) A table to assign regular expressions to database objects.
> >> >
> >> > a) regexid
> >> > b) table
> >> > c) field
> >> >
> >> > We could maybe reuse this rule on the davavalue table, to determine if
> >> > any values have been stored with trailing spaces.
> >> >
> >> > Yeah, its very easy I think. I would do it myself if I knew a lick of
> >> > Java. :)
> >> >
> >> > Best regards,
> >> > Jason
> >> >
> >> >
> >> > On Sun, Feb 7, 2010 at 7:36 PM, Hieu Dang Duy
> >> > <hieu.hispvietnam@xxxxxxxxx> wrote:
> >> >> Hi all,
> >> >>
> >> >> I've no idea about using RegEx for validating data in DHIS2. Just a
> >> >> small
> >> >> comment, I am also using this many times so my feeling on this is not
> >> >> easy
> >> >> but not too difficult when applying RegEx in your coding, ie,
> >> >> javascript and
> >> >> java also.
> >> >> With RegEx, we can easy controlling any thing that we want to force
> the
> >> >> user
> >> >> for entering data (text, number) or something else (a file name is an
> >> >> example).
> >> >> Let's try !
> >> >>
> >> >> Thanks !
> >> >>
> >> >> On Sun, Feb 7, 2010 at 10:24 PM, Jason Pickering
> >> >> <jason.p.pickering@xxxxxxxxx> wrote:
> >> >>>
> >> >>> https://blueprints.launchpad.net/dhis2/+spec/regex-validation
> >> >>>
> >> >>> I have updated the blueprint on regular expression use in data
> >> >>> validation rules. This would really make my life (and I suspect
> >> >>> others) lives a lot easier, as long as we are using naming
> >> >>> conventions, lets at least enforce them somehow.
> >> >>>
> >> >>> For discussion.
> >> >>>
> >> >>> Jason
> >> >>>
> >> >>> _______________________________________________
> >> >>> Mailing list: https://launchpad.net/~dhis2-devs
> >> >>> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >> >>> Unsubscribe : https://launchpad.net/~dhis2-devs
> >> >>> More help   : https://help.launchpad.net/ListHelp
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Hieu.HISPVietnam
> >> >> Good Health !
> >> >>
> >> >
> >> > _______________________________________________
> >> > Mailing list: https://launchpad.net/~dhis2-devs
> >> > Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >> > Unsubscribe : https://launchpad.net/~dhis2-devs
> >> > More help   : https://help.launchpad.net/ListHelp
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >> _______________________________________________
> >> Mailing list: https://launchpad.net/~dhis2-devs
> >> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> >> Unsubscribe : https://launchpad.net/~dhis2-devs
> >> More help   : https://help.launchpad.net/ListHelp
> >
> >
>
> _______________________________________________
> Mailing list: https://launchpad.net/~dhis2-devs
> Post to     : dhis2-devs@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~dhis2-devs
> More help   : https://help.launchpad.net/ListHelp
>



-- 
Cheers,
Knut Staring

References