← Back to team overview

dhis2-devs team mailing list archive

Re: About Regex-Validation blueprint

 

I have been pondering over this over the weekend, and have a few more
thoughts here.

The more I think about it, the more I think this is similar to something I
have been thinking about for some time, which is the ability to execute
passthrough SQL queries on the DHIS2 database. I know that H2, Postgres and
MySQL all have support for regular expressions, with slight variations. So,
what if we broke this blueprint up into several pieces.

1) A GUI element to allow users to enter field-level validation with regular
expressions, as I described in my previous mail. These regexes could
conceivably be reused during data import from external systems (SDMX-HD).
Seems pretty simple.

2) A query executor, that for right now, would execute a select query based
on  the stuff stored in the regex table, and execute a query against the
database. As an example, these two objects would be stored in the database


1) A regular expression
a) 1 (or something from the hibernate_sequence) (regexid)
b) No trailing spaces (name)
c) \s+$ (regexexpression)
d) This field is not allowed to have trailing spaces (violation_message)
e) Please remove all trailing spaces from this field. This means any spaces
after the last character. (violation_resolution)

2) What the regular expression should operate on
a) 1 (foreign key reference to the regex_objects table /object)
b) organisationunit (object/table)
c) name (property)
d)  FALSE (negate)
e)  TRUE (case_insensitive)
f)   NULL locale

Java can work some magic here, and convert this into  an SQL query for the
particular DB system we are using...

SELECT * FROM organisationunit where name ~('\\s+$') for postgres

In MySQL this would become..

SELECT * FROM `organisationunit` WHERE `name` REGEXP '\\s$'

I think it is possible in H2, but no idea what the grammar is.


Some sort of list (name, shortname, some other fields) would be returned to
the user along with instructions (1.e) about how to resolve it.

There could be many more uses for a passthrough SQL query executor, thus it
might be useful for constructing it in a way to make in generic to execute
arbitrary SELECT statements. I will not address this in this thread, but
perhaps a separate blueprint.

Thoughts?

Regards,
Jason




On Fri, Mar 26, 2010 at 3:24 PM, Jason Pickering <
jason.p.pickering@xxxxxxxxx> wrote:

> I thought I should forward this conversation Hieu and I have been having
> for the benefit of the group. Perhaps there are others that have
> suggestions/comments/rants about how we should go about field level
> validation with regular expressions?
>
> Regards,
> Jason
>
>
>
> ---------- Forwarded message ----------
> From: Jason Pickering <jason.p.pickering@xxxxxxxxx>
> Date: Fri, Mar 26, 2010 at 10:31 AM
> Subject: Re: About Regex-Validation blueprint
> To: Hieu Dang Duy <hieu.hispvietnam@xxxxxxxxx>
>
>
> Hi Hieu,
> Well, we can always start simple, and go from there, but it is important
> that we architect everything properly.
>
> So, I suggest that for everything out of the translation table, we use
> something like the data validation procedures. A user enter a rules, and
> then runs a check. Eventually, this could be extended to either the UI
> validation. But, my thinking was similar to that of DHIS 1.4, which allows
> data validation rules to be defined in terms of SQL, and then run by the
> user during a validation check.
>
> So, lets try a a full example. I have a situation now, where I have
> organisationunit short names, with trailing spaces.
>
> These there would need to be a UI screen to define the rule itself.
> a) 1 (or something from the hibernate_sequence) (regexid)
> b) No trailing spaces (name)
> c) \s+$ (regexexpression)
> d) This field is not allowed to have trailing spaces (violation_message)
> e) Please remove all trailing spaces from this field. This means any spaces
> after the last character. (violation_resolution)
>
> The UI, would contain a list of objects. The user would select the object
> (organisationunit) and then the property (name) and the rule (foreign key
> reference to the first table).
>
>
> a) 1 (foreign key reference to the regex_objects table /object)
> b) organisationunit (object/table)
> c) name (property)
> d)  FALSE (negate)
> e)  TRUE (case_insensitive)
> f)   NULL locale
>
> I would guess for the translation table, the method would need to be
> extended somehow, to deal with the locale, which is not present in other
> objects/tables.
>
> Now, a third UI screen would allow the user to choose the rule(s) they wish
> to run, similar to the data validation screen. A list of objects that match
> the rules that the user selects would then be returned to the user. I guess
> having the ability to print/save this list would be useful. Perhaps it
> should be brought up as a popup/seperate window to allow the user to resolve
> each of the violations in turn.
>
>
> Personally, I think we should start with everything other than the
> translations table, but I do not actually see much difference. Basically, if
> the user were to select the translation table, then the locale_code would
> need to be set in the regex definition.
>
> Making any more sense now? If I could code Java, I would help out, but I am
> like an old dog, too old to learn new tricks. :)
>
> Best regards,
> jason
>
>
>
>
>
> On Fri, Mar 26, 2010 at 10:16 AM, Hieu Dang Duy <
> hieu.hispvietnam@xxxxxxxxx> wrote:
>
>> Dear Jason,
>>
>> Actually, your password example it is an interesting point. So, I've
>> thought quite simple but maybe not in fact. Especially with the locale
>> field. So, I understand now that why you called this field is the extra
>> field.
>> The functionality of regex validation which is larger and scale than I
>> thought.
>>
>> I am confusing now. Would u like to give me any suggestion about what
>> should I have set up which key is it ?
>>
>> Thank you !
>>
>>
>> On Fri, Mar 26, 2010 at 4:15 PM, Jason Pickering <
>> jason.p.pickering@xxxxxxxxx> wrote:
>>
>>>
>>>
>>>
>>> And the term* f )* ... How I can forget this one. I am really careless
>>>> person. DHIS2 is a multilingual software. Thank you so much for reminding me
>>>> on this issue.
>>>>
>>>>
>>>>> Yes, I did not consider the translation table. I suppose we would need
>>>>> to specify an additional field for the locale in't  this table. For
>>>>> instance, certain regex may apply for English that do not apply for
>>>>> Vietnamese. A good example would be that of the data format field. Different
>>>>> countries have different ways of specifying the dates, so I guess the second
>>>>> table would
>>>>>
>>>>>
>>>> Yes, of course or absolutely. But may I ask you a sensitive question?
>>>> Why don't you consider to the translation table. As forgot or you really
>>>> don't want to?
>>>>
>>>
>>> OK, here is the issue with the translation table as I see it. Again, this
>>> may not be the same way as with Java. When I look at for instance certain
>>> fields, lets say Organisationunit.shortname, there is no corresponding
>>> translation in the translation table. As far as i can tell, these properties
>>> are not translatable, yet. Additionally, we might want to use regex
>>> validation on other properties that are not translatable, for instance the
>>> user password, for example ^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ to
>>> give a password that must be at least 4 characters, no more than 8
>>> characters, and must include at least one upper case letter, one lower case
>>> letter, and one numeric digit. Now, different implementers might have
>>> different restrictions on the type of the type of password they would
>>> accept. Another example might be the use of regular expressions to validate
>>> identifiers. For instance ^\d{3}-\d{2}-\d{4}$ can be used to validate a
>>> Social Security number used in the US. Other countries have different
>>> identifiers, so there is a need to be able to validate different identifier
>>> types depending on the country.
>>>
>>> So, as for the translation table, I would consider this as a separate
>>> object really to apply a regex to, but I would not consider it to be the
>>> ONLY table that regex validations should apply to.
>>>
>>> So, for the second table, I would see it being persisted like this..
>>>
>>> a) regexid- 1 (foreign key reference to the regex_objects) table
>>> b) object/table-translation
>>> c) property/field-value
>>>
>>> d)  FALSE (In this case, we will not negate the expression as we want all
>>> fields that DO have trailing spaces.
>>>  e)  TRUE (With this regex, it does not matter, but we will default to
>>> true anyway)
>>> f) locale (en_GB) I think this may be necessary to deal with your next
>>> point about translations
>>>
>>> To validate a password, the object would look something like this..
>>>
>>> a) regexid- 1 (foreign key reference to the regex_objects) table
>>> b) object/table-users
>>> c) property/field-password
>>>
>>> d)  FALSE (In this case, we will not negate the expression as we want all
>>> fields that DO have trailing spaces.
>>>  e) FALSE (With this regex, it does, but we will default to true anyway)
>>> f) locale NULL
>>>
>>> Now, the case of the password brings up an interesting point, as it is
>>> persisted in the DB as a hash. Not really sure how to deal with this one....
>>>
>>> Anyway, maybe this is clearer?
>>>
>>> Regards,
>>> Jason
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>>>  I did not really consider the translations so much. I guess there is a
>>>>> need for an additional field. By default, it would be the default locale.
>>>>> The user would then need to construct different regex expressions for
>>>>> different locales, or apply the same regex to several locales.  Maybe this
>>>>> extra property would be enough for this? If it is NULL, then the regex would
>>>>> use the base object (organisationunit.name), but if there is a locale
>>>>> stored in the field, then it would use the
>>>>> translation.organisationunit.name property for that particular locale.
>>>>> Does this make sense?
>>>>>
>>>>>
>>>> After your suggestion on adding new field is locale. So, I thought that
>>>> the translation table's model data which are very very useful for the regex
>>>> table's case. Beside, the locale filed I would like to suggest another field
>>>> is that object's i likes the translation table.
>>>>
>>>> If so, we will have a composite key as same as the key one in
>>>> Translation table.
>>>> Please have a look at the example is below:
>>>>
>>>> *** Example ***
>>>>
>>>> With *regexId *: *01*
>>>>
>>>>
>>>> 01- (001 - DataElement - name - en_GB) - false - true*    ===> SAFED*
>>>>
>>>> 01- (002 - DataElement - shortname - en_GB) - F - T*     ===> SAFED*
>>>>
>>>> 01- (003 - DataElement - name - vi_VN) - F - T*              ===> SAFED
>>>> *
>>>>
>>>> 02 - (001 -DataElement - name - en_GB) - false - true*     ===>
>>>> UN-SAFED ===> UN-INSERTED*
>>>>
>>>> In this example, with the composite key. We do not allow the action of
>>>> inserting/updating an other regex for the same object.
>>>>
>>>>
>>>> How do you think dear Jason?
>>>>
>>>>
>>>>
>>>>> Looking forward to working on this with you.
>>>>>
>>>>
>>>> Thanks again !
>>>>
>>>>
>>>>>
>>>>> Best regards,
>>>>> Jason
>>>>>
>>>>> --
>>>>> --
>>>>> Jason P. Pickering
>>>>> email: jason.p.pickering@xxxxxxxxx
>>>>> tel:+260968395190
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Hieu.HISPVietnam
>>>> Good Health !
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Jason P. Pickering
>>> email: jason.p.pickering@xxxxxxxxx
>>> tel:+260968395190
>>>
>>
>>
>>
>> --
>> Hieu.HISPVietnam
>> Good Health !
>>
>
>
>
> --
> --
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+260968395190
>
>
>
> --
> --
> Jason P. Pickering
> email: jason.p.pickering@xxxxxxxxx
> tel:+260968395190
>



-- 
--
Jason P. Pickering
email: jason.p.pickering@xxxxxxxxx
tel:+260968395190

Follow ups

References