dhis2-devs team mailing list archive
-
dhis2-devs team
-
Mailing list archive
-
Message #04288
Re: Can we apply regex string into hypernate mapping ?
>
> I wanna ask to you all like the subject. At the present, I really don't know
> how to do this. But in my opinion, can we use regex if using to combine with
> the FILTER in hibernate mapping?
OK, here is the thing. I have no clue how hibernate works, but if
FILTER is equivalent to a WHERE in SQL, then it may or may not work. I
would assume that it would not work, as regular expressions are not
necessarily standard SQL. They are implemented slightly differently in
each DB system. Now, but the query
SELECT * FROM organisationunit where name = '\s+$'
is not the same as
SELECT * FROM organisationunit where name ~ '\s+$'
In Postgres, the ~ symbol means to evaluate the expression as a
regular expression, and not a a literal value.
>
> If can, I would like to suggest that we should not use all regexes for each
> persistent object (DE, Orgunit, Cate, ect...). Because if you apply all
> regexes into hibernate mapping file you will accidentally make our system to
> become rigid or inflexible (If you want to modify the regex which you want
> so you MUST go to the hibernate file to re-fix and re-build your code) .
> Although, this solution will help the system is faster in processing.
> Btw, if using this way I don't know but I guess that your idea about using
> the regex rule on each persistent object which is not available anymore.
> Thus, I think we should combine them together.
Well, this was the whole reason why I suggested, if possible, to
persist the expressions as objects, and then use a regular expression
evaluator (either in the DB or with java.util.regex) to determine what
the matches are. As you may have seen from Bobs mail, it should be
possible to reuse regular expressions during data import, to ensure
that data matches regular expression restrictions for a given database
field. It is really just a mechanism to attach more data dynamically
to the schema, rather than hard-coding it.
>
> We separate the regex rule to two kind of regex. One is regex default and
> another one is customization (using by the user). Why called the second one
> is customization? Because user can use or not the regex as they want
> anytime.
>
>
This is why I suggested that for the time being, we put it in the data
integrity checks. Administrators could then run the field-level
validation checks on a regular basis and make corrections from there.
>
> Yeas, sure ! we have to show up what happening and which mistake to the user
> when they are importing data into DHIS2.
>
Agreed, but I think this is almost a second step. Ideally, we would
want to intercept "bad" data from entering the system in the first
place, but who know what people will do. Some people (not calling any
names) have inserted several hundreds of thousands of rows of data
with ETL processes. Now, ideally as Murod pointed out, the ETL job
would have regular expression rules built into it. But where would
these rules come from? Well, if they were in a table, it would be
pretty easy to dynamically create a set of rules, for any import
process be it from an external ETL package, or through DXF import.
Again, I think the key question, and this is only something the devs
can answer is
1) Should a recordset be returned to java.util.regex for processing
(if it is even feasible) or
2) Should native DB queries be used? If we are supporting at the
moment MySQL, Postgresql and H2, then implementing regular expression
queries natively should be a piece of cake. However, this restricts
the DBs that this particular function would be able to be applied to.
I think as well to highlight Bobs point in a previous mail, that if
the regexs are persisted, then they could be reused at any point (UI
level validation, import validation, data integrity checks, or even
with external ETL tools). Currently, the data validation rules are
hard-coded (unlike 1.4 where they consist of SQL queries). Ideally, we
should make this dynamic too, to allow admins to apply their own data
integrity checks, but this may deserve a separate thread.
Regards,
Jason
References