← Back to team overview

dhis2-devs team mailing list archive

Need a group for UNIQUE attributes for deduplication

 

With Tracker, there is a high probability of getting duplicates (could be
exact duplicates, or misspellings of name for example).

To deal with this, it would be good to be able to designate SOME of the
attributes of each person (or rather trackedentityinstance) as the ones
really identifying a person or thing, e.g. Firstname, Lastname, Age,
Address. So we need a way to designate a subset of all the attributes as
input for a deduplication process, which could start by just finding exact
matches, and subsequently be refined with introducing different kinds of
fuzzy logic etc.

And then later, we could build a GUI for human review and merger of clear
duplicates (which can also be defined). But I suppose we initially need an
addition to the model. So this is like the UNIQUNESS property, but not for
just ONE attribute, but rather for a group/collection of attributes.

So, it will be similar to a compound key in SQL:
http://en.wikipedia.org/wiki/Compound_key

Knut
-- 
Knut Staring
Dept. of Informatics, University of Oslo
Norway: +4791880522
Skype: knutstar
http://dhis2.org

Follow ups