← Back to team overview

maria-developers team mailing list archive

Help with MDEV-4419, ENUM optimization


Hi guys, i want to optimize fields with ENUM() in item comparation
the point is:

SELECT COUNT(*) FROM table WHERE enum_field!='x'
column 'enum_field' is ENUM('x,'y','z') NOT NULL DEFAULT 'x'

i want to rewrite the query in a way that we can execute it with index:
SELECT COUNT(*)FROM table WHERE enum_field IN ('y','z','')
the "" is not allowed values ('a' for example)

the feature is 'easy' to understand but i don't know where to implement
it will read possible values of enum fields, and rewrite
"!=","<","<=",">=",">" operators to IN operator with fields that match
the condition of the primary operator
some think like:

column   "operator"    const value
if(operator in ('!=','<','<=','>=','>' AND column is ENUM){
     get all possible values from column
     copy all values to a return variable
     execute the "operator" in all values with const values
     if operator return false remove the value from the return variable
     change the "operator" and "const value" to :
         "IN" and return variable comma separated
    no optimization leave operation as it is

column = enum('x','y','z'),
operator = ">"
const value = "3"

if(column is ENUM and operator = ">"){
    possible values => "x","y","z",""   (check that "" is used when
insert into, value = "a" for example)
    return variable = "x","y","z", ""
    for each value....
       "x" > "3" ?  yes, ASCII 120 > ASCII 51 -> do nothing
       "y" > "3" ?  yes, ASCII 121 > ASCII 51 -> do nothing
       "z" > "3" ?  yes, ASCII 122 > ASCII 51 -> do nothing
       "" > "3" ?  no, "" < ASCII 51 -> remove the value "" from return variable
    end for
    now the return variable is : "x","y","z"    the "" was removed
    rewrite the operator to:
    "column IN ('x','y','z')"

this will optimize a lot:
using != in a big table (446577 rows)
result = Using where, 446577 rows, type= ALL

using IN() in a big table:
result = Using where, 28 rows, key = index, type = range

in other words, a lot of optimization with ENUM fields, since they are
a pseudo "index" of what we can found in the column =)

thank guys any  help is wellcome to make a patch
Roberto Spadim