← Back to team overview

maria-developers team mailing list archive

Updated (by Guest): Add a mysqlbinlog option to filter updates to certain tables (40)

 

-----------------------------------------------------------------------
                              WORKLOG TASK
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
TASK...........: Add a mysqlbinlog option to filter updates to certain tables
CREATION DATE..: Mon, 10 Aug 2009, 13:25
SUPERVISOR.....: Monty
IMPLEMENTOR....: 
COPIES TO......: Psergey
CATEGORY.......: Server-RawIdeaBin
TASK ID........: 40 (http://askmonty.org/worklog/?tid=40)
VERSION........: Server-9.x
STATUS.........: Un-Assigned
PRIORITY.......: 60
WORKED HOURS...: 0
ESTIMATE.......: 0 (hours remain)
ORIG. ESTIMATE.: 0

PROGRESS NOTES:

-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High Level Description modified.
--- /tmp/wklog.40.old.16985     2009-08-10 14:51:59.000000000 +0300
+++ /tmp/wklog.40.new.16985     2009-08-10 14:51:59.000000000 +0300
@@ -1,3 +1,4 @@
 Replication slave can be set to filter updates to certain tables with
---replicate-[wild-]{do,ignore}-table options. This task is about adding similar
-functionality to mysqlbinlog.
+--replicate-[wild-]{do,ignore}-table options.
+
+This task is about adding similar functionality to mysqlbinlog.

-=-=(Guest - Mon, 10 Aug 2009, 14:51)=-=-
High-Level Specification modified.
--- /tmp/wklog.40.old.16949     2009-08-10 14:51:33.000000000 +0300
+++ /tmp/wklog.40.new.16949     2009-08-10 14:51:33.000000000 +0300
@@ -1 +1,73 @@
 
+1. Context
+----------
+At the moment, the server has these replication slave options:
+
+  --replicate-do-table=db.tbl
+  --replicate-ignore-table=db.tbl
+  --replicate-wild-do-table=pattern.pattern
+  --replicate-wild-ignore-table=pattern.pattern
+
+They affect both RBR and SBR events. SBR events are checked after the
+statement has been parsed, the server iterates over list of used tables and
+checks them againist --replicate instructions.
+
+What is interesting is that this scheme still allows to update the ignored
+table through a VIEW.
+
+2. Table filtering in mysqlbinlog
+---------------------------------
+
+Per-table filtering of RBR events is easy (as it is relatively easy to extract
+the name of the table that the event applies to).
+
+Per-table filtering of SBR events is hard, as generally it is not apparent
+which tables the statement refers to.
+
+This opens possible options:
+
+2.1 Put the parser into mysqlbinlog
+-----------------------------------
+Once we have a full parser in mysqlbinlog, we'll be able to check which tables
+are used by a statement, and will allow to show behaviour identical to those
+that one obtains when using  --replicate-* slave options.
+
+(It is not clear how much effort is needed to put the parser into mysqlbinlog. 
+Any guesses?)
+
+
+2.2 Use dumb regexp match
+-------------------------
+Use a really dumb approach. A query is considered to be modifying table X if
+it matches an expression
+
+CREATE TABLE $tablename
+DROP $tablename
+UPDATE ...$tablename ... SET        // here '...' can't contain the word 'SET'
+DELETE ...$tablename ... WHERE      // same as above
+ALTER TABLE $tablename
+.. etc (go get from the grammar) ..
+
+The advantage over doing the same in awk is that mysqlbinlog will also process
+RBR statements, and together with that will provide a working solution for
+those who are careful with their table names not mixing with string constants
+and such.
+
+(TODO: string constants are of particular concern as they come from
+[potentially hostile] users, unlike e.g. table aliases which come from 
+[not hostile] developers. Remove also all string constants before attempting
+to do match?)
+
+2.3 Have the master put annotations
+-----------------------------------
+We could add a master option so that it injects into query a mark that tells
+which tables the query will affect, e.g. for the query
+
+  UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...
+
+
+the binlog will have
+
+  /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... 
+
+and further processing in mysqlbinlog will be trivial.



DESCRIPTION:

Replication slave can be set to filter updates to certain tables with
--replicate-[wild-]{do,ignore}-table options.

This task is about adding similar functionality to mysqlbinlog.


HIGH-LEVEL SPECIFICATION:



1. Context
----------
At the moment, the server has these replication slave options:

  --replicate-do-table=db.tbl
  --replicate-ignore-table=db.tbl
  --replicate-wild-do-table=pattern.pattern
  --replicate-wild-ignore-table=pattern.pattern

They affect both RBR and SBR events. SBR events are checked after the
statement has been parsed, the server iterates over list of used tables and
checks them againist --replicate instructions.

What is interesting is that this scheme still allows to update the ignored
table through a VIEW.

2. Table filtering in mysqlbinlog
---------------------------------

Per-table filtering of RBR events is easy (as it is relatively easy to extract
the name of the table that the event applies to).

Per-table filtering of SBR events is hard, as generally it is not apparent
which tables the statement refers to.

This opens possible options:

2.1 Put the parser into mysqlbinlog
-----------------------------------
Once we have a full parser in mysqlbinlog, we'll be able to check which tables
are used by a statement, and will allow to show behaviour identical to those
that one obtains when using  --replicate-* slave options.

(It is not clear how much effort is needed to put the parser into mysqlbinlog. 
Any guesses?)


2.2 Use dumb regexp match
-------------------------
Use a really dumb approach. A query is considered to be modifying table X if
it matches an expression

CREATE TABLE $tablename
DROP $tablename
UPDATE ...$tablename ... SET        // here '...' can't contain the word 'SET'
DELETE ...$tablename ... WHERE      // same as above
ALTER TABLE $tablename
.. etc (go get from the grammar) ..

The advantage over doing the same in awk is that mysqlbinlog will also process
RBR statements, and together with that will provide a working solution for
those who are careful with their table names not mixing with string constants
and such.

(TODO: string constants are of particular concern as they come from
[potentially hostile] users, unlike e.g. table aliases which come from 
[not hostile] developers. Remove also all string constants before attempting
to do match?)

2.3 Have the master put annotations
-----------------------------------
We could add a master option so that it injects into query a mark that tells
which tables the query will affect, e.g. for the query

  UPDATE t1 LEFT JOIN db3.t2 ON ... WHERE ...


the binlog will have

  /* !mysqlbinlog: updates t1,db3.t2 */ UPDATE t1 LEFT JOIN ... 

and further processing in mysqlbinlog will be trivial.


ESTIMATED WORK TIME

ESTIMATED COMPLETION DATE
-----------------------------------------------------------------------
WorkLog (v3.5.9)