maria-developers team mailing list archive

Thread
Date

Re: Review of MDEV-7112 Split HA_CREATE_INFO

To: Sergei Golubchik <serg@xxxxxxxxxxx>
From: Alexander Barkov <bar@xxxxxxxxxxx>
Date: Wed, 03 Dec 2014 17:31:45 +0400
Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
In-reply-to: <20141203120847.GA25167@meddwl.fritz.box>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

Hi Sergei,


On 12/03/2014 04:08 PM, Sergei Golubchik wrote:

Hi, Alexander!

On Dec 03, Alexander Barkov wrote:

On 12/03/2014 01:13 AM, Sergei Golubchik wrote:

Hi.

It think the split is good, I liked it.

I didn't like that you've tried to put the refactoring of sql_db.cc and
sql_lang.h in the same commit, please move it to a separate patch.
MDEV-5359 should not depend on that.


When I had to change the interface for mysql_create_db(),
mysql_rm_db(), mysql_alter_db(), I thought why not to do it in the
right way at once... And it was very easy to do in case of the db
routines.

Anyway, I agree to make a separate task for this,
It's easier to review this way :)


Thanks!

Anyway, see all my comments below:

diff --git a/sql/structs.h b/sql/structs.h
index 99561c5..b38685c 100644
--- a/sql/structs.h
+++ b/sql/structs.h

<skip>

+  DDL_options_st operator|=(DDL_options_st::Options other)
+  {
+    add(other);
+    return *this;
+  }
+};


I don't particularly like this style of C++ing...
It's way too verbose.
but ok


This is to make the code on sql_yacc.yy shorter:

    lex->set_command_with_check(SQLCOM_CREATE_TABLE, $2, $1 | $4)

I find "$1 | $4" very clearly readable, and short.


Yes, I didn't mean operator| in particular, but the whole class with
accessors for every single bit, etc. C++ is very verbose language and
the more code you write, more code someone will have to read. So I
generally like it when refactoring makes the codebase smaller, not
larger.


Let me try to advertise a different approach :)
I like more encapsulation. I'd encapsulate every single member
in every single class, really.


- It allows stricter control on what the callers can do

- It makes the code shorter for the callers

- It's easier to modify the code in the future:

  you can move the members across structs/classes, change inheritance,
  make methods virtual, move methods between private, protected and
  public sections.

  The caller side code still stays exactly the same!
  In case of direct member access, any of the above operation
  will most likely need changes on the caller side.
  In many cases, sooner or later you have to wrap members into
  methods anyway.

- It's easier to set breakpoints having methods.

- For exotic cases, polymorphic classes can share code using templates.
  For example, Field and Item have some duplicate code.
  Had they a set of similar properties wrapped into methods, they could
  share some code using templates.
  But they are different: Field->decimals() vs Item->dec, etc.


The price is a few lines of code in the struct/class declaration,
which make the code grow a little bit in the beginning.
But in the long terms this can even reduce the amount of code.

diff --git a/sql/sql_show.cc b/sql/sql_show.cc
index 0541cbc..2536255 100644
--- a/sql/sql_show.cc
+++ b/sql/sql_show.cc
@@ -1696,14 +1695,14 @@ int show_create_table(THD *thd, TABLE_LIST *table_list, String *packet,

     packet->append(STRING_WITH_LEN("CREATE "));
     if (create_info_arg &&
-      (create_info_arg->org_options & HA_LEX_CREATE_REPLACE ||
+      ((create_info_arg->or_replace() &&
+        !create_info_arg->or_replace_slave_generated()) ||


that's a bit strange. Could you explain this?
why org_options & HA_LEX_CREATE_REPLACE is translated to
or_replace() && !or_replace_slave_generated()


Notice, I removed "org_options".

Now everything is passed through a single set of flags:
the DDL_options_st part of *create_info_arg.

And the related flag is set in sql_parse.cc, here:

    create_info.add(DDL_options_st::OPT_OR_REPLACE_SLAVE_GENERATED);

I find this logic to be much easier to understand.


Yes, of course, I've noticed that.
Still, old condition was

    org_options & HA_LEX_CREATE_REPLACE

and not

   org_options & HA_LEX_CREATE_REPLACE && !(options & HA_LEX_CREATE_REPLACE)

So, why your new condition is

   or_replace() && !or_replace_slave_generated()


On slave "CREATE TABLE" is translated to "CREATE TABLE IF NOT EXISTS".
But the slave itself can do binary logging again.
It writes "IF NOT EXISTS" only if it was in the original query.


mysql_create_like_table() calls show_create_table() for binlogging
purposes.

mysql_create_like_table() needs or_replace() not to fail on error.
show_create_table() needs to write "OR REPLACE" into the binlog
only if "OR REPLACE" was in the original query.


In the old code the caller made sure to set create_info.org_options
and create_info.options properly, to make this logic work.

Now there is no org_options any more, so everything is done in
the single set of options (the DDL_options_st part of create_info).
is_replace() makes mysql_create_line_table() not to fail.

is_replace_slave_generated() prevents adding of "IF NOT EXISTS"
in show_create_table() in the cases when is_replace() was set on the
slave side, not in the original query.

diff --git a/sql/sql_table.cc b/sql/sql_table.cc
index b991215..7bb293b 100644
--- a/sql/sql_table.cc
+++ b/sql/sql_table.cc
@@ -4628,7 +4628,8 @@ int create_table_impl(THD *thd,
                          const char *orig_db, const char *orig_table_name,
                          const char *db, const char *table_name,
                          const char *path,
-                       HA_CREATE_INFO *create_info,
+                       const DDL_options_st options,
+                       HA_CREATE_INFO &create_info,


why? I generally prefer pointers over C++ references -
one can immediately see what variables can be modified by the function.
is there a compelling reason to use references here?


Well, I had a hope I'd manage to do it a const:

    const HA_CREATE_INFO &create_info

And in case of "const", I really prefer "const XXX &name",
instead of "const XXX *name". It gives a lot of advantages.


Okay, with 'const' I could agree, right.

diff --git a/storage/xtradb/handler/ha_innodb.cc b/storage/xtradb/handler/ha_innodb.cc
index f3b1167..6b1942e 100644
--- a/storage/xtradb/handler/ha_innodb.cc
+++ b/storage/xtradb/handler/ha_innodb.cc
@@ -11349,7 +11349,7 @@ static __attribute__((nonnull, warn_unused_result))

   	/* Do not use DATA DIRECTORY with TEMPORARY TABLE. */
   	if (create_info->data_file_name
-	    && create_info->options & HA_LEX_CREATE_TMP_TABLE) {
+	    && create_info->tmp_table()) {


nope, you should not change storage engine API here.
tmp_table() was a convenience wrapper, old way should still work


I haven't changed the API at this point, so the old way still works.
But I don't want any newly created engines to copy the old way.


Then only change engines that we develop (MyISAM, Aria, etc) and engines
where MariaDB repositories are kind of the authoritative source of the
engine source code (Connect, OQGraph). But don't change engines that are
developed elsewhere and then merged into MariaDB - those engines we
should try to keep as close to the upstream as possible and only change
when absolutely necessary.

There is a chance that we'll move the table scope part into a separate
structure, and Table_scope_and_contents_source_st will then inherit it.
Now table scope is only responsible for TEMPORARY or not TEMPORARY,
but in the SQL standard it's something more complex:

So if we ever decide to support more table scope options,
the HA_LEX_CREATE_TMP_TABLE flag most likely won't be stored in
create_info->options any more.


Yes, and then it will be absolutely necessary to change all engines,
because they'll stop working otherwise.


I see, this is for easier merge purposes.
Let's keep InnoDB/XTraDB use direct member access.

diff --git a/sql/handler.h b/sql/handler.h
index 0044556..002a22e 100644
--- a/sql/handler.h
+++ b/sql/handler.h
@@ -1571,9 +1569,41 @@ enum enum_stats_auto_recalc { HA_STATS_AUTO_RECALC_DEFAULT= 0,
                                HA_STATS_AUTO_RECALC_ON,
                                HA_STATS_AUTO_RECALC_OFF };

-struct HA_CREATE_INFO
+/**
+  A helper struct for schema DDL statements:
+    CREATE SCHEMA [IF NOT EXISTS] name [ schema_specification... ]
+    ALTER SCHEMA name [ schema_specification... ]
+
+  It stores the "schema_specification" part of the CREATE/ALTER statements and
+  is passed to mysql_create_db() and  mysql_alter_db().
+  Currently consists only of the schema default character set and collation.


by the way, have you seen
http://www.tocker.ca/2014/12/02/proposal-to-deprecate-collation_database-and-character_set_database-settings.html
?


Yes, Sanja sent me this link yesterday.

I think this is going to be a good change.

From what I remember, changing character_set_database manually
was originally (in 4.1?) the recommended way to do LOAD DATA INFILE and
SELECT  INTO OUTFILE using an alternative character set.


Later, these statements were extended to have their own
CHARACTER SET clauses for this purposes.
So it should be fine to make character_set_database read-only.


Regards,
Sergei

Follow ups

Re: Review of MDEV-7112 Split HA_CREATE_INFO
From: Sergei Golubchik, 2014-12-04

References

Review of MDEV-7112 Split HA_CREATE_INFO
From: Sergei Golubchik, 2014-12-02
Re: Review of MDEV-7112 Split HA_CREATE_INFO
From: Alexander Barkov, 2014-12-03
Re: Review of MDEV-7112 Split HA_CREATE_INFO
From: Sergei Golubchik, 2014-12-03