← Back to team overview

maria-developers team mailing list archive

Re: 9058071: MDEV-9362: InnoDB tables using DATA_DIRECTORY created using MySQL 5.6 do not work with MariaDB 10.1

 

Hi Sergei,

Firstly, thank you for your review. I agree that the proposed patch is huge
and intrusive. Let my try to
reason why we should apply it to 10.1:

Problem: Oracle MySQL 5.6 database is not binary compatible to MariaDB 10.1
IF AND ONLY IF DATA_DIRECTORY is used or UNIV_PAGE_SIZE != 16KB

Why:

(1) Table flags (InnoDB dictionary SYS_TABLES.TYPE) (include/dict0mem.h)

There is function to verify table flags on dictionary against tablespace
flags, see below.

(2) Tablespace flags (page 0) (include/fsp0fsp.h)

Oracle MySQL 5.6
#define FSP_FLAGS_POS_PAGE_SSIZE (FSP_FLAGS_POS_ATOMIC_BLOBS \
+ FSP_FLAGS_WIDTH_ATOMIC_BLOBS)
#define FSP_FLAGS_POS_DATA_DIR (FSP_FLAGS_POS_PAGE_SSIZE \
+ FSP_FLAGS_WIDTH_PAGE_SSIZE)
+
MariaDB 10.1:
#define FSP_FLAGS_POS_PAGE_SSIZE (FSP_FLAGS_POS_ATOMIC_WRITES \
+ FSP_FLAGS_WIDTH_ATOMIC_WRITES)
#define FSP_FLAGS_POS_DATA_DIR (FSP_FLAGS_POS_PAGE_SSIZE \
+ FSP_FLAGS_WIDTH_PAGE_SSIZE)

Here FSP_FLAGS_POS_ATOMIC_WRITES > FSP_FLAGS_POS_ATOMIC_BLOBS

==> UNIV_PAGE_SIZE != 16K used to create tablespace is on different
position on bitmap
==> DATA_DIR flag on different position on bitmap

Less intrusive fix is to detect if tablespace flags are MariaDB flags and
fix them to be same
as Oracle flags by moving MariaDB extensions to end of bitmap.  This is
just delaying the
actual fix.

Actual bug on MDEV-9362 can be fixed by not touching these bitmaps at all.
Note that bug
on using UNIV_PAGE_SIZE!= 16KB requires touching these bitmaps.

So alternatives are:
(1) Proposed intrusive fix (with some modifications based on review)
(2) Move MariaDB flags to end of bitmap
(3) Just fix reported problem on MDEV-9362

Opinions?
R: Jan

On Thu, Apr 28, 2016 at 11:46 PM, Sergei Golubchik <serg@xxxxxxxxxxx> wrote:

> Hi, Jan!
>
> I have various comments, see below. Nothing big, nothing that's really
> the approach you've taken. In fact, I like where it's going.
>
> But this is a huge and intrusive patch. I wonder, should we rather do it
> in 10.2? See below, you yourself write:
>
> On Feb 25, Jan Lindström wrote:
> >
> > Caution: Please take backups of your database before migrating 10.1.12.
>
> This is not what users normally do when upgrading from a GA version to
> the next minor GA version.
>
> Anyway, see comments about the patch below:
>
> > diff --git a/mysql-test/suite/encryption/r/innodb-bad-key-change.result
> b/mysql-test/suite/encryption/r/innodb-bad-key-change.result
> > index cf97918..39e08f4 100644
> > --- a/mysql-test/suite/encryption/r/innodb-bad-key-change.result
> > +++ b/mysql-test/suite/encryption/r/innodb-bad-key-change.result
> > @@ -36,8 +36,7 @@ SELECT * FROM t1;
> >  ERROR HY000: Got error 192 'Table encrypted but decryption failed. This
> could be because correct encryption management plugin is not loaded, used
> encryption key is not available or encryption method does not match.' from
> InnoDB
> >  SHOW WARNINGS;
> >  Level        Code    Message
> > -Warning      1812    Tablespace is missing for table 'test/t1'
> > -Warning      192     Table test/t1 is encrypted but encryption service
> or used key_id 2 is not available.  Can't continue reading table.
> > +Warning      192     Table test/t1 is encrypted but encryption service
> or used key_id is not available.  Can't continue reading table.
>
> why did that happen?
>
> >  Error        1296    Got error 192 'Table encrypted but decryption
> failed. This could be because correct encryption management plugin is not
> loaded, used encryption key is not available or encryption method does not
> match.' from InnoDB
> >  DROP TABLE t1;
> >  # Start server with keys.txt
> > diff --git a/storage/innobase/include/dict0tableoptions.h
> b/storage/innobase/include/dict0tableoptions.h
> > new file mode 100644
> > index 0000000..def0e39
> > --- /dev/null
> > +++ b/storage/innobase/include/dict0tableoptions.h
> > @@ -0,0 +1,127 @@
> >
> +/*****************************************************************************
> > +
> > +Copyright (c) 2016, MariaDB Corporation.
> > +
> > +This program is free software; you can redistribute it and/or modify it
> under
> > +the terms of the GNU General Public License as published by the Free
> Software
> > +Foundation; version 2 of the License.
> > +
> > +This program is distributed in the hope that it will be useful, but
> WITHOUT
> > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> FITNESS
> > +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
> details.
> > +
> > +You should have received a copy of the GNU General Public License along
> with
> > +this program; if not, write to the Free Software Foundation, Inc.,
> > +51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
> > +
> >
> +*****************************************************************************/
> > +
> > +/**************************************************//**
> > +@file include/dict0tableoptions.h
> > +Definitions for the system table SYS_TABLE_OPTIONS
> > +
> > +Created 22/01/2006 Jan Lindström
> > +*******************************************************/
> > +
> > +#ifndef dict0tableoptions_h
> > +#define dict0tableoptions_h
> > +
> > +#include "univ.i"
> > +
> > +/** Data structure to hold contents of SYS_TABLE_OPTIONS */
>
> This is a move in the interesting direction. One of the ideas I was toying
> with
> for a few years is to remove .frm files for InnoDB tables.
>
> In MariaDB a storage engine can live without frm files if the engine
> supports
> table discovery. This is rather easy to implement, but InnoDB does not
> store
> the complete table definition, so the discovery will be lossy.
>
> Now with your SYS_TABLE_OPTIONS table we're getting closer to having the
> complete
> table definition saved inside InnoDB. And one step closer to frm-less
> InnoDB tables.
>
> > +struct dict_tableoptions_t{
> > +     table_id_t              table_id;
>
> why do you need table_id in the option structure?
>
> > +     /* true if table page compressed */
> > +     bool                    page_compressed;
> > +     /* Used compression level if set */
> > +     ulonglong               page_compression_level;
> > +     /* dict0types.h: ATOMIC_WRITES_DEFAULT, _ON, or _OFF */
> > +     atomic_writes_t         atomic_writes;
> > +     /* fil0crypt.h: FIL_SPACE_ENCRYPTION_DEFAULT, _ON, or _OFF */
> > +     fil_encryption_t        encryption;
> > +     /* Used encryption key identifier if set */
> > +     ulonglong               encryption_key_id;
> > +     /* true if table options need to be stored */
>
> why wouldn't they need to be?
>
> > +     bool                    need_stored;
> > +};
> > +
> >
> +/********************************************************************//**
> > +This function parses a SYS_TABLE_OPTIONS record, extracts necessary
> > +information from the record and returns it to the caller.
> > +@return error message or NULL if successfull */
> > +UNIV_INTERN
> > +const char*
> > +dict_process_sys_tableoptions(
> > +/*==========================*/
> > +     mem_heap_t*     heap,           /*!< in/out: heap memory */
> > +     const rec_t*    rec,            /*!< in: current SYS_TABLE_OPTIONS
> rec */
> > +     dict_tableoptions_t* table_options); /*!< out: table options */
> > +
> >
> +/********************************************************************//**
> > +Gets the table options from SYS_TABLE_OPTIONS based on table_id
> > +@return true if found, false if not found */
> > +UNIV_INTERN
> > +bool
> > +dict_get_table_options(
> > +/*===================*/
> > +     table_id_t              table_id,       /*!< in: table id */
> > +     dict_tableoptions_t*    options,        /*!< out:table options */
> > +     bool                    fixed);         /*!< in: can we fix the
> > +                                             dictionary ? */
> > +
> >
> +/********************************************************************//**
> > +Gets the table options from SYS_TABLE_OPTIONS
> > +@return true if found, false if not found */
> > +UNIV_INTERN
> > +bool
> > +dict_get_table_options(
> > +/*===================*/
> > +     dict_table_t*   table,  /*!< in/out: table */
> > +     bool            fixed); /*!< in: can we fix the
> > +                             dictionary ? */
> > +
> >
> +/********************************************************************//**
> > +Update the record in SYS_TABLE_OPTIONS.
> > +@return      DB_SUCCESS if OK, dberr_t if the update failed */
> > +UNIV_INTERN
> > +dberr_t
> > +dict_update_tableoptions(
> > +/*=====================*/
> > +     const dict_table_t* table);     /*!< in: table object */
> > +
> >
> +/********************************************************************//**
> > +Insert record into SYS_TABLE_OPTIONS
> > +@return      DB_SUCCESS if OK, dberr_t if the insert failed */
> > +UNIV_INTERN
> > +dberr_t
> > +dict_insert_tableoptions(
> > +/*=====================*/
> > +     const dict_table_t*     table,  /*!< in: table object */
> > +     bool                    fixed); /*!< in: can we fix the
> > +                                     dictionary ? */
> > +
> >
> +/********************************************************************//**
> > +Update the table flags in SYS_TABLES.
> > +@return      DB_SUCCESS if OK, dberr_t if the update failed */
> > +UNIV_INTERN
> > +dberr_t
> > +dict_update_table_flags(
> > +/*=====================*/
> > +     const dict_table_t*     table,  /*!< in: table object */
> > +     bool                    fixed); /*!< in: can we fix the
> > +                                     dictionary ? */
> > +
> >
> +/********************************************************************//**
> > +Delete the record in SYS_TABLE_OPTIONS.
> > +@return      DB_SUCCESS if OK, dberr_t if the update failed */
> > +UNIV_INTERN
> > +dberr_t
> > +dict_delete_tableoptions(
> > +/*=====================*/
> > +     const dict_table_t*     table,  /*!< in: table object */
> > +     trx_t*                  trx,    /*!< in: trx */
> > +     bool                    fixed); /*!< in: can we fix the
> > +                                     dictionary ? */
> > +
> > +#endif /* dict0tableoptions_h */
> > +
> > diff --git a/storage/innobase/dict/dict0tableoptions.cc
> b/storage/innobase/dict/dict0tableoptions.cc
> > new file mode 100644
> > index 0000000..527c5e3
> > --- /dev/null
> > +++ b/storage/innobase/dict/dict0tableoptions.cc
> > @@ -0,0 +1,482 @@
> >
> +/*****************************************************************************
> > +
> > +Copyright (c) 2016, MariaDB Corporation.
> > +
> > +This program is free software; you can redistribute it and/or modify it
> under
> > +the terms of the GNU General Public License as published by the Free
> Software
> > +Foundation; version 2 of the License.
> > +
> > +This program is distributed in the hope that it will be useful, but
> WITHOUT
> > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> FITNESS
> > +FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
> details.
> > +
> > +You should have received a copy of the GNU General Public License along
> with
> > +this program; if not, write to the Free Software Foundation, Inc.,
> > +51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
> > +
> >
> +*****************************************************************************/
> > +
> > +/**************************************************//**
> > +@file dict/dict0tableoptions.cc
> > +Function implementations for the system table SYS_TABLE_OPTIONS
> > +
> > +Created 22/01/2006 Jan Lindström
> > +*******************************************************/
> > +
> > +#include "mysql_version.h"
> > +#include "btr0pcur.h"
> > +#include "btr0btr.h"
> > +#include "page0page.h"
> > +#include "mach0data.h"
> > +#include "dict0dict.h"
> > +#include "dict0boot.h"
> > +#include "dict0stats.h"
> > +#include "dict0mem.h"
> > +#include "rem0cmp.h"
> > +#include "srv0start.h"
> > +#include "srv0srv.h"
> > +#include "dict0crea.h"
> > +#include "dict0priv.h"
> > +#include "ha_prototypes.h" /* innobase_casedn_str() */
> > +#include "fts0priv.h"
> > +#include "dict0tableoptions.h"
> > +#include "dict0load.h"
> > +#include "row0mysql.h"
> > +
> >
> +/********************************************************************//**
> > +This function parses a SYS_TABLE_OPTIONS record, extracts necessary
> > +information from the record and returns it to the caller.
> > +@return error message or NULL if successfull */
> > +UNIV_INTERN
> > +const char*
> > +dict_process_sys_tableoptions(
> > +/*==========================*/
> > +     mem_heap_t*     heap,           /*!< in/out: heap memory */
> > +     const rec_t*    rec,            /*!< in: current SYS_TABLE_OPTIONS
> rec */
> > +     dict_tableoptions_t* table_options) /*!< out: table options */
> > +{
> > +     const byte*     field;
> > +     ulint           len=0;
> > +
> > +     if (rec_get_deleted_flag(rec, 0)) {
> > +             return("delete-marked record in SYS_TABLE_OPTIONS");
> > +     }
> > +
> > +     if (rec_get_n_fields_old(rec) !=
> DICT_NUM_FIELDS__SYS_TABLEOPTIONS) {
> > +             return("wrong number of columns in SYS_TABLE_OPTIONS
> record");
> > +     }
>
> How does it work future-wise? How do you add new options to the table,
> say, in 10.2?
>
> > diff --git a/storage/innobase/api/api0api.cc
> b/storage/innobase/api/api0api.cc
> > index 739ea9f..4af32a5 100644
> > --- a/storage/innobase/api/api0api.cc
> > +++ b/storage/innobase/api/api0api.cc
> > @@ -270,7 +271,7 @@ ib_open_table_by_name(
> >       dict_table_t*   table;
> >
> >       table = dict_table_open_on_name(name, FALSE, FALSE,
> > -                                     DICT_ERR_IGNORE_NONE);
> > +                                     DICT_ERR_IGNORE_NONE, NULL);
>
> What does that mean that you pass NULL as the last argument?
> You won't know whether the table is encrypted or compressed - so you,
> basically won't be able to read the table?
>
> >
> >       if (table != NULL && table->ibd_file_missing) {
> >               table = NULL;
> > diff --git a/storage/innobase/buf/buf0buf.cc
> b/storage/innobase/buf/buf0buf.cc
> > index f4e7c0d..f1953af 100644
> > --- a/storage/innobase/buf/buf0buf.cc
> > +++ b/storage/innobase/buf/buf0buf.cc
> > @@ -4677,7 +4678,7 @@ buf_page_io_complete(
> >                                               "However key management
> plugin or used key_id %lu is not found or"
> >                                               " used encryption
> algorithm or method does not match."
> >                                               " Can't continue opening
> the table.",
> > -                                             bpage->key_version);
> > +                                             bpage->space,
> bpage->key_version);
>
> This seems to be an unrelated bugfix. And it seems to be already fixed
> in the latest 10.1
>
> >
> >                                       if (bpage->space > TRX_SYS_SPACE) {
> >                                               if (corrupted) {
> > diff --git a/storage/innobase/dict/dict0dict.cc
> b/storage/innobase/dict/dict0dict.cc
> > index 2b72835..13f1ee6 100644
> > --- a/storage/innobase/dict/dict0dict.cc
> > +++ b/storage/innobase/dict/dict0dict.cc
> > @@ -1187,6 +1189,16 @@ dict_table_open_on_name(
> >
> >               /* If table is encrypted return table */
> >               if (ignore_err == DICT_ERR_IGNORE_NONE
> > +                     && !table->is_encrypted && options) {
>
> hmm, where else table->is_encrypted can be set?
>
> > +                     if ((options->encryption ==
> FIL_SPACE_ENCRYPTION_ON ||
> > +                         (options->encryption ==
> FIL_SPACE_ENCRYPTION_DEFAULT && srv_encrypt_tables))
> > +                             && !encryption_key_id_exists((unsigned
> int)options->encryption_key_id)) {
> > +                             table->is_encrypted = true;
> > +                     }
> > +             }
> > +
> > +             /* If table is encrypted return table */
> > +             if (ignore_err == DICT_ERR_IGNORE_NONE
> >                       && table->is_encrypted) {
> >                       /* Make life easy for drop table. */
> >                       if (table->can_be_evicted) {
> > diff --git a/storage/innobase/dict/dict0load.cc
> b/storage/innobase/dict/dict0load.cc
> > index d8bd0a6..2b732f1 100644
> > --- a/storage/innobase/dict/dict0load.cc
> > +++ b/storage/innobase/dict/dict0load.cc
> > @@ -56,7 +64,8 @@ static const char* SYSTEM_TABLE_NAME[] = {
> >       "SYS_FOREIGN",
> >       "SYS_FOREIGN_COLS",
> >       "SYS_TABLESPACES",
> > -     "SYS_DATAFILES"
> > +     "SYS_DATAFILES",
> > +     "SYS_TABLE_OPTIONS"
>
> May be "SYS_MARIADB_OPTIONS" ? Not really InnoDB sys table style,
> I agree, but guarantees no conflicts in the future.
>
> >  };
> >
> >  /* If this flag is TRUE, then we will load the cluster index's (and
> tables')
> > @@ -2365,6 +2386,11 @@ dict_load_table(
> >       btr_pcur_close(&pcur);
> >       mtr_commit(&mtr);
> >
> > +     if (table && options) {
>
> no need to do if (table), because table is guaranteed to be not NULL here
>
> > +             ut_ad(table->table_options);
> > +             memcpy(table->table_options, options,
> sizeof(dict_tableoptions_t));
> > +     }
> > +
> >       if (table->space == 0) {
> >               /* The system tablespace is always available. */
> >       } else if (table->flags2 & DICT_TF2_DISCARDED) {
> > diff --git a/storage/innobase/dict/dict0mem.cc
> b/storage/innobase/dict/dict0mem.cc
> > index 1724ac0..39d0d55 100644
> > --- a/storage/innobase/dict/dict0mem.cc
> > +++ b/storage/innobase/dict/dict0mem.cc
> > @@ -85,7 +86,6 @@ dict_mem_table_create(
> >       mem_heap_t*     heap;
> >
> >       ut_ad(name);
> > -     ut_a(dict_tf_is_valid(flags));
>
> why?
>
> >       ut_a(!(flags2 & ~DICT_TF2_BIT_MASK));
> >
> >       heap = mem_heap_create(DICT_HEAP_SIZE);
> > diff --git a/storage/innobase/fil/fil0fil.cc
> b/storage/innobase/fil/fil0fil.cc
> > index 928c72c..4f75465 100644
> > --- a/storage/innobase/fil/fil0fil.cc
> > +++ b/storage/innobase/fil/fil0fil.cc
> > @@ -716,27 +717,6 @@ fil_node_open_file(
> >                          ut_error;
> >                  }
> >
> > -                if (UNIV_UNLIKELY(space->flags != flags)) {
> > -                        fprintf(stderr,
> > -                                "InnoDB: Error: table flags are 0x%lx"
> > -                                " in the data dictionary\n"
> > -                                "InnoDB: but the flags in file %s are
> 0x%lx!\n",
> > -                                space->flags, node->name, flags);
>
> do you still the check somewhere
> where SYS_TABLE_OPTIONS is compared with frm?
>
> > -
> > -                        ut_error;
> > -                }
> > -
> > -                if (UNIV_UNLIKELY(space->flags != flags)) {
> > -                        if (!dict_tf_verify_flags(space->flags, flags))
> {
> > -                                fprintf(stderr,
> > -                                        "InnoDB: Error: table flags are
> 0x%lx"
> > -                                        " in the data dictionary\n"
> > -                                        "InnoDB: but the flags in file
> %s are 0x%lx!\n",
> > -                                        space->flags, node->name,
> flags);
> > -                                ut_error;
> > -                        }
> > -                }
> > -
> >                  if (size_bytes >= FSP_EXTENT_SIZE * UNIV_PAGE_SIZE) {
> >                          /* Truncate the size to whole extent size. */
> >                          size_bytes = ut_2pow_round(size_bytes,
> > @@ -1917,11 +1909,6 @@ fil_check_first_page(
> >          flags = mach_read_from_4(FSP_HEADER_OFFSET + FSP_SPACE_FLAGS +
> page);
> >
> >          if (UNIV_PAGE_SIZE != fsp_flags_get_page_size(flags)) {
> > -                fprintf(stderr,
> > -                        "InnoDB: Error: Current page size %lu != "
> > -                        " page size on page %lu\n",
> > -                        UNIV_PAGE_SIZE, fsp_flags_get_page_size(flags));
> > -
>
> why?
>
> >                  return("innodb-page-size mismatch");
> >          }
> >
> > @@ -3379,8 +3362,7 @@ fil_create_new_single_table_tablespace(
> >          ulint           size,           /*!< in: the initial size of the
> >                                          tablespace file in pages,
> >                                          must be >=
> FIL_IBD_FILE_INITIAL_SIZE */
> > -        fil_encryption_t mode,  /*!< in: encryption mode */
> > -        ulint           key_id) /*!< in: encryption key_id */
> > +        const dict_table_t* table)      /*!< in: table or NULL */
>
> what does NULL mean here?
>
> >  {
> >          os_file_t       file;
> >          ibool           ret;
> > @@ -3640,6 +3627,140 @@ fil_report_bad_tablespace(
> >                  (ulong) expected_id, (ulong) expected_flags);
> >  }
> >
> > +/******************************************************************
> > +Set flags for a tablespace */
> > +static
> > +void
> > +fil_space_set_fsp_flags(
> > +/*=====================*/
> > +        ulint   id,     /*!< in: space id */
> > +        uint    flags)  /*!< in: fsp flags */
> > +{
> > +        fil_space_t*    space;
> > +
> > +        ut_ad(fil_system);
> > +
> > +        mutex_enter(&fil_system->mutex);
> > +
> > +        space = fil_space_get_by_id(id);
> > +
> > +        if (space != NULL) {
> > +                space->flags = flags;
> > +        }
> > +
> > +        mutex_exit(&fil_system->mutex);
> > +}
> > +
> > +/******************************************************************
> > +Update tablespace (fsp) flags on page 0
>
> a bit more verbose comment would've been be nice.
> like, say that it's used to migrate from mariadb-flags-in-page-0
> to sys_table_options approach.
>
> > +@return true if successfull, false if not */
> > +static
> > +void
> > +fil_update_page0(
> > +/*=============*/
> > +        byte*                   page0,  /*!< in: page 0 or NULL */
>
> why do you need this page0 argument if it is *always* NULL?
>
> > +        os_file_t               data_file, /*!< in: data file */
> > +        ulint                   space,  /*!< in: space id */
> > +        ulint                   flags,  /*!< in: old fsp flags */
> > +        ulint                   fsp_flags)/*!< in: new fsp flags */
> > +{
> > +        ulint psize = FSP_FLAGS_GET_PAGE_SSIZE_MARIADB(flags);
> > +        ulint zip_size = FSP_FLAGS_GET_ZIP_SSIZE(flags);
> > +
> > +        if (!psize) {
> > +                psize = UNIV_PAGE_SIZE_ORIG;
> > +        }
> > +
> > +        fsp_flags = fsp_flags_set_page_size(fsp_flags, psize);
> > +
> > +        if (flags != fsp_flags) {
> > +
> > +                ib_logf(IB_LOG_LEVEL_INFO,
> > +                        "InnoDB: Adjusted space_id %lu tablespace flags
> from %lu to %lu.\n",
> > +                        space, flags, fsp_flags);
> > +
> > +                mtr_t mtr;
> > +                mtr_start(&mtr);
> > +
> > +                if (!page0) {
> > +                        buf_block_t* block = buf_page_get_gen(space,
> > +                                        zip_size, 0,
> > +                                        RW_X_LATCH,
> > +                                        NULL,
> > +                                        BUF_GET,
> > +                                        __FILE__, __LINE__,
> > +                                        &mtr);
> > +                        page0 = buf_block_get_frame(block);
> > +                }
> > +
> > +                mlog_write_ulint(page0 + FSP_HEADER_OFFSET +
> FSP_SPACE_FLAGS,
> > +                        fsp_flags, MLOG_4BYTES, &mtr);
> > +                /* Redo log this as bytewise update to page 0
> > +                followed by an MLOG_FILE_WRITE_FSP_FLAGS */
> > +                byte* log_ptr = mlog_open(&mtr, 11 + 8);
> > +                if (log_ptr != NULL) {
> > +                        log_ptr = mlog_write_initial_log_record_fast(
> > +                                        page0,
> > +                                        MLOG_FILE_WRITE_FSP_FLAGS,
> > +                                        log_ptr, &mtr);
> > +                        mach_write_to_4(log_ptr, fsp_flags);
> > +                        log_ptr += 4;
> > +                        mach_write_to_4(log_ptr, space);
> > +                        log_ptr += 4;
> > +                        mlog_close(&mtr, log_ptr);
> > +                }
> > +
> > +                mtr_commit(&mtr);
> > +                lsn_t end_lsn = mtr.end_lsn;
> > +                buf_flush_init_for_writing(page0, NULL, end_lsn);
> > +                flags = mach_read_from_4(FSP_HEADER_OFFSET +
> FSP_SPACE_FLAGS + page0);
> > +                ut_ad(flags == fsp_flags);
> > +
> > +                /* Flush dirty page to the storage */
> > +                ulint n_pages = 0;
> > +                ulint sum_pages = 0;
> > +                bool success = false;
> > +                do {
> > +                        success = buf_flush_list(ULINT_MAX, end_lsn,
> &n_pages);
> > +                        buf_flush_wait_batch_end(NULL, BUF_FLUSH_LIST);
> > +                        sum_pages += n_pages;
> > +                } while (!success);
> > +
> > +                fil_space_set_fsp_flags(space, fsp_flags);
> > +        }
> > +}
> > +
> > +/******************************************************************
> > +Parse a MLOG_FILE_WRITE_FSP_FLAGS log entry
> > +@return position on log buffer */
> > +UNIV_INTERN
> > +byte*
> > +fil_parse_write_fsp_flags(
> > +/*======================*/
> > +        byte*           ptr,    /*!< in: Log entry start */
> > +        byte*           end_ptr,/*!< in: Log entry end */
> > +        buf_block_t*    block)  /*!< in: buffer block */
> > +{
> > +        /* check that redo log entry is complete */
> > +        uint entry_size = 4 + 4; // size of flags + space_id
> > +
> > +        if (end_ptr - ptr < entry_size){
> > +                return NULL;
> > +        }
> > +
> > +        ulint flags = mach_read_from_4(ptr);
> > +        ptr += 4;
> > +        ulint space_id = mach_read_from_4(ptr);
> > +        ptr += 4;
> > +
> > +        ut_a(fsp_flags_is_valid(flags));
> > +
> > +        /* update fil_space memory cache with flags */
> > +        fil_space_set_fsp_flags(space_id, flags);
> > +
> > +        return ptr;
> > +}
> > +
> >
> /********************************************************************//**
> >  Tries to open a single-table tablespace and optionally checks that the
> >  space id in it is correct. If this does not succeed, print an error
> message
> > diff --git a/storage/innobase/fts/fts0fts.cc
> b/storage/innobase/fts/fts0fts.cc
> > index cc4a4f9..f81fd31 100644
> > --- a/storage/innobase/fts/fts0fts.cc
> > +++ b/storage/innobase/fts/fts0fts.cc
> > @@ -1981,7 +1982,13 @@ fts_create_one_index_table(
> >       dict_mem_table_add_col(new_table, heap, "ilist", DATA_BLOB,
> >                              4130048, 0);
> >
> > -     error = row_create_table_for_mysql(new_table, trx, false,
> FIL_SPACE_ENCRYPTION_DEFAULT, FIL_DEFAULT_ENCRYPTION_KEY);
> > +     /* Get default encryption key id if set */
> > +     if (new_table && new_table->table_options &&
>
> new_table cannot be NULL here
>
> > +             new_table->table_options->encryption_key_id == 0) {
> > +             new_table->table_options->encryption_key_id =
> innobase_get_default_encryption_key_id(trx);
>
> do you want to do it here? may be better do it in
> row_create_table_for_mysql?
>
> > +     }
> > +
> > +     error = row_create_table_for_mysql(new_table, trx, false);
> >
> >       if (error != DB_SUCCESS) {
> >               trx->error_state = error;
> > diff --git a/storage/innobase/handler/ha_innodb.cc
> b/storage/innobase/handler/ha_innodb.cc
> > index 527c5be..3d5aaee 100644
> > --- a/storage/innobase/handler/ha_innodb.cc
> > +++ b/storage/innobase/handler/ha_innodb.cc
> > @@ -5459,6 +5460,43 @@ ha_innobase::innobase_initialize_autoinc()
> >  }
> >
> >  /*****************************************************************//**
> > +Set up the table options structure based on frm. */
> > +UNIV_INTERN
> > +void
> > +ha_innobase::set_table_options(
> > +/*===========================*/
> > +     THD*            thd,            /*!< in: thd */
> > +     TABLE*          table,          /*!< in: table */
> > +     dict_tableoptions_t* options)   /*!< in: table options */
> > +{
> > +     ha_table_option_struct* moptions = table->s->option_struct;
> > +
> > +     options->page_compressed = moptions->page_compressed;
> > +     options->page_compression_level = moptions->page_compression_level;
> > +     options->atomic_writes = (atomic_writes_t)moptions->atomic_writes;
> > +     options->encryption = (fil_encryption_t)moptions->encryption;
> > +     options->encryption_key_id = moptions->encryption_key_id;
> > +
> > +     if (options->encryption_key_id == 0) {
> > +             options->encryption_key_id = THDVAR(thd,
> default_encryption_key_id);
> > +     }
>
> No, this is not how it works. The *server* sets the encryption_key_id
> based on the default_encryption_key_id. The engine does not
> need to bother.
>
> > +
> > +     if (options->page_compression_level == 0) {
> > +             options->page_compression_level = page_zip_level;
> > +     }
>
> same here
>
> > +
> > +     /* Table options should be stored if they are not same as
> > +     defaults */
> > +     if (moptions->page_compressed ||
> > +         (options->atomic_writes == ATOMIC_WRITES_ON ||
> > +          options->atomic_writes == ATOMIC_WRITES_OFF) ||
> > +         (options->encryption == FIL_SPACE_ENCRYPTION_OFF ||
> > +          options->encryption == FIL_SPACE_ENCRYPTION_ON)) {
> > +             options->need_stored = true;
> > +     }
> > +}
> > +
> > +/*****************************************************************//**
> >  Creates and opens a handle to a table which already exists in an InnoDB
> >  database.
> >  @return      1 if error, 0 if success */
> > @@ -11603,6 +11647,10 @@ ha_innobase::check_table_options(
> >                       return "ENCRYPTION_KEY_ID";
> >
> >               }
> > +
> > +             table_options->need_stored = true;
> > +
> > +             table_options->need_stored = true;
>
> typo
>
> >       }
> >
> >       /* Check atomic writes requirements */
> > @@ -20422,3 +20483,25 @@ ib_push_warning(
> >       my_free(buf);
> >       va_end(args);
> >  }
> > +
> >
> +/********************************************************************//**
> > +Helper function to get default_encryption_key_id from THD
> > +(trx->mysql_thd).
> > +@return default_encryption_key_id from THD or
> > +FIL_DEFAULT_ENCRYPTION_KEY */
> > +UNIV_INTERN
> > +ulint
> > +innobase_get_default_encryption_key_id(
>
> See comment above. You _probably_ don't need this method
>
> > +/*===================================*/
> > +     trx_t*          trx)    /*! in: trx */
> > +{
> > +     ulint key_id = FIL_DEFAULT_ENCRYPTION_KEY;
> > +
> > +     if (trx && trx->mysql_thd) {
> > +             THD *thd = (THD *)trx->mysql_thd;
> > +
> > +             key_id = THDVAR(thd, default_encryption_key_id);
> > +     }
> > +
> > +     return (key_id);
> > +}
> > diff --git a/storage/innobase/handler/i_s.cc
> b/storage/innobase/handler/i_s.cc
> > index ef69e7d..75baa9d 100644
> > --- a/storage/innobase/handler/i_s.cc
> > +++ b/storage/innobase/handler/i_s.cc
> > @@ -9130,3 +9129,230 @@ UNIV_INTERN struct st_maria_plugin
> i_s_innodb_sys_semaphore_waits =
> >       STRUCT_FLD(version_info, INNODB_VERSION_STR),
> >          STRUCT_FLD(maturity, MariaDB_PLUGIN_MATURITY_BETA),
> >  };
> > +
> > +/**  SYS_TABLE_OPTIONS
> ************************************************/
> > +/* Fields of the dynamic table
> INFORMATION_SCHEMA.INNODB_SYS_TABLE_OPTIONS */
> > +static ST_FIELD_INFO innodb_sys_tableoptions_fields_info[] =
>
> Good idea!
>
> > +{
> > +     // SYS_TABLE_OPTIONS_TABLE_ID   0
> > +     {STRUCT_FLD(field_name,         "TABLE_ID"),
> > +      STRUCT_FLD(field_length,       MY_INT64_NUM_DECIMAL_DIGITS),
> > +      STRUCT_FLD(field_type,         MYSQL_TYPE_LONGLONG),
> > +      STRUCT_FLD(value,              0),
> > +      STRUCT_FLD(field_flags,        MY_I_S_UNSIGNED),
> > +      STRUCT_FLD(old_name,           ""),
> > +      STRUCT_FLD(open_method,        SKIP_OPEN_TABLE)},
> > +
> > +     // SYS_TABLE_OPTIONS_PAGE_COMPRESSED 1
> > +     {STRUCT_FLD(field_name,         "PAGE_COMPRESSED"),
> > +      STRUCT_FLD(field_length,       7),
> > +      STRUCT_FLD(field_type,         MYSQL_TYPE_STRING),
> > +      STRUCT_FLD(value,              0),
> > +      STRUCT_FLD(field_flags,        0),
> > +      STRUCT_FLD(old_name,           ""),
> > +      STRUCT_FLD(open_method,        SKIP_OPEN_TABLE)},
> > +
> > +     // SYS_TABLE_OPTIONS_PAGE_COMPRESSION_LEVEL 2
> > +     {STRUCT_FLD(field_name,         "PAGE_COMPRESSION_LEVEL"),
> > +      STRUCT_FLD(field_length,       MY_INT32_NUM_DECIMAL_DIGITS),
> > +      STRUCT_FLD(field_type,         MYSQL_TYPE_LONG),
> > +      STRUCT_FLD(value,              0),
> > +      STRUCT_FLD(field_flags,        MY_I_S_UNSIGNED),
> > +      STRUCT_FLD(old_name,           ""),
> > +      STRUCT_FLD(open_method,        SKIP_OPEN_TABLE)},
> > +
> > +     // SYS_TABLE_OPTIONS_ATOMIC_WRITES 3
> > +     {STRUCT_FLD(field_name,         "ATOMIC_WRITES"),
> > +      STRUCT_FLD(field_length,       9),
> > +      STRUCT_FLD(field_type,         MYSQL_TYPE_STRING),
> > +      STRUCT_FLD(value,              0),
> > +      STRUCT_FLD(field_flags,        0),
> > +      STRUCT_FLD(old_name,           ""),
> > +      STRUCT_FLD(open_method,        SKIP_OPEN_TABLE)},
> > +
> > +     // SYS_TABLE_OPTIONS_ENCRYPTED 4
> > +     {STRUCT_FLD(field_name,         "ENCRYPTED"),
> > +      STRUCT_FLD(field_length,       9),
> > +      STRUCT_FLD(field_type,         MYSQL_TYPE_STRING),
> > +      STRUCT_FLD(value,              0),
> > +      STRUCT_FLD(field_flags,        0),
> > +      STRUCT_FLD(old_name,           ""),
> > +      STRUCT_FLD(open_method,        SKIP_OPEN_TABLE)},
> > +
> > +     // SYS_TABLE_OPTIONS_ENCRYPTION_KEY_ID 5
> > +     {STRUCT_FLD(field_name,         "ENCRYPTION_KEY_ID"),
> > +      STRUCT_FLD(field_length,       MY_INT32_NUM_DECIMAL_DIGITS),
> > +      STRUCT_FLD(field_type,         MYSQL_TYPE_LONG),
> > +      STRUCT_FLD(value,              0),
> > +      STRUCT_FLD(field_flags,        MY_I_S_UNSIGNED),
> > +      STRUCT_FLD(old_name,           ""),
> > +      STRUCT_FLD(open_method,        SKIP_OPEN_TABLE)},
> > +
> > +     END_OF_ST_FIELD_INFO
> > +};
> > +
> > +/*******************************************************************//**
> > +Function to go through each record in SYS_TABLE_OPTIONS table, and fill
> the
> > +information_schema.innodb_sys_table_options table with related table
> information
> > +@return 0 on success */
> > +static
> > +int
> > +i_s_dict_fill_sys_table_options(
> > +/*============================*/
> > +     THD*            thd,    /*!< in: thread */
> > +     TABLE_LIST*     tables, /*!< in/out: tables to fill */
> > +     Item*           )       /*!< in: condition (not used) */
> > +{
> > +     btr_pcur_t      pcur;
> > +     const rec_t*    rec;
> > +     mem_heap_t*     heap;
> > +     mtr_t           mtr;
> > +
> > +     DBUG_ENTER("i_s_sys_table_options_fill_table");
> > +     RETURN_IF_INNODB_NOT_STARTED(tables->schema_table_name);
> > +
> > +     /* deny access to user without PROCESS_ACL privilege */
> > +     if (check_global_access(thd, PROCESS_ACL, true)) {
> > +             DBUG_RETURN(0);
> > +     }
> > +
> > +     heap = mem_heap_create(1000);
> > +     mutex_enter(&(dict_sys->mutex));
> > +     mtr_start(&mtr);
> > +
> > +     rec = dict_startscan_system(&pcur, &mtr, SYS_TABLE_OPTIONS);
> > +
> > +     while (rec) {
> > +             const char*     err_msg;
> > +             dict_tableoptions_t options;
> > +
> > +             /* Create and populate a dict_tableoptions_t structure with
> > +             information from SYS_TABLE_OPTIONS row */
> > +             memset(&options, 0, sizeof(dict_tableoptions_t));
> > +
> > +             err_msg = dict_process_sys_tableoptions(
> > +                     heap, rec, &options);
> > +
> > +             mtr_commit(&mtr);
> > +             mutex_exit(&dict_sys->mutex);
> > +
> > +             if (!err_msg) {
> > +                     Field**         fields = tables->table->field;
> > +
> > +
>  OK(fields[SYS_TABLE_OPTIONS_TABLE_ID]->store((longlong) options.table_id));
> > +
>  OK(fields[SYS_TABLE_OPTIONS_PAGE_COMPRESSION_LEVEL]->store((ulint)options.page_compression_level));
> > +
>  OK(fields[SYS_TABLE_OPTIONS_ENCRYPTION_KEY_ID]->store((ulint)options.encryption_key_id));
> > +
> > +                     if (options.page_compressed) {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_PAGE_COMPRESSED], "YES"));
> > +                     } else {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_PAGE_COMPRESSED], "NO"));
> > +                     }
> > +
> > +                     if (options.encryption ==
> FIL_SPACE_ENCRYPTION_DEFAULT) {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_ENCRYPTED], "DEFAULT"));
> > +                     } else if (options.encryption ==
> FIL_SPACE_ENCRYPTION_ON) {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_ENCRYPTED], "ON"));
> > +                     } else {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_ENCRYPTED], "OFF"));
> > +                     }
> > +
> > +                     if (options.atomic_writes ==
> ATOMIC_WRITES_DEFAULT) {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_ATOMIC_WRITES], "DEFAULT"));
> > +                     } else if (options.atomic_writes ==
> ATOMIC_WRITES_ON) {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_ATOMIC_WRITES], "ON"));
> > +                     } else {
> > +
>  OK(field_store_string(fields[SYS_TABLE_OPTIONS_ATOMIC_WRITES], "OFF"));
> > +                     }
> > +
> > +                     OK(schema_table_store_record(thd, tables->table));
> > +
> > +             } else {
> > +                     push_warning_printf(thd,
> Sql_condition::WARN_LEVEL_WARN,
> > +                                         ER_CANT_FIND_SYSTEM_REC, "%s",
> > +                                         err_msg);
> > +             }
> > +
> > +             mem_heap_empty(heap);
> > +
> > +             /* Get the next record */
> > +             mutex_enter(&dict_sys->mutex);
> > +             mtr_start(&mtr);
> > +             rec = dict_getnext_system(&pcur, &mtr);
> > +     }
> > +
> > +     mtr_commit(&mtr);
> > +     mutex_exit(&dict_sys->mutex);
> > +     mem_heap_free(heap);
> > +
> > +     DBUG_RETURN(0);
> > +}
> > +/*******************************************************************//**
> > +Bind the dynamic table INFORMATION_SCHEMA.INNODB_SYS_TABLE_OPTIONS
> > +@return 0 on success */
> > +static
> > +int
> > +innodb_sys_table_options_init(
> > +/*============================*/
> > +     void*   p)      /*!< in/out: table schema object */
> > +{
> > +     ST_SCHEMA_TABLE*        schema;
> > +
> > +     DBUG_ENTER("innodb_sys_table_options_init");
> > +
> > +     schema = (ST_SCHEMA_TABLE*) p;
> > +
> > +     schema->fields_info = innodb_sys_tableoptions_fields_info;
> > +     schema->fill_table = i_s_dict_fill_sys_table_options;
> > +
> > +     DBUG_RETURN(0);
> > +}
> > +
> > +UNIV_INTERN struct st_maria_plugin   i_s_innodb_sys_table_options =
> > +{
> > +     /* the plugin type (a MYSQL_XXX_PLUGIN value) */
> > +     /* int */
> > +     STRUCT_FLD(type, MYSQL_INFORMATION_SCHEMA_PLUGIN),
> > +
> > +     /* pointer to type-specific plugin descriptor */
> > +     /* void* */
> > +     STRUCT_FLD(info, &i_s_info),
> > +
> > +     /* plugin name */
> > +     /* const char* */
> > +     STRUCT_FLD(name, "INNODB_SYS_TABLE_OPTIONS"),
> > +
> > +     /* plugin author (for SHOW PLUGINS) */
> > +     /* const char* */
> > +     STRUCT_FLD(author, maria_plugin_author),
> > +
> > +     /* general descriptive text (for SHOW PLUGINS) */
> > +     /* const char* */
> > +     STRUCT_FLD(descr, "InnoDB SYS_TABLE_OPTIONS"),
> > +
> > +     /* the plugin license (PLUGIN_LICENSE_XXX) */
> > +     /* int */
> > +     STRUCT_FLD(license, PLUGIN_LICENSE_GPL),
> > +
> > +     /* the function to invoke when plugin is loaded */
> > +     /* int (*)(void*); */
> > +     STRUCT_FLD(init, innodb_sys_table_options_init),
> > +
> > +     /* the function to invoke when plugin is unloaded */
> > +     /* int (*)(void*); */
> > +     STRUCT_FLD(deinit, i_s_common_deinit),
> > +
> > +     /* plugin version (for SHOW PLUGINS) */
> > +     /* unsigned int */
> > +     STRUCT_FLD(version, INNODB_VERSION_SHORT),
> > +
> > +     /* struct st_mysql_show_var* */
> > +     STRUCT_FLD(status_vars, NULL),
> > +
> > +     /* struct st_mysql_sys_var** */
> > +     STRUCT_FLD(system_vars, NULL),
> > +
> > +        /* Maria extension */
> > +     STRUCT_FLD(version_info, INNODB_VERSION_STR),
> > +        STRUCT_FLD(maturity, MariaDB_PLUGIN_MATURITY_BETA),
> > +};
> > diff --git a/storage/innobase/include/dict0dict.h
> b/storage/innobase/include/dict0dict.h
> > index b15d364..fa65884 100644
> > --- a/storage/innobase/include/dict0dict.h
> > +++ b/storage/innobase/include/dict0dict.h
> > @@ -944,15 +948,8 @@ dict_tf_set(
> >       ulint*          flags,          /*!< in/out: table */
> >       rec_format_t    format,         /*!< in: file format */
> >       ulint           zip_ssize,      /*!< in: zip shift size */
> > -     bool            remote_path,    /*!< in: table uses DATA DIRECTORY
> > +     bool            remote_path);   /*!< in: table uses DATA DIRECTORY
> >                                       */
> > -        bool         page_compressed,/*!< in: table uses page compressed
> > -                                     pages */
> > -     ulint           page_compression_level, /*!< in: table page
> compression
> > -                                              level */
> > -     ulint           atomic_writes)  /*!< in: table atomic
> > -                                     writes option value*/
> > -     __attribute__((nonnull));
>
> Why do you remove nonnull attributes? They help gcc to optimize the code
>
> >
> /********************************************************************//**
> >  Convert a 32 bit integer table flags to the 32 bit integer that is
> >  written into the tablespace header at the offset FSP_SPACE_FLAGS and is
> > diff --git a/storage/innobase/include/dict0dict.ic
> b/storage/innobase/include/dict0dict.ic
> > index a3a3446..3580081 100644
> > --- a/storage/innobase/include/dict0dict.ic
> > +++ b/storage/innobase/include/dict0dict.ic
> > @@ -685,11 +629,7 @@ dict_sys_tables_type_validate(
> >       ulint   zip_ssize = DICT_TF_GET_ZIP_SSIZE(type);
> >       ulint   atomic_blobs = DICT_TF_HAS_ATOMIC_BLOBS(type);
> >       ulint   unused = DICT_TF_GET_UNUSED(type);
> > -     ulint   page_compression = DICT_TF_GET_PAGE_COMPRESSION(type);
> > -     ulint   page_compression_level =
> DICT_TF_GET_PAGE_COMPRESSION_LEVEL(type);
>
> By the way, what about page compression in MySQL InnoDB?
> where is that stored? will you support those tables?
>
> > -     ulint   atomic_writes = DICT_TF_GET_ATOMIC_WRITES(type);
> > -
> > -     ut_a(atomic_writes <= ATOMIC_WRITES_OFF);
> > +     ulint   unused_mariadb = DICT_TF_GET_UNUSED_MARIADB(type);
> >
> >       /* The low order bit of SYS_TABLES.TYPE is always set to 1.
> >       If the format is UNIV_FORMAT_B or higher, this field is the same
> > diff --git a/storage/innobase/include/fil0fil.h
> b/storage/innobase/include/fil0fil.h
> > index 7fe0cc8..bc7f317 100644
> > --- a/storage/innobase/include/fil0fil.h
> > +++ b/storage/innobase/include/fil0fil.h
> > @@ -39,7 +39,7 @@ Created 10/25/1995 Heikki Tuuri
> >  #include "ibuf0types.h"
> >  #include "log0log.h"
> >  #endif /* !UNIV_HOTBACKUP */
> > -
> > +#include "my_crypt.h"
>
> why?
>
> >  #include <list>
> >
> >  // Forward declaration
> > diff --git a/storage/innobase/srv/srv0start.cc
> b/storage/innobase/srv/srv0start.cc
> > index d7b37b5..29ce93d 100644
> > --- a/storage/innobase/srv/srv0start.cc
> > +++ b/storage/innobase/srv/srv0start.cc
> > @@ -2537,7 +2543,16 @@ innobase_start_or_create_for_mysql(void)
> >
> >                       sys_datafiles_created = true;
> >
> > -                     /* This function assumes that SYS_DATAFILES exists
> */
> > +                     err = dict_create_or_check_sys_table_options();
> > +
> > +                     if (err != DB_SUCCESS) {
> > +                             return(err);
> > +                     }
> > +
> > +                     sys_table_options_created = true;
> > +
> > +                     /* This function assumes that SYS_DATAFILES
> > +                     and SYS_TABLE_OPTIONS exists */
>
> "exist"
>
> >
>  dict_check_tablespaces_and_store_max_id(dict_check);
> >               }
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and security@xxxxxxxxxxx
>

Follow ups

References