← Back to team overview

maria-developers team mailing list archive

Re: [Commits] 8140454d2b7: MDEV-11254: innodb-use-trim has no effect in 10.2

 

This is somewhat better, but I think we should still clean up the code
a little more.

On Mon, Jan 23, 2017 at 1:32 PM, jan <jan.lindstrom@xxxxxxxxxxx> wrote:
> fil0fil.cc: Remove unneeded status call and add test is
> spare files and punch hole supported by file system where
> tablespace is created. Add call to get file system
> block size. Used file node is added to IORequest. Added
> functions to check is punch hole supported and setting
> punch hole.

s/is spare files/if sparse files/
s/where/when/

> dberr.h: Add error code if punch hole operation fails.

I think that it would be better to explicitly mention the error code
here. The name of the code matters more than the file name where it is
declared.

> os0file.h: Remove unneeded m_block size. Add m_bpage
> to know actual size of the block and m_fil_node
> to know tablespace file system block size and does it
> support punch hole.

Mention the class or struct name too. There could be multiple classes
declared in the file os0file.h.

> os0file.cc: Add function to do punch_hole operation,
> get the file system block size and determine
> does file system support sparse files (for punch hole).

What is the name of the added function?

> +--disable_warnings
> +let $innodb_compression_algorithm_orig=`SELECT @@innodb_compression_algorithm`;
> +let $innodb_file_format_orig = `SELECT @@innodb_file_format`;
> +let $innodb_file_per_table_orig = `SELECT @@innodb_file_per_table`;
> +
> +SET GLOBAL innodb_file_format = `Barracuda`;
> +SET GLOBAL innodb_file_per_table = ON;
> +--enable_warnings
> +--enable_query_log

In 10.2, do not add unnecessary references to the deprecated variable
innodb_file_format, and do not bother to mess with
innodb_file_per_table. The defaults are fine as is.

> +let $wait_condition= SELECT variable_value > 0 FROM information_schema.global_status WHERE variable_name = 'innodb_num_pages_page_compressed';
> +--source include/wait_condition.inc
> +
> +let $wait_condition= SELECT variable_value > 0 FROM information_schema.global_status WHERE variable_name = 'innodb_num_page_compressed_trim_op';
> +--source include/wait_condition.inc
> +
> +SELECT VARIABLE_VALUE > 0 FROM information_schema.global_status where variable_name = 'INNODB_NUM_PAGES_PAGE_COMPRESSED';
> +SELECT VARIABLE_VALUE > 0 FROM information_schema.global_status where variable_name = 'INNODB_NUM_PAGE_COMPRESSED_TRIM_OP';

Can we wait for a single condition here, using AND in the WHERE
condition? I do not think that we need the subsequent SELECT
statements.

> +/**
> +Should we punch hole to deallocate unused portion of the page.
> +@param[in]     bpage           Page control block
> +@return true if punch hole should be used, false if not */
> +bool
> +buf_page_should_punch_hole(
> +       const buf_page_t* bpage)
> +{
> +       return (bpage && bpage->real_size != bpage->size.physical());
> +}
> +
> +/**
> +Calculate the length of trim (punch_hole) operation.
> +@param[in]     bpage           Page control block
> +@param[in]     write_length    Write length
> +@return length of the trim or zero. */
> +ulint
> +buf_page_get_trim_length(
> +       const buf_page_t*       bpage,
> +       ulint                   write_length)
> +{
> +       return (bpage ? bpage->size.physical() - write_length : 0);
> +}

Please declare these as methods of buf_page_t, and move the NULL check
to the caller.

> @@ -912,7 +912,7 @@ buf_dblwr_write_block_to_datafile(
>                 type |= IORequest::DO_NOT_WAKE;
>         }
>
> -       IORequest       request(type);
> +       IORequest       request(type, bpage);
>
>         /* We request frame here to get correct buffer in case of
>         encryption and/or page compression */
> @@ -924,7 +924,7 @@ buf_dblwr_write_block_to_datafile(
>                 fil_io(request, sync, bpage->id, bpage->size, 0,
>                        bpage->size.physical(),
>                        (void*) frame,
> -                      (void*) bpage, NULL);
> +                      (void*) bpage);
>         } else {
>                 ut_ad(!bpage->size.is_compressed());
>

It feels redundant that we are passing bpage both in the IORequest
request and directly to the fil_io() call. One parameter should be
enough. And there should be no need for the parameters that are
derived directly from bpage.

Can we have a separate fil_io() call for the redo log, and a separate
one for buffer pool blocks?

> +/**
> +Get should we punch hole to tablespace.
> +@param[in]     node            File node
> +@return true, if punch hole should be tried, false if not. */
> +bool
> +fil_node_should_punch_hole(
> +       const fil_node_t*       node)
> +{
> +       return (node && node->space && node->space->punch_hole);
> +}

Who would call this with node=NULL? And when would node->space ever be
NULL for a valid fil_node_t?

> +/**
> +Set punch hole to tablespace to given value.
> +@param[in]     node            File node
> +@param[in]     val             value to be set. */
> +void
> +fil_space_set_punch_hole(
> +       fil_node_t*             node,
> +       bool                    val)
> +{
> +       if (node && node->space) {
> +               node->space->punch_hole = val;
> +       }
> +}

Also here, I would remove the unnecessary if condition.

> diff --git a/storage/innobase/include/os0api.h b/storage/innobase/include/os0api.h
> new file mode 100644
> index 00000000000..ea2a113bdec
> --- /dev/null
> +++ b/storage/innobase/include/os0api.h

Instead of creating a new file, can we please move these declarations
into an existing file, such as fsp0types.h? The file name prefix is
misleading, because there is nothing operating system dependent about
the contents.

> diff --git a/storage/innobase/include/os0file.h b/storage/innobase/include/os0file.h
> index 57ee015dfdd..cd016a46972 100644
> --- a/storage/innobase/include/os0file.h
> +++ b/storage/innobase/include/os0file.h
> @@ -36,7 +36,8 @@ Created 10/21/1995 Heikki Tuuri
>  #ifndef os0file_h
>  #define os0file_h
>
> -#include "univ.i"
> +#include "page0size.h"
> +#include "os0api.h"
>
>  #ifndef _WIN32
>  #include <dirent.h>
> @@ -46,8 +47,10 @@ Created 10/21/1995 Heikki Tuuri
>
>  /** File node of a tablespace or the log data space */
>  struct fil_node_t;
> +struct fil_space_t;
>
>  extern bool    os_has_said_disk_full;
> +extern my_bool srv_use_trim;
>
>  /** Number of pending read operations */
>  extern ulint   os_n_pending_reads;

Instead of duplicating declarations, please add #include directives.

> @@ -177,6 +180,8 @@ static const ulint OS_FILE_ERROR_MAX = 200;
>  #define IORequestLogRead       IORequest(IORequest::LOG | IORequest::READ)
>  #define IORequestLogWrite      IORequest(IORequest::LOG | IORequest::WRITE)
>
> +
> +
>  /**
>  The IO Context that is passed down to the low level IO code */
>  class IORequest {

Do not add empty lines.

> +       /** @return true if punch hole should be used */
> +       bool punch_hole() const
> +               MY_ATTRIBUTE((warn_unused_result))
> +       {
> +               return((m_type & PUNCH_HOLE) == PUNCH_HOLE);
> +       }
> +

Comparing against 0 should be more efficient.

> -       /** @return the block size to use for IO */
> -       ulint block_size() const
> -               MY_ATTRIBUTE((warn_unused_result))
> +       /** Set the pointer to file node for IO
> +       @param[in] node                 File node */
> +       void set_fil_node(fil_node_t* node)

Consider removing this method, and setting this in the constructor.
Why fil_node_t and not fil_space_t?

> +       /** @return true if punch hole is supported */
> +       static bool is_punch_hole_supported()
> +       {
> +
> +               /* In this debugging mode, we act as if punch hole is supported,
> +               and then skip any calls to actually punch a hole here.
> +               In this way, Transparent Page Compression is still being tested. */
> +               DBUG_EXECUTE_IF("ignore_punch_hole",
> +                       return(true);
> +               );

Please add a test that uses SET DEBUG_DBUG='+d,ignore_punch_hole'.

> +#if defined(HAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE) || defined(_WIN32)
> +               return(true);
> +#else
> +               return(false);
> +#endif /* HAVE_FALLOC_PUNCH_HOLE_AND_KEEP_SIZE || _WIN32 */
> +       }

This looks wrong design to me. IORequest::is_punch_hole_supported()
should depend on the fil_node_t and the actual file system
capabilities.

> +       void space_no_punch_hole(void) {
> +               fil_space_set_punch_hole(m_fil_node, false);
> +       }

I would prefer more straight code, such as m_fil_node->punch_hole = false;
or m_fil_node->clear_punch_hole();

> +/** Check if the file system supports sparse files.
> +
> +Warning: On POSIX systems we try and punch a hole from offset 0 to
> +the system configured page size. This should only be called on an empty
> +file.
> +
> +Note: On Windows we use the name and on Unices we use the file handle.
> +
> +@param[in]     name            File name
> +@param[in]     fh              File handle for the file - if opened
> +@return true if the file system supports sparse files */
> +bool
> +os_is_sparse_file_supported(
> +       const char*     path,
> +       os_file_t       fh)
> +       MY_ATTRIBUTE((warn_unused_result));

Is it possible to use the file handle on Windows? Or can we define two
different functions, one for Windows and another for POSIX?

> +/** Free storage space associated with a section of the file.
> +@param[in]     fh              Open file handle
> +@param[in]     off             Starting offset (SEEK_SET)
> +@param[in]     len             Size of the hole
> +@return DB_SUCCESS or error code */
> +dberr_t
> +os_file_punch_hole(
> +       IORequest&      type,
> +       os_file_t       fh,
> +       os_offset_t     off,
> +       os_offset_t     len)
> +       MY_ATTRIBUTE((warn_unused_result));
> +
> +/** Free storage space associated with a section of the file.
> +@param[in]     fh              Open file handle
> +@param[in]     off             Starting offset (SEEK_SET)
> +@param[in]     len             Size of the hole
> +@return DB_SUCCESS or error code */
> +dberr_t
> +os_file_punch_hole(
> +       os_file_t       fh,
> +       os_offset_t     off,
> +       os_offset_t     len)
> +       MY_ATTRIBUTE((warn_unused_result));

The parameter 'type' of the first function is not defined. Why do we
need two very similar-looking functions? Could one of them be defined
inline as a wrapper of the other?

> @@ -156,9 +157,6 @@ class page_size_t {
>
>  private:
>
> -       /* Disable implicit copying. */
> -       void operator=(const page_size_t&);
> -
>         /* For non compressed tablespaces, physical page size is equal to
>         the logical page size and the data is stored in buf_page_t::frame
>         (and is also always equal to univ_page_size (--innodb-page-size=)).

Also reimplement page_size_t::copy_from() as { *this = src; }

> @@ -737,6 +740,19 @@ os_file_handle_error_no_exit(
>         const char*     operation,
>         bool            silent);
>
> +/** Punch a hole in the file if it was a write
> +@param[in]     type            IO context
> +@param[in]     fh              Open file handle
> +@param[in]     len             Compressed buffer length for write
> +@return DB_SUCCESS or error code */
> +static
> +dberr_t
> +os_file_io_complete(
> +       IORequest&      type,
> +       os_file_t       fh,
> +       ulint           offset,
> +       ulint           len);

Can this be made a method of IORequest?

> +       if (err != 0) {
> +               ib::warn() << "fstat() failed on file " << name
> +                          << " error "<< errno << " : " << strerror(errno);
> +               os_file_handle_error_no_exit(name, "fstat()", FALSE);

Remove the ib::warn(). One error report should be enough, just like we
would do it on Windows.

> +       ib::warn()
> +               << "fallocate(" << fh
> +               <<", FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, "
> +               << off << ", " << len << ") returned errno: "
> +               <<  errno;

Do we need this detailed reporting? What use is the numeric value of
the file handle to the user? A file name would be much more useful.
And ideally, this error should be pushed to the client.
> +                       /* Deallocate unused blocks from file system.
> +                       This is newer done to page 0 or to log files.*/

s/newer/never/. However, during .ibd file creation, we do punch a hole
to page 0.

> +                       if (offset > 0
> +                           && !type.is_log()
> +                           && type.is_write()
> +                           && type.punch_hole()) {
> +                               *err = os_file_io_complete(
> +                                       const_cast<IORequest&>(type),
> +                                       file,
> +                                       static_cast<ulint>(offset),
> +                                       n);

Remove the const_cast, and properly change all IORequest parameters to
non-const.

> @@ -4680,7 +4913,7 @@ os_file_pwrite(
>         (void) my_atomic_addlint(&os_n_pending_writes, 1);
>         MONITOR_ATOMIC_INC(MONITOR_OS_PENDING_WRITES);
>
> -       ssize_t n_bytes = os_file_io(type, file, const_cast<void*>(buf),
> +       ssize_t n_bytes = os_file_io(type, file, (void*)buf,
>                                      n, offset, err);

Avoid C-style casts at any cost. Can we split the os_file_io() into
os_file_read() and os_file_write() somehow? How much common code is
there actually between the two? Also, could that function be made a
method of IORequest?

> @@ -5195,6 +5429,31 @@ os_file_read_no_error_handling_func(
> +/** NOTE! Use the corresponding macro os_file_write(), not directly
> +Requests a synchronous write operation.
> +@param[in]     type            IO flags
> +@param[in]     file            handle to an open file
> +@param[out]    buf             buffer from which to write
> +@param[in]     offset          file offset from the start where to read
> +@param[in]     n               number of bytes to read, starting from offset
> +@return DB_SUCCESS if request was successful, false if fail */
> +dberr_t
> +os_file_write_func(
> +       IORequest&      type,
> +       const char*     name,
> +       os_file_t       file,
> +       const void*     buf,
> +       os_offset_t     offset,
> +       ulint           n)

Can this be made a method of IORequest?

> diff --git a/storage/innobase/srv/srv0srv.cc b/storage/innobase/srv/srv0srv.cc
> index 5d478e4529f..4b862cc6267 100644
> --- a/storage/innobase/srv/srv0srv.cc
> +++ b/storage/innobase/srv/srv0srv.cc
> @@ -184,7 +184,7 @@ my_bool     srv_use_native_aio = TRUE;
>  my_bool        srv_numa_interleave = FALSE;
>  /* If this flag is TRUE, then we will use fallocate(PUCH_HOLE)
>  to the pages */
> -UNIV_INTERN my_bool    srv_use_trim = FALSE;
> +UNIV_INTERN my_bool    srv_use_trim = TRUE;
>  /* If this flag is TRUE, then we disable doublewrite buffer */
>  UNIV_INTERN my_bool    srv_use_atomic_writes = FALSE;
>  /* If this flag IS TRUE, then we use this algorithm for page compressing the pages */

Omit the redundant initialization and let the variable live in the BSS
segment. Global parameters will be initialized via the table in
ha_innodb.cc.
-- 
DON’T MISS
M|17
April 11 - 12, 2017
The Conrad Hotel
New York City
https://m17.mariadb.com/

Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation