← Back to team overview

maria-developers team mailing list archive

Re: [Commits] c32bc1afe82: MDEV-11254: innodb-use-trim has no effect in 10.2

 

On Tue, Jan 10, 2017 at 11:49 AM, jan <jan.lindstrom@xxxxxxxxxxx> wrote:
> revision-id: c32bc1afe820aa90c2e2666a77ce2bb43019df3d (mariadb-10.2.3-43-gc32bc1afe82)
> parent(s): 3d46768da2a784ddc9c341d1fb03468525bd38f1
> author: Jan Lindström
> committer: Jan Lindström
> timestamp: 2017-01-10 11:36:42 +0200
> message:
>
> MDEV-11254: innodb-use-trim has no effect in 10.2
>
> In MySQL 5.7 InnoDB os0file.[h|cc] was refactored making
> trim or punch hole operation not work with MariaDB.
>
> buf0buf.cc: buf_page_is_corrupted() remember that compressed
> pages do not contain checksum and FIL_TAIL header.

What is a FIL_TAIL header? I suppose it is referring to the 8 bytes
FIL_PAGE_DATA_END at the end of the page frame?

And which compressed pages? ROW_FORMAT=COMPRESSED pages do contain a
page footer.

> buf0dblwr.cc: Set up the page size and write size to IORequest
> and set up punch_hole() if actual write is not physical size.

Actual write to where? Apparently (looking at the code), it is to the
user data file, not to the doublewrite buffer. The commit message
should mention this important detail.
(I think we should not punch holes to the doublewrite buffer area
(pages 64..191 in the system tablespace).)

> buf0rea.cc, fil0fil.cc, fil0fil.h: Remove unnecessary parameter
> write_size from os_file_write, fil_io, os_aio functions.

I see that this parameter does not exist in MySQL 5.7. OK to remove.

> os0file.h, os0file.cc: Implement new version of os_file_io_complete
> to do punch_hole operation and punch hole operation should
> be based on file system block size determined using fstat or
> DeviceIOControl.

Please also mention the InnoDB wrapper function
os_file_get_block_size(), which was added.
But, where is that function being called? Did you forget to adjust
fil_node_open_file()?

Please submit a revised patch for review.
Some detailed comments follow.

> +set autocommit=0;
> +call innodb_insert_proc(16000);
> +commit;
> +set autocommit=1;

Do we really have to insert this many records? What would be the
minimum test case? Did you test with innodb_page_size=4k or 8k?

Consider removing the two SET statements and adding a BEGIN before the
CALL instead. That would be more standard SQL.

> +--disable_query_log
> +--disable_warnings
> +EVAL SET GLOBAL innodb_compression_algorithm = $innodb_compression_algorithm_orig;
> +EVAL SET GLOBAL innodb_file_per_table = $innodb_file_per_table_orig;
> +EVAL SET GLOBAL innodb_file_format = $innodb_file_format_orig;
> +--enable_warnings
> +--enable_query_log

There is no need to assign innodb_file_per_table or innodb_file_format
in 10.2. The defaults are just fine.
If you keep adding these statements in 10.2, merging to 10.3 will
require extra work after 10.3 removes these deprecated parameters
(copying WL#7704 from MySQL 8.0.0).

> +       bool            page_encrypted = false;
> +       bool            page_compressed = false;
> +       ulint           page_type = mach_read_from_2(read_buf+FIL_PAGE_TYPE);
> +
> +       page_compressed = (page_type == FIL_PAGE_PAGE_COMPRESSED_ENCRYPTED
> +                          || page_type == FIL_PAGE_PAGE_COMPRESSED);

Please remove the initialization of page_compressed to false. Could we
fuse page_compressed into page_encrypted to simplify those conditions
that currently check for page_compressed&&page_encrypted?

> @@ -978,9 +978,13 @@ buf_dblwr_write_block_to_datafile(
>                 ut_a(buf_block_get_state(block) == BUF_BLOCK_FILE_PAGE);
>                 buf_dblwr_check_page_lsn(block->frame);
>
> +               if (bpage->real_size != bpage->size.physical()) {
> +                       request.set_punch_hole();
> +               }
> +
>                 fil_io(request,
> -                      sync, bpage->id, bpage->size, 0, bpage->size.physical(),
> -                      frame, block, (ulint *)&bpage->write_size);
> +                       sync, bpage->id, bpage->size, 0, bpage->real_size,
> +                       frame, block);
>         }
>  }

I understand that this is the write to the final location. Sure, we
should punch the hole there.

> diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
> index b82c4db18ad..de534cc6e89 100644
> --- a/storage/innobase/buf/buf0flu.cc
> +++ b/storage/innobase/buf/buf0flu.cc
> @@ -1093,11 +1093,15 @@ buf_flush_write_block_low(
>
>                 ulint   type = IORequest::WRITE | IORequest::DO_NOT_WAKE;
>
> -               IORequest       request(type);
> +               IORequest       request(type, bpage->size, (ulint *)&bpage->write_size);
> +
> +               if (bpage->real_size != bpage->size.physical()) {
> +                       request.set_punch_hole();
> +               }

Could we pass bpage as a parameter to the IORequest::IORequest(), and
also include the set_punch_hole() call there?

I think that buf_page_t is already too big, and we should not bloat it
with a 64-bit field for write_size. The reason that I separated
buf_page_t from buf_block_t was that for ROW_FORMAT=COMPRESSED, the
block descriptors would use less space.

Do we even need write_size in buf_page_t? I think it would better
belong to buf_block_t. That is because ROW_FORMAT=COMPRESSED is never
used together with hole-punching.

> @@ -37,6 +37,7 @@ Created 10/21/1995 Heikki Tuuri
>  #define os0file_h
>
>  #include "univ.i"
> +#include "page0size.h"

#include "page0size.h" also does #include "univ.i"

>
>  #ifndef _WIN32
>  #include <dirent.h>
> @@ -554,6 +555,8 @@ class IORequest {
>         /** Default constructor */
>         IORequest()
>                 :
> +               m_page_size(univ_page_size),
> +               m_write_size(NULL),
>                 m_block_size(UNIV_SECTOR_SIZE),
>                 m_type(READ),
>                 m_compression()

We only have to add a single buf_page_t* const parameter. Maybe we
could also remove m_block_size?

> +       /**
> +       @param[in]      type            Request type, can be a value that is
> +                                       ORed from the above enum
> +       @param[in]      page_size       Page size
> +       @param[in]      write_size      Actual write size */
> +       IORequest(ulint type,
> +               const page_size_t& page_size,
> +               ulint*          write_size)
> +               :
> +               m_page_size(page_size),
> +               m_write_size(write_size),
> +               m_block_size(UNIV_SECTOR_SIZE),
> +               m_type(static_cast<uint16_t>(type)),
> +               m_compression()
> +       {
> +               if (is_log()) {
> +                       disable_compression();
> +               }
> +
> +               if (!is_punch_hole_supported()) {
> +                       clear_punch_hole();
> +               }
> +       }

I do not think we need this constructor.

> +       /** operator =
> +       @param[in]      from            Source to copy */
> +       IORequest& operator=(const IORequest& from)
> +       {
> +               m_page_size.copy_from(from.m_page_size);
> +               m_write_size = from.m_write_size;
> +               m_block_size = from.m_block_size;
> +               m_type = from.m_type;
> +               m_compression = from.m_compression;
> +               return (*this);
> +       }

What is wrong with the compiler-generated assignment operator?
If it complains that there is no page_size_t::operator=(), we can
easily let it be generated by the compiler.
What is this assignment needed for?

> +       /** Copy constructor.
> +       @param[in]      from            source. */
> +       IORequest(const IORequest& from)
> +               : m_page_size(from.m_page_size.physical(),
> +                       from.m_page_size.logical(),
> +                       from.m_page_size.is_compressed()),
> +               m_write_size(from.m_write_size),
> +               m_block_size(from.m_block_size),
> +               m_type(from.m_type),
> +               m_compression(from.m_compression)
> +       {
> +       }
> +

What is this needed for? Why cannot the compiler generate this? As far
as I can tell, this is just a bitwise copy.

> @@ -688,6 +742,22 @@ class IORequest {
>                 m_block_size = static_cast<uint32_t>(block_size);
>         }
>
> +       /** Set the write size variable for later use.
> +       @param[in] write_size           Write size variable */
> +       void write_size(ulint* write_size)
> +       {
> +               m_write_size = write_size;
> +       }
> +
> +       /** Set the actual write size done in IO
> +       @param[in] write_size           Write size to set */
> +       void write_size(ulint write_size)
> +       {
> +               if (m_write_size) {
> +                       *m_write_size = write_size;
> +               }
> +       }
> +

The two methods do two completely different things, and at the very
least they should have distinctive names. Do we need the first method
at all?

> +/***********************************************************************//**
> +Try to get number of bytes per sector from file system.
> +@return        file block size */
> +UNIV_INTERN
> +ulint
> +os_file_get_block_size(
> +/*===================*/
> +       os_file_t       file,   /*!< in: handle to a file */
> +       const char*     name);  /*!< in: file name */

Do not use /*****/ or /*====*/ or /*!< comments in new code.

This function appears to be unused.

> -/** Decompress after a read and punch a hole in the file if it was a write
> +/** Punch a hole in the file if it was a write
>  @param[in]     type            IO context
>  @param[in]     fh              Open file handle
> -@param[in,out] buf             Buffer to transform
> -@param[in,out] scratch         Scratch area for read decompression
> -@param[in]     src_len         Length of the buffer before compression
> -@param[in]     len             Compressed buffer length for write and size
> -                               of buf len for read
> +@param[in]     len             Compressed buffer length for write
>  @return DB_SUCCESS or error code */
>  static
>  dberr_t
>  os_file_io_complete(
> -       const IORequest&type,
> +       IORequest&      type,
>         os_file_t       fh,
> -       byte*           buf,
> -       byte*           scratch,
> -       ulint           src_len,
>         ulint           offset,
>         ulint           len);

Why is the const qualifier removed?

> @@ -832,6 +825,69 @@ os_aio_simulated_handler(
>         void**          m2,
>         IORequest*      type);
>
> +/***********************************************************************//**
> +Try to get number of bytes per sector from file system.
> +@return        file block size */
> +UNIV_INTERN
> +ulint
> +os_file_get_block_size(
> +/*===================*/
> +       os_file_t       file,   /*!< in: handle to a file */
> +       const char*     name)   /*!< in: file name */
> +{
> +       ulint           fblock_size = 512;
> +
> +#if defined(UNIV_LINUX)
> +       struct stat local_stat;
> +       int             err;
> +
> +       err = fstat((int)file, &local_stat);
> +
> +       if (err != 0) {
> +               ib::warn() << "fstat() failed on file " << name
> +                          << " error "<< errno << " : " << strerror(errno);
> +               os_file_handle_error_no_exit(name, "fstat()", FALSE);
> +       } else {
> +               fblock_size = local_stat.st_blksize;
> +       }
> +#endif /* UNIV_LINUX */

Why only Linux? fstat() is already in POSIX.1-2001.

> @@ -953,8 +1009,8 @@ class AIOHandler {
>                 ut_a(slot->type.is_read() || !slot->skip_punch_hole);
>
>                 return(os_file_io_complete(
> -                               slot->type, slot->file, slot->buf,
> -                               slot->compressed_page, slot->original_len,
> +                               const_cast<IORequest&>(slot->type),
> +                               slot->file,
>                                 static_cast<ulint>(slot->offset),
>                                 slot->len));
>         }

Please never use const_cast in a context where the object can be modified.
It is unclear to me why we would need to lose the constness of IORequest here.
(The members of IORequest do not need changes. Writes through pointers
in IORequest are allowed even if IORequest itself is const.)

> @@ -1659,85 +1715,78 @@ os_file_read_string(
> +               ulint amount = trim_len / type.block_size();
> +               switch(type.block_size()) {
> +               case 512:
> +                       srv_stats.page_compression_trim_sect512.add(amount);
> +                       break;

Can we store the log2 of the block_size somewhere and use an array of
compression attributes, to avoid so much repeated code?

> @@ -4252,6 +4300,7 @@ os_file_punch_hole_win32(
>                 fh, FSCTL_SET_ZERO_DATA, &punch, sizeof(punch),
>                 NULL, 0, &temp, NULL);
>
> +
>         return(!result ? DB_IO_NO_PUNCH_HOLE : DB_SUCCESS);
>  }
>

One empty line is enough.

 Marko
-- 
DON’T MISS
M|17
April 11 - 12, 2017
The Conrad Hotel
New York City
https://m17.mariadb.com/

Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation