← Back to team overview

maria-discuss team mailing list archive

Re: fsync necessary for synchronous page flush?

 

I came across some slides by Percona CEO.
https://www.percona.com/live/mysql-conference-2015/sites/default/files/slides/PLMCE2015-SSD-For-MySQL.pdf
On page 45, It says "Flash can avoid this with little cost due to internal
design". Does this mean we can disable doublewrite buffer for safe? Thanks.

Xiaofei

On Sat, May 9, 2015 at 4:57 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx> wrote:

> Laurynas,
>
> We cannot recover from a torn page only using redo log. But wouldn't undo
> log record enough information for recovery in the case of a torn page? Undo
> log should have old values of affected rows. So shouldn't it be enough to
> recover a torn page using information from undo log?
>
> Xiaofei
>
> On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis <
> laurynas.biveinis@xxxxxxxxx> wrote:
>
>> Xiaofei -
>>
>> We can indeed detect the torn page write without the doublewrite
>> buffer (and WebScaleSQL has a patch utilising this observation). But
>> we need not only to detect, but to recover the page as well. And
>> without the doublewrite, if we discard the page, we have nothing: a
>> half-old half-new page on the disk and the redo log records for that
>> page are not enough to recover it.
>>
>> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>> > Justin,
>> >
>> > I think the fsync I was concerning and the torn page problem are two
>> > different things. But now I have a question about double write buffer.
>> If we
>> > can detect a torn page by checking the top and bottom of a page, why
>> would
>> > we still need double write buffer? If the page is consistent, then we
>> use
>> > it, otherwise, we just discard it. Maybe this is a naive question. But
>> > please let me know. Thanks.
>> >
>> > Xiaofei
>> >
>> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> The log does not have whole pages.  Pages must not be torn for the
>> >> recovery process to work.  A fsync is required when a page is written
>> to
>> >> disk.  During recovery all changes since the last checkpoint are
>> replayed,
>> >> then transactions that do not have a commit marker are rolled back.
>> This is
>> >> called roll forward/roll back recovery.
>> >>
>> >> --Justin
>> >>
>> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Justin,
>> >>>
>> >>> I was thinking of if fsync is needed each time after a write. The
>> >>> operations are already in the log. So recovery can always be done
>> from the
>> >>> log. The difference is that during recovery, we need to go back
>> further in
>> >>> the log and it will take longer. But in that way, I guess it would be
>> hard
>> >>> to coordinate with the kernel flush thread.
>> >>>
>> >>> Xiaofei
>> >>>
>> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> InnoDB recovery can not handle torn pages.  An fsync is required to
>> >>>> ensure that the page is fully written to disk.  This is also why the
>> >>>> doublewrite buffer is used.  Before pages are written down to disk,
>> they are
>> >>>> first written sequentially into the doublewrite buffer.  This buffer
>> is
>> >>>> synced, then async page writing can proceed.  If the database
>> crashes, the
>> >>>> pages in flight will be rewritten by the doublewrite buffer.  The
>> detection
>> >>>> mechanism for torn pages comes from an LSN, which is written into
>> the top
>> >>>> and the bottom of the page.  If the LSN at the top and bottom do not
>> match
>> >>>> the page is torn.
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> --Justin
>> >>>>
>> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> Laurynas,
>> >>>>>
>> >>>>> This is exactly what I was looking for. I went through these
>> functions
>> >>>>> before. I disabled double write buffer, so I didn't pay attention
>> to code
>> >>>>> under buf_dblwr... The reason I asked this question is because I
>> didn't know
>> >>>>> how the recovery process works, so I was wondering if it's
>> necessary to
>> >>>>> fsync after each write. It's a performance concern. Anyway, thank
>> you very
>> >>>>> much!
>> >>>>>
>> >>>>> Jan -- Thank you for your answer too!
>> >>>>>
>> >>>>> Xiaofei
>> >>>>>
>> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis
>> >>>>> <laurynas.biveinis@xxxxxxxxx> wrote:
>> >>>>>>
>> >>>>>> Xiaofei -
>> >>>>>>
>> >>>>>> fsync is performed for all the flush types (LRU, flush, single
>> page)
>> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The
>> >>>>>> apparent difference in sync and async is not because of the sync
>> >>>>>> difference itself, but because of the flush type difference. The
>> >>>>>> single page flush flushes one page, and requests a fsync for its
>> file.
>> >>>>>> Other flushes flush in batches, don't have to fsync for each
>> written
>> >>>>>> page individually but rather sync once at the end. Then doublewrite
>> >>>>>> complicates this further. If it is disabled, fsync will happen in
>> >>>>>> buf_dblwr_sync_datafiles called from
>> buf_dblwr_flush_buffered_writes
>> >>>>>> called from buf_flush_common called at the end of either LRU or
>> flush
>> >>>>>> list flush. If doublewrite is enabled, fsync will happen in
>> >>>>>> buf_dblwr_update called from buf_flush_write_complete.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>> >>>>>> > Hi Laurynas,
>> >>>>>> >
>> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
>> >>>>>> > <laurynas.biveinis@xxxxxxxxx> wrote:
>> >>>>>> >>
>> >>>>>> >> Xiaofei -
>> >>>>>> >>
>> >>>>>> >> > Does InnoDB maintain a dirty
>> >>>>>> >> > page table?
>> >>>>>> >>
>> >>>>>> >> You must be referring to the buffer pool flush_list.
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > You are right. The flush_list is can be used for recovery and
>> >>>>>> > checkpoint.
>> >>>>>> >
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> > Is fsync called to guarantee the page to be on persistent
>> >>>>>> >> > storage so that the dirty page table can be updated? If this
>> is
>> >>>>>> >> > the
>> >>>>>> >> > case,
>> >>>>>> >> > when is the dirty page table updated for asynchronous IOs?
>> >>>>>> >>
>> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is
>> >>>>>> >> called from buf_page_io_complete in buf0buf.cc.
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > You are right that this is the place it updates the dirty page
>> >>>>>> > information.
>> >>>>>> > But I still don't understand why the fsync is needed for
>> synchronous
>> >>>>>> > IOs,
>> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for
>> >>>>>> > other AIO
>> >>>>>> > operations. But I could only it true in one of many AIO
>> operations.
>> >>>>>> > Or maybe
>> >>>>>> > I am missing something still?
>> >>>>>> >
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> --
>> >>>>>> >> Laurynas
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Laurynas
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Mailing list: https://launchpad.net/~maria-discuss
>> >>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>> >>>>> More help   : https://help.launchpad.net/ListHelp
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Laurynas
>>
>
>

Follow ups

References