← Back to team overview

maria-discuss team mailing list archive

Re: fsync necessary for synchronous page flush?

 

 If the device and the filesystem provide the guarantees, then yes:
http://www.percona.com/doc/percona-server/5.5/performance/atomic_fio.html,
but not in the general case.

2015-05-10 9:12 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> I came across some slides by Percona CEO.
> https://www.percona.com/live/mysql-conference-2015/sites/default/files/slides/PLMCE2015-SSD-For-MySQL.pdf
> On page 45, It says "Flash can avoid this with little cost due to internal
> design". Does this mean we can disable doublewrite buffer for safe? Thanks.
>
> Xiaofei
>
> On Sat, May 9, 2015 at 4:57 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx> wrote:
>>
>> Laurynas,
>>
>> We cannot recover from a torn page only using redo log. But wouldn't undo
>> log record enough information for recovery in the case of a torn page? Undo
>> log should have old values of affected rows. So shouldn't it be enough to
>> recover a torn page using information from undo log?
>>
>> Xiaofei
>>
>> On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis
>> <laurynas.biveinis@xxxxxxxxx> wrote:
>>>
>>> Xiaofei -
>>>
>>> We can indeed detect the torn page write without the doublewrite
>>> buffer (and WebScaleSQL has a patch utilising this observation). But
>>> we need not only to detect, but to recover the page as well. And
>>> without the doublewrite, if we discard the page, we have nothing: a
>>> half-old half-new page on the disk and the redo log records for that
>>> page are not enough to recover it.
>>>
>>> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>>> > Justin,
>>> >
>>> > I think the fsync I was concerning and the torn page problem are two
>>> > different things. But now I have a question about double write buffer.
>>> > If we
>>> > can detect a torn page by checking the top and bottom of a page, why
>>> > would
>>> > we still need double write buffer? If the page is consistent, then we
>>> > use
>>> > it, otherwise, we just discard it. Maybe this is a naive question. But
>>> > please let me know. Thanks.
>>> >
>>> > Xiaofei
>>> >
>>> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> The log does not have whole pages.  Pages must not be torn for the
>>> >> recovery process to work.  A fsync is required when a page is written
>>> >> to
>>> >> disk.  During recovery all changes since the last checkpoint are
>>> >> replayed,
>>> >> then transactions that do not have a commit marker are rolled back.
>>> >> This is
>>> >> called roll forward/roll back recovery.
>>> >>
>>> >> --Justin
>>> >>
>>> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
>>> >> wrote:
>>> >>>
>>> >>> Justin,
>>> >>>
>>> >>> I was thinking of if fsync is needed each time after a write. The
>>> >>> operations are already in the log. So recovery can always be done
>>> >>> from the
>>> >>> log. The difference is that during recovery, we need to go back
>>> >>> further in
>>> >>> the log and it will take longer. But in that way, I guess it would be
>>> >>> hard
>>> >>> to coordinate with the kernel flush thread.
>>> >>>
>>> >>> Xiaofei
>>> >>>
>>> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> InnoDB recovery can not handle torn pages.  An fsync is required to
>>> >>>> ensure that the page is fully written to disk.  This is also why the
>>> >>>> doublewrite buffer is used.  Before pages are written down to disk,
>>> >>>> they are
>>> >>>> first written sequentially into the doublewrite buffer.  This buffer
>>> >>>> is
>>> >>>> synced, then async page writing can proceed.  If the database
>>> >>>> crashes, the
>>> >>>> pages in flight will be rewritten by the doublewrite buffer.  The
>>> >>>> detection
>>> >>>> mechanism for torn pages comes from an LSN, which is written into
>>> >>>> the top
>>> >>>> and the bottom of the page.  If the LSN at the top and bottom do not
>>> >>>> match
>>> >>>> the page is torn.
>>> >>>>
>>> >>>> Regards,
>>> >>>>
>>> >>>> --Justin
>>> >>>>
>>> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du
>>> >>>> <xiaofei.du008@xxxxxxxxx>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Laurynas,
>>> >>>>>
>>> >>>>> This is exactly what I was looking for. I went through these
>>> >>>>> functions
>>> >>>>> before. I disabled double write buffer, so I didn't pay attention
>>> >>>>> to code
>>> >>>>> under buf_dblwr... The reason I asked this question is because I
>>> >>>>> didn't know
>>> >>>>> how the recovery process works, so I was wondering if it's
>>> >>>>> necessary to
>>> >>>>> fsync after each write. It's a performance concern. Anyway, thank
>>> >>>>> you very
>>> >>>>> much!
>>> >>>>>
>>> >>>>> Jan -- Thank you for your answer too!
>>> >>>>>
>>> >>>>> Xiaofei
>>> >>>>>
>>> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis
>>> >>>>> <laurynas.biveinis@xxxxxxxxx> wrote:
>>> >>>>>>
>>> >>>>>> Xiaofei -
>>> >>>>>>
>>> >>>>>> fsync is performed for all the flush types (LRU, flush, single
>>> >>>>>> page)
>>> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The
>>> >>>>>> apparent difference in sync and async is not because of the sync
>>> >>>>>> difference itself, but because of the flush type difference. The
>>> >>>>>> single page flush flushes one page, and requests a fsync for its
>>> >>>>>> file.
>>> >>>>>> Other flushes flush in batches, don't have to fsync for each
>>> >>>>>> written
>>> >>>>>> page individually but rather sync once at the end. Then
>>> >>>>>> doublewrite
>>> >>>>>> complicates this further. If it is disabled, fsync will happen in
>>> >>>>>> buf_dblwr_sync_datafiles called from
>>> >>>>>> buf_dblwr_flush_buffered_writes
>>> >>>>>> called from buf_flush_common called at the end of either LRU or
>>> >>>>>> flush
>>> >>>>>> list flush. If doublewrite is enabled, fsync will happen in
>>> >>>>>> buf_dblwr_update called from buf_flush_write_complete.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>>> >>>>>> > Hi Laurynas,
>>> >>>>>> >
>>> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
>>> >>>>>> > <laurynas.biveinis@xxxxxxxxx> wrote:
>>> >>>>>> >>
>>> >>>>>> >> Xiaofei -
>>> >>>>>> >>
>>> >>>>>> >> > Does InnoDB maintain a dirty
>>> >>>>>> >> > page table?
>>> >>>>>> >>
>>> >>>>>> >> You must be referring to the buffer pool flush_list.
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > You are right. The flush_list is can be used for recovery and
>>> >>>>>> > checkpoint.
>>> >>>>>> >
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >> > Is fsync called to guarantee the page to be on persistent
>>> >>>>>> >> > storage so that the dirty page table can be updated? If this
>>> >>>>>> >> > is
>>> >>>>>> >> > the
>>> >>>>>> >> > case,
>>> >>>>>> >> > when is the dirty page table updated for asynchronous IOs?
>>> >>>>>> >>
>>> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it
>>> >>>>>> >> is
>>> >>>>>> >> called from buf_page_io_complete in buf0buf.cc.
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>> > You are right that this is the place it updates the dirty page
>>> >>>>>> > information.
>>> >>>>>> > But I still don't understand why the fsync is needed for
>>> >>>>>> > synchronous
>>> >>>>>> > IOs,
>>> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called
>>> >>>>>> > for
>>> >>>>>> > other AIO
>>> >>>>>> > operations. But I could only it true in one of many AIO
>>> >>>>>> > operations.
>>> >>>>>> > Or maybe
>>> >>>>>> > I am missing something still?
>>> >>>>>> >
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >> --
>>> >>>>>> >> Laurynas
>>> >>>>>> >
>>> >>>>>> >
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Laurynas
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> _______________________________________________
>>> >>>>> Mailing list: https://launchpad.net/~maria-discuss
>>> >>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>> >>>>> More help   : https://help.launchpad.net/ListHelp
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> Laurynas
>>
>>
>



-- 
Laurynas


References