← Back to team overview

maria-discuss team mailing list archive

Re: fsync necessary for synchronous page flush?

 

Justin,

I think the fsync I was concerning and the torn page problem are two
different things. But now I have a question about double write buffer. If
we can detect a torn page by checking the top and bottom of a page, why
would we still need double write buffer? If the page is consistent, then we
use it, otherwise, we just discard it. Maybe this is a naive question. But
please let me know. Thanks.

Xiaofei

On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@xxxxxxxxx> wrote:

> Hi,
>
> The log does not have whole pages.  Pages must not be torn for the
> recovery process to work.  A fsync is required when a page is written to
> disk.  During recovery all changes since the last checkpoint are replayed,
> then transactions that do not have a commit marker are rolled back.  This
> is called roll forward/roll back recovery.
>
> --Justin
>
> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
> wrote:
>
>> Justin,
>>
>> I was thinking of if fsync is needed each time after a write. The
>> operations are already in the log. So recovery can always be done from the
>> log. The difference is that during recovery, we need to go back further in
>> the log and it will take longer. But in that way, I guess it would be hard
>> to coordinate with the kernel flush thread.
>>
>> Xiaofei
>>
>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>> wrote:
>>
>>> Hi,
>>>
>>> InnoDB recovery can not handle torn pages.  An fsync is required to
>>> ensure that the page is fully written to disk.  This is also why the
>>> doublewrite buffer is used.  Before pages are written down to disk, they
>>> are first written sequentially into the doublewrite buffer.  This buffer is
>>> synced, then async page writing can proceed.  If the database crashes, the
>>> pages in flight will be rewritten by the doublewrite buffer.  The detection
>>> mechanism for torn pages comes from an LSN, which is written into the top
>>> and the bottom of the page.  If the LSN at the top and bottom do not match
>>> the page is torn.
>>>
>>> Regards,
>>>
>>> --Justin
>>>
>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
>>> wrote:
>>>
>>>> Laurynas,
>>>>
>>>> This is exactly what I was looking for. I went through these functions
>>>> before. I disabled double write buffer, so I didn't pay attention to code
>>>> under buf_dblwr... The reason I asked this question is because I didn't
>>>> know how the recovery process works, so I was wondering if it's necessary
>>>> to fsync after each write. It's a performance concern. Anyway, thank you
>>>> very much!
>>>>
>>>> Jan -- Thank you for your answer too!
>>>>
>>>> Xiaofei
>>>>
>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis <
>>>> laurynas.biveinis@xxxxxxxxx> wrote:
>>>>
>>>>> Xiaofei -
>>>>>
>>>>> fsync is performed for all the flush types (LRU, flush, single page)
>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The
>>>>> apparent difference in sync and async is not because of the sync
>>>>> difference itself, but because of the flush type difference. The
>>>>> single page flush flushes one page, and requests a fsync for its file.
>>>>> Other flushes flush in batches, don't have to fsync for each written
>>>>> page individually but rather sync once at the end. Then doublewrite
>>>>> complicates this further. If it is disabled, fsync will happen in
>>>>> buf_dblwr_sync_datafiles called from buf_dblwr_flush_buffered_writes
>>>>> called from buf_flush_common called at the end of either LRU or flush
>>>>> list flush. If doublewrite is enabled, fsync will happen in
>>>>> buf_dblwr_update called from buf_flush_write_complete.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>>>>> > Hi Laurynas,
>>>>> >
>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
>>>>> > <laurynas.biveinis@xxxxxxxxx> wrote:
>>>>> >>
>>>>> >> Xiaofei -
>>>>> >>
>>>>> >> > Does InnoDB maintain a dirty
>>>>> >> > page table?
>>>>> >>
>>>>> >> You must be referring to the buffer pool flush_list.
>>>>> >
>>>>> >
>>>>> > You are right. The flush_list is can be used for recovery and
>>>>> checkpoint.
>>>>> >
>>>>> >>
>>>>> >>
>>>>> >> > Is fsync called to guarantee the page to be on persistent
>>>>> >> > storage so that the dirty page table can be updated? If this is
>>>>> the
>>>>> >> > case,
>>>>> >> > when is the dirty page table updated for asynchronous IOs?
>>>>> >>
>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is
>>>>> >> called from buf_page_io_complete in buf0buf.cc.
>>>>> >
>>>>> >
>>>>> > You are right that this is the place it updates the dirty page
>>>>> information.
>>>>> > But I still don't understand why the fsync is needed for synchronous
>>>>> IOs,
>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for
>>>>> other AIO
>>>>> > operations. But I could only it true in one of many AIO operations.
>>>>> Or maybe
>>>>> > I am missing something still?
>>>>> >
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Laurynas
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Laurynas
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~maria-discuss
>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>>
>>
>

Follow ups

References