← Back to team overview

maria-discuss team mailing list archive

Re: fsync necessary for synchronous page flush?

 

Undo logs log only a subset of a database instance. And, since their
purpose is different, by the time of crash recovery the undo logs
might be purged.

2015-05-10 2:57 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> Laurynas,
>
> We cannot recover from a torn page only using redo log. But wouldn't undo
> log record enough information for recovery in the case of a torn page? Undo
> log should have old values of affected rows. So shouldn't it be enough to
> recover a torn page using information from undo log?
>
> Xiaofei
>
> On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis
> <laurynas.biveinis@xxxxxxxxx> wrote:
>>
>> Xiaofei -
>>
>> We can indeed detect the torn page write without the doublewrite
>> buffer (and WebScaleSQL has a patch utilising this observation). But
>> we need not only to detect, but to recover the page as well. And
>> without the doublewrite, if we discard the page, we have nothing: a
>> half-old half-new page on the disk and the redo log records for that
>> page are not enough to recover it.
>>
>> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>> > Justin,
>> >
>> > I think the fsync I was concerning and the torn page problem are two
>> > different things. But now I have a question about double write buffer.
>> > If we
>> > can detect a torn page by checking the top and bottom of a page, why
>> > would
>> > we still need double write buffer? If the page is consistent, then we
>> > use
>> > it, otherwise, we just discard it. Maybe this is a naive question. But
>> > please let me know. Thanks.
>> >
>> > Xiaofei
>> >
>> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> The log does not have whole pages.  Pages must not be torn for the
>> >> recovery process to work.  A fsync is required when a page is written
>> >> to
>> >> disk.  During recovery all changes since the last checkpoint are
>> >> replayed,
>> >> then transactions that do not have a commit marker are rolled back.
>> >> This is
>> >> called roll forward/roll back recovery.
>> >>
>> >> --Justin
>> >>
>> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
>> >> wrote:
>> >>>
>> >>> Justin,
>> >>>
>> >>> I was thinking of if fsync is needed each time after a write. The
>> >>> operations are already in the log. So recovery can always be done from
>> >>> the
>> >>> log. The difference is that during recovery, we need to go back
>> >>> further in
>> >>> the log and it will take longer. But in that way, I guess it would be
>> >>> hard
>> >>> to coordinate with the kernel flush thread.
>> >>>
>> >>> Xiaofei
>> >>>
>> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenlion@xxxxxxxxx>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> InnoDB recovery can not handle torn pages.  An fsync is required to
>> >>>> ensure that the page is fully written to disk.  This is also why the
>> >>>> doublewrite buffer is used.  Before pages are written down to disk,
>> >>>> they are
>> >>>> first written sequentially into the doublewrite buffer.  This buffer
>> >>>> is
>> >>>> synced, then async page writing can proceed.  If the database
>> >>>> crashes, the
>> >>>> pages in flight will be rewritten by the doublewrite buffer.  The
>> >>>> detection
>> >>>> mechanism for torn pages comes from an LSN, which is written into the
>> >>>> top
>> >>>> and the bottom of the page.  If the LSN at the top and bottom do not
>> >>>> match
>> >>>> the page is torn.
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> --Justin
>> >>>>
>> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
>> >>>> wrote:
>> >>>>>
>> >>>>> Laurynas,
>> >>>>>
>> >>>>> This is exactly what I was looking for. I went through these
>> >>>>> functions
>> >>>>> before. I disabled double write buffer, so I didn't pay attention to
>> >>>>> code
>> >>>>> under buf_dblwr... The reason I asked this question is because I
>> >>>>> didn't know
>> >>>>> how the recovery process works, so I was wondering if it's necessary
>> >>>>> to
>> >>>>> fsync after each write. It's a performance concern. Anyway, thank
>> >>>>> you very
>> >>>>> much!
>> >>>>>
>> >>>>> Jan -- Thank you for your answer too!
>> >>>>>
>> >>>>> Xiaofei
>> >>>>>
>> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis
>> >>>>> <laurynas.biveinis@xxxxxxxxx> wrote:
>> >>>>>>
>> >>>>>> Xiaofei -
>> >>>>>>
>> >>>>>> fsync is performed for all the flush types (LRU, flush, single
>> >>>>>> page)
>> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The
>> >>>>>> apparent difference in sync and async is not because of the sync
>> >>>>>> difference itself, but because of the flush type difference. The
>> >>>>>> single page flush flushes one page, and requests a fsync for its
>> >>>>>> file.
>> >>>>>> Other flushes flush in batches, don't have to fsync for each
>> >>>>>> written
>> >>>>>> page individually but rather sync once at the end. Then doublewrite
>> >>>>>> complicates this further. If it is disabled, fsync will happen in
>> >>>>>> buf_dblwr_sync_datafiles called from
>> >>>>>> buf_dblwr_flush_buffered_writes
>> >>>>>> called from buf_flush_common called at the end of either LRU or
>> >>>>>> flush
>> >>>>>> list flush. If doublewrite is enabled, fsync will happen in
>> >>>>>> buf_dblwr_update called from buf_flush_write_complete.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
>> >>>>>> > Hi Laurynas,
>> >>>>>> >
>> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
>> >>>>>> > <laurynas.biveinis@xxxxxxxxx> wrote:
>> >>>>>> >>
>> >>>>>> >> Xiaofei -
>> >>>>>> >>
>> >>>>>> >> > Does InnoDB maintain a dirty
>> >>>>>> >> > page table?
>> >>>>>> >>
>> >>>>>> >> You must be referring to the buffer pool flush_list.
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > You are right. The flush_list is can be used for recovery and
>> >>>>>> > checkpoint.
>> >>>>>> >
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> > Is fsync called to guarantee the page to be on persistent
>> >>>>>> >> > storage so that the dirty page table can be updated? If this
>> >>>>>> >> > is
>> >>>>>> >> > the
>> >>>>>> >> > case,
>> >>>>>> >> > when is the dirty page table updated for asynchronous IOs?
>> >>>>>> >>
>> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is
>> >>>>>> >> called from buf_page_io_complete in buf0buf.cc.
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > You are right that this is the place it updates the dirty page
>> >>>>>> > information.
>> >>>>>> > But I still don't understand why the fsync is needed for
>> >>>>>> > synchronous
>> >>>>>> > IOs,
>> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for
>> >>>>>> > other AIO
>> >>>>>> > operations. But I could only it true in one of many AIO
>> >>>>>> > operations.
>> >>>>>> > Or maybe
>> >>>>>> > I am missing something still?
>> >>>>>> >
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >> --
>> >>>>>> >> Laurynas
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Laurynas
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> Mailing list: https://launchpad.net/~maria-discuss
>> >>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss
>> >>>>> More help   : https://help.launchpad.net/ListHelp
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Laurynas
>
>



-- 
Laurynas


Follow ups

References