← Back to team overview

maria-discuss team mailing list archive

Re: fsync necessary for synchronous page flush?

 

Laurynas,

We cannot recover from a torn page only using redo log. But wouldn't undo
log record enough information for recovery in the case of a torn page? Undo
log should have old values of affected rows. So shouldn't it be enough to
recover a torn page using information from undo log?

Xiaofei

On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis <
laurynas.biveinis@xxxxxxxxx> wrote:

> Xiaofei -
>
> We can indeed detect the torn page write without the doublewrite
> buffer (and WebScaleSQL has a patch utilising this observation). But
> we need not only to detect, but to recover the page as well. And
> without the doublewrite, if we discard the page, we have nothing: a
> half-old half-new page on the disk and the redo log records for that
> page are not enough to recover it.
>
> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> > Justin,
> >
> > I think the fsync I was concerning and the torn page problem are two
> > different things. But now I have a question about double write buffer.
> If we
> > can detect a torn page by checking the top and bottom of a page, why
> would
> > we still need double write buffer? If the page is consistent, then we use
> > it, otherwise, we just discard it. Maybe this is a naive question. But
> > please let me know. Thanks.
> >
> > Xiaofei
> >
> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@xxxxxxxxx>
> wrote:
> >>
> >> Hi,
> >>
> >> The log does not have whole pages.  Pages must not be torn for the
> >> recovery process to work.  A fsync is required when a page is written to
> >> disk.  During recovery all changes since the last checkpoint are
> replayed,
> >> then transactions that do not have a commit marker are rolled back.
> This is
> >> called roll forward/roll back recovery.
> >>
> >> --Justin
> >>
> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
> >> wrote:
> >>>
> >>> Justin,
> >>>
> >>> I was thinking of if fsync is needed each time after a write. The
> >>> operations are already in the log. So recovery can always be done from
> the
> >>> log. The difference is that during recovery, we need to go back
> further in
> >>> the log and it will take longer. But in that way, I guess it would be
> hard
> >>> to coordinate with the kernel flush thread.
> >>>
> >>> Xiaofei
> >>>
> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenlion@xxxxxxxxx>
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> InnoDB recovery can not handle torn pages.  An fsync is required to
> >>>> ensure that the page is fully written to disk.  This is also why the
> >>>> doublewrite buffer is used.  Before pages are written down to disk,
> they are
> >>>> first written sequentially into the doublewrite buffer.  This buffer
> is
> >>>> synced, then async page writing can proceed.  If the database
> crashes, the
> >>>> pages in flight will be rewritten by the doublewrite buffer.  The
> detection
> >>>> mechanism for torn pages comes from an LSN, which is written into the
> top
> >>>> and the bottom of the page.  If the LSN at the top and bottom do not
> match
> >>>> the page is torn.
> >>>>
> >>>> Regards,
> >>>>
> >>>> --Justin
> >>>>
> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
> >>>> wrote:
> >>>>>
> >>>>> Laurynas,
> >>>>>
> >>>>> This is exactly what I was looking for. I went through these
> functions
> >>>>> before. I disabled double write buffer, so I didn't pay attention to
> code
> >>>>> under buf_dblwr... The reason I asked this question is because I
> didn't know
> >>>>> how the recovery process works, so I was wondering if it's necessary
> to
> >>>>> fsync after each write. It's a performance concern. Anyway, thank
> you very
> >>>>> much!
> >>>>>
> >>>>> Jan -- Thank you for your answer too!
> >>>>>
> >>>>> Xiaofei
> >>>>>
> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis
> >>>>> <laurynas.biveinis@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Xiaofei -
> >>>>>>
> >>>>>> fsync is performed for all the flush types (LRU, flush, single page)
> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The
> >>>>>> apparent difference in sync and async is not because of the sync
> >>>>>> difference itself, but because of the flush type difference. The
> >>>>>> single page flush flushes one page, and requests a fsync for its
> file.
> >>>>>> Other flushes flush in batches, don't have to fsync for each written
> >>>>>> page individually but rather sync once at the end. Then doublewrite
> >>>>>> complicates this further. If it is disabled, fsync will happen in
> >>>>>> buf_dblwr_sync_datafiles called from buf_dblwr_flush_buffered_writes
> >>>>>> called from buf_flush_common called at the end of either LRU or
> flush
> >>>>>> list flush. If doublewrite is enabled, fsync will happen in
> >>>>>> buf_dblwr_update called from buf_flush_write_complete.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> >>>>>> > Hi Laurynas,
> >>>>>> >
> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
> >>>>>> > <laurynas.biveinis@xxxxxxxxx> wrote:
> >>>>>> >>
> >>>>>> >> Xiaofei -
> >>>>>> >>
> >>>>>> >> > Does InnoDB maintain a dirty
> >>>>>> >> > page table?
> >>>>>> >>
> >>>>>> >> You must be referring to the buffer pool flush_list.
> >>>>>> >
> >>>>>> >
> >>>>>> > You are right. The flush_list is can be used for recovery and
> >>>>>> > checkpoint.
> >>>>>> >
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> > Is fsync called to guarantee the page to be on persistent
> >>>>>> >> > storage so that the dirty page table can be updated? If this is
> >>>>>> >> > the
> >>>>>> >> > case,
> >>>>>> >> > when is the dirty page table updated for asynchronous IOs?
> >>>>>> >>
> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is
> >>>>>> >> called from buf_page_io_complete in buf0buf.cc.
> >>>>>> >
> >>>>>> >
> >>>>>> > You are right that this is the place it updates the dirty page
> >>>>>> > information.
> >>>>>> > But I still don't understand why the fsync is needed for
> synchronous
> >>>>>> > IOs,
> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for
> >>>>>> > other AIO
> >>>>>> > operations. But I could only it true in one of many AIO
> operations.
> >>>>>> > Or maybe
> >>>>>> > I am missing something still?
> >>>>>> >
> >>>>>> >>
> >>>>>> >>
> >>>>>> >> --
> >>>>>> >> Laurynas
> >>>>>> >
> >>>>>> >
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Laurynas
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Mailing list: https://launchpad.net/~maria-discuss
> >>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss
> >>>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
>
> --
> Laurynas
>

Follow ups

References