← Back to team overview

maria-discuss team mailing list archive

Re: fsync necessary for synchronous page flush?

 

Laurynas,

Thank you for your explanation. It helps me a lot. Appreciate your help!
Thank everyone else's help also!

Xiaofei

On Sun, May 10, 2015 at 9:53 PM, Laurynas Biveinis <
laurynas.biveinis@xxxxxxxxx> wrote:

> Undo logs log only a subset of a database instance. And, since their
> purpose is different, by the time of crash recovery the undo logs
> might be purged.
>
> 2015-05-10 2:57 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> > Laurynas,
> >
> > We cannot recover from a torn page only using redo log. But wouldn't undo
> > log record enough information for recovery in the case of a torn page?
> Undo
> > log should have old values of affected rows. So shouldn't it be enough to
> > recover a torn page using information from undo log?
> >
> > Xiaofei
> >
> > On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis
> > <laurynas.biveinis@xxxxxxxxx> wrote:
> >>
> >> Xiaofei -
> >>
> >> We can indeed detect the torn page write without the doublewrite
> >> buffer (and WebScaleSQL has a patch utilising this observation). But
> >> we need not only to detect, but to recover the page as well. And
> >> without the doublewrite, if we discard the page, we have nothing: a
> >> half-old half-new page on the disk and the redo log records for that
> >> page are not enough to recover it.
> >>
> >> 2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> >> > Justin,
> >> >
> >> > I think the fsync I was concerning and the torn page problem are two
> >> > different things. But now I have a question about double write buffer.
> >> > If we
> >> > can detect a torn page by checking the top and bottom of a page, why
> >> > would
> >> > we still need double write buffer? If the page is consistent, then we
> >> > use
> >> > it, otherwise, we just discard it. Maybe this is a naive question. But
> >> > please let me know. Thanks.
> >> >
> >> > Xiaofei
> >> >
> >> > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenlion@xxxxxxxxx>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> The log does not have whole pages.  Pages must not be torn for the
> >> >> recovery process to work.  A fsync is required when a page is written
> >> >> to
> >> >> disk.  During recovery all changes since the last checkpoint are
> >> >> replayed,
> >> >> then transactions that do not have a commit marker are rolled back.
> >> >> This is
> >> >> called roll forward/roll back recovery.
> >> >>
> >> >> --Justin
> >> >>
> >> >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du008@xxxxxxxxx>
> >> >> wrote:
> >> >>>
> >> >>> Justin,
> >> >>>
> >> >>> I was thinking of if fsync is needed each time after a write. The
> >> >>> operations are already in the log. So recovery can always be done
> from
> >> >>> the
> >> >>> log. The difference is that during recovery, we need to go back
> >> >>> further in
> >> >>> the log and it will take longer. But in that way, I guess it would
> be
> >> >>> hard
> >> >>> to coordinate with the kernel flush thread.
> >> >>>
> >> >>> Xiaofei
> >> >>>
> >> >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <
> greenlion@xxxxxxxxx>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> InnoDB recovery can not handle torn pages.  An fsync is required to
> >> >>>> ensure that the page is fully written to disk.  This is also why
> the
> >> >>>> doublewrite buffer is used.  Before pages are written down to disk,
> >> >>>> they are
> >> >>>> first written sequentially into the doublewrite buffer.  This
> buffer
> >> >>>> is
> >> >>>> synced, then async page writing can proceed.  If the database
> >> >>>> crashes, the
> >> >>>> pages in flight will be rewritten by the doublewrite buffer.  The
> >> >>>> detection
> >> >>>> mechanism for torn pages comes from an LSN, which is written into
> the
> >> >>>> top
> >> >>>> and the bottom of the page.  If the LSN at the top and bottom do
> not
> >> >>>> match
> >> >>>> the page is torn.
> >> >>>>
> >> >>>> Regards,
> >> >>>>
> >> >>>> --Justin
> >> >>>>
> >> >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <
> xiaofei.du008@xxxxxxxxx>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Laurynas,
> >> >>>>>
> >> >>>>> This is exactly what I was looking for. I went through these
> >> >>>>> functions
> >> >>>>> before. I disabled double write buffer, so I didn't pay attention
> to
> >> >>>>> code
> >> >>>>> under buf_dblwr... The reason I asked this question is because I
> >> >>>>> didn't know
> >> >>>>> how the recovery process works, so I was wondering if it's
> necessary
> >> >>>>> to
> >> >>>>> fsync after each write. It's a performance concern. Anyway, thank
> >> >>>>> you very
> >> >>>>> much!
> >> >>>>>
> >> >>>>> Jan -- Thank you for your answer too!
> >> >>>>>
> >> >>>>> Xiaofei
> >> >>>>>
> >> >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis
> >> >>>>> <laurynas.biveinis@xxxxxxxxx> wrote:
> >> >>>>>>
> >> >>>>>> Xiaofei -
> >> >>>>>>
> >> >>>>>> fsync is performed for all the flush types (LRU, flush, single
> >> >>>>>> page)
> >> >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC).
> The
> >> >>>>>> apparent difference in sync and async is not because of the sync
> >> >>>>>> difference itself, but because of the flush type difference. The
> >> >>>>>> single page flush flushes one page, and requests a fsync for its
> >> >>>>>> file.
> >> >>>>>> Other flushes flush in batches, don't have to fsync for each
> >> >>>>>> written
> >> >>>>>> page individually but rather sync once at the end. Then
> doublewrite
> >> >>>>>> complicates this further. If it is disabled, fsync will happen in
> >> >>>>>> buf_dblwr_sync_datafiles called from
> >> >>>>>> buf_dblwr_flush_buffered_writes
> >> >>>>>> called from buf_flush_common called at the end of either LRU or
> >> >>>>>> flush
> >> >>>>>> list flush. If doublewrite is enabled, fsync will happen in
> >> >>>>>> buf_dblwr_update called from buf_flush_write_complete.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du008@xxxxxxxxx>:
> >> >>>>>> > Hi Laurynas,
> >> >>>>>> >
> >> >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis
> >> >>>>>> > <laurynas.biveinis@xxxxxxxxx> wrote:
> >> >>>>>> >>
> >> >>>>>> >> Xiaofei -
> >> >>>>>> >>
> >> >>>>>> >> > Does InnoDB maintain a dirty
> >> >>>>>> >> > page table?
> >> >>>>>> >>
> >> >>>>>> >> You must be referring to the buffer pool flush_list.
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > You are right. The flush_list is can be used for recovery and
> >> >>>>>> > checkpoint.
> >> >>>>>> >
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >> > Is fsync called to guarantee the page to be on persistent
> >> >>>>>> >> > storage so that the dirty page table can be updated? If this
> >> >>>>>> >> > is
> >> >>>>>> >> > the
> >> >>>>>> >> > case,
> >> >>>>>> >> > when is the dirty page table updated for asynchronous IOs?
> >> >>>>>> >>
> >> >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it
> is
> >> >>>>>> >> called from buf_page_io_complete in buf0buf.cc.
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > You are right that this is the place it updates the dirty page
> >> >>>>>> > information.
> >> >>>>>> > But I still don't understand why the fsync is needed for
> >> >>>>>> > synchronous
> >> >>>>>> > IOs,
> >> >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called
> for
> >> >>>>>> > other AIO
> >> >>>>>> > operations. But I could only it true in one of many AIO
> >> >>>>>> > operations.
> >> >>>>>> > Or maybe
> >> >>>>>> > I am missing something still?
> >> >>>>>> >
> >> >>>>>> >>
> >> >>>>>> >>
> >> >>>>>> >> --
> >> >>>>>> >> Laurynas
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Laurynas
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> _______________________________________________
> >> >>>>> Mailing list: https://launchpad.net/~maria-discuss
> >> >>>>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> >> >>>>> Unsubscribe : https://launchpad.net/~maria-discuss
> >> >>>>> More help   : https://help.launchpad.net/ListHelp
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Laurynas
> >
> >
>
>
>
> --
> Laurynas
>

References