← Back to team overview

maria-developers team mailing list archive

Re: Rev 3712: MDEV-4338 : Support atomic option on directFS/FusionIO

 


> -----Original Message-----
> From: Sergei Golubchik [mailto:serg@xxxxxxxxxxxx]
> Sent: Donnerstag, 4. April 2013 23:12
> To: Vladislav Vaintroub
> Cc: maria-developers@xxxxxxxxxxxxxxxxxxx
> Subject: Re: Rev 3712: MDEV-4338 : Support atomic option on
> directFS/FusionIO
> 

Hi Serg,

<skip>

> > > it'd be nice if you could try os_file_set_atomic_writes() here to
> > > see if that works. This function creates and opens quite a few
> > > files, you could use one of them to mark it for atomic writes, and
> > > if that would fail, you'd disable atomic writes, and wouldn't change
> > > innobase_file_flush_method and innobase_use_doublewrite.
> >
> > It is tricky to do it before files are opened the first time , without
> > moving lot of code around - there is non-trivial parsing of tablespace
names
> > later on in open_or_create_data_files(), just to figure out directories
and
> > filenames.
> 
> I mean, you can try your function on any other file, not necessarily
> on the tablespace. Even on a temporary file, like with
 
>   int fd = mysql_tmpfile("ib");
>   if (os_file_set_atomic_writes(fd)) ...
>   my_close(fd);

Right, but the subtlety here is that test file needs to be on the same
device as ibdata1 tablespace, and figuring out  the correct directory is not
trivial .

Apart from easy default situation, where  ibdata lands in the into datadir,
there is innodb_data_home_dir , as well as innodb_data_file_path (this one
needs to be parsed). Not to forget possible symbolic links . Imagine ibdata1
is placed on atomic-capable filesystem, and symlink to it into datadir.

So I'd still say this is tricky..
 
> > > Btw, why not to use posix_fallocate whenever it's available?
> > > Or, at least, with its own --innodb-use-fallocate option?
> >
> > Yes, I guess it (the new option) is a good idea.  I created a followup
patch
> > that introduces the option
> > http://lists.askmonty.org/pipermail/commits/2013-April/004569.html . I
set
> > default to ON.  What do you think?
> 
> There are two related threads:
> 
> https://lists.launchpad.net/maria-developers/msg05068.html
> and the one in internals@ mysql list, starting from
> http://lists.mysql.com/internals/38679
> 
> In particular, I noticed this part
> "
>   I relied on that fallocate does not need fsync since metadata is
>   protected by filesystem journal. But I am not confident whether it is
>   true. I'm wondering if this patch may lead InnoDB committing schema to
>   not function normally.
> 
> What do you know about it, does one need to sync after posix_fallocate()?
> What if the filesystem is not journalling?

This is a good question, and I  do not think I have enough knowledge to
answer that one :)

note, that fsync()  is done anyway almost everywhere after
os_set_file_size()
1. os_set_file_size  during creation single tablespace (file-per-table I
believe) is  followed almost immediately by os_file_flush()
2. os_set_file_size   when  tablespace is extended, is followed by
fil_flush.

When innodb starts up for the first time (bootstrap), it is a little bit
different -   log file or tablespace are created , os_set_file_size() is
called, and then the file is closed,  but not flushed.  My feeling is that
it is ok, hoping at least on close() the metadata will be flushed.  And even
if not, in case of the probably worst scenario that  machine crashes during
bootstrap, - in this case user data is not lost , as there is no user data
yet.


Having said all this, I would not  mind to adding an extra fsync() to the
function, if this makes feel safer for someone. I think the overhead of it
would be minimal.


> Regards,
> Sergei




Follow ups

References