← Back to team overview

maria-developers team mailing list archive

Re: pre-allocating binlog to speed up sync_binlog=1

 

Sergei Golubchik <serg@xxxxxxxxxxxx> writes:

> On Dec 14, Arjen Lentz wrote:

>> Can we adopt/implement http://forge.mysql.com/worklog/task.php?id=4925 in 
>> MariaDB?

> P.S. These are the changesets:
>
> final version:
>   http://lists.mysql.com/commits/113309
>   http://lists.mysql.com/commits/113306
>   http://lists.mysql.com/commits/113307
> review comments:
>   http://lists.mysql.com/commits/116965
>   http://lists.mysql.com/commits/121478

The implementation of this makes me very uneasy.

The problem is that I see nothing that properly handles partial writes into
the binlog, at least from a quick read-through. Neither in the worklog nor in
the patch. Just the fact that this is not clearly described up-front in the
worklog is very worrying!

The worklog says this:

"For replication threads, when reading the latest binary log, getting actual
size information is needed to check EOF [...] If binlog size is not set, 4KB
is read so bogus data is read if actual binlog size is smaller than 4KB. This
makes slave i/o thread terminated)"

But there is no guarantee that "bogus data" will be detected as such. We don't
even have a checksum on events. So basically, after a crash the last binlog
event may be corrupt, with no sure way to detect this corruption.

In other words, we loose crash recovery. Which is the whole point of setting
sync_binlog=1 in the first place.

[I would love to learn that I am wrong, as this is a very nice feature. But
the whole reason fsync() is slow when appending to files is handling the
difficult issue of partial writes, so I would be really curious how the patch
manages to handle this properly.]

 - Kristian.



References