maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #07997
Re: JIRA ping
Hi again,
i addressed your comments and I uploaded 2 new patches to JIRA.
1) a new "complete patch"
2) a patch that is changes from v1 to v2.
comments inline below:
looking forward to more feedback
/Jonas
On Mon, Dec 8, 2014 at 2:01 PM, Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
wrote:
> Jonas Oreland <jonaso@xxxxxxxxxx> writes:
>
> > I light of recent "I did see any of your comments until by accident just
> > now",
> > I now mail you directly that I updated both MDEV-162 and MDEV-7257.
>
> Sorry for the late answer, I do not follow _all_ Jira activity, so I did
> not
> see your comments on those issues (I do read all of maria-developers@
> though).
>
> I think the patch looks good. That binlog send code has long needed a
> cleanup,
> lots of duplicated code in there and other ugly stuff.
>
> A few comments:
>
> > diff --git a/mysql-test/suite/rpl/r/rpl_checksum.result
> b/mysql-test/suite/rpl/r/rpl_checksum.result
> > index d88258f..b736dbc 100644
> > --- a/mysql-test/suite/rpl/r/rpl_checksum.result
> > +++ b/mysql-test/suite/rpl/r/rpl_checksum.result
>
> > -Last_IO_Error = 'Got fatal error 1236 from master when reading data
> from binary log: 'Slave can not handle replication events with the checksum
> that master is configured to log; the first event 'master-bin.000009' at
> 367, the last event read from 'master-bin.000010' at 248, the last byte
> read from 'master-bin.000010' at 248.''
> > +Last_IO_Error = 'Got fatal error 1236 from master when reading data
> from binary log: 'Slave can not handle replication events with the checksum
> that master is configured to log; the first event 'master-bin.000009' at
> 367, the last event read from 'master-bin.000010' at 367, the last byte
> read from 'master-bin.000010' at 411.''
>
> Why does the binlog position change here? (It's not necessarily a problem,
> and
> usually it's a bad idea anyway for test cases to depend on binlog
> positions. But it was hard to follow the details of the patch due to so
> much
> code moving around, so I want to understand why...)
>
thanks!
the problem was that I didn't update info->last_pos and linfo->pos
correctly in send_format_descriptor_event
so the offsets were wrong (in the old file)...how ever the result file
still changes, since the old code was also incorrect :-)
The new error is:
Last_IO_Error = 'Got fatal error 1236 from master when reading data from
binary log: 'Slave can not handle replication events with the checksum that
master is configured to log; the first event 'master-bin.000009' at 367,
the last event read from 'master-bin.000010' at 4, the last byte read from
'master-bin.000010' at 248.''
which is entirely correct, the last read event started at offset 4 and
ended at offset 248...and the slave requested to start at master-bin.000009
at 367
---
this also made me find a bug that when seeking i forgot to update linfo->pos
> > diff --git a/sql/log.cc b/sql/log.cc
> > index 2c20ea3..d10c1ad 100644
> > --- a/sql/log.cc
> > +++ b/sql/log.cc
>
> > + /**
> > + * TODO(jonaso): Check with Kristian, should last_commit_pos really
> > + * be updated for relay logs!!!
> > + */
> > + if (!is_relay_log)
> > + {
> > + /* update binlog_end_pos so that it can be read by after sync
> hook */
> > + reset_binlog_end_pos(log_file_name, offset);
> > + }
> > +
> > mysql_mutex_lock(&LOCK_commit_ordered);
> > strmake_buf(last_commit_pos_file, log_file_name);
> > - last_commit_pos_offset= my_b_tell(&log_file);
> > + last_commit_pos_offset= offset;
> > mysql_mutex_unlock(&LOCK_commit_ordered);
>
> I agree, it is not necessary to update last_commit_pos for relay log. So it
> could be moved into if (!is_relay_log) { ... }.
>
done.
>
> > @@ -4797,6 +4815,10 @@ void MYSQL_BIN_LOG::make_log_name(char* buf,
> const char* log_ident)
> >
> > bool MYSQL_BIN_LOG::is_active(const char *log_file_name_arg)
> > {
> > + /**
> > + * there should/must be a mysql_mutex_assert_owner(&LOCK_log) here...
> > + * but code violates this all over! (scary monsters and super creeps!)
> > + */
> > return !strcmp(log_file_name, log_file_name_arg);
> > }
>
> Hm. I guess this isn't a new problem? Though with the reduced locking of
> LOCK_log, it might have a higher risk of triggering?
>
> But if I understand the patch correctly, then it actually fixes this with
> respect to the binlog. Because now it uses the new binlog_end_pos, it seems
> there are no more calls to is_active() on the master.
>
> And it looks like on the slave side, it does hold LOCK_log while doing
> is_active().
>
> So is this comment still appropriate? And if so, maybe it could be
> clarified
> instead to specify exactly where the code is that violates
>
i'll recheck.
...time passes...
...checking...
...time passes...
ok, I readded the assert,
and the code crashed on the slave side.
I updated comment with stack trace,
but left it at that.
> > @@ -7387,6 +7434,10 @@
> MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
> >
> > DEBUG_SYNC(leader->thd, "commit_before_get_LOCK_commit_ordered");
> > mysql_mutex_lock(&LOCK_commit_ordered);
> > + /**
> > + * TODO(jonaso): Check with Kristian,
> > + * if we rotate:d above, this offset is "wrong"
> > + */
> > last_commit_pos_offset= commit_offset;
>
> Agree, good catch, I think.
> If we rotate, we already update last_commit_pos_file and _offset. So the
> code
> needs to remember if it rotated, and skip the errorneous offset update in
> this
> case.
>
yes,
> Do you want to fix this, as part of the patch? Or should I take it?
>
you take
>
> > @@ -811,6 +817,67 @@ class MYSQL_BIN_LOG: public TC_LOG, private
> MYSQL_LOG
>
> > + mysql_mutex_assert_not_owner(&LOCK_binlog_end_pos);
> > + lock_binlog_end_pos();
>
> You shouldn't need the assert_not_owner() I think, safe_mutex will already
> complain if a thread tries to multiple lock a mutex.
>
i see it as easy to read documentation...
>
> > + The difference between this and last_commit_pos_{file,offset} is
> that
> > + the commit position is updated later. If semi-sync wait point is set
> > + to WAIT_AFTER_SYNC, the commit pos is update after semi-sync-ack has
> > + been received and the end point is updated after the write as it's
> needed
> > + for the dump threads to be able to semi-sync the event.
> > + */
> > + my_off_t binlog_end_pos;
> > + char binlog_end_pos_file[FN_REFLEN];
>
> Right... last_commit_pos is about commits in the storage engine. While
> binlog_end_pos is about writes to the binlog. Agree.
>
> > + strncpy(info->start_log_file_name, log_ident,
> > + sizeof(info->start_log_file_name));
> > + info->start_log_file_name[sizeof(info->start_log_file_name)-1]= 0;
>
> I think the better way to do this in MySQL/MariaDB sources is
>
> strmake(info->start_log_file_name, log_ident,
> sizeof(info->start_log_file_name)-1);
>
>
done
> ------
>
> So I think the patch looks good. And getting rid of LOCK_log for every
> event
> sent from master to a slave, that can't be bad. So let's include it.
>
> So how do we do it? You mentioned wanting to avoid large intrusive diff in
> your tree relative to MariaDB upstream, which makes sense. But I suppose
> this
> is a large change to put in stable 10.0... Is it sufficient to include in
> 10.1/10.2, so that any new changes (which will happen only on top of this
> patch) will not conflict?
>
> (I think I also will want it in my 10.0-based replication tree, which will
> not
> be pushed to 10.0 main, but now includes the new parallel replication
> stuff...).
>
> - Kristian.
>
i think i'm fine either way, put it wherever you like.
(at least currently I'm mainly interested in the review feedback)
/Jonas
Follow ups
References