maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09860
Re: Status of InnoDB 5.7 merge to 10.2
Jan Lindström <jan.lindstrom@xxxxxxxxxxx> writes:
> I have reached to situation where I need help fixing remaining test
> failures and mentoring is current status good enough for 10.2 Beta. I need
> help on test failures on parts where I do not follow why test is failing or
> server crashes I do not know how to fix.
>
> First of all current merge result is on bb-10.2-jan with
> commit ca4f194460da469e29624d930e6d0027349d047c
>
> Higher level of status is on https://jira.mariadb.org/browse/MDEV-10024
>
> Note that mtr thinks currently that innodb_plugin compiled using static
> linking is xtradb (but not have_xtradb as xtradb is not compiled).
> Furthermore galera is not compiled (waiting for codership).
>
> Here are current issues where I need help and possible persons who could
> help:
>
> (1) Parallel replication lock wait/deadlock handling on InnoDB 5.7 with
> refactored lock0lock.cc does not work
> https://jira.mariadb.org/browse/MDEV-10550 (Monty, Kristian)
>From a quick look at the patch, this seems to be because of code removed
from innobase_kill_query(). Specifically:
> @@ -4799,65 +5511,11 @@ innobase_kill_query(
> DBUG_ENTER("innobase_kill_query");
> DBUG_ASSERT(hton == innodb_hton_ptr);
> - if (!wsrep_thd_is_BF(trx->mysql_thd, FALSE) &&
> - trx->abort_type == TRX_SERVER_ABORT) {
> - ut_ad(!lock_mutex_own());
> - lock_mutex_enter();
When parallel replication discovers a case where a transaction T1 goes to
wait on a row lock held by a later (in the replication stream) transaction
T2, T2 will be killed to avoid a deadlock. In this situation, we are already
holding lock mutex. So this code is there in 10.1 to prevent re-taking the
mutex, code that is removed here in your patch.
> + if (trx != NULL) {
> + /* Cancel a pending lock request if there are any */
> + lock_trx_handle_wait(trx);
> +lock_trx_handle_wait(
> +/*=================*/
> + trx_t* trx) /*!< in/out: trx lock state */
> {
> dberr_t err;
> - ulint heap_no;
>
> - ut_ad(!dict_index_is_clust(index));
> - ut_ad(!dict_index_is_online_ddl(index));
> - ut_ad(block->frame == page_align(rec));
> - ut_ad(page_rec_is_user_rec(rec) || page_rec_is_supremum(rec));
> - ut_ad(rec_offs_validate(rec, index, offsets));
> - ut_ad(mode == LOCK_X || mode == LOCK_S);
> + lock_mutex_enter();
But in the new code here, the protection on lock_mutex_enter is no longer
there, lock_mutex_enter() is done unconditionally.
The code which sets the flag TRX_SERVER_ABORT seems to be still there,
though:
> +void
> +lock_report_waiters_to_mysql(
> +/*=======================*/
> + struct thd_wait_reports* waitee_buf_ptr, /*!< in: set of trxs */
> + THD* mysql_thd, /*!< in: THD */
> + trx_id_t victim_trx_id) /*!< in: Trx selected
> + as deadlock victim, if
> + any */
> +{
> + struct thd_wait_reports* p;
> + struct thd_wait_reports* q;
> + ulint i;
> +
> + p = waitee_buf_ptr;
> + while (p) {
> + i = 0;
> + while (i < p->used) {
> + trx_t *w_trx = p->waitees[i];
> + /* There is no need to report waits to a trx already
> + selected as a victim. */
> + if (w_trx->id != victim_trx_id) {
> + /* If thd_report_wait_for() decides to kill the
> + transaction, then we will get a call back into
> + innobase_kill_query. We mark this by setting
> + current_lock_mutex_owner, so we can avoid trying
> + to recursively take lock_sys->mutex. */
> + w_trx->abort_type = TRX_REPLICATION_ABORT;
> + thd_report_wait_for(mysql_thd, w_trx->mysql_thd);
> + w_trx->abort_type = TRX_SERVER_ABORT;
abort_type is set to TRX_REPLICATION_ABORT around thd_report_wait_for(),
during which innobase_kill_query() may be invoked with lock mutex already
held.
So it would appear that a first attempt should be to restore the protection
around lock_mutex_enter()...
Note that, if w_trx->abort_type is not functional in your current patch, the
parallel replication code was originally working without this, you can see
in the snippet above the comment about current_lock_mutex_owner, which was
(mistakenly) not removed when this code was changed to use Galera stuff. So
you can check git history and go back to the original code (using
current_lock_mutex_owner) if necessary, perhaps. Or maybe w_trx->abort_type
is fine, I'm not familiar with that change.
> (2) mysqld: /home/jan/mysql/10.2/sql/handler.cc:2692: int
> handler::ha_index_first(uchar*): Assertion `table_share->tmp_table !=
> NO_TMP_TABLE || m_lock_type != 2' failed.
> https://jira.mariadb.org/browse/MDEV-10549 (Serg)
>
> (3) Some of the debug sync waits do not work InnoDB 5.7
> https://jira.mariadb.org/browse/MDEV-10548 (Monty, Serg)
>
> (4) Test multi_update_innodb fails with InnoDB 5.7
> https://jira.mariadb.org/browse/MDEV-10550 (Petrunia)
>
> (5) Test innodb.defrag_mdl-9155 hangs on InnoDB 5.7
> https://jira.mariadb.org/browse/MDEV-10551 (Jan, Serg)
>
> (6) IS tables are not found on 10.2 InnoDB 5.7
> https://jira.mariadb.org/browse/MDEV-10200 (Jan)
Hope this helps,
- Kristian.
Follow ups