maria-developers team mailing list archive
-
maria-developers team
-
Mailing list archive
-
Message #09587
Re: [Commits] a6fb98f: MDEV-6368: assertion xid_seqno > trx_sys_cur_xid_seqno
Hi Jan,
Thanks for the review.
On Tue, May 3, 2016 at 3:06 AM, Jan Lindström <jan.lindstrom@xxxxxxxxxxx>
wrote:
> Hi Nirbhay,
>
> I'm not expert on this code area
>
I am also hoping to get some remarks from Seppo/Teemu on this.
> but some questions, comments below.
>
Addressed : http://lists.askmonty.org/pipermail/commits/2016-May/009343.html
See my comments inline.
>
> On Tue, May 3, 2016 at 1:26 AM, Nirbhay Choubey <nirbhay@xxxxxxxxxxx>
> wrote:
>
>> revision-id: a6fb98fd74ab6f14fab0c1ef4630b010ff28880f
>> (mariadb-10.1.13-6-ga6fb98f)
>> parent(s): f8cdb9e7e8fb692f8eb3b9bb4792cd60653e0eb8
>> author: Nirbhay Choubey
>> committer: Nirbhay Choubey
>> timestamp: 2016-05-02 18:26:58 -0400
>> message:
>>
>> MDEV-6368: assertion xid_seqno > trx_sys_cur_xid_seqno
>>
>> - Validate the specified wsrep_start_position value by also
>> checking the return status of wsrep->sst_received. This also
>> ensures that changes in wsrep_start_position is not allowed
>> when the node is not in JOINING state.
>> - Do not allow decrease in seqno within same UUID.
>> - The initial checkpoint in SEs should be [0...:-1].
>>
>>
>> diff --git a/sql/wsrep_sst.cc b/sql/wsrep_sst.cc
>> index 4f6cc51..44fdd03 100644
>> --- a/sql/wsrep_sst.cc
>> +++ b/sql/wsrep_sst.cc
>> @@ -264,42 +264,91 @@ void wsrep_sst_complete (const wsrep_uuid_t*
>> sst_uuid,
>> mysql_mutex_unlock (&LOCK_wsrep_sst);
>> }
>>
>> -void wsrep_sst_received (wsrep_t* const wsrep,
>> +/*
>> + If wsrep provider is loaded, inform that the new state snapshot
>> + has been received. Also update the local checkpoint.
>> +
>> + @param wsrep [IN] wsrep handle
>> + @param uuid [IN] Initial state UUID
>> + @param seqno [IN] Initial state sequence number
>> + @param state [IN] Always NULL, also ignored by wsrep
>> provider (?)
>> + @param state_len [IN] Always 0, also ignored by wsrep
>> provider (?)
>> + @param implicit [IN] Whether invoked implicitly due to
>> SST
>> + (true) or explicitly because if
>> change
>> + in wsrep_start_position by user
>> (false).
>> + @return false Success
>> + true Error
>> +
>> +*/
>> +bool wsrep_sst_received (wsrep_t* const wsrep,
>> const wsrep_uuid_t& uuid,
>> wsrep_seqno_t const seqno,
>> const void* const state,
>> - size_t const state_len)
>>
>
> Why not const size_t state_len ? Use of const here is confusing.
>
Updated.
>
>
>> + size_t const state_len,
>> + bool const implicit)
>> {
>> - wsrep_get_SE_checkpoint(local_uuid, local_seqno);
>> + /*
>> + To keep track of whether the local uuid:seqno should be updated.
>> Also, note
>> + that local state (uuid:seqno) is updated/checkpointed only after we
>> get an
>> + OK from wsrep provider. By doing so, the values remain consistent
>> across
>> + the server & wsrep provider.
>> + */
>> + bool do_update= false;
>> +
>> + // Get the locally stored uuid:seqno.
>> + wsrep_get_SE_checkpoint(local_uuid, local_seqno);
>>
>
> What if this fails, no error handling ?
>
Done.
>
>
>>
>> - if (memcmp(&local_uuid, &uuid, sizeof(wsrep_uuid_t)) ||
>> - local_seqno < seqno || seqno < 0)
>> + if (memcmp(&local_uuid, &uuid, sizeof(wsrep_uuid_t)) ||
>> + local_seqno < seqno)
>> + {
>> + do_update= true;
>> + }
>> + else if (local_seqno > seqno)
>> + {
>> + WSREP_WARN("SST position can't be set in past. Requested: %lld,
>> Current: "
>> + " %lld.", (long long)seqno, (long long)local_seqno);
>> + /*
>> + If we are here because of SET command, simply return true (error)
>> instead of
>> + aborting.
>> + */
>> + if (implicit)
>> {
>> - wsrep_set_SE_checkpoint(uuid, seqno);
>> - local_uuid = uuid;
>> - local_seqno = seqno;
>> + WSREP_WARN("Can't continue.");
>> + unireg_abort(1);
>> }
>> - else if (local_seqno > seqno)
>> + else
>> {
>> - WSREP_WARN("SST postion is in the past: %lld, current: %lld. "
>> - "Can't continue.",
>> - (long long)seqno, (long long)local_seqno);
>> - unireg_abort(1);
>> + return true;
>> }
>> + }
>>
>> #ifdef GTID_SUPPORT
>> - wsrep_init_sidno(uuid);
>> + wsrep_init_sidno(uuid);
>> #endif /* GTID_SUPPORT */
>>
>> - if (wsrep)
>> - {
>> - int const rcode(seqno < 0 ? seqno : 0);
>> - wsrep_gtid_t const state_id = {
>> - uuid, (rcode ? WSREP_SEQNO_UNDEFINED : seqno)
>> - };
>> + if (wsrep)
>> + {
>> + int const rcode(seqno < 0 ? seqno : 0);
>>
>
> When seqno can be < 0 ?
>
While I cannot think of a use case for anything below -1, but
-1 is the seqno assigned during initialization.
>
>> + wsrep_gtid_t const state_id= {uuid,
>> + (rcode ? WSREP_SEQNO_UNDEFINED : seqno)};
>> +
>> + wsrep_status_t ret= wsrep->sst_received(wsrep, &state_id, state,
>> + state_len, rcode);
>>
>> - wsrep->sst_received(wsrep, &state_id, state, state_len, rcode);
>> + if (ret != WSREP_OK)
>> + {
>> + return true;
>> }
>> + }
>> +
>> + // Now is the good time to update the local state and checkpoint.
>> + if (do_update) {
>> + wsrep_set_SE_checkpoint(uuid, seqno);
>>
>
> Again, I would add error handling.
>
Done.
>
>
>> + local_uuid= uuid;
>> + local_seqno= seqno;
>> + }
>> +
>> + return false;
>> }
>>
>> // Let applier threads to continue
>> @@ -308,7 +357,7 @@ void wsrep_sst_continue ()
>> if (sst_needed)
>> {
>> WSREP_INFO("Signalling provider to continue.");
>> - wsrep_sst_received (wsrep, local_uuid, local_seqno, NULL, 0);
>> + wsrep_sst_received (wsrep, local_uuid, local_seqno, NULL, 0, true);
>>
>
> You should check the return code.
>
Done.
>
>
>> }
>> }
>>
>> diff --git a/sql/wsrep_var.cc b/sql/wsrep_var.cc
>> index 9c01e54..bd94190 100644
>>
>>
>> bool wsrep_start_position_update (sys_var *self, THD* thd, enum_var_type
>> type)
>> {
>> - WSREP_INFO ("wsrep_start_position var submitted: '%s'",
>> - wsrep_start_position);
>> - // since this value passed wsrep_start_position_check, don't check
>> anything
>> - // here
>> - wsrep_set_local_position (wsrep_start_position, true);
>> - return 0;
>> + // Print a confirmation that wsrep_start_position has been updated.
>> + WSREP_INFO ("wsrep_start_position set to '%s'", wsrep_start_position);
>> + return false;
>>
>
> Hmm, normally I thought that system variable set would go like if
> (variable_check()) variable_update()
> but now update does not do nothing ?
>
variable_update() (or variable's on_update()), is actually called after the
actual
update of the system variable. So the sequence is roughly like:
on_check() -> global_update()/session_update() -> on_update().
See sql_set_variables() and sys_var::update().
So, since wsrep_set_local_position() can fail, I have moved it under
on_check, such that
the actual variable is updated only after setting of local position is
successful. And thus,
on_update essentially becomes a no-op.
>
>> }
>
>
>> void wsrep_start_position_init (const char* val)
>> @@ -164,7 +194,7 @@ void wsrep_start_position_init (const char* val)
>> return;
>> }
>>
>> - wsrep_set_local_position (val, false);
>> + wsrep_set_local_position (val, strlen(val), false);
>>
>
> Add error handling.
>
Done.
>
>
>> }
>>
>> static bool refresh_provider_options()
>> diff --git a/storage/innobase/trx/trx0sys.cc
>> b/storage/innobase/trx/trx0sys.cc
>> index 76c8613..1625e8d 100644
>> --- a/storage/innobase/trx/trx0sys.cc
>> +++ b/storage/innobase/trx/trx0sys.cc
>> @@ -349,10 +349,10 @@ void read_wsrep_xid_uuid(const XID* xid, unsigned
>> char* buf)
>> unsigned char xid_uuid[16];
>> long long xid_seqno = read_wsrep_xid_seqno(xid);
>> read_wsrep_xid_uuid(xid, xid_uuid);
>> - if (!memcmp(xid_uuid, trx_sys_cur_xid_uuid, 8))
>> + if (!memcmp(xid_uuid, trx_sys_cur_xid_uuid, 16) &&
>> + (xid_seqno > -1))
>>
>
> Does this mean that you ignore uuid if it is exactly the same, or is there
> error handling on else?
>
If UUID remains same, we only have to update the seqno.
Best,
Nirbhay
> R: Jan
>
>
References