maria-developers team mailing list archive

Thread
Date

Re: 489a7fba324: MDEV-29322 ASAN heap-use-after-free in Query_log_event::do_apply_event

To: maria-developers@xxxxxxxxxxxxxxxxxxx
From: Kristian Nielsen <knielsen@xxxxxxxxxxxxxxx>
Date: Sun, 04 Sep 2022 11:38:31 +0200
Cc: Andrei Elkin <andrei.elkin@xxxxxxxxxxx>
In-reply-to: <YxIFkzGYuukU/nlo@pweza> (Sergei Golubchik's message of "Fri, 2 Sep 2022 15:30:59 +0200")
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Hi Sergei, Andrei,

The commit message had me confused at first - because there are
mechanisms in parallel replication to ensure that things stay alive as long
as needed; and because if description_event_for_exec points to invalid
(freed) memory, then it would seem to indicate a deeper problem than can be
solved by just copying a single value out.

So I looked into it a bit (thanks Andrei for doing a testcase BTW!). And the
real problem is not use-after-free, it is that
rli->relay_log.description_event_for_exec is owned by the SQL driver thread
and updated without any thread synchronisation; it is wrong to access it
from a worker thread.

So I checked and found that there is actually another case of the same
problem not fixed in the patch, in Execute_load_log_event::do_apply_event():

int Execute_load_log_event::do_apply_event(rpl_group_info *rgi)
  if (!(lev= (Load_log_event*)
        Log_event::read_log_event(&file,
                                  rli->relay_log.description_event_for_exec,
                                  opt_slave_sql_verify_checksum)) ||

It looks to me that this will have the same problem - when run in the worker
thread, the value of description_event_for_exec will be for whatever point
in the log the SQL thread is currently reading, not the one corresponding to
the LOAD_EVENT. So Load_log_event will need to take a copy of the
description_event_for_exec, just like Query_log_event needed the
options_written_to_binlog value.

An alternative approach to fix could be to use the inuse_relaylog. Parallel
replication keeps a list of these (one for each relay log having workers
active), and keeps each element alive until all workers are done with that
relay log.

Currently inuse_relaylog mostly has the name of the relay log file, used for
transaction retry in optimistic parallel replication. But it could be
extended with a copy of description_event_for_exec which would be available
to do_apply_event() in rgi->relaylog.

That is perhaps the "correct" fix. But as long as it's only copying a 32-bit
value to query event (plus a similar fix for LOAD_EVENT), maybe the fix in
the patch is fine as well.

Some other smaller comments:

The commit message is misleading. The problem is that worker threads are
accessing an FD owned by and updated by the SQL driver thread. This FD can
be from any random binlog following the current query of the worker thread,
as Sergei pointed out. The FD being freed while the worker thread is using
its copy is just one symptom, more common will be to use a too new FD, which
is also what triggers an error in Andrei's testcase.

And it would be good to put a comment in the code on
description_event_for_exec that it must _not_ be accessed from worker
threads. The name is somewhat misleading ("exec" suggests exactly worker
thread operation), maybe it should be renamed to something like
"description_event_for_sql_thread", and description_event_for_queue could be
called "description_event_for_io_thread", or something.

BTW, the testcase only fails sporadically (without the fix), becase it
depends on whether the SQL thread has had time to read ahead to a new FD
when the CREATE TABLE t1 runs in the worker thread. There's a wait in the
testcase which seems to be intended to ensure this condition:

# Wait for the last event recieve before to unlock t2
--let $wait_condition= SELECT COUNT(*) > 0 FROM information_schema.processlist WHERE state = "Waiting for worker threads to be idle"

But apparently it's ineffective, I had to run the test ~50 times for the
error to trigger. Just mentioning it, maybe it's worth spending time on.

Hope this helps,

 - Kristian.

Sergei Golubchik <serg@xxxxxxxxxxx> writes:

> Hi, Andrei,
>
> I still don't understand this.
>
> 1. Why rgi->options_to_bin_log is only set for GTID_EVENT?
>    Is there a guarantee that there always be a GTID_EVENT before a
>    QUERY_EVENT? What if gtids aren't enabled?
>
> 2. how do you guarantee that all query events for a previous value of
>    options_written_to_bin_log have been applied before updating it
>    in the rgi?
>
> 3. and if you do guarantee it, why is it atomic? No query events =
>    nobody reads options_written_to_bin_log concurrently.
>
> And why not to do a simple patch, like the one I've attached?
> (note, it's an intentionally minimal patch to show the fix, practically
> I'd accompany it with a small cleanup patch)
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and security@xxxxxxxxxxx

Follow ups

Re: 489a7fba324: MDEV-29322 ASAN heap-use-after-free in Query_log_event::do_apply_event
From: Kristian Nielsen, 2022-09-04

References

Re: 489a7fba324: MDEV-29322 ASAN heap-use-after-free in Query_log_event::do_apply_event
From: Sergei Golubchik, 2022-09-01
Re: 489a7fba324: MDEV-29322 ASAN heap-use-after-free in Query_log_event::do_apply_event
From: Sergei Golubchik, 2022-09-02