← Back to team overview

maria-developers team mailing list archive

Missing locking around THD::set_db() ?

 

Hi,

I was looking at a Valgrind warning in Buildbot (appended below).

What happens seems to be this:

Thread 1 is running SHOW PROCESSLIST, it grabs the pointer THD::db to the
current database of thread 2.

Thread 2 then does THD::set_db(), freeing the old THD::db pointer and
allocating a new one with the new data.

Thread 1 then resumes, doing strdup() of the _old_, now invalid, THD::db
pointer, which reads garbage data (or could even segfault if we get really
unlucky).

This seems like a genuine bug. I see absolutely no locking protecting against
this race :-(

Any suggestions for how to deal with this?

The only locking I see in mysqld_list_processes() is holding
LOCK_thread_count, but taking that for every THD::set_db() sounds quite bad
for scalability (for example, set_db() is run by every query replicated by a
slave thread).

Should we use LOCK_thd_data also for THD::db ?

Or maybe we could have a fixed buffer in THD for the db? Seems we would need
193 bytes. Then SHOW PROCESSLIST could sometimes show a random mix of old and
new db name if we race, but at least we would not read invalid memory or crash
due to mummap() ...

Any better ideas?

 - Kristian.

-----------------------------------------------------------------------

Original valgrind warning, for reference:

rpl.rpl_stop_slave 'mix,xtradb'          w5 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2013-04-22 22:05:12
line
==9897== Thread 20:
==9897== Invalid read of size 1
==9897==    at 0x4C2826B: memcpy (mc_replace_strmem.c:878)
==9897==    by 0xBAF9E8: strmake_root (my_alloc.c:424)
==9897==    by 0x63C74E: mysqld_list_processes(THD*, char const*, bool) (sql_class.h:771)
==9897==    by 0x5C8574: mysql_execute_command(THD*) (sql_parse.cc:3149)
==9897==    by 0x5CC844: mysql_parse(THD*, char*, unsigned int, Parser_state*) (sql_parse.cc:5759)
==9897==    by 0x5CDDC9: dispatch_command(enum_server_command, THD*, char*, unsigned int) (sql_parse.cc:1068)
==9897==    by 0x5CE4BD: do_command(THD*) (sql_parse.cc:794)
==9897==    by 0x6A4E23: do_handle_one_connection(THD*) (sql_connect.cc:1266)
==9897==    by 0x6A4F23: handle_one_connection (sql_connect.cc:1181)
==9897==    by 0x8F0027: pfs_spawn_thread (pfs.cc:1015)
==9897==    by 0x4E3506F: start_thread (in /lib64/libpthread-2.9.so)
==9897==    by 0x62F713C: clone (in /lib64/libc-2.9.so)
==9897==  Address 0x1a06a973 is 3 bytes inside a block of size 5 free'd
==9897==    at 0x4C257D8: free (vg_replace_malloc.c:446)
==9897==    by 0xBB96BB: my_free (my_malloc.c:115)
==9897==    by 0x838C86: Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned int) (sql_class.h:2846)
==9897==    by 0x53904C: apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*) (log_event.h:1230)
==9897==    by 0x5446DD: exec_relay_log_event(THD*, Relay_log_info*) (slave.cc:2766)
==9897==    by 0x545BD9: handle_slave_sql (slave.cc:3627)
==9897==    by 0x8F0027: pfs_spawn_thread (pfs.cc:1015)
==9897==    by 0x4E3506F: start_thread (in /lib64/libpthread-2.9.so)
==9897==    by 0x62F713C: clone (in /lib64/libc-2.9.so)


Follow ups