maria-discuss team mailing list archive

Thread
Date

Re: DELETE query hanging on Statistics (10.6.12)

To: maria-discuss@xxxxxxxxxxxxxxxxxxx
From: mariadb@xxxxxxxxxxxxxx
Date: Thu, 13 Apr 2023 09:34:18 -0400
In-reply-to: <c9dd7e3f-e524-2a9f-4f58-ae17c22456e9@Biblestuph.com>
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1



On 3/18/2023 1:28 PM, mariadb@xxxxxxxxxxxxxx wrote:

On 3/6/2023 3:58 AM, Sergei Golubchik wrote:
Hi, Dan,

On Mar 03, mariadb@xxxxxxxxxxxxxx wrote:
With this table:

CREATE TABLE t1 (
   label varchar(100) NOT NULL,
   dt datetime NOT NULL,
   success tinyint(1) NOT NULL,
   PRIMARY KEY (label,dt),
   KEY dt (dt),
   KEY success (success,dt)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC

I have a DELETE query that has been stuck in Statistics for nearly an
hour now:

DELETE
      MyDB.t1
FROM
      MyDB.t1
WHERE
      dt <= '2023-02-21 08:06:46'
      AND dt >= '2023-02-21 08:03:48'

This statement runs several times per hour to trim the table (and has
for years). It has run fine against my 10.6.12 DB (this is a downstream
replica DB) for a couple weeks now(when I upgraded the replica); but
suddenly today produced the hang. Any ideas why? The table has nearly 15
million rows; but the rows are more or less evenly spaced acrossdatetimes.
Is there any chance you could run https://poormansprofiler.org/ (or one
of the alternatives) when this happens again?
Thanks for your reply. Later that same day I ended up having to forcethe server to die (kill -9); the thread never did let go and never leftstatistics; and since it was a replica thread it was preventing thereplica from getting updated. I know that the server was getting someCPU while the thread was hung, but that may have only been the slave IOthread (which AFAICT was still running fine). And the server itself wasresponsive, so I was able to FLUSH all the tables (at least all but theone that it was working on) before I pulled the plug.
After I killed the server I restarted with skip-slave and immediatelyANALYZEd the table in question, as well as all the other tables, beforestarting the slave again. I've not seen the issue since (and honestlyhope I don't) but if so I will attempt to use the profiling techniqueyou mentioned and report back here.

It just happened again today. A different replica this time, but theexact same statement. Here's the "Poor Man's Profile." Hope it is helpful:

26pthread_cond_timedwait@@GLIBC_2.3.2,Thread_cache::park(),do_handle_one_connection(CONNECT*,,handle_one_connection,pfs_spawn_thread,start_thread,clone2pthread_cond_timedwait@@GLIBC_2.3.2,MYSQL_BIN_LOG::wait_for_update_binlog_end_pos(THD*,,mysql_binlog_send(THD*,,dispatch_command(enum_server_command,,do_command(THD*,,do_handle_one_connection(CONNECT*,,handle_one_connection,pfs_spawn_thread,start_thread,clone1syscall,tpool::aio_linux::getevent_thread_routine(tpool::aio_linux*),??,start_thread,clone1syscall,ssux_lock_impl<true>::wr_wait(unsigned,buf_page_get_low(page_id_t,,buf_page_get_gen(page_id_t,,btr_block_get(dict_index_t,btr_can_merge_with_page(btr_cur_t*,,btr_compress(btr_cur_t*,,btr_cur_compress_if_useful(btr_cur_t*,,btr_cur_pessimistic_delete(dberr_t*,,btr_cur_node_ptr_delete(btr_cur_t*,,btr_compress(btr_cur_t*,,btr_cur_compress_if_useful(btr_cur_t*,,btr_cur_pessimistic_delete(dberr_t*,,row_purge_remove_sec_if_poss_tree(purge_node_t*,,row_purge_record_func(purge_node_t*,,row_purge_step(que_thr_t*),que_run_threads(que_thr_t*),trx_purge(unsigned,purge_coordinator_callback(void*),tpool::task_group::execute(tpool::task*),tpool::thread_pool_generic::worker_main(tpool::worker_data*),??,start_thread,clone1syscall,srw_mutex_impl<true>::wait_and_lock(),buf_page_get_low(page_id_t,,buf_page_get_gen(page_id_t,,btr_block_get(dict_index_t,btr_estimate_n_rows_in_range(dict_index_t*,,ha_innobase::records_in_range(unsigned,handler::multi_range_read_info_const(unsigned,DsMrr_impl::dsmrr_info_const(unsigned,check_quick_select(PARAM*,,SQL_SELECT::test_quick_select(THD*,,JOIN::optimize_inner(),JOIN::optimize(),mysql_select(THD*,,mysql_execute_command(THD*,,mysql_parse(THD*,,Query_log_event::do_apply_event(rpl_group_info*,,apply_event_and_update_pos_apply(Log_event*,,rpt_handle_event(rpl_parallel_thread::queued_event*,,handle_rpl_parallel_thread,pfs_spawn_thread,start_thread,clone

      1 sigwaitinfo,signal_hand,pfs_spawn_thread,start_thread,clone

1pthread_cond_wait@@GLIBC_2.3.2,rpl_parallel_entry::choose_thread(rpl_group_info*,,rpl_parallel::do_event(rpl_group_info*,,handle_slave_sql,pfs_spawn_thread,start_thread,clone1pthread_cond_wait@@GLIBC_2.3.2,MYSQL_BIN_LOG::queue_for_group_commit(MYSQL_BIN_LOG::group_commit_entry*),MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*),MYSQL_BIN_LOG::write_transaction_to_binlog(THD*,,binlog_flush_cache(THD*,,MYSQL_BIN_LOG::log_and_order(THD*,,ha_commit_trans(THD*,,trans_commit(THD*),Xid_log_event::do_commit(),Xid_apply_log_event::do_apply_event(rpl_group_info*),apply_event_and_update_pos_apply(Log_event*,,rpt_handle_event(rpl_parallel_thread::queued_event*,,handle_rpl_parallel_thread,pfs_spawn_thread,start_thread,clone1pthread_cond_wait@@GLIBC_2.3.2,handle_manager,pfs_spawn_thread,start_thread,clone1pthread_cond_wait@@GLIBC_2.3.2,buf_flush_page_cleaner(),??,start_thread,clone1pthread_cond_wait@@GLIBC_2.3.2,binlog_background_thread,pfs_spawn_thread,start_thread,clone1pthread_cond_timedwait@@GLIBC_2.3.2,tpool::thread_pool_generic::wait_for_tasks(std::unique_lock<std::mutex>&,,tpool::thread_pool_generic::get_task(tpool::worker_data*,,tpool::thread_pool_generic::worker_main(tpool::worker_data*),??,start_thread,clone1pthread_cond_timedwait@@GLIBC_2.3.2,timer_handler,start_thread,clone1pthread_cond_timedwait@@GLIBC_2.3.2,my_service_thread_sleep,ma_checkpoint_background,start_thread,clone1poll,vio_io_wait,vio_socket_io_wait,vio_ssl_read,my_real_read(st_net*,,my_net_read_packet_reallen,cli_safe_read_reallen,handle_slave_io,pfs_spawn_thread,start_thread,clone1poll,vio_io_wait,vio_socket_io_wait,vio_read,my_real_read(st_net*,,my_net_read_packet_reallen,my_net_read_packet,do_command(THD*,,do_handle_one_connection(CONNECT*,,handle_one_connection,pfs_spawn_thread,start_thread,clone1poll,handle_connections_sockets(),mysqld_main(int,,__libc_start_main,_start

Also I find that I cannot kill the query using KILL 12345; the KILL
statement just comes back immediately without reporting an error and the
statement continues. Is there no way to kill this statement without
crashing the DB? Again, this is a replica statement that is executing. I
had planned to stop the query, execute an ANALYZE TABLE t1 and then try
it again, but I don't seem to be able to kill the query.


KILL always returns immediately. Killing in MariaDB is cooperative, it
simply tells the thread "abort execution at your earliest convenience"
and it's up to the executing thread to decide when to abort. KILL
doesn't wait for it.

Yah, that's kind of what I figured. Is there a recommended (better)course of action other than kill -9 if this happens again?


Regards,
Sergei
VP of MariaDB Server Engineering
and security@xxxxxxxxxxx


_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: DELETE query hanging on Statistics (10.6.12)
From: Marko Mäkelä, 2023-04-14

References

DELETE query hanging on Statistics (10.6.12)
From: mariadb, 2023-03-03
Re: DELETE query hanging on Statistics (10.6.12)
From: Sergei Golubchik, 2023-03-06
Re: DELETE query hanging on Statistics (10.6.12)
From: mariadb, 2023-03-18