← Back to team overview

maria-discuss team mailing list archive

Re: Deadlocks in R/W phase of Sysbench

 

Thanks. I was hoping there was an easy fix if you did something like used
serializable isolation, but you haven't done that.

On Fri, Jun 9, 2017 at 12:27 AM, J. Cassidy <sean@xxxxxxxxxxxx> wrote:

> All,
>
> as discussed yesterday, here is my setup and configuration.
>
>
> *************************************************************
> 16 VCPU 64GB Memory, DB size 12 GB (In RAM), MariaDB 10.1.24*
> Single Image. No clustering or outside network interaction  *
> SuSE SLES 12 SP2 using a 4.4.49-92 kernel                   *
> *************************************************************
> ### Sysbench R/W - 64 Threads
>
> sysbench 0.5:  multi-threaded system evaluation benchmark
>
> Running the test with following options:
> Number of threads: 64
> Report intermediate results every 2 second(s)
> Initializing random number generator from seed (42).
>
>
> Threads started!
>
> [   2s] threads: 64, tps: 51141.62, reads: 717023.15, writes: 204757.48,
> response time: 4.72ms (99%), errors: 54.00, reconnects:  0.00
> [   4s] threads: 64, tps: 47938.19, reads: 671195.72, writes: 191759.28,
> response time: 4.65ms (99%), errors: 5.00, reconnects:  0.00
> [   6s] threads: 64, tps: 46908.41, reads: 656790.25, writes: 187640.64,
> response time: 4.65ms (99%), errors: 2.50, reconnects:  0.00
> [   8s] threads: 64, tps: 45940.12, reads: 643191.64, writes: 183781.47,
> response time: 4.44ms (99%), errors: 3.00, reconnects:  0.00
> [  10s] threads: 64, tps: 45789.91, reads: 641083.20, writes: 183161.13,
> response time: 4.77ms (99%), errors: 2.00, reconnects:  0.00
> [  12s] threads: 64, tps: 45408.37, reads: 635825.16, writes: 181672.97,
> response time: 4.80ms (99%), errors: 1.50, reconnects:  0.00
> [  14s] threads: 64, tps: 44319.93, reads: 620440.07, writes: 177262.74,
> response time: 4.50ms (99%), errors: 0.50, reconnects:  0.00
> [  16s] threads: 64, tps: 44634.78, reads: 624843.97, writes: 178523.13,
> response time: 4.71ms (99%), errors: 0.50, reconnects:  0.00
> .
> .
> .
> [ 572s] threads: 64, tps: 24980.71, reads: 349707.39, writes: 99924.32,
> response time: 5.92ms (99%), errors: 0.00, reconnects:  0.00
> [ 574s] threads: 64, tps: 25337.03, reads: 354796.98, writes: 101355.13,
> response time: 6.19ms (99%), errors: 0.00, reconnects:  0.00
> [ 576s] threads: 64, tps: 25196.31, reads: 352683.33, writes: 100777.24,
> response time: 6.49ms (99%), errors: 0.00, reconnects:  0.00
> [ 578s] threads: 64, tps: 25235.61, reads: 353317.05, writes: 100941.94,
> response time: 5.94ms (99%), errors: 0.00, reconnects:  0.00
> [ 580s] threads: 64, tps: 25241.11, reads: 353395.59, writes: 100956.96,
> response time: 6.09ms (99%), errors: 0.00, reconnects:  0.00
> [ 582s] threads: 64, tps: 25146.18, reads: 352056.04, writes: 100599.73,
> response time: 5.96ms (99%), errors: 0.00, reconnects:  0.00
> [ 584s] threads: 64, tps: 25242.59, reads: 353362.72, writes: 100978.85,
> response time: 6.21ms (99%), errors: 0.00, reconnects:  0.00
> [ 586s] threads: 64, tps: 25297.02, reads: 354187.80, writes: 101166.58,
> response time: 5.89ms (99%), errors: 0.00, reconnects:  0.00
> [ 588s] threads: 64, tps: 25070.60, reads: 350944.86, writes: 100283.39,
> response time: 6.33ms (99%), errors: 0.00, reconnects:  0.00
> [ 590s] threads: 64, tps: 25115.20, reads: 351666.84, writes: 100466.81,
> response time: 6.21ms (99%), errors: 0.00, reconnects:  0.00
> [ 592s] threads: 64, tps: 25165.11, reads: 352314.52, writes: 100668.94,
> response time: 5.98ms (99%), errors: 0.00, reconnects:  0.00
> [ 594s] threads: 64, tps: 25165.99, reads: 352322.89, writes: 100703.97,
> response time: 6.30ms (99%), errors: 0.00, reconnects:  0.00
> [ 596s] threads: 64, tps: 24854.73, reads: 347945.66, writes: 99370.90,
> response time: 6.17ms (99%), errors: 0.00, reconnects:  0.00
> [ 598s] threads: 64, tps: 24888.16, reads: 348463.80, writes: 99556.16,
> response time: 6.21ms (99%), errors: 0.00, reconnects:  0.00
> [ 600s] threads: 64, tps: 25158.22, reads: 352195.58, writes: 100633.38,
> response time: 5.94ms (99%), errors: 0.00, reconnects:  0.00
> OLTP test statistics:
>     queries performed:
>         read:                            256244548
>         write:                           73212536
>         other:                           36606172
>         total:                           366063256
>     transactions:                        18302990 (30454.11 per sec.)
>     read/write requests:                 329457084 (548179.44 per sec.)
>     other operations:                    36606172 (60908.54 per sec.)
>     ignored errors:                      192    (0.32 per sec.)
>     reconnects:                          0      (0.00 per sec.)
>
> General statistics:
>     total time:                          601.0023s
>     total number of events:              18302990
>     total time taken by event execution: 38450.5507s
>     response time:
>          min:                                  0.42ms
>          avg:                                  2.10ms
>          max:                                 62.26ms
>          approx.  99 percentile:               5.62ms
>
> Threads fairness:
>     events (avg/stddev):           285984.2188/3788.79
>     execution time (avg/stddev):   600.7899/0.00
>
>
> ============================================================
> =====================================================
>
> ### Excerpt from MariaDB Error Log (Thousands of similiar messages).
> .
> .
> .
> 2017-06-08  8:10:44 4395980287248 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4353081699600 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4353087458576 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4360227998992 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4353073515792 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4390705949968 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4360227392784 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4353087458576 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4353073515792 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> 2017-06-08  8:10:44 4353087458576 [ERROR] mysqld: Deadlock found when
> trying to get lock; try restarting transaction
> .
> .
> .
>
> ============================================================
> =======================================================
> The configuration below has been tried and tested and works across on
> different architectures e.g. S390X and S390X.
>
>
> [mysqld_safe]
> user=root
> password=''
> [mysqld]
> symbolic-links=0
> open_files_limit = 65535 # was 102400
> max_allowed_packet = 16M # was 768M
> max-connect-errors = 1000000
> connect_timeout = 120   # was 60
> performance-schema = false
> net_read_timeout = 720
> log-warnings=2
> #####files and sockets
> # innodb_log_group_home_dir= /data-lun22/logs
> pid-file=/tmp/mysqld.pid.sysbench
> port            = 3306
> # log-error=/var/log/mariadb/mariadb-error-john.log
> log-error=/var/log/mariadb/mariadb-error-john.log
> general_log_file=/var/log/mariadb/mariadb-john.log
> long_query_time = 0
> slow_query_log = 0
> # slow_query_log_file = /var/log/mariadb/mariadb-slowquery.log
> # general-log
> # general_log_file=/var/log/mariadb/mariadb-general.log
> ##### MariaDB temporary tables
> # tmpdir in RAM
> tmpdir = /mnt/tmpdir
> #####non innodb options (fixed)
> max_connections = 4000
> back_log = 150
> table_open_cache = 8000 # was 2048 # was 4000 # was 8000
> key_buffer_size = 16M
> query_cache_type = 0 #was 1 # was 0
> join_buffer_size = 32K
> sort_buffer_size = 32K
> autocommit = 0
> thread-cache-size = 50
> table-definition-cache = 1024
> table-open-cache = 2048
> ####fixed innodb options
> innodb_file_per_table = true
> innodb_open_files = 4096 # was 8192 # was 4096
> innodb_data_file_path = ibdata1:250M:autoextend
> innodb_flush_log_at_trx_commit = 2
> innodb_flush_method = O_DIRECT_NO_FSYNC
> innodb_log_buffer_size = 256M
> innodb_log_file_size = 4G # was 14G # was 4G
> innodb_log_files_in_group = 2
> innodb_buffer_pool_size = 36G #was 8G # was 48G
> innodb_buffer_pool_instances = 8 # 16  # was 6 # was 32
> innodb_adaptive_hash_index_partitions = 32
> innodb_thread_concurrency = 0
> ####tuning for RAM disk
> innodb_adaptive_flushing = 1
> innodb_flush_neighbors = 0
> innodb_io_capacity = 8000 # was 20000
> innodb_io_capacity_max = 40000
> innodb_lru_scan_depth = 4096
> innodb_purge_threads = 2
> innodb_read_io_threads = 2
> innodb_write_io_threads = 2
>
>
> <*> Note NMON reports 35% memory usage in the middle of the abovementioned
> Banchmark ergo the DB is in cache. No paging.
>
> ++  Note The R/W TPS rate are "respectable" based on observations from
> countless other runs ran on other configurations/architecture.
>          The TPS rates would obviously improve if the deadlock situation
> outlined above was resolved.
>
>
> So there you have it. Any tips / pointers that would enlighten me are much
> appreciated.
>
> Regards,
>
>
> JC
>
>
> Privet Sergey,
>
> do not think I have a problem with my config(s).
>
> I am not on the machine at the moment.
> From memory, with 16 Cores / 32 VCPU, 64 GB memory, buffer pools, logs
> etc. tailored to this configuration, the TPS rate is
>
> 64 threads - R/O 73K R/W 26K
> 128 threads - R/O 71K R/W 23K.
>
> I am putting the DB in RAM of course to obviate I/O issues. I am
> interested in seeing what is happening with the CPU caches and nest..
>
> NMON tells me that for the R/O runs I have approximately 85% User, 12-14%
> System and the rest Idle - this stays consistent throughout the run (300
> seconds R/O, 600 seconds R/W) which is reasonable.
>
> On the R/W runs, NMONs CPU usage display shows that some CPUs are barely
> firing, or only intermittently and the granularity I had with R/O is gone.
> This would seem to tie in with what I see in the MariaDB log - headwinds,
> violent storms and deadlocks.
>
> Any further information you require, please let me know.
>
>
>
> Regards,
>
>
> JC
>
> Hi!
>
> I just tried this on x86: 40 threads against single table having 10k rows.
> No deadlocks, "tps: 12372.40, reads: 173219.52, writes: 49490.51".
>
> This is with latest snapshot of 10.1 and sysbench 1.0.
>
> I believe you shouldn't see deadlocks on such a big data set. Probably
>
> something is wrong with your configs?
>
> Regards,
> Sergey
>
> On Thu, Jun 08, 2017 at 04:51:31PM +0200, J. Cassidy wrote:
> >
> >
> > Hello Mark,
> >
> > appreciate the reply.
> >
> > The OLTP table size and
> > OLTP size give me a DB size of appx 12GB. This is what I want. I have in
> the
> > meantime, looked at some older logs and see that whatever amount of
> threads I
> > specify (8, 16,32, 64 or 128), the deadlock messages are still surfacing
> in the
> > R/W phase
> > of Sysbench. I even dropped in two different MariaDB release levels
> > (10.0.19, 10.1.24) to see whether it would make a difference, deadlock
> still
> > there. I am using Sysbench 0.5 for these tests.
> > I am currently using MariaDB
> > 10.1.24 - built directly on the machine. S390x by the way, but the
> problem is
> > also
> > occuring on an X86-64 box.
> >
> > Regards,
> >
> >
> > JC
> >
> >
> >
> > I run sysbench frequently with MySQL
> > (not MariaDB yet) and don&#39;t recall ever seeing this error. But I use
> much
> > larger values for --oltp-table-size. You used 85937 and I use 1M or
> larger.
> > Maybe
> > there is too much data contention with smaller tables, even if a large
> number of
> > tables is used (128 in your case). I use a smaller number of tables -
> between 4
> > and 20. On Thu, Jun 8, 2017 at 12:48 AM, J. Cassidy sean@xxxxxxxxxxxx>
> wrote:
> > Hello all,
> >
> >
> > have used Sysbench extensively to test MariaDB CPU load factors between
> various
> > architectures.
> >
> > I am noticing massive amounts of "[ERROR] mysqld:
> > Deadlock found when trying to get lock; try restarting transaction" in
> the
> > MariaDB error log during the R/W phase. The Sysbench R/W TPS rates are
> still
> > respectable for my purposes, but how do I
> > correct this condition?? Can I
> > ignore this / must live with it?
> > I have tried three different MariaDB
> > releases (10.0.19, 10.1.12, 10.1.24) to narrow things down, but this
> message (x
> > 1000&#39;s) is happening across all releases in the Sysbench R/W phase.
> >
> > Here is the Sysbench command that drives the workload against MariaDB -
> >
> > sysbench --test=lua/oltp.lua --oltp_tables_count=128
> --oltp-table-size=85937
> > --rand-seed=42 --rand-type=uniform --num-threads=128 --oltp-read-only=off
> > --report-interval=2 --mysql-socket=/var/lib/mysql/mysql.sock
> --max-time=201
> > --max-requests=0 --mysql-user=root --percentile=99 run
> >
> > Would anyone
> > have a tip / idea /pointer?
> >
> >
> > Regards,
> >
> >
> > JC
> >
> >
> > John Cassidy
> >
> > Obere Bühlstrasse 21
> > 8700 Küsnacht
> > (ZH)
> > Switzerland / Suisse / Schweiz
> >
> >
> > Mobile: +49 152 58961601 <+49%201525%208961601> (Germany)
> > Mobile: +352 621 577 149 (Luxembourg)
> > Mobile: +41 78 769 17 97 <+41%2078%20769%2017%2097> (CH)
> > Landline: +41 44 509 1957
> >
> > http://www.jdcassidy.eu
> >
> > "Aut viam inveniam aut
> > faciam" - Hannibal.
> > _______________________________________________
> > Mailing list: https://launchpad.net/~maria-discuss
> > Post to :
> > maria-discuss@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~maria-discuss
> > More help : https://help.launchpad.net/ListHelp
> > --Mark Callaghan
> > mdcallag@xxxxxxxxx
> >
> > John Cassidy
> >
> > Obere Bühlstrasse 21
> > 8700
> > Küsnacht (ZH)
> > Switzerland / Suisse / Schweiz
> >
> >
> > Mobile:
> > +49 152 58961601 (Germany)
> > Mobile: +352 621 577 149 <+352%20621%20577%20149> (Luxembourg)
> > Mobile:
> > +41 78 769 17 97 <+41%2078%20769%2017%2097> (CH)
> > Landline: +41 44 509 1957 <+41%2044%20509%2019%2057>
> >
> > http://www.jdcassidy.eu
> >
> > "Aut viam inveniam aut faciam" -
> > Hannibal.
>
> > _______________________________________________
> > Mailing list: https://launchpad.net/~maria-discuss
> > Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~maria-discuss
> > More help : https://help.launchpad.net/ListHelp
>
>
>
>
> John Cassidy
>
> Obere Bühlstrasse 21
> 8700 Küsnacht (ZH)
> Switzerland / Suisse / Schweiz
>
>
> Mobile: +49 152 58961601 <+49%201525%208961601> (Germany)
> Mobile: +352 621 577 149 <+352%20621%20577%20149> (Luxembourg)
> Mobile: +41 78 769 17 97 <+41%2078%20769%2017%2097> (CH)
> Landline: +41 44 509 1957 <+41%2044%20509%2019%2057>
>
> http://www.jdcassidy.eu
>
> "Aut viam inveniam aut faciam" - Hannibal.
>
> John Cassidy
>
> Obere Bühlstrasse 21
> 8700 Küsnacht (ZH)
> Switzerland / Suisse / Schweiz
>
>
> Mobile: +49 152 58961601 <+49%201525%208961601> (Germany)
> Mobile: +352 621 577 149 <+352%20621%20577%20149> (Luxembourg)
> Mobile: +41 78 769 17 97 <+41%2078%20769%2017%2097> (CH)
> Landline: +41 44 509 1957 <+41%2044%20509%2019%2057>
>
> http://www.jdcassidy.eu
>
> "Aut viam inveniam aut faciam" - Hannibal.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>


-- 
Mark Callaghan
mdcallag@xxxxxxxxx

Follow ups

References