← Back to team overview

maria-discuss team mailing list archive

Re: Deadlocks in R/W phase of Sysbench

 

Re,

Yep  get this but just that io a supposed to be fast but still have to hit FS moved from PB to redo log to tblspace 
In  a perfect fast world all layer should have equal ressources so no reasons not giving as many threads to write io vs doing the work itself of modifying pages. 

Disabling  auto commit with sysbench is suspicious and intersting : never question myself on this topic before ?  

/svar
Stéphane Varoqui, Senior Consultant
Phone: +33 695-926-401, skype: svaroqui
http://www.mariadb.com <http://www.mariadb.com/>




> Le 9 juin 2017 à 16:47, J. Cassidy <sean@xxxxxxxxxxxx> a écrit :
> 
> Salut Stephane,
> 
> I/O does not enter into the equation, the database is in memory (RAM).
> 
> 
> Regards,
> 
> 
> JC
> 
> 
> Hello,
>  
> What IO scheduler are you using for FS and witch one ?
> deadline or loop is a must do. Also mounting with noatime can help for write
>  
> Would you try such settings i found to make a difference
> autocommit = 1
>  
>  
> innodb_adaptive_hash_index=0
> innodb_max_dirty_page_pct=20
> innodb_write_io_threads =64
> innodb_log_file_size = 1024M
> innodb_log_file_in_group = 4
> innodb_log_files_in_group = 4
> innodb_thread_concurrency = 0
> innodb_purge_threads = 8
> innodb_change_buffering=none
> innodb_open_files = 16384
> innodb_file_per_table=1
> innodb_autoinc_lock_mode = 2
>  
> And comment
> # innodb_lru_scan_depth = 4096
>  
>  
> Stéphane Varoqui, Senior Consultant
> 
> Phone: +33 695-926-401, skype: svaroqui
> http://www.mariadb.com <http://www.mariadb.com/>
>  
> 
>  
>  
> Le 9 juin 2017 à 16:22, J. Cassidy sean@xxxxxxxxxxxx> a écrit :
>  
> Mark,
> 
> 
> still scratching head...
> 
> 
> Regards,
> 
> 
> JC
> 
> 
>  
> Thanks. I was hoping there was an easy fix if you did something like used serializable isolation, but you haven't done that.
> On Fri, Jun 9, 2017 at 12:27 AM, J. Cassidy sean@xxxxxxxxxxxx <mailto:sean@xxxxxxxxxxxx>> wrote:
> All,
> 
> 
> as discussed yesterday, here is my setup and configuration.
> 
> 
> *************************************************************
> 16 VCPU 64GB Memory, DB size 12 GB (In RAM), MariaDB 10.1.24*
> Single Image. No clustering or outside network interaction *
> SuSE SLES 12 SP2 using a 4.4.49-92 kernel *
> *************************************************************
> ### Sysbench R/W - 64 Threads
> 
> sysbench 0.5: multi-threaded system evaluation benchmark
> 
> Running the test with following options:
> Number of threads: 64
> Report intermediate results every 2 second(s)
> Initializing random number generator from seed (42).
> 
> 
> Threads started!
> 
> [ 2s] threads: 64, tps: 51141.62, reads: 717023.15, writes: 204757.48, response time: 4.72ms (99%), errors: 54.00, reconnects: 0.00
> [ 4s] threads: 64, tps: 47938.19, reads: 671195.72, writes: 191759.28, response time: 4.65ms (99%), errors: 5.00, reconnects: 0.00
> [ 6s] threads: 64, tps: 46908.41, reads: 656790.25, writes: 187640.64, response time: 4.65ms (99%), errors: 2.50, reconnects: 0.00
> [ 8s] threads: 64, tps: 45940.12, reads: 643191.64, writes: 183781.47, response time: 4.44ms (99%), errors: 3.00, reconnects: 0.00
> [ 10s] threads: 64, tps: 45789.91, reads: 641083.20, writes: 183161.13, response time: 4.77ms (99%), errors: 2.00, reconnects: 0.00
> [ 12s] threads: 64, tps: 45408.37, reads: 635825.16, writes: 181672.97, response time: 4.80ms (99%), errors: 1.50, reconnects: 0.00
> [ 14s] threads: 64, tps: 44319.93, reads: 620440.07, writes: 177262.74, response time: 4.50ms (99%), errors: 0.50, reconnects: 0.00
> [ 16s] threads: 64, tps: 44634.78, reads: 624843.97, writes: 178523.13, response time: 4.71ms (99%), errors: 0.50, reconnects: 0.00
> .
> .
> .
> [ 572s] threads: 64, tps: 24980.71, reads: 349707.39, writes: 99924.32, response time: 5.92ms (99%), errors: 0.00, reconnects: 0.00
> [ 574s] threads: 64, tps: 25337.03, reads: 354796.98, writes: 101355.13, response time: 6.19ms (99%), errors: 0.00, reconnects: 0.00
> [ 576s] threads: 64, tps: 25196.31, reads: 352683.33, writes: 100777.24, response time: 6.49ms (99%), errors: 0.00, reconnects: 0.00
> [ 578s] threads: 64, tps: 25235.61, reads: 353317.05, writes: 100941.94, response time: 5.94ms (99%), errors: 0.00, reconnects: 0.00
> [ 580s] threads: 64, tps: 25241.11, reads: 353395.59, writes: 100956.96, response time: 6.09ms (99%), errors: 0.00, reconnects: 0.00
> [ 582s] threads: 64, tps: 25146.18, reads: 352056.04, writes: 100599.73, response time: 5.96ms (99%), errors: 0.00, reconnects: 0.00
> [ 584s] threads: 64, tps: 25242.59, reads: 353362.72, writes: 100978.85, response time: 6.21ms (99%), errors: 0.00, reconnects: 0.00
> [ 586s] threads: 64, tps: 25297.02, reads: 354187.80, writes: 101166.58, response time: 5.89ms (99%), errors: 0.00, reconnects: 0.00
> [ 588s] threads: 64, tps: 25070.60, reads: 350944.86, writes: 100283.39, response time: 6.33ms (99%), errors: 0.00, reconnects: 0.00
> [ 590s] threads: 64, tps: 25115.20, reads: 351666.84, writes: 100466.81, response time: 6.21ms (99%), errors: 0.00, reconnects: 0.00
> [ 592s] threads: 64, tps: 25165.11, reads: 352314.52, writes: 100668.94, response time: 5.98ms (99%), errors: 0.00, reconnects: 0.00
> [ 594s] threads: 64, tps: 25165.99, reads: 352322.89, writes: 100703.97, response time: 6.30ms (99%), errors: 0.00, reconnects: 0.00
> [ 596s] threads: 64, tps: 24854.73, reads: 347945.66, writes: 99370.90, response time: 6.17ms (99%), errors: 0.00, reconnects: 0.00
> [ 598s] threads: 64, tps: 24888.16, reads: 348463.80, writes: 99556.16, response time: 6.21ms (99%), errors: 0.00, reconnects: 0.00
> [ 600s] threads: 64, tps: 25158.22, reads: 352195.58, writes: 100633.38, response time: 5.94ms (99%), errors: 0.00, reconnects: 0.00
> OLTP test statistics:
> queries performed:
> read: 256244548
> write: 73212536
> other: 36606172
> total: 366063256
> transactions: 18302990 (30454.11 per sec.)
> read/write requests: 329457084 (548179.44 per sec.)
> other operations: 36606172 (60908.54 per sec.)
> ignored errors: 192 (0.32 per sec.)
> reconnects: 0 (0.00 per sec.)
> 
> General statistics:
> total time: 601.0023s
> total number of events: 18302990
> total time taken by event execution: 38450.5507s
> response time:
> min: 0.42ms
> avg: 2.10ms
> max: 62.26ms
> approx. 99 percentile: 5.62ms
> 
> Threads fairness:
> events (avg/stddev): 285984.2188/3788.79
> execution time (avg/stddev): 600.7899/0.00
> 
> 
> =================================================================================================================
> 
> ### Excerpt from MariaDB Error Log (Thousands of similiar messages).
> .
> .
> .
> 2017-06-08 8:10:44 4395980287248 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4353081699600 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4353087458576 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4360227998992 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4353073515792 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4390705949968 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4360227392784 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4353087458576 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4353073515792 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> 2017-06-08 8:10:44 4353087458576 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
> .
> .
> .
> 
> ===================================================================================================================
> The configuration below has been tried and tested and works across on different architectures e.g. S390X and S390X.
> 
> 
> [mysqld_safe]
> user=root
> password=''
> [mysqld]
> symbolic-links=0
> open_files_limit = 65535 # was 102400
> max_allowed_packet = 16M # was 768M
> max-connect-errors = 1000000
> connect_timeout = 120 # was 60
> performance-schema = false
> net_read_timeout = 720
> log-warnings=2
> #####files and sockets
> # innodb_log_group_home_dir= /data-lun22/logs
> pid-file=/tmp/mysqld.pid.sysbench
> port = 3306
> # log-error=/var/log/mariadb/mariadb-error-john.log
> log-error=/var/log/mariadb/mariadb-error-john.log
> general_log_file=/var/log/mariadb/mariadb-john.log
> long_query_time = 0
> slow_query_log = 0
> # slow_query_log_file = /var/log/mariadb/mariadb-slowquery.log
> # general-log
> # general_log_file=/var/log/mariadb/mariadb-general.log
> ##### MariaDB temporary tables
> # tmpdir in RAM
> tmpdir = /mnt/tmpdir
> #####non innodb options (fixed)
> max_connections = 4000
> back_log = 150
> table_open_cache = 8000 # was 2048 # was 4000 # was 8000
> key_buffer_size = 16M
> query_cache_type = 0 #was 1 # was 0
> join_buffer_size = 32K
> sort_buffer_size = 32K
> autocommit = 0
> thread-cache-size = 50
> table-definition-cache = 1024
> table-open-cache = 2048
> ####fixed innodb options
> innodb_file_per_table = true
> innodb_open_files = 4096 # was 8192 # was 4096
> innodb_data_file_path = ibdata1:250M:autoextend
> innodb_flush_log_at_trx_commit = 2
> innodb_flush_method = O_DIRECT_NO_FSYNC
> innodb_log_buffer_size = 256M
> innodb_log_file_size = 4G # was 14G # was 4G
> innodb_log_files_in_group = 2
> innodb_buffer_pool_size = 36G #was 8G # was 48G
> innodb_buffer_pool_instances = 8 # 16 # was 6 # was 32
> innodb_adaptive_hash_index_partitions = 32
> innodb_thread_concurrency = 0
> ####tuning for RAM disk
> innodb_adaptive_flushing = 1
> innodb_flush_neighbors = 0
> innodb_io_capacity = 8000 # was 20000
> innodb_io_capacity_max = 40000
> innodb_lru_scan_depth = 4096
> innodb_purge_threads = 2
> innodb_read_io_threads = 2
> innodb_write_io_threads = 2
> 
> 
> Note NMON reports 35% memory usage in the middle of the abovementioned Banchmark ergo the DB is in cache. No paging.
> 
> ++ Note The R/W TPS rate are "respectable" based on observations from countless other runs ran on other configurations/architecture.
> The TPS rates would obviously improve if the deadlock situation outlined above was resolved.
> 
> 
> So there you have it. Any tips / pointers that would enlighten me are much appreciated.
> 
> Regards,
> 
> 
> JC
> 
> 
> Privet Sergey,
> 
> do not think I have a problem with my config(s).
> 
> I am not on the machine at the moment.
> From memory, with 16 Cores / 32 VCPU, 64 GB memory, buffer pools, logs etc. tailored to this configuration, the TPS rate is
> 
> 64 threads - R/O 73K R/W 26K
> 128 threads - R/O 71K R/W 23K.
> 
> I am putting the DB in RAM of course to obviate I/O issues. I am interested in seeing what is happening with the CPU caches and nest..
> 
> NMON tells me that for the R/O runs I have approximately 85% User, 12-14% System and the rest Idle - this stays consistent throughout the run (300 seconds R/O, 600 seconds R/W) which is reasonable.
> 
> On the R/W runs, NMONs CPU usage display shows that some CPUs are barely firing, or only intermittently and the granularity I had with R/O is gone. This would seem to tie in with what I see in the MariaDB log - headwinds, violent storms and deadlocks.
> 
> Any further information you require, please let me know.
> 
> 
> 
> Regards,
> 
> 
> JC
> 
> Hi!
> 
> I just tried this on x86: 40 threads against single table having 10k rows.
> No deadlocks, "tps: 12372.40, reads: 173219.52, writes: 49490.51".
> 
> This is with latest snapshot of 10.1 and sysbench 1.0.
> 
> I believe you shouldn't see deadlocks on such a big data set. Probably
> 
> something is wrong with your configs?
> 
> Regards,
> Sergey
> 
> On Thu, Jun 08, 2017 at 04:51:31PM +0200, J. Cassidy wrote:
> >
> >
> > Hello Mark,
> >
> > appreciate the reply.
> >
> > The OLTP table size and
> > OLTP size give me a DB size of appx 12GB. This is what I want. I have in the
> > meantime, looked at some older logs and see that whatever amount of threads I
> > specify (8, 16,32, 64 or 128), the deadlock messages are still surfacing in the
> > R/W phase
> > of Sysbench. I even dropped in two different MariaDB release levels
> > (10.0.19, 10.1.24) to see whether it would make a difference, deadlock still
> > there. I am using Sysbench 0.5 for these tests.
> > I am currently using MariaDB
> > 10.1.24 - built directly on the machine. S390x by the way, but the problem is
> > also
> > occuring on an X86-64 box.
> >
> > Regards,
> >
> >
> > JC
> >
> >
> >
> > I run sysbench frequently with MySQL
> > (not MariaDB yet) and don&#39;t recall ever seeing this error. But I use much
> > larger values for --oltp-table-size. You used 85937 and I use 1M or larger.
> > Maybe
> > there is too much data contention with smaller tables, even if a large number of
> > tables is used (128 in your case). I use a smaller number of tables - between 4
> > and 20. On Thu, Jun 8, 2017 at 12:48 AM, J. Cassidy sean@xxxxxxxxxxxx <mailto:sean@xxxxxxxxxxxx>> wrote:
> > Hello all,
> >
> >
> > have used Sysbench extensively to test MariaDB CPU load factors between various
> > architectures.
> >
> > I am noticing massive amounts of "[ERROR] mysqld:
> > Deadlock found when trying to get lock; try restarting transaction" in the
> > MariaDB error log during the R/W phase. The Sysbench R/W TPS rates are still
> > respectable for my purposes, but how do I
> > correct this condition?? Can I
> > ignore this / must live with it?
> > I have tried three different MariaDB
> > releases (10.0.19, 10.1.12, 10.1.24) to narrow things down, but this message (x
> > 1000&#39;s) is happening across all releases in the Sysbench R/W phase.
> >
> > Here is the Sysbench command that drives the workload against MariaDB -
> >
> > sysbench --test=lua/oltp.lua --oltp_tables_count=128 --oltp-table-size=85937
> > --rand-seed=42 --rand-type=uniform --num-threads=128 --oltp-read-only=off
> > --report-interval=2 --mysql-socket=/var/lib/mysql/mysql.sock --max-time=201
> > --max-requests=0 --mysql-user=root --percentile=99 run
> >
> > Would anyone
> > have a tip / idea /pointer?
> >
> >
> > Regards,
> >
> >
> > JC
> >
> >
> > John Cassidy
> >
> > Obere Bühlstrasse 21
> > 8700 Küsnacht
> > (ZH)
> > Switzerland / Suisse / Schweiz
> >
> >
> > Mobile: +49 152 58961601 <x-msg://59/images/blank.png> (Germany)
> > Mobile: +352 621 577 149 (Luxembourg)
> > Mobile: +41 78 769 17 97 <x-msg://59/images/blank.png> (CH)
> > Landline: +41 44 509 1957
> >
> > http://www.jdcassidy.eu <http://www.jdcassidy.eu/>
> >
> > "Aut viam inveniam aut
> > faciam" - Hannibal.
> > _______________________________________________
> > Mailing list: https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> > Post to :
> > maria-discuss@xxxxxxxxxxxxxxxxxxx <mailto:maria-discuss@xxxxxxxxxxxxxxxxxxx>
> > Unsubscribe : https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> > More help : https://help.launchpad.net/ListHelp <https://help.launchpad.net/ListHelp>
> > --Mark Callaghan
> > mdcallag@xxxxxxxxx <mailto:mdcallag@xxxxxxxxx>
> >
> > John Cassidy
> >
> > Obere Bühlstrasse 21
> > 8700
> > Küsnacht (ZH)
> > Switzerland / Suisse / Schweiz
> >
> >
> > Mobile:
> > +49 152 58961601 (Germany)
> > Mobile: +352 621 577 149 <x-msg://59/images/blank.png> (Luxembourg)
> > Mobile:
> > +41 78 769 17 97 <x-msg://59/images/blank.png> (CH)
> > Landline: +41 44 509 1957 <x-msg://59/images/blank.png>
> >
> > http://www.jdcassidy.eu <http://www.jdcassidy.eu/>
> >
> > "Aut viam inveniam aut faciam" -
> > Hannibal.
> 
> > _______________________________________________
> > Mailing list: https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> > Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx <mailto:maria-discuss@xxxxxxxxxxxxxxxxxxx>
> > Unsubscribe : https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> > More help : https://help.launchpad.net/ListHelp <https://help.launchpad.net/ListHelp>
> 
> 
> 
> 
> John Cassidy
> 
> Obere Bühlstrasse 21
> 8700 Küsnacht (ZH)
> Switzerland / Suisse / Schweiz
> 
> 
> Mobile: +49 152 58961601 <x-msg://59/images/blank.png> (Germany)
> Mobile: +352 621 577 149 <x-msg://59/images/blank.png> (Luxembourg)
> Mobile: +41 78 769 17 97 <x-msg://59/images/blank.png> (CH)
> Landline: +41 44 509 1957 <x-msg://59/images/blank.png>
> 
> http://www.jdcassidy.eu <http://www.jdcassidy.eu/>
> 
> "Aut viam inveniam aut faciam" - Hannibal.
> 
> John Cassidy
> 
> Obere Bühlstrasse 21
> 8700 Küsnacht (ZH)
> Switzerland / Suisse / Schweiz
> 
> 
> Mobile: +49 152 58961601 <x-msg://59/images/blank.png> (Germany)
> Mobile: +352 621 577 149 <x-msg://59/images/blank.png> (Luxembourg)
> Mobile: +41 78 769 17 97 <x-msg://59/images/blank.png> (CH)
> Landline: +41 44 509 1957 <x-msg://59/images/blank.png>
> 
> http://www.jdcassidy.eu <http://www.jdcassidy.eu/>
> 
> "Aut viam inveniam aut faciam" - Hannibal.
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx <mailto:maria-discuss@xxxxxxxxxxxxxxxxxxx>
> Unsubscribe : https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> More help : https://help.launchpad.net/ListHelp <https://help.launchpad.net/ListHelp>
>  
> --
> Mark Callaghan
> mdcallag@xxxxxxxxx <mailto:mdcallag@xxxxxxxxx>
> 
> John Cassidy
> 
> Obere Bühlstrasse 21
> 8700 Küsnacht (ZH)
> Switzerland / Suisse / Schweiz
> 
> 
> Mobile: +49 152 58961601 (Germany)
> Mobile: +352 621 577 149 (Luxembourg)
> Mobile: +41 78 769 17 97 (CH)
> Landline: +41 44 509 1957
> 
> http://www.jdcassidy.eu <http://www.jdcassidy.eu/>
> 
> "Aut viam inveniam aut faciam" - Hannibal._______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> Post to : maria-discuss@xxxxxxxxxxxxxxxxxxx <mailto:maria-discuss@xxxxxxxxxxxxxxxxxxx>
> Unsubscribe : https://launchpad.net/~maria-discuss <https://launchpad.net/~maria-discuss>
> More help : https://help.launchpad.net/ListHelp <https://help.launchpad.net/ListHelp>
> 
> John Cassidy
> 
> Obere Bühlstrasse 21
> 8700 Küsnacht (ZH)
> Switzerland / Suisse / Schweiz
> 
> 
> Mobile: +49 152 58961601 (Germany)
> Mobile: +352 621 577 149 (Luxembourg)
> Mobile: +41 78 769 17 97 (CH)
> Landline: +41 44 509 1957
> 
> http://www.jdcassidy.eu
> 
> "Aut viam inveniam aut faciam" - Hannibal.


Follow ups

References