← Back to team overview

maria-discuss team mailing list archive

Re: Slave corruption on myrocks update

 

Update: I was looking through the open tickets on JIRA, this could be a
match: https://jira.mariadb.org/browse/MDEV-20456

Am Do., 19. Sept. 2019 um 17:07 Uhr schrieb Jonas Krauss <
jkrauss82@xxxxxxxxx>:

> Hi all,
>
> I am trying to upgrade MariaDB to a more recent version which includes
> https://jira.mariadb.org/browse/MDEV-19795 and thus a newer rocksdb
> version. We are running a mixed database of innodb and rocksdb tables.
>
> We are currently using 10.2.21-MariaDB-10.2.21+maria~stretch-log /
> mariadb.org binary distribution (on the master). The reported rocksdb
> version is 5.14.0, compile date Dec 30 2018.
>
> I have set up a new slave with the database hard-copied from the master's
> filesystem (with mariabackup) and MariaDB versions 10.4.7 / 10.3.17 and
> 10.2.26. As a first thing before replication starts I of course run
> mysql_upgrade, which goes through just fine. Once the replication catches
> up (row-based), it eventually hits some kind of wall and rocksdb detects a
> checksum mismatch during compaction, e.g.:
>
> 2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN]
> [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption:
> block checksum mismatch: expected 1973625368, got 2324967102  in
> /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907
>
> MariaDB then crashes with the dreaded exit code 6 (see log below). The
> database is unusable afterwards and does not start up anymore.
>
> The same procedure for setting up a slave and replicating works fine when
> staying at 10.2.21.
>
> I suspect there might be an incompatibility between rocksdb 5.14.0 and
> 6.2.0 (version used in the newer MariaDBs). However, I have read through
> the changelog at https://github.com/facebook/rocksdb/releases and have
> found no clear candidate for that.
>
> Any help or ideas are greatly appreciated. I can also provide more logs on
> request.
>
> Thanks
>
> Jonas
>
> --------------
> *RocksDB options from my.cnf:*
>
> plugin-load-add = ha_rocksdb.so
> default-storage-engine = rocksdb
> default-tmp-storage-engine = MyISAM
> transaction-isolation = READ-COMMITTED
> rocksdb_unsafe_for_binlog = 1 # enables statement based replication
> rocksdb_datadir = /data/mysql/rocksdb
> rocksdb_wal_dir = /var/mysql/rocksdb
> rocksdb_tmpdir = /var/mysql/rocksdb
>
> rocksdb_flush_log_at_trx_commit = 0
> rocksdb_use_direct_io_for_flush_and_compaction = 0
> rocksdb_use_direct_reads = 0
>
> rocksdb_max_open_files = -1
> rocksdb_max_background_jobs = 1
> rocksdb_max_total_wal_size = 5G
> rocksdb_block_size = 64K
> rocksdb_block_cache_size = 16G
> rocksdb_table_cache_numshardbits = 6
> rocksdb_new_table_reader_for_compaction_inputs = 1
> rocksdb_compaction_readahead_size = 4M
>
> # rate limiter
> rocksdb_bytes_per_sync = 4M
> rocksdb_wal_bytes_per_sync = 4M
> rocksdb_rate_limiter_bytes_per_sec = 80M # MB/s. Increase if you're
> running on higher spec machines
>
> # triggering compaction if there are many sequential deletes
> rocksdb_compaction_sequential_deletes_count_sd = 1
> rocksdb_compaction_sequential_deletes = 199999
> rocksdb_compaction_sequential_deletes_window = 200000
>
>
> rocksdb_default_cf_options=write_buffer_size=128m;max_write_buffer_number=4;target_file_size_base=256m;max_bytes_for_level_base=2560m;target_file_size_multiplier=2;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;compression=kLZ4Compression;bottommost_compression=kZlibCompression;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio
>
> ---------------------------
> *MariaDB error log:*
>
> 190919 16:15:31 [ERROR] mysqld got signal 6 ;
> This could be because you hit a bug. It is also possible that this binary
> or one of the libraries it was linked against is corrupt, improperly built,
> or misconfigured. This error can also be caused by malfunctioning hardware.
>
> To report this bug, see https://mariadb.com/kb/en/reporting-bugs
>
> We will try our best to scrape up some info that will hopefully help
> diagnose the problem, but since we have already crashed,
> something is definitely wrong and this may fail.
>
> Server version: 10.2.27-MariaDB-1:10.2.27+maria~bionic-log
> key_buffer_size=134217728
> read_buffer_size=131072
> max_used_connections=20
> max_threads=3002
> thread_count=37
> It is possible that mysqld could use up to
> key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads =
> 6727453 K  bytes of memory
> Hope that's ok; if not, decrease some variables in the equation.
>
> Thread pointer: 0x0
> Attempting backtrace. You can use the following information to find out
> where mysqld died. If you see no messages after this, something went
> terribly wrong...
> stack_bottom = 0x0 thread_stack 0x30000
> *** buffer overflow detected ***: /usr/sbin/mysqld terminated
> terminate called after throwing an instance of 'std::system_error'
>   what():  Resource deadlock avoided
> 2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: There was a
> corruption detected in RockDB files. Check error log emitted earlier for
> more details.
> 2019-09-19 16:15:39 140198954641408 [ERROR] RocksDB: The server will exit
> normally and stop restart attempts. Remove
> /data/mysql/rocksdb/ROCKSDB_CORRUPTED file from data directory and start
> mysqld manually.
>
>
> --------------------
> *RocksDB error log (last 10 lines):*
>
> 2019/09/19-16:15:22.696889 7f7ae33ff700 EVENT_LOG_v1 {"time_micros":
> 1568902522696836, "cf_name": "default", "job": 41, "event":
> "table_file_creation", "file_number": 181274, "file_size": 180342221,
> "table_properties": {"data_size": 170893376, "index_size": 333943,
> "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key":
> 0, "index_value_is_delta_encoded": 0, "filter_size": 9317061,
> "raw_key_size": 388460606, "raw_average_key_size": 52, "raw_value_size":
> 191015485, "raw_average_value_size": 25, "num_data_blocks": 5571,
> "num_entries": 7455469, "num_deletions": 585078, "num_merge_operands": 0,
> "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0,
> "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name":
> "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10",
> "merge_operator": "nullptr", "prefix_extractor_name": "nullptr",
> "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4",
> "compression_options": "window_bits=-14; level=6; strategy=0;
> max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time":
> 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902510,
> "__indexstats__": "[...37  records...]"}}
> 2019/09/19-16:15:25.772863 7f7ae33ff700
> [compaction/compaction_job.cc:1332] [default] [JOB 41] Generated table
> #181275: 770219 keys, 129144877 bytes
> 2019/09/19-16:15:25.772927 7f7ae33ff700 EVENT_LOG_v1 {"time_micros":
> 1568902525772887, "cf_name": "default", "job": 41, "event":
> "table_file_creation", "file_number": 181275, "file_size": 129144877,
> "table_properties": {"data_size": 128134168, "index_size": 86084,
> "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key":
> 0, "index_value_is_delta_encoded": 0, "filter_size": 962885,
> "raw_key_size": 15404380, "raw_average_key_size": 20, "raw_value_size":
> 167199905, "raw_average_value_size": 217, "num_data_blocks": 2752,
> "num_entries": 770219, "num_deletions": 0, "num_merge_operands": 0,
> "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0,
> "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name":
> "default", "column_family_id": 0, "comparator": "RocksDB_SE_v3.10",
> "merge_operator": "nullptr", "prefix_extractor_name": "nullptr",
> "property_collectors": "[Rdb_tbl_prop_coll_factory]", "compression": "LZ4",
> "compression_options": "window_bits=-14; level=6; strategy=0;
> max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time":
> 1568902340, "oldest_key_time": 0, "file_creation_time": 1568902522,
> "__indexstats__": "[...1  records...]"}}
> 2019/09/19-16:15:25.883030 7f7ae33ff700
> [compaction/compaction_job.cc:1395] [default] [JOB 41] Compacted 1@3 + 7@4
> files to L4 => 1889443497 bytes
> 2019/09/19-16:15:26.573155 7f7ae33ff700 (Original Log Time
> 2019/09/19-16:15:26.571946) [compaction/compaction_job.cc:763] [default]
> compacted to: base level 3 level multiplier 10.00 max bytes base 2684354560
> files[0 0 0 9 23 139 1185] max score 1.00 (2 files need compaction),
> MB/sec: 19.1 rd, 18.6 wr, level 4, files in(1, 7) out(9) MB in(278.5,
> 1579.4) out(1801.9), read-write-amplify(13.1) write-amplify(6.5) OK,
> records in: 67833049, records dropped: 256224 output_compression: LZ4
> 2019/09/19-16:15:26.573173 7f7ae33ff700 (Original Log Time
> 2019/09/19-16:15:26.572001) EVENT_LOG_v1 {"time_micros": 1568902526571979,
> "job": 41, "event": "compaction_finished", "compaction_time_micros":
> 101734971, "compaction_time_cpu_micros": 48728676, "output_level": 4,
> "num_output_files": 9, "total_output_size": 1889443497,
> "num_input_records": 67741208, "num_output_records": 67484984,
> "num_subcompactions": 1, "output_compression": "LZ4",
> "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
> "lsm_state": [0, 0, 0, 9, 23, 139, 1185]}
> 2019/09/19-16:15:26.573342 7f7ae33ff700
> [compaction/compaction_job.cc:1645] [default] [JOB 42] Compacting 1@4 +
> 15@5 files to L5, score 0.00
> 2019/09/19-16:15:26.573366 7f7ae33ff700
> [compaction/compaction_job.cc:1649] [default] Compaction start summary:
> Base version 44 Base level 4, inputs: [181272(167MB)], [180583(535MB)
> 180788(537MB) 180963(12MB) 180964(513MB) 181094(21MB) 181095(567MB)
> 181096(556MB) 181097(353MB) 181098(367MB) 181099(465MB) 181101(559MB)
> 181102(390MB) 181103(579MB) 181104(353MB) 181105(576MB)]
> 2019/09/19-16:15:26.573462 7f7ae33ff700 EVENT_LOG_v1 {"time_micros":
> 1568902526573441, "job": 42, "event": "compaction_started",
> "compaction_reason": "ManualCompaction", "files_L4": [181272], "files_L5":
> [180583, 180788, 180963, 180964, 181094, 181095, 181096, 181097, 181098,
> 181099, 181101, 181102, 181103, 181104, 181105], "score": 0,
> "input_data_size": 6878343156}
> 2019/09/19-16:15:31.512013 7f7ae33ff700 [WARN]
> [db_impl/db_impl_compaction_flush.cc:2726] Compaction error: Corruption:
> block checksum mismatch: expected 1973625368, got 2324967102  in
> /data/mysql/rocksdb/180583.sst offset 153437494 size 6567907
>

References