← Back to team overview

maria-discuss team mailing list archive

Re: Performance tuning sought for MariaDB

 

Opening many connections will cause issues, especially if each thread is
running a sub-optimal query. Try to decrease the number of connections,
one-way can be using message queues.

Tuning the database I/O is important, ensure you are using a separate mount
point for /var/lib/mysql and mount it with noatime. Resize the
innodb_buffer_pool_size to ensure that it can hold your workload.
Subsequently, perform some tests with innodb_flush_method = "O_DIRECT".

>From your description, data S is the key which in turn is composed of T and
A. Can the data be stripped by the application and T and A stored
separately? Maybe use persistent virtual columns and index those instead.
Recall that in InnoDB the primary key is a clustered index thus the table
is written to disk sorted by column S. Inserts and updates may require
moving rows around slowing the SQL statements (high insert/update
concurrency will worsen the situation). If column S is the primary key and
is large, all other indexes will use the primary key as the prefix, causing
indexes to be large increasing the load on the system I/O.



On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@xxxxxxxxx> wrote:

> Thanks for your feedback. Please see my comments  interspersed below.
>
> On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@xxxxxxxxxxxx>
> wrote:
>
>> Hello,
>>
>> thread-handling=pool-of-threads
>>> max_connections = 1000
>>> table_open_cache = 800
>>> query_cache_type = 0
>>> innodb_buffer_pool_size = 512M
>>> innodb_buffer_pool_instances = 10
>>> innodb_adaptive_hash_index_partitions = 20
>>> innodb_lock_wait_timeout = 5000
>>>
>> No need to use buffer pool instances with only 512M of buffer. you said
>> you have 24GB of RAM - why not increase the buffer size? how big is your
>> table on the disk right now? If you want the best performance it must be
>> hold in the buffer.
>>
>
> It does not seem to be that big - about 101 MB. I used the following
> command:
>
>  SELECT table_name AS `Table`, round(((data_length + index_length) / 1024
> / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema =
> "<my-db-name>";
>
> I had to look it up.
>
>
>>> With this, my application can keep up with the FIFO writer, but -
>>> depending on the circumstances - my database can't. As I am writing this,
>>> there are over 1300 threads connected to my database; any command that I
>>> issue at the mysql CLI takes over one minute to return. I am keeping track
>>> on how long each thread takes to complete, and that is of the order of
>>> hundreds of seconds - sometimes thousands. Each thread is itself simple, in
>>> that it just issues a couple of simple MariaDB commands.  Currently my
>>> table consists of 1.6 million entries, and growing - on this basis, I
>>> expect that things will get only worse. Each entry,however, will never
>>> require more than a couple of hundred bytes of storage. The operations that
>>> can be undertaken on entries are insertion, deletion and modification, the
>>> latter being straightforward - like e.g. incrementing a counter or
>>> replacing a short string.
>>>
>>
>> You are not considering many factors which is
>> a) limiting the number of concurrent threads - with 1300 threads you are
>> creating concurrency races and locking issues. Try limiting your threads to
>> a factor of the # of cores
>>
>
>    I will. Notice, though, that most of the time most of the cores are
> idle anyway.
>
>
>> b), you're writing to a disk system. The number of CPU cores won't matter
>> if you saturate the disk. You say nothing about the disk, if it's SSD, HDD
>> etc. Note that HDD's are severely handicapped when it comes to concurrent
>> IO operations.
>>
>
>    It is an HDD. I am sure it will be a factor in making things slower.
>
>
>> c) given the factor above you could maybe try relaxing commit to disk if
>> integrity of the data is not of utmost importance, for example by adding
>> "innodb_flush_log_at_trx_commit = 2" to your config.
>>
>>      Thanks - I'll try that.
>
>
>> My system has 24 GB of  RAM and 12 cores. Occasionally all the cores are
>> fully busy with MariaDB activity, but most of the time barely one or two
>> are.
>>
>> d) CPU in a DB system will mostly be used for queries (index scans) and
>> some for reindexing but you're only doing writes, so unless you have dozens
>> of indexes
>> e) Table design - is your table design efficient ? how many rows per sec
>> are you writing? maybe you could also benefit from hash partitioning or
>> clever data types.
>>
>>>
>>> I am a newbie when it comes to interacting with MariaDB - please,  bear
>>> with me. I know I must use a single database and a single table. I also
>>> know - because of the nature of the data that are being written to the FIFO
>>> - that the  probability for two different threads to be operating on the
>>> same entry in the table at the same time is negligible - i.e. for all
>>> practical purposes, that will not happen.
>>> What I need is advice on how to configure my instance of MariaDB to
>>> perform optimally in the scenario above. In particular, I would like for it
>>> to make better use of all the cores available - in essence, to parallelize
>>> the database operations as much as possible.
>>>
>>> f) Congratulations, you have managed to contradict yourself in two
>> sentences.
>>
> If your workload isn't parallel by design you will not make use of the
>> available cores. Also, your workload sounds IO-bound to me - there's a
>> strong change the disk is the bottleneck.
>> g) "I know I must use a single database and a single table." How do you
>> know this if you are a newbie? No offense meant but nost of monolithic
>> design is not the best to leverage performance....
>>
>>      The data that I have consist of a string S, two string attributes T
> and A, and an integer attribute D.  String S is what I have to use as a key
> in order to insert/delete/modify entries. Can this be advantageously be
> stored across several tables?  The S strings have nothing in common, beyond
> the attributes that I mentioned. As for the parallelism, the essence is
> what I pointed out - namely, that no two threads will work on the same
> entry (as accessed by S) at the same time.  They can of course be adding
> new entries, or deleting existing ones, at the same time, with the proviso
> that the insertion operations will always be for different values of S, and
> analogously for the deletion operations. I am indeed open to suggestions
> about a better design.
>
>
>> -GL
>>
> _______________________________________________
> Mailing list: https://launchpad.net/~maria-discuss
> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References