← Back to team overview

maria-discuss team mailing list archive

Re: Performance tuning sought for MariaDB

 

On Mon, Oct 7, 2019 at 12:12 PM Kenneth Penza <kpenza@xxxxxxxxx> wrote:

> Opening many connections will cause issues, especially if each thread is
> running a sub-optimal query. Try to decrease the number of connections,
> one-way can be using message queues.
>
> Tuning the database I/O is important, ensure you are using a separate
> mount point for /var/lib/mysql and mount it with noatime. Resize the
> innodb_buffer_pool_size to ensure that it can hold your workload.
> Subsequently, perform some tests with innodb_flush_method = "O_DIRECT".
>
> From your description, data S is the key which in turn is composed of T
> and A. Can the data be stripped by the application and T and A stored
> separately? Maybe use persistent virtual columns and index those instead.
>

       I don't think I understand. It is not that S is composed of T and A;
what I am reading from the FIFO is two items: S and A. T is derived from S
and A, together with a timestamp.

Recall that in InnoDB the primary key is a clustered index thus the table
> is written to disk sorted by column S. Inserts and updates may require
> moving rows around slowing the SQL statements (high insert/update
> concurrency will worsen the situation). If column S is the primary key and
> is large, all other indexes will use the primary key as the prefix, causing
> indexes to be large increasing the load on the system I/O.
>

      There is something to what you are saying, for I have a high
insert/update concurrency. I will definitely try to understand to details
of what you are suggesting.

>
>
>
> On Mon, Oct 7, 2019 at 7:13 PM JCA <1.41421@xxxxxxxxx> wrote:
>
>> Thanks for your feedback. Please see my comments  interspersed below.
>>
>> On Mon, Oct 7, 2019 at 10:38 AM Guillaume Lefranc <guillaume@xxxxxxxxxxxx>
>> wrote:
>>
>>> Hello,
>>>
>>> thread-handling=pool-of-threads
>>>> max_connections = 1000
>>>> table_open_cache = 800
>>>> query_cache_type = 0
>>>> innodb_buffer_pool_size = 512M
>>>> innodb_buffer_pool_instances = 10
>>>> innodb_adaptive_hash_index_partitions = 20
>>>> innodb_lock_wait_timeout = 5000
>>>>
>>> No need to use buffer pool instances with only 512M of buffer. you said
>>> you have 24GB of RAM - why not increase the buffer size? how big is your
>>> table on the disk right now? If you want the best performance it must be
>>> hold in the buffer.
>>>
>>
>> It does not seem to be that big - about 101 MB. I used the following
>> command:
>>
>>  SELECT table_name AS `Table`, round(((data_length + index_length) / 1024
>> / 1024), 2) `Size (MB)` FROM information_schema.TABLES WHERE table_schema =
>> "<my-db-name>";
>>
>> I had to look it up.
>>
>>
>>>> With this, my application can keep up with the FIFO writer, but -
>>>> depending on the circumstances - my database can't. As I am writing this,
>>>> there are over 1300 threads connected to my database; any command that I
>>>> issue at the mysql CLI takes over one minute to return. I am keeping track
>>>> on how long each thread takes to complete, and that is of the order of
>>>> hundreds of seconds - sometimes thousands. Each thread is itself simple, in
>>>> that it just issues a couple of simple MariaDB commands.  Currently my
>>>> table consists of 1.6 million entries, and growing - on this basis, I
>>>> expect that things will get only worse. Each entry,however, will never
>>>> require more than a couple of hundred bytes of storage. The operations that
>>>> can be undertaken on entries are insertion, deletion and modification, the
>>>> latter being straightforward - like e.g. incrementing a counter or
>>>> replacing a short string.
>>>>
>>>
>>> You are not considering many factors which is
>>> a) limiting the number of concurrent threads - with 1300 threads you are
>>> creating concurrency races and locking issues. Try limiting your threads to
>>> a factor of the # of cores
>>>
>>
>>    I will. Notice, though, that most of the time most of the cores are
>> idle anyway.
>>
>>
>>> b), you're writing to a disk system. The number of CPU cores won't
>>> matter if you saturate the disk. You say nothing about the disk, if it's
>>> SSD, HDD etc. Note that HDD's are severely handicapped when it comes to
>>> concurrent IO operations.
>>>
>>
>>    It is an HDD. I am sure it will be a factor in making things slower.
>>
>>
>>> c) given the factor above you could maybe try relaxing commit to disk if
>>> integrity of the data is not of utmost importance, for example by adding
>>> "innodb_flush_log_at_trx_commit = 2" to your config.
>>>
>>>      Thanks - I'll try that.
>>
>>
>>> My system has 24 GB of  RAM and 12 cores. Occasionally all the cores are
>>> fully busy with MariaDB activity, but most of the time barely one or two
>>> are.
>>>
>>> d) CPU in a DB system will mostly be used for queries (index scans) and
>>> some for reindexing but you're only doing writes, so unless you have dozens
>>> of indexes
>>> e) Table design - is your table design efficient ? how many rows per sec
>>> are you writing? maybe you could also benefit from hash partitioning or
>>> clever data types.
>>>
>>>>
>>>> I am a newbie when it comes to interacting with MariaDB - please,  bear
>>>> with me. I know I must use a single database and a single table. I also
>>>> know - because of the nature of the data that are being written to the FIFO
>>>> - that the  probability for two different threads to be operating on the
>>>> same entry in the table at the same time is negligible - i.e. for all
>>>> practical purposes, that will not happen.
>>>> What I need is advice on how to configure my instance of MariaDB to
>>>> perform optimally in the scenario above. In particular, I would like for it
>>>> to make better use of all the cores available - in essence, to parallelize
>>>> the database operations as much as possible.
>>>>
>>>> f) Congratulations, you have managed to contradict yourself in two
>>> sentences.
>>>
>> If your workload isn't parallel by design you will not make use of the
>>> available cores. Also, your workload sounds IO-bound to me - there's a
>>> strong change the disk is the bottleneck.
>>> g) "I know I must use a single database and a single table." How do you
>>> know this if you are a newbie? No offense meant but nost of monolithic
>>> design is not the best to leverage performance....
>>>
>>>      The data that I have consist of a string S, two string attributes T
>> and A, and an integer attribute D.  String S is what I have to use as a key
>> in order to insert/delete/modify entries. Can this be advantageously be
>> stored across several tables?  The S strings have nothing in common, beyond
>> the attributes that I mentioned. As for the parallelism, the essence is
>> what I pointed out - namely, that no two threads will work on the same
>> entry (as accessed by S) at the same time.  They can of course be adding
>> new entries, or deleting existing ones, at the same time, with the proviso
>> that the insertion operations will always be for different values of S, and
>> analogously for the deletion operations. I am indeed open to suggestions
>> about a better design.
>>
>>
>>> -GL
>>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~maria-discuss
>> Post to     : maria-discuss@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~maria-discuss
>> More help   : https://help.launchpad.net/ListHelp
>>
>

Follow ups

References