← Back to team overview

maria-discuss team mailing list archive

Re: MariaDB server horribly slow on start


I'll look into ZFS and check all tables asap, thanks.

Note that with 2 DCs, those will be synched with replication, not part of a single galera cluster, but 2 galera clusters. Atm the spare DC is a replication slave of the first DC. Only first DC is active, the second can be used for read-only purpose by a read-only data API, but it's barely used. The spare DC is here in case the active one goes down at our service provider (it happened in the past, fire hazard @ OVH!)

The biggest problem was really the node start slowing down the rest of the cluster. Second to that is the replication slave stopping for unknown reason. Other than those I could setup replication both ways to have both DC synched, not in real time but synched, thus changing the load balancing for a source IP instead of round robin and it should work as planned.

Since we experienced too many issues and writing the same data to 6-8 nodes seem counterproductive (against NVMe max writes) we're thinking about splitting our big customers to dedicated servers, completely separated from the rest, however this would require coding and implementing the redirection in our Android application and having dedicated URLs per customers, and our customers are in for it.

Anyway, one step at a time, first the issues that are causing huge problems, then we can focus on a better solution.

-----Message d'origine-----
De : Gordan Bobic <gordan.bobic@xxxxxxxxx> 
Envoyé : jeudi 28 juillet 2022 11:35
À : Cédric Counotte <cedric.counotte@xxxxxxxxxx>
Cc : jocelyn fournier <jocelyn.fournier@xxxxxxxxx>; Marko Mäkelä <marko.makela@xxxxxxxxxxx>; Mailing-List mariadb <maria-discuss@xxxxxxxxxxxxxxxxxxx>; Pierre LAFON <pierre.lafon@xxxxxxxxxx>
Objet : Re: [Maria-discuss] MariaDB server horribly slow on start

On Thu, Jul 28, 2022 at 12:07 PM Cédric Counotte <cedric.counotte@xxxxxxxxxx> wrote:
> Well, one server crashed twice a few days ago and I've asked my service provided (OVH) to look into it, but they asked me to test the hardware myself, found a NVMe disk with 17000+ errors, still waiting for their feedback on this.

It sounds like you need:
1) ZFS
2) Better monitoring

> Only our 2 oldest servers are experiencing crashes (6 months old only!), and it turns out the RAID NVMe have very different written data, one disk has 58TB (not a replacement) while the other is at 400+TB within the same RAID ! All other servers have identical written data size on both disks of their RAID, so it seems we got used disks and that those are having issues.

Welcome to the cloud. But this is not a bad thing, it's better than having multiple disks in the same array fail at the same time.
ZFS would help you by catching those errors before the database ingests them. In normal non-ZFS RAID, it is plausible and even quite probable that the corrupted data will be loaded from disk and propagate to other nodes, either via a state transfer or via corrupted binlogs.
ZFS prevents that by making sure every block's checksum is compared at read time and any errors that show up get recovered from other redundant disks.

Under the current circumstances, I wouldn't trust your data integrity until you run a full extended table check on all tables on all nodes.
And probably pt-table-checksum on all the tables between the nodes to make sure.

> Still didn't have time to produce a crash dump and post an issue with those (to confirm the cause) as I kept having to deal with server restarts trying to reduce the slow issue for 30 minutes to one hour.

you need to be careful with that - state transfer from a node with failing disks can actually result in the corrupted data propagating to the node being bootstrapped.

> There was issues with slave thread crashing which I posted an issue and got to update MariaDB to resolve, still there are issues with slave threads stopping without reason so I have written a script to restart it and posted an issue with that.

I don't think you can meaningfully debug anything until you have verified that your hardware is reliable.
Do your OVH servers have ECC memory?

> The original objective was to have 2 usable cluster in different sites, synched with each other using replication, however all those issues have not allowed us to move forward with this.

With 4 nodes across 2 DCs, you are going to lose writability if you lose a DC even if it is the secondary DC.
Your writes are also going to be very slow because with 4 nodes, all writes have to be acknowledged by 3 nodes - and the 3rd node is always going to be slow because it is connected over a WAN.
I would seriously question whether Galera is the correct solution for you.
And that's on top of writing to multiple nodes which will make things far worse on top.

> Not to mention the fact that we are now using OVH load balancer and that piece of hardware is sometimes thinking all our servers are down and starts showing error 503 to our customers while our servers are just running fine (no restart, no issue, nothing). So one more issue to deal with, for which we'll get a dedicated server and configure our own load balancer we can have control on.

I think you need to take a long hard look at what you are trying to achieve and re-assess:
1) Whether it is actually achievable sensibly within the constraints you imposed
2) What the best workable compromise is between what you want and what you can reasonably have

Right now, I don't think you have a solution that is likely to be workable.

Follow ups