maria-discuss team mailing list archive
-
maria-discuss team
-
Mailing list archive
-
Message #05589
Re: fsync alternative
Hi Jure, maybe you can try using toku engine. I noticed that with large keycaches syncing it to disk can sometimes take up to a minute for myisam tables. Also innodb is generally heavy on I/O. Toku on the other hand is heavier on CPU/memory but seems very light on I/O (from my experience), so maybe that would help... Rocksdb should be light on random I/O access as it's using LSM, haven't used it in production yet and not sure how mature it is but may work... You can try to disable binlogs/fsync almost altogether if you don't care about data consistency, it's very unrecommended. And if you're using replication skip the 3'rd line because it'll break it, if you're not using replication remove _xxx. SET GLOBAL innodb_flush_log_at_trx_commit = 0; SET GLOBAL sync_binlog = 0; SET GLOBAL log_bin_xxx = 0; But i believe SAN is really the issue. Benchmark this SAN (throughput / iops)... if not conclusive try running from local SSD on raid10 or something similar... raid5/6 works horribly with SSDs and databases in general due to trim support and write amplification (not sure what raid you're using)... www.percona.com www.percona.com www.percona.com www.percona.com For kernel's pagecache/fsync comment, i think it's working as expected and you're missing write-behind not OS file cache... I guess without fsync it'd be totally impossible to have the data in consistent state after a crash, so you could not only loose couple of second's worth of data but whole table, as writes could be postponed and re-ordered, especially on SSDs... So for example if you'd do copy-on-write to make changes atomic, the pointer to data block could be written before the data block is (fully) there on disk and you'd get a structure pointing to not-existing or random data. To be sure you're having anything which is making sense in your files you'd need to 1. write your data 2. ensure it's on the device in full (fsync) 3. atomically change the data pointer from OLD to NEW data, which is probably using assumption that block-aligned writes below 512b should be atomic, or some log-like structure Maria is probably doing it little differently, but concept is for sure similar if you don't want your software to be bound to some single, specific, probably very expensive and propertialy mix of software and hardware which can support some basic atomic I/O. For IO over network... actually had a huge issues even with running simple readonly sqlite over network-attached I/O via NFS. I believe such kind of setup when your throughput is high will at least need dedicated 10Gbps network link. And still SAN/NAS can have "normal" throughput on 10Gb, but latency will probably be much higher (so slower fsync's). I'd try to move this SQL server outside of VM and to local storage. Best, Slawomir. Dnia 7 września 2019 21:38 Jure Sah <e@xxxxxxxxxx> napisał(a): Hi, On 7. 09. 19 21:03, Jan Steinman wrote: I’m using an inexpensive Mac Mini, maxed out with RAM, and a 2GB SSD, running NOTHING but MariaDB. I even run it headless, which means all the UI processes stay in sleep(3). When I was having web server performance issues, that was the one thing that improved things the most. And that was after wasting a lot of time trying to tweak MariaDB variables. I work for an ISP. The system they use is connected to a dedicated SSD RAID array on SAN. On the particular virtual machine the MariaDB is on the same server as the Apache webserver. The main issue is that the website gets a lot of traffic. It has some CMS-based website on it and the session table is grinding away constantly. Normally it's not a problem, but once when a malfunction on the SAN network caused degraded performance (not downtime mind you!), with the usual traffic the website was simply down with Gateway timeout from memcached. The workaround was to temporarily move the database to a ramdisk (tmpfs). My employer considers stepping outside the recommendations of the open source community to be not worth the risk, and the issue was resolved since. However I know that I shouldn't have had to make the workaround with the ramdisk, because the kernel page cache was supposed to take care of that by itself. The reason this doesn't work is the fsyncs used by MariaDB, which effectively disables any advantage offered by the page cache. I think that is a great shame so I am trying to see if there is anything I can do about that. Perhaps there is a better alternative out there but nobody has as of yet tried to make it work. I'm also not entirely convinced there is a reasonable argument for being so particular about fsyncing everything on typical server hardware (servers are usually UPS backed). The argument about battery-backed RAID controllers, while this is something we have, is something I find particularly unreasonable, since as far as I know fsync isn't even supported by most RAID controller drivers in Linux. There's an email on the Kernel mailing list that it does exist, but that Linux doesn't use it. Disabling fsync boosts performance of a typical MySQL server by something like a factor of 3. Are you really sure about that? I don’t know the MySQL implementation details, but typically, sync(2) is run from a dedicated thread per file, and so should not block anything, unless you’ve run out of threads or something, in which case, tuning may help. I found the measurement by some blogger on the internet. I'm not sure what hardware they tried this on, it seems to me it would depend. In the aforementioned malfunction it was pretty clear that MariaDB's performance is capped by storage. It makes sense too, if you are fsyncing on write operations, it's going to block if the storage is slow. I'm not excluding the possibility that there is a setting here that is making my system behave the way that it is. LP, Jure ______________________________ Mailing list: launchpad.net launchpad.net Post to : maria-discuss@lists.launchpad. Unsubscribe : launchpad.net launchpad.net More help : help.launchpad.net help.launchpad.net
References