Investigate MySQL Performance issues on Rosetta (l10n.kde.org) - Possible Server Replacement Needed
Not too long ago, we received reports that the MySQL performance on Rosetta had deteriorated with several changes tried by @ltoscano yielding either no change or making the issue worse.
Among these most notably was that switching to InnoDB reduced performance, which is counter-intuitive given that InnoDB supports row-level locking compared with the table-level locking that MyISAM is limited to.
Analysis of a MySQLTuner report gives some insights as to the potential causes for the issue here:
>> MySQLTuner 1.7.21 - Major Hayden <major@mhtx.net>
>> Bug reports, feature requests, and downloads at http://mysqltuner.pl/
>> Run with '--help' for additional options and output filtering
[--] Skipped version check for MySQLTuner script
[OK] Logged in using credentials from Debian maintenance account.
[OK] Currently running supported MySQL version 5.7.33-0ubuntu0.18.04.1
[OK] Operating on 64-bit architecture
-------- Log file Recommendations ------------------------------------------------------------------
[OK] Log file /var/log/mysql/error.log exists
[--] Log file: /var/log/mysql/error.log(0B)
[--] Log file /var/log/mysql/error.log is empty. Assuming log-rotation. Use --server-log={file} for explicit file
-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +BLACKHOLE +CSV -FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA
[--] Data in MyISAM tables: 9.7G (Tables: 14)
[--] Data in InnoDB tables: 16.0K (Tables: 1)
[OK] Total fragmented tables: 0
-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.
-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[--] There are 620 basic passwords in the list.
-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 17d 15h 25m 23s (18M q [12.444 qps], 16K conn, TX: 57G, RX: 7G)
[--] Reads / Writes: 0% / 100%
[--] Binary logging is disabled
[--] Physical Memory : 5.8G
[--] Max MySQL memory : 2.7G
[--] Other process memory: 0B
[--] Total buffers: 192.0M global + 17.1M per thread (151 max threads)
[--] P_S Max memory usage: 72B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 362.6M (6.08% of installed RAM)
[OK] Maximum possible memory usage: 2.7G (46.45% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (0/18M)
[OK] Highest usage of available connections: 6% (10/151)
[OK] Aborted connections: 0.00% (0/16529)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 1% (377 temp sorts / 36K sorts)
[OK] No joins without indexes
[!!] Temporary tables created on disk: 26% (6K on disk / 24K total)
[OK] Thread cache hit rate: 99% (14 created / 16K connections)
[OK] Table cache hit rate: 99% (811 open / 818 opened)
[OK] table_definition_cache(1400) is upper than number of tables(294)
[OK] Open file limit used: 5% (296/5K)
[OK] Table locks acquired immediately: 99% (10M immediate / 10M locks)
-------- Performance schema ------------------------------------------------------------------------
[--] Memory used by P_S: 72B
[--] Sys schema is installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is disabled.
-------- MyISAM Metrics ----------------------------------------------------------------------------
[OK] Key buffer used: 100.0% (16M used / 16M cache)
[!!] Key buffer size / total MyISAM indexes: 16.0M/2.1G
[!!] Read Key buffer hit rate: 91.1% (4B cached / 398M reads)
[!!] Write Key buffer hit rate: 50.8% (652M cached / 331M writes)
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 128.0M/16.0K
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (75 %): 48.0M * 2/128.0M should be equal to 25%
[OK] InnoDB buffer pool instances: 1
[--] Number of InnoDB Buffer Pool Chunk : 1 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.72% (148937 hits/ 149352 total)
[!!] InnoDB Write Log efficiency: 0% (2 hits/ 0 total)
[OK] InnoDB log waits: 0.00% (0 waits / 2 writes)
-------- Aria Metrics ------------------------------------------------------------------------------
[--] Aria Storage Engine not available.
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: ROW
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[--] This is a standalone server
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries which have no LIMIT clause
Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: https://bit.ly/2TcGgtU
Variables to adjust:
tmp_table_size (> 16M)
max_heap_table_size (> 16M)
key_buffer_size (> 2.1G)
innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
In terms of InnoDB performance, it didn't stand a chance with only 128MB of buffer pool space when compared with 9.1GB of data. Unfortunately Rosetta only has 8GB of RAM available, so we're unable to properly provision that on the current system.
Looking at the commands we frequently see in mysqladmin processlist
and comparing this with the behaviour of the system indicates that either:
- Some of the queries are improperly scoped and are collecting many thousands of rows of results.
- General disk activity on the system, combined with the too small buffer pools for both InnoDB and MyISAM mean that the system is having to go to underlying storage too often to serve certain queries.
I don't think that (1) is the case here, as we're not hitting PHP memory limits, which is what I would expect to happen if that were happening.
While increasing the key_buffer_size may help, it only stores index information unfortunately and not actual database contents - so we may need to look at provisioning a more capable server for this. It is worth noting that the system is experiencing fairly high load and takes over an hour to complete backups - even though each day we are only backing up less than 1GB of data after compression - so the overall IO limits currently imposed on Rosetta may simply be too low for the current workload of the machine.
If we were to provision a new machine for this, an AX41-NVME (https://www.hetzner.com/dedicated-rootserver/ax41-nvme) should be sufficient to handle the existing load of Rosetta, as well as Capona (container on Anepsion used for signing which will need a home once Anepsion is cancelled) and some other workloads (such as the one potentially created by #27)
It should be noted that the new machine option would likely be a significant lift in performance in general (so scripty might actually complete overnight rather than dragging into the next day...).