MySQL

Percona Server for MongoDB 3.2.9-2.1 is now available

MySQL Performance Blog - Mon, 2016-09-19 16:17

Percona announces the release of Percona Server for MongoDB 3.2.9-2.1 on September 19, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.9-2.1 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

Note:

We deprecated the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.9. There are no additional improvements or new features on top of those upstream fixes.

The release notes are available in the official documentation.

 

Categories: MySQL

How X Plugin Works Under the Hood

MySQL Performance Blog - Fri, 2016-09-16 19:59

In this blog post, we’ll look at what MySQL does under the hood to transform NoSQL requests to SQL (and then store them in InnoDB transactional engine) when using the X Plugin.

X Plugin allows MySQL to function as a document store. We don’t need to define any schema or use SQL language while still being a fully ACID database. Sounds like magic – but we know the only thing that magic does is make planes fly!

Categories: MySQL

Consul, ProxySQL and MySQL HA

MySQL Performance Blog - Fri, 2016-09-16 15:20

When it comes to “decision time” about which type of MySQL HA (high-availability) solution to implement, and how to architect the solution, many questions come to mind. The most important questions are:

  • “What are the best tools to provide HA and Load Balancing?”
  • “Should I be deploying this proxy tool on my application servers or on a standalone server?”.

Ultimately, the best tool really depends on the needs of your application and your environment. You might already be using specific tools such as Consul or MHA, or you might be looking to implement tools that provide richer features. The dilemma of deploying a proxy instance per application host versus a standalone proxy instance is usually a trade-off between “a less effective load balancing algorithm” or “a single point of failure.” Neither are desirable, but there are ways to implement a solution that balances all aspects.

In this article, we’ll go through a solution that is suitable for an application that has not been coded to split reads and writes over separate MySQL instances. An application like this would rely on a proxy or 3rd party tool to split reads/writes, and preferably a solution that has high-availability at the proxy layer. The solution described here is comprised of ProxySQLConsul and Master High Availability (MHA). Within this article, we’ll focus on the configuration required for ProxySQL and Consul since there are many articles that cover MHA configuration (such as Miguel’s recent MHA Quick Start Guide blog post).

When deploying Consul in production, a minimum of 3x instances are recommended – in this example, the Consul agents run on the Application Server (appserver) as well as on the two “ProxySQL servers” mysql1 and mysql2 (which act as the HA proxy pair). This is not a hard requirement, and these instances can easily run on another host or docker container. MySQL is deployed locally on mysql1 and mysql2, however this could just as well be 1..n separate standalone DB server instances:

So let’s move on to the actual configuration of this HA solution, starting with Consul.

Installation of Consul:

Firstly, we’ll need to install the required packages, download the Consul archive and perform the initial configuration. We’ll need to perform the same installation on each of the nodes (i.e., appserver, mysql1 and mysql2).

### Install pre-requisite packages: sudo yum -y install wget unzip bind-utils dnsmasq ### Install Consul: sudo useradd consul sudo mkdir -p /opt/consul /etc/consul.d sudo touch /var/log/consul.log /etc/consul.d/proxysql.json cd /opt/consul sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip sudo unzip consul_0.6.4_linux_amd64.zip sudo ln -s /opt/consul/consul /usr/bin/consul sudo chown consul:consul -R /etc/consul* /opt/consul* /var/log/consul.log

Configuration of Consul on Application Server (used as ‘bootstrap’ node):

Now, that we’re done with the installation on each of the hosts, let’s continue with the configuration. In this example we’ll bootstrap the Consul cluster using “appserver”:

### Edit configuration files $ sudo vi /etc/consul.conf { "datacenter": "dc1", "data_dir": "/opt/consul/", "log_level": "INFO", "node_name": "agent1", "server": true, "ui": true, "bootstrap": true, "client_addr": "0.0.0.0", "advertise_addr": "192.168.1.119" ## Add server IP here } ###### $ sudo vi /etc/consul.d/proxysql.json {"services": [ { "id": "proxy1", "name": "proxysql", "address": "192.168.1.120", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123", "interval": "3s"} }, { "id": "proxy2", "name": "proxysql", "address": "192.168.1.121", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root --password=123", "interval": "3s"} } ] } ###### ### Start Consul agent $ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &' ### Setup DNSMASQ (as root) echo "server=/consul/127.0.0.1#8600" > /etc/dnsmasq.d/10-consul service dnsmasq restart ### Remember to add the localhost as a DNS server (this step can vary ### depending on how your DNS servers are managed... here I'm just ### adding the following line to resolve.conf: sudo vi /etc/resolve.conf #... snippet ...# nameserver 127.0.0.1 #... snippet ...# ### Restart dnsmasq sudo service dnsmasq restart

The service should now be started, and you can verify this in the logs in “/var/log/consul.log”.

Configuration of Consul on Proxy Servers:

The next item is to configure each of the proxy Consul agents. Note that the “agent name” and the “IP address” need to be updated for each host (values for both must be unique):

### Edit configuration files $ sudo vi /etc/consul.conf { "datacenter": "dc1", "data_dir": "/opt/consul/", "log_level": "INFO", "node_name": "agent2", ### Agent node name must be unique "server": true, "ui": true, "bootstrap": false, ### Disable bootstrap on joiner nodes "client_addr": "0.0.0.0", "advertise_addr": "192.168.1.xxx", ### Set to local instance IP "dns_config": { "only_passing": true } } ###### $ sudo vi /etc/consul.d/proxysql.json {"services": [ { "id": "proxy1", "name": "proxysql", "address": "192.168.1.120", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123", "interval": "3s"} }, { "id": "proxy2", "name": "proxysql", "address": "192.168.1.121", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root password=123", "interval": "3s"} } ] } ###### ### Start Consul agent: $ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &' ### Join Consul cluster specifying 1st node IP e.g. $ consul join 192.168.1.119 ### Verify logs and look out for the following messages: $ cat /var/log/consul.log ==> Starting Consul agent... ==> Starting Consul agent RPC... ==> Consul agent running! Node name: 'agent2' Datacenter: 'dc1' Server: true (bootstrap: false) Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400) Cluster Addr: 192.168.1.120 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false Atlas: ==> Log data will now stream in as it occurs: # ... snippet ... 2016/09/05 19:48:04 [INFO] agent: Synced service 'consul' 2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql1' 2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql2' # ... snippet ...

At this point, we have Consul installed, configured and running on each of our hosts appserver (mysql1 and mysql2). Now it’s time to install and configure ProxySQL on mysql1 and mysql2.

Installation & Configuration of ProxySQL:

The same procedure should be run on both mysql1 and mysql2 hosts:

### Install ProxySQL packages and initialise ProxySQL DB sudo yum -y install https://github.com/sysown/proxysql/releases/download/v1.2.2/proxysql-1.2.2-1-centos7.x86_64.rpm sudo service proxysql initial sudo service proxysql stop ### Edit the ProxySQL configuration file to update username / password vi /etc/proxysql.cnf ### admin_variables= { admin_credentials="admin:admin" mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock" } ### ### Start ProxySQL sudo service proxysql start ### Connect to ProxySQL and configure mysql -P6032 -h127.0.0.1 -uadmin -padmin ### First we create a replication hostgroup: mysql> INSERT INTO mysql_replication_hostgroups VALUES (10,11,'Standard Replication Groups'); ### Add both nodes to the hostgroup 11 (ProxySQL will automatically put the writer node in hostgroup 10) mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.120',11,3306,1000); mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.121',11,3306,1000); ### Save server configuration mysql> LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; ### Add query rules for RW split mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .* FOR UPDATE', 10, NULL, 1); mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .*', 11, NULL, 1); mysql> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; ### Finally configure ProxySQL user and save configuration mysql> INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('root','123',1,10,'test'); mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; mysql> EXIT;

MySQL Configuration:

We also need to perform one configuration step on the MySQL servers in order to create a user for ProxySQL to monitor the instances:

### ProxySQL's monitor user on the master MySQL server (default username and password is monitor/monitor) mysql -h192.168.1.120 -P3306 -uroot -p123 -e"GRANT USAGE ON *.* TO monitor@'%' IDENTIFIED BY 'monitor';"

We can view the configuration of the monitor user on the ProxySQL host by checking the global variables on the admin interface:

mysql> SHOW VARIABLES LIKE 'mysql-monitor%'; +----------------------------------------+---------+ | Variable_name | Value | +----------------------------------------+---------+ | mysql-monitor_enabled | true | | mysql-monitor_connect_timeout | 200 | | mysql-monitor_ping_max_failures | 3 | | mysql-monitor_ping_timeout | 100 | | mysql-monitor_replication_lag_interval | 10000 | | mysql-monitor_replication_lag_timeout | 1000 | | mysql-monitor_username | monitor | | mysql-monitor_password | monitor | | mysql-monitor_query_interval | 60000 | | mysql-monitor_query_timeout | 100 | | mysql-monitor_slave_lag_when_null | 60 | | mysql-monitor_writer_is_also_reader | true | | mysql-monitor_history | 600000 | | mysql-monitor_connect_interval | 60000 | | mysql-monitor_ping_interval | 10000 | | mysql-monitor_read_only_interval | 1500 | | mysql-monitor_read_only_timeout | 500 | +----------------------------------------+---------+

Testing Consul:

Now that Consul and ProxySQL are configured we can do some tests from the “appserver”. First, we’ll verify that the hosts we’ve added are both reporting [OK] on our DNS requests:

$ dig @127.0.0.1 -p 53 proxysql.service.consul ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @127.0.0.1 -p 53 proxysql.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9975 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;proxysql.service.consul. IN A ;; ANSWER SECTION: proxysql.service.consul. 0 IN A 192.168.1.121 proxysql.service.consul. 0 IN A 192.168.1.120 ;; Query time: 1 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Mon Sep 05 19:32:12 UTC 2016 ;; MSG SIZE rcvd: 158

As you can see from the output above, DNS is reporting both 192.168.120 and 192.168.1.121 as available for the ProxySQL service. As soon as the ProxySQL check fails, the nodes will no longer report in the output above.

We can also view the status of our cluster and agents through the Consul Web GUI which runs on port 8500 of all the Consul servers in this configuration (e.g. http://192.168.1.120:8500/):

Testing ProxySQL:

So now that we have this configured we can also do some basic tests to see that ProxySQL is load balancing our connections:

[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname" +--------------------+ | @@hostname | +--------------------+ | mysql1.localdomain | +--------------------+ [percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname" +--------------------+ | @@hostname | +--------------------+ | mysql2.localdomain | +--------------------+

Perfect! We’re ready to use the hostname “proxysql.service.consul” to connect to our MySQL instances using a round-robin load balancing and HA proxy solution. If one of the two ProxySQL instances fails, we’ll continue communicating with the database through the other. Of course, this configuration is not limited to just two hosts, so feel free to add as many as you need. Be aware that in this example the two hosts’ replication hierarchy is managed by MHA in order to allow for master/slave promotion. By performing an automatic or manual failover using MHA, ProxySQL automatically detects the change in replication topology and redirect writes to the newly promoted master instance.

To make this configuration more durable, it is encouraged to create a more intelligent Consul check – i.e., a check that checks more than just the availability of the MySQL service (an example would be to select some data from a table). It is also recommended to fine tune the interval of the check to suit the requirements of your application.

Categories: MySQL

ProxySQL and Percona XtraDB Cluster (Galera) Integration

MySQL Performance Blog - Thu, 2016-09-15 22:37

In this post, we’ll discuss how an integrated ProxySQL and Percona XtraDB Cluster (Galera) helps manage node states and failovers.

ProxySQL is designed to not perform any specialized operation in relation to the servers with which it communicates. Instead, it uses an event scheduler to extend functionalities and cover any special needs.

Given that specialized products like Percona XtraDB Cluster are not managed by ProxySQL, they require the design and implementation of good/efficient extensions.

In this article, I will illustrate how Percona XtraDB Cluster/Galera can be integrated with ProxySQL to get the best from both.

Brief digression

Before discussing their integration, we need to review a couple of very important concepts in ProxySQL. ProxySQL has a very important logical component: Hostgroup(s) (HG).

A hostgroup is a relation of:

+-----------+       +------------------------+ |Host group +------>|Server (1:N)            | +-----------+       +------------------------+

In ProxySQL, QueryRules (QR) can be directly mapped to an HG. Using QRs, you can define a specific user to ONLY go to that HG. For instance, you may want to have user app1_user go only on servers A-B-C. Simply set a QR that says app1_user has the destination hostgroup 5, where HG 5 has the servers A-B-C:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',5,3306,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',5,3306,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',5,3306,10); INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1);

Easy isn’t it?

Another important concept in ProxySQL also related to HG is ReplicationHostgroup(s) (RHG). This is a special HG that ProxySQL uses to automatically manage the nodes that are connected by replication and configured in Write/Read and Read_only mode.

What does this mean? Let’s say you have four nodes A-B-C-D, connected by standard asynchronous replication. A is the master and B-C-D are the slaves. What you want is to have your application pointing writes to server A, and reads to B-C (keeping D as a backup slave). Also, you don’t want to have any reads go to B-C if the replication delay is more than two seconds.

RHG, in conjunction with HG, ProxySQL can manage all this for you. Simply instruct the proxy to:

  1. Use RHG
  2. Define the value of the maximum latency

Using the example above:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.5',5,3306,10,2); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.6',5,3306,10,2); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.7',5,3306,10,2); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.8',10,3306,10,2); INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1); INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',6,1); INSERT INTO mysql_replication_hostgroups VALUES (5,6);

From now on ProxySQL will split the R/W using the RHG and the nodes defined in HG 5. The flexibility introduced by using HGs is obviously not limited to what I mention here. It will play a good part in the integration of Percona XtraDB Cluster and ProxySQL, as I illustrate below. Percona XtraDB Cluster/Galera Integration

In an XtraDB cluster, a node has many different states and conditions that affect if and how your application operates on the node.

The most common one is when a node become a DONOR. If you’ve ever installed Percona XtraDB Cluster (or any Galera implementation), you’ve faced the situation when a node become a DONOR it changes state to DESYNC. If the node is under a heavy load, the DONOR process might affect the node itself.

But that is just one of the possible node states:

  • A node can be JOINED but not synced
  • It can have wsrep_rejectqueries, wsrep_donorrejectqueries, wsrep_ready (off)
  • It can be in a different segment
  • The number of nodes per segment is relevant.

To show what can be done and how, we will use the following setup:

  • Five nodes
  • Two segments
  • Applications requiring R/W split

And two options:

  • Single writer node
  • Multiple writers node

We’ll analyze how the proxy behaves under the use of a script run by the ProxySQL scheduler.

The use of a script is necessary for ProxySQL to respond correctly to Percona XtraDB Cluster state modifications. ProxySQL comes with two scripts for Galera, both of them are too basic and don’t consider a lot of relevant conditions. I’ve written a more complete script: https://github.com/Tusamarco/proxy_sql_tools galera_check.pl

This script is a prototype and requires QA and debugging, but is still more powerful than the default ones.

The script is designed to manage X number of nodes that belong to a given HG. The script works by HG, and as such it will perform isolated actions/checks by the HG. It is not possible to have more than one check running on the same HG. The check will create a lock file {proxysql_galera_check_${hg}.pid} that will be used to prevent duplicates. galera_check will connect to the ProxySQL node and retrieve all the information regarding the nodes/proxysql configuration. It will then check in parallel each node and will retrieve the status and configuration. galera_check analyzes and manages the following node states:

  • read_only
  • wsrep_status
  • wsrep_rejectqueries
  • wsrep_donorrejectqueries
  • wsrep_connected
  • wsrep_desinccount
  • wsrep_ready
  • wsrep_provider
  • wsrep_segment
  • Number of nodes in by segment
  • Retry loop

As mentioned, the number of nodes inside a segment is relevant. If a node is the only one in a segment, the check behaves accordingly. For example, if a node is the only one in the MAIN segment, it will not put the node in OFFLINE_SOFT when the node becomes a donor, to prevent the cluster from becoming unavailable for applications.

The script allows you to declare a segment as MAIN — quite useful when managing production and DR sites, as the script manages the segment acting as main in a more conservative way. The check can be configured to perform retries after a given interval, where the interval is the time define in the ProxySQL scheduler. As such, if the check is set to have two retries for UP and three for DOWN, it will loop that number before doing anything.

Percona XtraDB Cluster/Galera performs some actions under the hood, some of them not totally correct. This feature is useful in some uncommon circumstances, where Galera behaves weirdly. For example, whenever a node is set to READ_ONLY=1, Galera desyncs and resyncs the node. A check that doesn’t take this into account sets the node to OFFLINE and back for no reason.

Another important differentiation for this check is that it use special HGs for maintenance, all in the range of 9000. So if a node belongs to HG 10, and the check needs to put it in maintenance mode, the node will be moved to HG 9010. Once all is normal again, the node will be put back on its original HG.

This check does NOT modify any node states. This means it will NOT modify any variables or settings in the original node. It will ONLY change node states in ProxySQL.

Multi-writer mode

The recommended way to use Galera is in multi-writer mode. You can then play with the weight to have a node act as MAIN node and prevent/reduce certification failures and Brutal force Abort from Percona XtraDB Cluster. Use this configuration:

Delete from mysql_replication_hostgroups where writer_hostgroup=500 ; delete from mysql_servers where hostgroup_id in (500,501); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',500,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',501,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',500,3306,1000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',501,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',500,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',501,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',500,3306,1); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',501,3306,1); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',500,3306,1); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',501,3306,1); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL TO DISK;

In this test, we will NOT use Replication HostGroup. We will do that later when testing a single writer. For now, we’ll focus on multi-writer.

Segment 1 covers HG 500 and 501, while segment two only covers 501. Weight for the servers in HG 500 is progressive from 1 to 1 billion, in order to reduce the possible random writes on the non-main node.

As such nodes:

  • HG 500S1 192.168.1.5 – 1.000.000.000
    • S1 192.168.1.6 – 1.000.000
    • S1 192.168.1.7 – 100
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1
  • HG 501S1 192.168.1.5 – 100
    • S1 192.168.1.6 – 1000000000
    • S1 192.168.1.7 – 1000000000
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1

The following command shows what ProxySQL is doing:

watch -n 1 'mysql -h 127.0.0.1 -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (500,501,9500,9501) order by hostgroup_id,hostname ;"'

Download the check from GitHub (https://github.com/Tusamarco/proxy_sql_tools) and activate it in ProxySQL. Be sure to set the parameters that match your installation:

delete from scheduler where id=10; INSERT INTO scheduler (id,active,interval_ms,filename,arg1) values (10,0,2000,"/var/lib/proxysql/galera_check.pl","-u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog"); LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

If you want to activate it:

update scheduler set active=1 where id=10; LOAD SCHEDULER TO RUNTIME;

The following is the kind of scenario we have:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 413 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 420 | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 227 | | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 10 | 10 | 0 | 12654 | 1016975 | 0 | 230 | | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 9 | 29 | 0 | 107358 | 8629123 | 0 | 206 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 4 | 6 | 0 | 12602425 | 613371057 | 34467286486 | 413 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 6 | 7 | 0 | 12582617 | 612422028 | 34409606321 | 420 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 6 | 6 | 0 | 18580675 | 905464967 | 50824195445 | 227 | | 501 | 192.168.1.6 | 3306 | ONLINE | 0 | 6 | 14 | 0 | 18571127 | 905075154 | 50814832276 | 230 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 10 | 0 | 169570 | 8255821 | 462706881 | 206 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

To generate a load, use the following commands (or whatever you like, but use a different one for read-only and reads/writes):

Write sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run Read only sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

The most common thing that could happen to a cluster node is to become a donor. This is a planned activity for Percona XtraDB Cluster and is suppose to be managed in a less harmful way.

We’re going to simulate crashing a node and forcing it to elect our main node as DONOR (the one with the highest WEIGHT).

To do so, we need to have the parameter wsrep_sst_donor set.

show global variables like 'wsrep_sst_donor'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | wsrep_sst_donor | node1 | <--- +-----------------+-------+

Activate the check if not already done:

update scheduler set active=1 where id=10;

And now run traffic. Check load:

select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ; +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 30 | 0 | 112662 | 9055479 | 0 | 120 | <--- our Donor | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 10 | 10 | 0 | 12654 | 1016975 | 0 | 111 | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 115 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 316 | | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 329 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 10 | 0 | 257271 | 12533763 | 714473854 | 120 | | 501 | 192.168.1.6 | 3306 | ONLINE | 0 | 10 | 18 | 0 | 18881582 | 920200116 | 51688974309 | 111 | | 501 | 192.168.1.7 | 3306 | ONLINE | 3 | 6 | 9 | 0 | 18927077 | 922317772 | 51794504662 | 115 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 8 | 0 | 12595556 | 613054573 | 34447564440 | 316 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 3 | 6 | 0 | 12634435 | 614936148 | 34560620180 | 329 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Now on one of the nodes:

  1. Kill mysql
  2. Remove the content of the data directory
  3. Restart the node

The node will go in SST and our galera_check script will manage it:

+--------------+-------------+--------------+------------+--------------------------------------------------+ | hostgroup_id | hostname | status | weight | comment | +--------------+-------------+--------------+------------+--------------------------------------------------+ | 500 | 192.168.1.5 | OFFLINE_SOFT | 1000000000 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <---- the donor | 500 | 192.168.1.6 | ONLINE | 1000000 | | | 500 | 192.168.1.7 | ONLINE | 100 | | | 500 | 192.168.1.8 | ONLINE | 1 | | | 500 | 192.168.1.9 | ONLINE | 1 | | | 501 | 192.168.1.5 | OFFLINE_SOFT | 100 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 501 | 192.168.1.6 | ONLINE | 1000000000 | | | 501 | 192.168.1.7 | ONLINE | 1000000000 | | | 501 | 192.168.1.8 | ONLINE | 1 | | | 501 | 192.168.1.9 | ONLINE | 1 | | +--------------+-------------+--------------+------------+--------------------------------------------------+

We can also check the galera_check log and see what happened:

2016/09/02 16:13:27.298:[WARN] Move node:192.168.1.5;3306;500;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306' 2016/09/02 16:13:27.303:[WARN] Move node:192.168.1.5;3306;501;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

The node will remain in OFFLINE_SOFT while the other node (192.168.1.6 with the 2nd WEIGHT) serves the writes, until the node is in DONOR state.

All as expected, the node was set in OFFLINE_SOFT state, which mean the existing connections finished, while the node was not accepting any NEW connections.

As soon the node stops sending data to the Joiner, it was moved back and traffic restarted:

2016/09/02 16:14:58.239:[WARN] Move node:192.168.1.5;3306;500;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306' 2016/09/02 16:14:58.243:[WARN] Move node:192.168.1.5;3306;501;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 6 | 1 | 37 | 0 | 153882 | 12368557 | 0 | 72 | <--- | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 9 | 10 | 0 | 16008 | 1286492 | 0 | 42 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 1398 | 112371 | 0 | 96 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 24545 | 791 | 24545 | 122725 | 0 | 359 | | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 15108 | 1214366 | 0 | 271 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 11 | 0 | 2626808 | 128001112 | 7561278884 | 72 | | 501 | 192.168.1.6 | 3306 | ONLINE | 5 | 7 | 20 | 0 | 28629516 | 1394974468 | 79289633420 | 42 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 8 | 10 | 0 | 29585925 | 1441400648 | 81976494740 | 96 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 16779 | 954 | 12672983 | 616826002 | 34622768228 | 359 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 4 | 6 | 0 | 13567512 | 660472589 | 37267991677 | 271 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

This was easy, and more or less managed by the standard script. But what would happen if my donor was set to DO NOT serve query when in the DONOR state?

Wait, what?? Yes, Percona XtraDB Cluster (and Galera in general) can be set to refuse any query when the node goes in DONOR state. If not managed this can cause issues as the node will simply reject queries (but ProxySQL sees the node as alive).

Let me show you:

show global variables like 'wsrep_sst_donor_rejects_queries'; +---------------------------------+-------+ | Variable_name | Value | +---------------------------------+-------+ | wsrep_sst_donor_rejects_queries | ON | +---------------------------------+-------+

For the moment, let’s deactivate the check. Then, do the same stop and delete of the data dir, then restart the node. SST takes place.

Sysbench will report:

ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN' FATAL: failed to execute function `event': 3 ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN' FATAL: failed to execute function `event': 3

But ProxySQL?

+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 101 | 0 | 186331 | 14972717 | 0 | 118 | <-- no writes in wither HG | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 9 | 10 | 0 | 20514 | 1648665 | 0 | 171 | | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 1 | 3 | 0 | 5881 | 472629 | 0 | 134 | | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 205451 | 1264 | 205451 | 1027255 | 0 | 341 | | | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 15642 | 1257277 | 0 | 459 | - | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 13949 | 0 | 4903347 | 238627310 | 14089708430 | 118 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 10 | 20 | 0 | 37012174 | 1803380964 | 103269634626 | 171 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 11 | 13 | 0 | 38782923 | 1889507208 | 108288676435 | 134 | | 501 | 192.168.1.8 | 3306 | SHUNNED | 0 | 0 | 208452 | 1506 | 12864656 | 626156995 | 34622768228 | 341 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 3 | 6 | 0 | 14451462 | 703534884 | 39837663734 | 459 | +-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ mysql> select * from mysql_server_connect_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9') order by time_start_us desc limit 10; +-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+ | 192.168.1.9 | 3306 | 1472827444621954 | 1359 | NULL | | 192.168.1.8 | 3306 | 1472827444618883 | 0 | Can't connect to MySQL server on '192.168.1.8' (107) | | 192.168.1.7 | 3306 | 1472827444615819 | 433 | NULL | | 192.168.1.6 | 3306 | 1472827444612722 | 538 | NULL | | 192.168.1.5 | 3306 | 1472827444606560 | 473 | NULL | <-- donor is seen as up | 192.168.1.9 | 3306 | 1472827384621463 | 1286 | NULL | | 192.168.1.8 | 3306 | 1472827384618442 | 0 | Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 107 | | 192.168.1.7 | 3306 | 1472827384615317 | 419 | NULL | | 192.168.1.6 | 3306 | 1472827384612241 | 415 | NULL | | 192.168.1.5 | 3306 | 1472827384606117 | 454 | NULL | <-- donor is seen as up +-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+ select * from mysql_server_ping_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9') order by time_start_us desc limit 10; +-------------+------+------------------+----------------------+------------------------------------------------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------------------------------------------------+ | 192.168.1.9 | 3306 | 1472827475062217 | 311 | NULL | | 192.168.1.8 | 3306 | 1472827475060617 | 0 | Can't connect to MySQL server on '192.168.1.8' (107) | | 192.168.1.7 | 3306 | 1472827475059073 | 108 | NULL | | 192.168.1.6 | 3306 | 1472827475057281 | 102 | NULL | | 192.168.1.5 | 3306 | 1472827475054188 | 74 | NULL | <-- donor is seen as up | 192.168.1.9 | 3306 | 1472827445061877 | 491 | NULL | | 192.168.1.8 | 3306 | 1472827445060254 | 0 | Can't connect to MySQL server on '192.168.1.8' (107) | | 192.168.1.7 | 3306 | 1472827445058688 | 53 | NULL | | 192.168.1.6 | 3306 | 1472827445057124 | 131 | NULL | | 192.168.1.5 | 3306 | 1472827445054015 | 98 | NULL | <-- donor is seen as up +-------------+------+------------------+----------------------+------------------------------------------------------+

As you can see, all seems OK. Let’s turn on galera_check and see what happens when we run some read and write loads.

And now let me do the stop-delete-restart-SST process again:

kill -9 <mysqld_safe_pid> <mysqld_pid>; rm -fr data/*;rm -fr logs/*;sleep 2;./start

As soon as the node goes down, ProxySQL shuns the node.

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 7 | 3 | 34 | 0 | 21570 | 1733833 | 0 | 146 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 8 | 12 | 0 | 9294 | 747063 | 0 | 129 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 4 | 0 | 3396 | 272950 | 0 | 89 | | 500 | 192.168.1.8 | 3306 | SHUNNED | 0 | 0 | 1 | 6 | 12 | 966 | 0 | 326 | <-- crashed | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 246 | 19767 | 0 | 286 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 772203 | 37617973 | 2315131214 | 146 | | 501 | 192.168.1.6 | 3306 | ONLINE | 9 | 3 | 12 | 0 | 3439458 | 167514166 | 10138636314 | 129 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 12 | 13 | 0 | 3183822 | 155064971 | 9394612877 | 89 | | 501 | 192.168.1.8 | 3306 | SHUNNED | 0 | 0 | 1 | 6 | 11429 | 560352 | 35350726 | 326 | <-- crashed | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 312253 | 15227786 | 941110520 | 286 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Immediately after, galera_check identifies the node is requesting the SST, and that the DONOR is our writer (given it is NOT the only writer in the HG, and it has the variable wsrep_sst_donor_rejects_queries active), it cannot be set to OFFLINE_SOFT. We do not want ProxySQL to consider it OFFLINE_HARD (because it is not).

As such, the script moves it to a special HG:

2016/09/04 16:11:22.091:[WARN] Move node:192.168.1.5;3306;500;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9500 WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306' 2016/09/04 16:11:22.097:[WARN] Move node:192.168.1.5;3306;501;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9501 WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 1000000 | 0 | 1000 | 0 | 0 | 0 | | | 500 | 192.168.1.7 | 3306 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | | | 500 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 501 | 192.168.1.6 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | | | 501 | 192.168.1.7 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 9500 | 192.168.1.5 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG | 9501 | 192.168.1.5 | 3306 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG +--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+

The Donor continues to serve the Joiner, but applications won’t see it.

What applications see is also very important. Applications doing WRITEs will see:

[ 10s] threads: 10, tps: 9.50, reads: 94.50, writes: 42.00, response time: 1175.77ms (95%), errors: 0.00, reconnects: 0.00 ... [ 40s] threads: 10, tps: 2.80, reads: 26.10, writes: 11.60, response time: 3491.45ms (95%), errors: 0.00, reconnects: 0.10 [ 50s] threads: 10, tps: 4.80, reads: 50.40, writes: 22.40, response time: 10062.13ms (95%), errors: 0.80, reconnects: 351.60 <--- Main writer moved to another HG [ 60s] threads: 10, tps: 5.90, reads: 53.10, writes: 23.60, response time: 2869.82ms (95%), errors: 0.00, reconnects: 0.00 ...

When one node shifts to another, the applications will have to manage the RE-TRY, but it will only be a short time and will cause limited impact on the production flow.

Application readers see no errors:

[ 10s] threads: 10, tps: 0.00, reads: 13007.31, writes: 0.00, response time: 9.13ms (95%), errors: 0.00, reconnects: 0.00 [ 50s] threads: 10, tps: 0.00, reads: 9613.70, writes: 0.00, response time: 10.66ms (95%), errors: 0.00, reconnects: 0.20 <-- just a glitch in reconnect [ 60s] threads: 10, tps: 0.00, reads: 10807.90, writes: 0.00, response time: 11.07ms (95%), errors: 0.00, reconnects: 0.20 [ 70s] threads: 10, tps: 0.00, reads: 9082.61, writes: 0.00, response time: 23.62ms (95%), errors: 0.00, reconnects: 0.00 ... [ 390s] threads: 10, tps: 0.00, reads: 13050.80, writes: 0.00, response time: 8.97ms (95%), errors: 0.00, reconnects: 0.00

When the Donor ends providing SST, it comes back and the script manages it. Then galera_check puts it in the right HG:

2016/09/04 16:12:34.266:[WARN] Move node:192.168.1.5;3306;9500;1010 SQL: UPDATE mysql_servers SET hostgroup_id=500 WHERE hostgroup_id=9500 AND hostname='192.168.1.5' AND port='3306' 2016/09/04 16:12:34.270:[WARN] Move node:192.168.1.5;3306;9501;1010 SQL: UPDATE mysql_servers SET hostgroup_id=501 WHERE hostgroup_id=9501 AND hostname='192.168.1.5' AND port='3306'

The crashed node is restarted by the SST process, and the node goes up. But if the level of load in the cluster is mid/high, it will remain in the JOINED state for sometime, becoming visible by the ProxySQL again. ProxySQL will not, however, correctly recognize the state.

2016-09-04 16:17:15 21035 [Note] WSREP: 3.2 (node4): State transfer from 1.1 (node1) complete. 2016-09-04 16:17:15 21035 [Note] WSREP: Shifting JOINER -> JOINED (TO: 254515)

To avoid this issue, the script will move it to a special HG, allowing it to recovery without interfering with a real load.

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 6 | 2 | 15 | 0 | 3000 | 241060 | 0 | 141 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 9 | 13 | 0 | 13128 | 1055268 | 0 | 84 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 4 | 0 | 3756 | 301874 | 0 | 106 | | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 4080 | 327872 | 0 | 278 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 256753 | 12508935 | 772048259 | 141 | | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 8 | 12 | 0 | 5116844 | 249191524 | 15100617833 | 84 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 11 | 13 | 0 | 4739756 | 230863200 | 13997231724 | 106 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 496524 | 24214563 | 1496482104 | 278 | | 9500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 331 |<-- Joined not Sync | 9501 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 331 |<-- Joined not Sync +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Once the node fully recovers, galera_check puts it back in the original HG, ready serve requests:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 15 | 0 | 3444 | 276758 | 0 | 130 | | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 9 | 13 | 0 | 13200 | 1061056 | 0 | 158 | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 0 | 4 | 0 | 3828 | 307662 | 0 | 139 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |<-- up again | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 0 | 2 | 0 | 4086 | 328355 | 0 | 336 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 286349 | 13951366 | 861638962 | 130 | | 501 | 192.168.1.6 | 3306 | ONLINE | 0 | 12 | 12 | 0 | 5239212 | 255148806 | 15460951262 | 158 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 13 | 13 | 0 | 4849970 | 236234446 | 14323937975 | 139 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |<-- up again | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 507910 | 24768898 | 1530841172 | 336 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

A summary of the logical steps is:

+---------+ | Crash | +----+----+ | v +--------+-------+ | ProxySQL | | shun crashed | | node | +--------+-------+ | | v +-----------------+-----------------+ | Donor has one of the following? | | wsrep_sst_dono _rejects_queries | | OR | | wsrep_reject_queries | +-----------------------------------+ |No |Yes v v +-----+----------+ +-----------+----+ | Galera_check | | Galera_check | | put the donor | | put the donor | | in OFFLINE_SOFT| | in special HG | +---+------------+ +-----------+----+ | | | | v v +---+--------------------------------+-----+ | Donor SST ends | +---+---------------+----------------+-----+ | | | | | | +---+------------+ | +-----------+----+ | Galera_check | | | Galera_check | | put the donor | | | put the donor | | ONLINE | | | in Original HG | +----------------+ | +----------------+ | | +------------------------------------------+ | crashed SST ends | +-------------------+----------------------+ | | +------------+-------------+ | Crashed node back but +<------------+ | Not Sync? | | +--------------------------+ | |No |Yes | | | | | | | +---------+------+ +------+---------+ | | Galera_check | | Galera_check | | | put the node | | put the node +-----+ | back orig. HG | | Special HG | +--------+-------+ +----------------+ | | | | +---------+ +------> END | +---------+

As mentioned, galera_check can manage several node states.

Another case is when we can’t have the node accept ANY queries. We might need that for several reasons, including preparing the node for maintenance (or whatever).

In Percona XtraDB Cluster (and other Galera implementations) we can set the value of wsrep_reject_queries to:

  • NONE
  • ALL
  • ALL_KILL

Let see how it works. Run some load, then on the main writer node (192.168.1.5):

set global wsrep_reject_queries=ALL;

This blocks any new query execution until the run is complete. Do a simple select on the node:

(root@localhost:pm) [test]>select * from tbtest1; ERROR 1047 (08S01): WSREP has not yet prepared node for application use

ProxySQL won’t see these conditions:

+-------------+------+------------------+----------------------+------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------+ | 192.168.1.5 | 3306 | 1473005467628001 | 35 | NULL | <--- ping ok | 192.168.1.5 | 3306 | 1473005437628014 | 154 | NULL | +-------------+------+------------------+----------------------+------------+ +-------------+------+------------------+-------------------------+---------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+---------------+ | 192.168.1.5 | 3306 | 1473005467369575 | 246 | NULL | <--- connect ok | 192.168.1.5 | 3306 | 1473005407369441 | 353 | NULL | +-------------+------+------------------+-------------------------+---------------+

The script galera_check will instead manage it:

+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | OFFLINE_SOFT | 0 | 0 | 8343 | 0 | 10821 | 240870 | 0 | 93 | <--- galera check put it OFFLINE | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 15 | 0 | 48012 | 3859402 | 0 | 38 | <--- writer | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 1 | 6 | 0 | 14712 | 1182364 | 0 | 54 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 1092 | 87758 | 0 | 602 | | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 4 | 0 | 5352 | 430152 | 0 | 238 | | 501 | 192.168.1.5 | 3306 | OFFLINE_SOFT | 0 | 0 | 1410 | 0 | 197909 | 9638665 | 597013919 | 93 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 10 | 12 | 0 | 7822682 | 380980455 | 23208091727 | 38 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 13 | 13 | 0 | 7267507 | 353962618 | 21577881545 | 54 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 241641 | 11779770 | 738145270 | 602 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 756415 | 36880233 | 2290165636 | 238 | +-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the script will put the node in OFFLINE_SOFT, given the set global wsrep_reject_queries=ALL means do not accept NEW and complete the existing as OFFLINE_SOFT.

The script also manages the case of set global wsrep_reject_queries=ALL_KILL;. From the ProxySQL point of view, these do not exist either:

+-------------+------+------------------+----------------------+------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------+ | 192.168.1.5 | 3306 | 1473005827629069 | 59 | NULL |<--- ping ok | 192.168.1.5 | 3306 | 1473005797628988 | 57 | NULL | +-------------+------+------------------+----------------------+------------+ +-------------+------+------------------+-------------------------+---------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+---------------+ | 192.168.1.5 | 3306 | 1473005827370084 | 370 | NULL | <--- connect ok | 192.168.1.5 | 3306 | 1473005767369915 | 243 | NULL | +-------------+------+------------------+-------------------------+---------------+ +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 9500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |<--- galera check put it in special HG | 9501 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

The difference here is that the script moves the node to the special HG to isolate it, instead leaving it in the original HG.

The integration between ProxySQL and Percona XtraDB Custer (Galera) works perfectly for multi-writer if you have a script like galera_check that correctly manages the different Percona XtraDB Custer/Galera states.

ProxySQL and PXC using Replication HostGroup

Sometimes we might need to have 100% of the write going to only one node at a time. As explained above, ProxySQL uses weight to redirect a % of the load to a specific node.

In most cases, it will be enough to set the weight in the main writer to a very high value (like 10 billion) and one thousand on the next node to almost achieve a single writer.

But this is not 100% effective, it still allows ProxySQL to send a query once every X times to the other node(s). To keep it consistent with the ProxySQL logic, the solution is to use replication Hostgroups.

Replication HGs are special HGs that the proxy sees as connected for R/W operations. ProxySQL analyzes the value of the READ_ONLY variables and assigns to the READ_ONLY HG the nodes that have it enabled.

The node having READ_ONLY=0 resides in both HGs. As such the first thing we need to modify is to tell ProxySQL that HG 500 and 501 are replication HGs.

INSERT INTO mysql_replication_hostgroups VALUES (500,501,''); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; select * from mysql_replication_hostgroups ; +------------------+------------------+---------+ | writer_hostgroup | reader_hostgroup | comment | +------------------+------------------+---------+ | 500 | 501 | | +------------------+------------------+---------+

Now whenever I set the value of READ_ONLY on a node, ProxySQL will move the node accordingly. Let see how. Current:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 6 | 1 | 7 | 0 | 16386 | 1317177 | 0 | 97 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 9 | 15 | 0 | 73764 | 5929366 | 0 | 181 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 6 | 0 | 18012 | 1447598 | 0 | 64 | | 500 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 1440 | 115728 | 0 | 341 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1210029 | 58927817 | 3706882671 | 97 | | 501 | 192.168.1.6 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 16390790 | 798382865 | 49037691590 | 181 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 12 | 13 | 0 | 15357779 | 748038558 | 45950863867 | 64 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1247662 | 60752227 | 3808131279 | 341 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1766309 | 86046839 | 5374169120 | 422 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Set global READ_ONLY=1 on the following nodes: 192.168.1.6/7/8/9.

After:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 20 | 0 | 25980 | 2088346 | 0 | 93 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1787979 | 87010074 | 5473781192 | 93 | | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 8 | 12 | 0 | 18815907 | 916547402 | 56379724890 | 79 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 12 | 13 | 0 | 17580636 | 856336023 | 52670114510 | 131 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 15324 | 746109 | 46760779 | 822 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 16210 | 789999 | 49940867 | 679 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

IF in this scenario a reader node crashes, the application will not suffer at all given the redundancy.

But if the writer is going to crash, THEN the issue exists because there will be NO node available to manage the failover. The solution is to either do the node election manually or to have the script elect the node with the lowest read weight in the same segment as the new writer.

Below is what happens when a node crashes (bird-eye view):

+---------+ | Crash | +----+----+ | v +--------+-------+ | ProxySQL | | shun crashed | | node | +--------+-------+ | | v +-----------------+-----------------+ +-----------> HostGroup has another active | | | Node in HG writer? | | +--+--------------+---------------+-+ | | | | | | | | | |No | |Yes | | | | | +-----v----------+ | +-----------v----+ | |ProxySQL will | | |ProxySQL will | | |stop serving | | |redirect load >--------+ | |writes | | |there | | | +----------------+ | +----------------+ | | | | | v | | +-------+--------+ | | |ProxySQL checks | | | |READ_ONLY on | | | |Reader HG | | | | | | | +-------+--------+ | | | | | v | | +-------+--------+ | | |Any Node with | | | |READ_ONLY = 0 ? | | | +----------------+ | | |No |Yes | | | | | | +----------v------+ +--v--------------+ | | |ProxySQL will | |ProxySQL will | | | |continue to | |Move node to | | +<---------<+do not serve | |Writer HG | | | |Writes | | | | | +-----------------+ +--------v--------+ | | | | +-------------------------------------------+ | +---------+ | | END <------------------------+ +---------+

The script should act immediately after the ProxySQL SHUNNED the node step, just replacing the READ_ONLY=1 with READ_ONLY=0 on the reader node with the lowest READ WEIGHT.

ProxySQL will do the rest, copying the node into the WRITER HG, keeping low weight, such that WHEN/IF the original node will comeback the new node will not compete for traffic.

Since it included that special function in the check, the feature allows automatic fail-over. This experimental feature is active only if explicitly set in the parameter that the scheduler passes to the script. To activate it add --active_failover in the scheduler. My recommendation is to have two entries in the scheduler and activate the one with --active_failover for testing, and remember to deactivate the other one.

Let see the manual procedure first:

The process is:

1 Generate some load
2 Kill the writer node
3 Manually elect a reader as writer
4 Recover crashed node

Current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 30324 | 2437438 | 0 | 153 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1519612 | 74006447 | 4734427711 | 153 | | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 8 | 12 | 0 | 7730857 | 376505014 | 24119645457 | 156 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 10 | 12 | 0 | 7038332 | 342888697 | 21985442619 | 178 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 612523 | 29835858 | 1903693835 | 337 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 611021 | 29769497 | 1903180139 | 366 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 501 | 192.168.1.5 | 3306 | SHUNNED | 0 | 0 | 1 | 11 | 1565987 | 76267703 | 4879938857 | 119 | | 501 | 192.168.1.6 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 8023216 | 390742215 | 25033271548 | 112 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 7306838 | 355968373 | 22827016386 | 135 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 638326 | 31096065 | 1984732176 | 410 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 636857 | 31025014 | 1982213114 | 328 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ +-------------+------+------------------+----------------------+------------------------------------------------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------------------------------------------------+ | 192.168.1.5 | 3306 | 1473070640798571 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | | 192.168.1.5 | 3306 | 1473070610798464 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | +-------------+------+------------------+----------------------+------------------------------------------------------+ +-------------+------+------------------+-------------------------+------------------------------------------------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+------------------------------------------------------+ | 192.168.1.5 | 3306 | 1473070640779903 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | | 192.168.1.5 | 3306 | 1473070580779977 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | +-------------+------+------------------+-------------------------+------------------------------------------------------+

When the node is killed ProxySQL, shun it and report issues with the checks (connect and ping). During this time frame the application will experience issues if it is not designed to manage the retry and eventually a queue, and it will crash.

Sysbench reports the errors:

Writes

[ 10s] threads: 10, tps: 6.70, reads: 68.50, writes: 30.00, response time: 1950.53ms (95%), errors: 0.00, reconnects: 0.00 ... [1090s] threads: 10, tps: 4.10, reads: 36.90, writes: 16.40, response time: 2226.45ms (95%), errors: 0.00, reconnects: 1.00 <-+ killing the node [1100s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1110s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1120s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1130s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 |-- Gap waiting for a node to become [1140s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | READ_ONLY=0 [1150s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1160s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1170s] threads: 10, tps: 4.70, reads: 51.30, writes: 22.80, response time: 80430.18ms (95%), errors: 0.00, reconnects: 0.00 <-+ [1180s] threads: 10, tps: 8.90, reads: 80.10, writes: 35.60, response time: 2068.39ms (95%), errors: 0.00, reconnects: 0.00 ... [1750s] threads: 10, tps: 5.50, reads: 49.80, writes: 22.80, response time: 2266.80ms (95%), errors: 0.00, reconnects: 0.00 -- No additional errors

I decided to promote node 192.168.1.6 given the weight for readers was equal and as such no difference in this setup.

(root@localhost:pm) [(none)]>set global read_only=0; Query OK, 0 rows affected (0.00 sec)

Checking ProxySQL:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 1848 | 148532 | 0 | 40 | | 501 | 192.168.1.5 | 3306 | SHUNNED | 0 | 0 | 1 | 72 | 1565987 | 76267703 | 4879938857 | 38 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 10 | 12 | 0 | 8843069 | 430654903 | 27597990684 | 40 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 8048826 | 392101994 | 25145582384 | 83 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 725820 | 35371512 | 2259974847 | 227 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 723582 | 35265066 | 2254824754 | 290 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

As the READ_ONLY value is modified, ProxySQL moves it to the writer HG, and writes can take place again. At this point in time production activities are recovered.

Reads had just a minor glitch:

Reads

[ 10s] threads: 10, tps: 0.00, reads: 20192.15, writes: 0.00, response time: 6.96ms (95%), errors: 0.00, reconnects: 0.00 ... [ 410s] threads: 10, tps: 0.00, reads: 16489.03, writes: 0.00, response time: 9.41ms (95%), errors: 0.00, reconnects: 2.50 ... [ 710s] threads: 10, tps: 0.00, reads: 18789.40, writes: 0.00, response time: 6.61ms (95%), errors: 0.00, reconnects: 0.00

The glitch happened when node 192.168.1.6 was copied over to HG 500, but with no interruptions or errors. At this point let us put back the crashed node, which comes back elect Node2 (192.168.1.6) as Donor.

This was a Percona XtraDB Cluster/Galera choice, and we have to accept and manage it.

Note that the other basic scripts put the node in OFFLINE_SOFT, given the node will become a DONOR.
Galera_check will recognize that Node2 (192.168.1.6) is the only active node in the segment for that specific HG (writer), while is not the only present for the READER HG.

As such it will put the node in OFFLINE_SOFT only for the READER HG, trying to reduce the load on the node, but it will keep it active in the WRITER HG, to prevent service interruption.

Restart the node and ask for a donor:

2016-09-05 12:21:43 8007 [Note] WSREP: Flow-control interval: [67, 67] 2016-09-05 12:21:45 8007 [Note] WSREP: Member 1.1 (node1) requested state transfer from '*any*'. Selected 0.1 (node2)(SYNCED) as donor. 2016-09-05 12:21:46 8007 [Note] WSREP: (ef248c1f, 'tcp://192.168.1.8:4567') turning message relay requesting off 2016-09-05 12:21:52 8007 [Note] WSREP: New cluster view: global state: 234bb6ed-527d-11e6-9971-e794f632b140:324329, view# 7: Primary, number of nodes: 5, my index: 3, protocol version 3

galera_check  sets OFFLINE_SOFT 192.168.1.6 only for the READER HG, and ProxySQL uses the others to serve reads.

+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 7746 | 622557 | 0 | 86 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 1 | 147 | 1565987 | 76267703 | 4879938857 | 38 | | 501 | 192.168.1.6 | 3306 | OFFLINE_SOFT | 0 | 0 | 12 | 0 | 9668944 | 470878452 | 30181474498 | 86 | <-- Node offline | 501 | 192.168.1.7 | 3306 | ONLINE | 9 | 3 | 12 | 0 | 10932794 | 532558667 | 34170366564 | 62 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 816599 | 39804966 | 2545765089 | 229 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 814893 | 39724481 | 2541760230 | 248 | +-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

When the SST donor task is over, galera_check moves the 192.168.1.6 back ONLINE as expected. But at the same time, it moves the recovering node to the special HG to avoid to have it included in any activity until ready.

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete. 2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 1554 | 124909 | 0 | 35 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 8 | 22 | 0 | 10341612 | 503637989 | 32286072739 | 35 | | 501 | 192.168.1.7 | 3306 | ONLINE | 3 | 9 | 12 | 0 | 12058701 | 587388598 | 37696717375 | 13 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 890102 | 43389051 | 2776691164 | 355 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 887994 | 43296865 | 2772702537 | 250 | | 9500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 57 | <-- Special HG for recover | 9501 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 57 | <-- Special HG for recover +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Once finally the node is in SYNC with the group, it is put back online in the READER HG and in the writer HG:

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete. 2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <-- Back on line | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 402 | 32317 | 0 | 68 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 6285 | 305823 | 19592814 | 312 | <-- Back on line | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 6 | 22 | 0 | 10818694 | 526870710 | 33779586475 | 68 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 12 | 12 | 0 | 12492316 | 608504039 | 39056093665 | 26 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 942023 | 45924082 | 2940228050 | 617 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 939975 | 45834039 | 2935816783 | 309 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ +--------------+-------------+------+--------+------------+ | hostgroup_id | hostname | port | status | weight | +--------------+-------------+------+--------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 100 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1000000000 | | 501 | 192.168.1.5 | 3306 | ONLINE | 100 | | 501 | 192.168.1.6 | 3306 | ONLINE | 1000000000 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1000000000 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | +--------------+-------------+------+--------+------------+

But given it is coming back with its READER WEIGHT, it will NOT compete with the previously elected WRITER.

The recovered node will stay on “hold” waiting for a DBA to act and eventually put it back, or be set as READ_ONLY and as such be fully removed from the WRITER HG.

Let see the automatic procedure now:

For the moment, we just stick to the MANUAL failover process. The process is:

  1. Generate some load
  2. Kill the writer node
  3. Script will do auto-failover
  4. Recover crashed node

Check our scheduler config:

+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+ | id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment | +----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+ | 10 | 1 | 2000 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --active_failover --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | <--- Active | 20 | 0 | 1500 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | +----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+

The active one is the one with auto-failover. Start load and check current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 952 | 76461 | 0 | 0 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 53137 | 2587784 | 165811100 | 167 | | 501 | 192.168.1.6 | 3306 | ONLINE | 5 | 5 | 11 | 0 | 283496 | 13815077 | 891230826 | 109 | | 501 | 192.168.1.7 | 3306 | ONLINE | 3 | 7 | 10 | 0 | 503516 | 24519457 | 1576198138 | 151 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 21952 | 1068972 | 68554796 | 300 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 21314 | 1038593 | 67043935 | 289 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 60 | 4826 | 0 | 0 | | 501 | 192.168.1.5 | 3306 | SHUNNED | 0 | 0 | 1 | 11 | 177099 | 8626778 | 552221651 | 30 | | 501 | 192.168.1.6 | 3306 | ONLINE | 3 | 7 | 11 | 0 | 956724 | 46601110 | 3002941482 | 49 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 8 | 10 | 0 | 1115685 | 54342756 | 3497575125 | 42 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 76289 | 3721419 | 240157393 | 308 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 75803 | 3686067 | 236382784 | 231 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

When the node is killed the node is SHUNNED, but this time the script already set the new node 192.168.1.6 to ONLINE. See script log:

2016/09/08 14:04:02.494:[INFO] END EXECUTION Total Time:102.347850799561 2016/09/08 14:04:04.478:[INFO] This Node Try to become a WRITER set READ_ONLY to 0 192.168.1.6:3306:HG501 2016/09/08 14:04:04.479:[INFO] This Node NOW HAS READ_ONLY = 0 192.168.1.6:3306:HG501 2016/09/08 14:04:04.479:[INFO] END EXECUTION Total Time:71.8140602111816

More importantly, let’s look at the application experience:

Writes

[ 10s] threads: 10, tps: 9.40, reads: 93.60, writes: 41.60, response time: 1317.41ms (95%), errors: 0.00, reconnects: 0.00 [ 20s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1350.96ms (95%), errors: 0.00, reconnects: 0.00 [ 30s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1317.81ms (95%), errors: 0.00, reconnects: 0.00 [ 40s] threads: 10, tps: 7.80, reads: 70.20, writes: 31.20, response time: 1407.51ms (95%), errors: 0.00, reconnects: 0.00 [ 50s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 2259.35ms (95%), errors: 0.00, reconnects: 0.00 [ 60s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 3275.78ms (95%), errors: 0.00, reconnects: 0.00 [ 70s] threads: 10, tps: 5.70, reads: 60.30, writes: 26.80, response time: 1492.56ms (95%), errors: 0.00, reconnects: 1.00 <-- just a reconnect experience [ 80s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 7959.74ms (95%), errors: 0.00, reconnects: 0.00 [ 90s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 2109.03ms (95%), errors: 0.00, reconnects: 0.00 [ 100s] threads: 10, tps: 6.40, reads: 57.60, writes: 25.60, response time: 1883.96ms (95%), errors: 0.00, reconnects: 0.00 [ 110s] threads: 10, tps: 5.60, reads: 50.40, writes: 22.40, response time: 2167.27ms (95%), errors: 0.00, reconnects: 0.00

With no errors and no huge delay, our application (managing to reconnect) had only a glitch, and had to reconnect.

Read had no errors or reconnects.

The connection errors were managed by ProxySQL, and given it found five in one second it SHUNNED the node. The galera_script was able to promote a reader, and given it is a failover, no delay with retry loop. The whole thing was done in such brief time that application barely saw it.

Obviously, an application with thousands of connections/sec will experience larger impact, but the time-window will be very narrow. Once the failed node is ready to come back, either we choose to start it with READ_ONLY=1, and it will come back as the reader.
Or we will keep it as it is and it will come back as the writer.

No matter what, the script manages the case as it had done in the previous (manual) exercise.

Conclusions

ProxySQL and galera_check, when working together, are quite efficient in managing the cluster and its different scenarios. When using the single-writer mode, solving the manual part of the failover dramatically improves the efficiency in production state recovery performance — going from few minutes to seconds or less.

The multi-writer mode remains the preferred and most recommended way to use ProxySQL/Percona XtraDB Cluster given it performs failover without the need of additional scripts or extensions. It’s also the preferred method if a script is required to manage the integration with ProxySQL.

In both cases, the use of a script can identify the multiple states of Percona XtraDB Cluster and the mutable node scenario. It is a crucial part of the implementation, without which ProxySQL might not behave correctly.

Categories: MySQL

Percona XtraDB Cluster 5.6.30-25.16.2 is now available (CVE-2016-6662 fix)

MySQL Performance Blog - Thu, 2016-09-15 13:53

Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 15, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.2 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25

This release provides a fix for CVE-2016-6662. More information about this security issue can be found here.

Bug Fixed:

  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Categories: MySQL

MySQL Default Configuration Changes between 5.6 and 5.7

MySQL Performance Blog - Wed, 2016-09-14 22:26

In this blog post, we’ll discuss the MySQL default configuration changes between 5.6 and 5.7.

MySQL 5.7 has added a variety of new features that might excite you. However, there are also changes in the current variables that you might have overlooked. MySQL 5.7 updated nearly 40 of the defaults from 5.6. Some of the changes could severely impact your server performance, while others might go unnoticed. I’m going to go over each of the changes and what they mean.

The change that can have the largest impact on your server is likely sync_binlog. My colleague, Roel Van de Paar, wrote about this impact in depth in another blog post, so I won’t go in much detail. Sync_binlog controls how MySQL flushes the binlog to disk. The new value of 1 forces MySQL to write every transaction to disk prior to committing. Previously, MySQL did not force flushing the binlog, and trusted the OS to decide when to flush the binlog.

(https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/)

Variables 5.6.29 5.7.11 sync_binlog 0 1

 

The performance schema variables stand out as unusual, as many have a default of -1. MySQL uses this notation to call out variables that are automatically adjusted. The only performance schema variable change that doesn’t adjust itself is  performance_schema_max_file_classes. This is the number of file instruments used for the performance schema. It’s unlikely you will ever need to alter it.

Variables 5.6.29 5.7.11 performance_schema_accounts_size 100 -1 performance_schema_hosts_size 100 -1 performance_schema_max_cond_instances 3504 -1 performance_schema_max_file_classes 50 80 performance_schema_max_file_instances 7693 -1 performance_schema_max_mutex_instances 15906 -1 performance_schema_max_rwlock_instances 9102 -1 performance_schema_max_socket_instances 322 -1 performance_schema_max_statement_classes 168 -1 performance_schema_max_table_handles 4000 -1 performance_schema_max_table_instances 12500 -1 performance_schema_max_thread_instances 402 -1 performance_schema_setup_actors_size 100 -1 performance_schema_setup_objects_size 100 -1 performance_schema_users_size 100 -1

 

The optimizer_switch, and sql_mode variables have a variety of options that can each be enabled and cause a slightly different action to occur. MySQL 5.7 enables both variables for flags, increasing their sensitivity and security. These additions make the optimizer more efficient in determining how to correctly interpret your queries.

Three flags have been added to the optimzer_switch, all of which existed in MySQL 5.6 and were set as the default in MySQL 5.7 (with the intent to increase the optimizer’s efficiency): duplicateweedout=on, condition_fanout_filter=on, and derived_merge=on. duplicateweedout is part of the optimizer’s semi join materialization strategy. condition_fanout_filter controls the use of condition filtering, and derived_merge controls the merging of derived tables, and views into the outer query block.

https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

http://www.chriscalender.com/tag/condition_fanout_filter/

The additions to SQL mode do not affect performance directly, however they will improve the way you write queries (which can increase performance). Some notable changes include requiring all fields in a select … group by statement must either be aggregated using a function like SUM, or be in the group by clause. MySQL will not assume they should be grouped, and will raise an error if a field is missing. Strict_trans_tables causes a different effect depending on if it used with a transactional table.

Statements are rolled back on transaction tables if there is an invalid or missing value in a data change statement. For tables that do not use a transactional engine, MySQL’s behavior depends on the row in which the invalid data occurs. If it is the first row, then the behavior matches that of a transactional engine. If not, then the invalid value is converted to the closest valid value, or the default value for the columns. A warning is generated, but the data is still inserted.

Variables 5.6.29 5.7.11 optimizer_switch index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on,mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on, semijoin=on
loosescan=on, firstmatch=on
subquery_materialization_cost_based=on
use_index_extensions=on
index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on
mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on
semijoin=on
loosescan=on
firstmatch=on
duplicateweedout=on
subquery_materialization_cost_based=on
use_index_extensions=on
condition_fanout_filter=on
derived_merge=on sql_mode NO_ENGINE_SUBSTITUTION ONLY_FULL_GROUP_BY
STRICT_TRANS_TABLES
NO_ZERO_IN_DATE
NO_ZERO_DATE
ERROR_FOR_DIVISION_BY_ZERO
NO_AUTO_CREATE_USER
NO_ENGINE_SUBSTITUTION

 

There have been a couple of variable changes surrounding the binlog. MySQL 5.7 updated the binlog_error_action so that if there is an error while writing to the binlog, the server aborts. These kind of incidents are rare, but cause a big impact to your application and replication when they occurs, as the server will not perform any further transactions until corrected.

The binlog default format was changed to ROW, instead of the previously used statement format. Statement writes less data to the logs. However there are many statements that cannot be replicated correctly, including “update … order by rand()”. These non-deterministic statements could result in different resultsets on the master and slave. The change to Row format writes more data to  the binlog, but the information is more accurate and ensures correct replication.

MySQL has begun to focus on replication using GTID’s instead of the traditional binlog position. When MySQL is started, or restarted, it must generate a list of the previously used GTIDs. If binlog_gtid_simple_recovery is OFF, or FALSE, then the server starts with the newest binlog and iterates backwards through the binlog files searching for a previous_gtids_log_event. With it set to ON, or TRUE, then the server only reviews the newest and oldest binlog files and computes the used gtids.Binlog_gtid_simple_recovery  makes it much faster to identify the binlogs, especially if there are a large number of binary logs without GTID events. However, in specific cases it could cause gtid_executed and gtid_purged to be populated incorrectly. This should only happen when the newest binarly log was generated by MySQL5.7.5 or older, or if a SET GTID_PURGED statement was run on MySQL earlier than version 5.7.7.

Another replication-based variable updated in 5.7 is slave_net_timeout. It is lowered to only 60 seconds. Previously the replication thread would not consider it’s connection to the master broken until the problem existed for at least an hour. This change informs you much sooner if there is a connectivity problem, and ensures replication does not fall behind significantly before informing you of an issue.

Variables 5.6.29 5.7.11 binlog_error_action IGNORE_ERROR ABORT_SERVER binlog_format STATEMENT ROW binlog_gtid_simple_recovery OFF ON slave_net_timeout 3600 60

 

InnoDB buffer pool changes impact how long starting and stopping the server takes. innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup are used together to prevent you from having to “warm up” the server. As the names suggest, this causes a buffer pool dump at shutdown and load at startup. Even though you might have a buffer pool of 100’s of gigabytes, you will not need to reserve the same amount of space on disk, as the data written is much smaller. The only things written to disk for this is the information necessary to locate the actual data, the tablespace and page IDs.

Variables 5.6.29 5.7.11 innodb_buffer_pool_dump_at_shutdown OFF ON innodb_buffer_pool_load_at_startup OFF ON

 

MySQL now made some of the options implemented in InnoDB during 5.6 and earlier into its defaults. InnoDB’s checksum algorithm was updated from innodb to crc32, allowing you to benefit from the hardware acceleration recent Intel CPU’s have available.

The Barracuda file format has been available since 5.5, but had many improvements in 5.6. It is now the default in 5.7. The Barracuda format allows you to use the compressed and dynamic row formats. My colleague Alexey has written about the utilization of the compressed format and the results he saw when optimizing a server: https://www.percona.com/blog/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/

The innodb_large_prefix defaults to “on”, and when combined with the Barracuda file format allows for creating larger index key prefixes, up to 3072 bytes. This allows larger text fields to benefit from an index. If this is set to “off”, or the row format is not either dynamic or compressed, any index prefix larger than 767 bytes gets silently be truncated. MySQL has introduced larger InnoDB page sizes (32k and 64k) in 5.7.6.

MySQL 5.7 increased the innodb_log_buffer_size value as well. InnoDB uses the log buffer to log transactions prior to writing them to disk in the binary log. The increased size allows the log to flush to the disk less often, reducing IO, and allows larger transactions to fit in the log without having to write to disk before committing.

MySQL 5.7 moved InnoDB’s purge operations to a background thread in order to reduce the thread contention in MySQL 5.5.The latest version increases the default to four purge threads, but can be changed to have anywhere from 1 to 32 threads.

MySQL 5.7 now enables innodb_strict_mode by default, turning some of the warnings into errors. Syntax errors in create table, alter table, create index, and optimize table statements generate errors and force the user to correct them prior to running. It also enables a record size check, ensuring that insert or update statements will not fail due to the record being too large for the selected page size.

Variables 5.6.29 5.7.11 innodb_checksum_algorithm innodb crc32 innodb_file_format Antelope Barracuda innodb_file_format_max Antelope Barracuda innodb_large_prefix OFF ON innodb_log_buffer_size 8388608 16777216 innodb_purge_threads 1 4 innodb_strict_mode OFF ON

 

MySQL has increased the number of times the optimizer dives into the index when evaluating equality ranges. If the optimizer needs to dive into the index more than the eq_range_index_dive_limit , defaulted to 200 in MySQL 5.7, then it uses the existing index statistics. You can adjust this limit from 0, eliminating index dives, to 4294967295. This can have a significant impact to query performance since the table statistics are based on the cardinality of a random sample. This can cause the optimizer to estimate a much larger set of rows to review than it would with the index dives, changing the method the optimizer chooses to execute the query.

MySQL 5.7 deprecated log_warnings. The new preference is utilize log_error_verbosity. By default this is set to 3, and logs errors, warnings, and notes to the error log. You can alter this to 1 (log errors only) or 2 (log errors and warnings). When consulting the error log, verbosity is often a good thing. However this increases the IO and disk space needed for the error log.

Variables 5.6.29 5.7.11 eq_range_index_dive_limit 10 200 log_warnings 1 2

 

There are many changes to the defaults in 5.7. But many of these options have existed for a long time and should be familiar to users. Many people used these variables, and they are the best method to push MySQL forward. Remember, however, you can still edit these variables, and configure them to ensure that your server works it’s best for your data.

Categories: MySQL

pmp-check-pt-table-checksum Percona Monitoring Plugin

MySQL Performance Blog - Wed, 2016-09-14 20:52

Recently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave. pmp-checksum-pt-table-checksum monitors data differences on the slave from the checksum table as per information in the last checksum performed by the pt-table-checksum tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the pmp-check-pt-table-checksum documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

  • Host 10.0.3.131 is master.
  • Host 10.0.3.83 is slave1
  • Host 10.0.3.36 is slave2

I intentionally generated more data on the master so pt-table-checksum can catch the differences on the slave(s). Here’s what it looks like:

mysql-master> SELECT * FROM test.t1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | +------+ 10 rows in set (0.00 sec) mysql-slave1> SELECT * FROM test.t1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+ 5 rows in set (0.00 sec) mysql-slave2> SELECT * FROM test.t1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+ 5 rows in set (0.00 sec)

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed pt-table-checksum from the master to check for data discrepancies:

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 08-25T04:57:10 0 1 10 1 0 0.018 test.t1 [root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password Differences on slave1 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY test.t1 1 -5 1 Differences on slave2 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY test.t1 1 -5 1

pt-table-checksum correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the pmp-check-pt-table-checksum  Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306 WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1 [root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306 WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

NOTE: The checksum_user database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the pmp-check-pt-table-checksum command to the commands.cfg file:

define command{ command_name pmp-check-pt-table-checksum command_line $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$ }

NOTE: I used “-c” option for pmp-check-pt-table-checksum, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

define service{ use generic-service host_name slave1 service_description Checksum Status check_command pmp-check-pt-table-checksum!1 }

In this command “1” is an argument to command “-c $ARG1$” so pmp-check-pt-table-checksum will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:


I also think the “INTERVAL” option is useful:

-i INTERVAL     Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have pt-table-checksum cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule pt-table-checksum via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.

Categories: MySQL

Webinar Thursday Sept. 15: Identifying and Solving Database Performance Issues with PMM

MySQL Performance Blog - Wed, 2016-09-14 13:51

Please join Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10 am PDT (UTC-7) for a webinar on Identifying and Solving Database Performance Issues with PMM.

Database performance is the key to high-performance applications. Gaining visibility into the database is the key to improving database performance. Percona’s Monitoring and Management (PMM) provides the insight you need into your database environment.

In this webinar, we will demonstrate how using PMM for query analytics, in combination with database and host performance metrics, can more efficiently drive tuning, issue management and application development. Using PMM can result in faster resolution times, more focused development and a more efficient IT team.

Register for the webinar here.

Roman Vynar, Lead Platform Engineer Roman is a Lead Platform Engineer at Percona. He joined the company to establish and develop the Remote DBA service from scratch. Over time, the growing service successfully expanded to Managed Services. Roman develops the monitoring tools, automated scripts, backup solution, notification and incident tracking web system and currently leading Percona Monitoring and Management project.
Categories: MySQL

Black Friday and Cyber Monday: Best Practices for Your E-Commerce Database

MySQL Performance Blog - Wed, 2016-09-14 12:06

This blog post discusses how you can protect your e-commerce database from a high traffic disaster.

Databases power today’s e-commerce. Whether it’s listing items on your site, contacting your distributor for inventory, tracking shipments, payments, or customer data, your database must be up, running, tuned and available for your business to be successful.

There is no time that this is more important than high-volume traffic days. There are specific events that occur throughout the year (such as Black Friday, Cyber Monday, or Singles Day) that you know are going to put extra strain on your database environment. But these are the specific times that your database can’t go down – these are the days that can make or break your year!

So what can you do to guarantee that your database environment is up to the challenge of handling high traffic events? Are there ways of preparing for this type of traffic?

Yes, there are! In this blog post, we’ll look at some of the factors that can help prepare your database environment to handle large amounts of traffic.

Synchronous versus Asynchronous Applications

Before moving to strategies, we need to discuss the difference between synchronous and asynchronous applications.

In most web-based applications, user input starts a number of requests for resources. Once the server answers the requests, no communication stops until the next input. This type of communication between a client and server is called synchronous communication.

Restricted application updates limit synchronous communication. Even synchronous applications designed to automatically refresh application server information at regular intervals have consistent periods of delay between data refreshes. While usually such delays aren’t an issue, some applications (for example, stock-trading applications) rely on continuously updated information to provide their users optimum functionality and usability.

Web 2.0-based applications address this issue by using asynchronous communication. Asynchronous applications deliver continuously updated data to users. Asynchronous applications separate client requests from application updates, so multiple asynchronous communications between the client and server can occur simultaneously or in parallel.

The strategy you use to scale the two types of applications to meet growing user and traffic demands will differ.

Scaling a Synchronous/Latency-sensitive Application

When it comes to synchronous applications, you really have only one option for scaling performance: sharding. With sharding, the tables are divided and distributed across multiple servers, which reduces the total number of rows in each table. This consequently reduces index size, and generally improves search performance.

A shard can also be located on its own hardware, with different shards added to different machines. This database distribution over a large multiple of machines spreads the load out, also improving performance. Sharding allows you to scale read and write performance when latency is important.

Generally speaking, it is better to avoid synchronous applications when possible – they limit your scalability options.

Scaling an Asynchronous Application

When it comes to scaling asynchronous applications, we have many more options than with synchronous applications. You should try and use asynchronous applications whenever possible:

  • Secondary/Slave hosts. Replication can be used to add more hardware for read traffic. Replication usually employs a master/slave relationship between a designated “original” server and copies of the server. The master logs and then distributes the updates to the slaves. This setup allows you to distribute the read load across more than one machine.
  • Caching. Database caching (tables, data, and models – caching summaries of data) improves scalability by distributing the query workload from expensive (overhead-wise) backend processes to multiple cheaper ones. It allows more flexibility for data processing: for example premium user data can be cached, while regular user data isn’t.

    Caching also improves data availability by providing applications that don’t depend on backend services continued service. It also allows for improved data access speeds by localizing the data and avoiding roundtrip queries. There are some specific caching strategies you can use:

    • Pre-Emptive Caching. Ordinarily, an object gets cached the first time it is requested (or if cached data isn’t timely enough). Preemptive caching instead generates cached versions before an application requests them. Typically this is done by a cron process.
    • Hit/Miss Caching. A cache hit occurs when an application or software requests data. First, the central processing unit (CPU) looks for the data in its closest memory location, which is usually the primary cache. If the requested data is found in the cache, it is considered a cache hit. Cache miss occurs within cache memory access modes and methods. For each new request, the processor searched the primary cache to find that data. If the data is not found, it is considered a cache miss. A cache hit serves data more quickly, as the data can be retrieved by reading the cache memory. The cache hit also can be in disk caches where the requested data is stored and accessed by the first query. A cache miss slows down the overall process because after a cache miss, the central processing unit (CPU) will look for a higher level cache, such as random access memory (RAM) for that data. Further, a new entry is created and copied into cache before it can be accessed by the processor.
    • Client-side Caching. Client-side caching allows server data to be copied and cached on the client computer. Client side caching reduces load times by several factors. 
  • Queuing Updates. Queues are used to order queries (and other database functions) in a timely fashion. There are queues for asynchronously sending notifications like email and SMS in most websites. E-commerce sites have queues for storing, processing and dispatching orders. How your database handles queues can affect your performance:
    • Batching. Batch processing can be used for efficient bulk database updates and automated transaction processing, as opposed to interactive online transaction processing (OLTP) applications.
    • Fan-Out Updates. Fan-out duplicates data in the database. When data is duplicated it eliminates slow joins and increases read performance.
Efficient Usage of Data at Scale

As you scale up in terms of database workload, you need to be able to avoid bad queries or patterns from your applications.

  • Moving expensive queries out of the user request path. Even if your database server uses powerful hardware, its performance can be negatively affected by a handful of expensive queries. Even a single bad query can cause serious performance issues for your database. Make sure to use monitoring tools to track down the queries that are taking up the most resources.
  • Using caching to offload database traffic. Cache data away from the database using something like memcached. This is usually done at the application layer, and is highly effective.
  • Counters and In-Memory Stores. Use memory counters to monitor performance hits: pages/sec, faults/sec, available bytes, total server, target server memory, etc. Percona’s new in-memory storage engine for MongoDB also can help.
  • Connection Pooling. A connection pool made up of cached database connections, remembered so that the connections can be reused for future requests to the database. Connection pools can improve the performance of executing commands on a database.
Scaling Out (Horizontal) Tricks

Scaling horizontally means adding more nodes to a system, such as adding a new server to a database environment to a distributed software application. For example, scaling out from one Web server to three.

  • Pre-Sharding Data for Flexibility. Pre-sharding the database across the server instances allows you to have the entire environment resources available at the start of the event, rather than having to rebalance during peak event traffic.
  • Using “Kill Switches” to Control Traffic. The idea of a kill switch is a single point where you can stop the flow of data to a particular node. Strategically set up kill switches allow you to stop a destructive workload that begins to impact the entire environment.
  • Limiting Graph Structures. By limiting the size or complexity of graph structures in the database, you will simplify data lookups and data size.
Scaling with Hardware (Vertical Scaling)

Another option to handle the increased traffic load is adding more hardware to your environment: more servers, more CPUs, more memory, etc. This, of course, can be expensive. One option here is to pre-configure your testing environment to become part of the production environment if necessary. Another is to pre-configure more Database-as-a-Service (DaaS) instances for the event (if you are a using cloud-based services).

Whichever method, be sure you verify and test your extra servers and environment before your drop-dead date.

Testing Performance and Capacity

As always, in any situation where your environment is going to be stressed beyond usual limits, testing under real-world conditions is a key factor. This includes not only testing for raw traffic levels, but also the actual workloads that your database will experience, with the same volume and variety of requests.

Knowing Your Application and Questions to Ask at Development Time

Finally, it’s important that you understand what applications will be used and querying the database. This sort of common sense idea is often overlooked, especially when teams (such as the development team and the database/operations team) get siloed and don’t communicate.

Get to know who is developing the applications that are using the database, and how they are doing it. As an example, a while back I had the opportunity to speak with a team of developers, mostly to just understand what they were doing. In the process of whiteboarding the app with them, we discovered a simple query issue that – now that we were aware of it – took little effort to fix. These sorts of interactions, early in the process, can save a great deal of headache down the line.

Conclusion

There are many strategies that can help you prepare for high traffic events that will impact your database. I’ve covered a few here briefly. For an even more thorough look at e-commerce database strategies, attend my webinar “Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster” on Thursday, September 22, 2016 10:00 am Pacific Time.

Register here.

Categories: MySQL

MySQL CDC, Streaming Binary Logs and Asynchronous Triggers

MySQL Performance Blog - Tue, 2016-09-13 22:21

In this post, we’ll look at MySQL CDC, streaming binary logs and asynchronous triggers.

What is Change Data Capture and why do we need it?

Change Data Capture (CDC) tracks data changes (usually close to realtime). In MySQL, the easiest and probably most efficient way to track data changes is to use binary logs. However, other approaches exist. For example:

  • General log or Audit Log Plugin (which logs all queries, not just the changes)
  • MySQL triggers (not recommended, as it can slow down the application — more below)

One of the first implementations of CDC for MySQL was the FlexCDC project by Justin Swanhart. Nowadays, there are a lot of CDC implementations (see mysql-cdc-projects wiki for a long list).

CDC can be implemented for various tasks such as auditing, copying data to another system or processing (and reacting to) events. In this blog post, I will demonstrate how to use a CDC approach to stream MySQL binary logs, process events and save it (stream to) another MySQL instance (or MongoDB). In addition, I will show how to implement asynchronous triggers by streaming binary logs.

Streaming binary logs 

You can read binary logs using the mysqlbinlog utility, by adding “-vvv” (verbose option). mysqlbinlog can also show human readable version for the ROW based replication. For example:

# mysqlbinlog -vvv /var/lib/mysql/master.000001 BINLOG ' JxiqVxMBAAAALAAAAI7LegAAAHQAAAAAAAEABHRlc3QAAWEAAQMAAUTAFAY= JxiqVx4BAAAAKAAAALbLegAAAHQAAAAAAAEAAgAB//5kAAAAedRLHg== '/*!*/; ### INSERT INTO `test`.`a` ### SET ### @1=100 /* INT meta=0 nullable=1 is_null=0 */ # at 8047542 #160809 17:51:35 server id 1 end_log_pos 8047573 CRC32 0x56b36ca5 Xid = 24453 COMMIT/*!*/;

Starting with MySQL 5.6, mysqlbinlog can also read the binary log events from a remote master (“fake” replication slave).

Reading binary logs is a great basis for CDC. However, there are still some challenges:

  1. ROW-based replication is probably the easiest way to get the RAW changes, otherwise we will have to parse SQL. At the same time, ROW-based replication binary logs don’t contain the table metadata, i.e. it does not record the field names, only field number (as in the example above “@1” is the first field in table “a”).
  2. We will need to somehow record and store the binary log positions so that the tool can be restarted at any time and proceed from the last position (like a MySQL replication slave).

Maxwell’s daemon (Maxwell = Mysql + Kafka), an application recently released by Zendesk, reads MySQL binlogs and writes row updates as JSON (it can write to Kafka, which is its primary goal, but can also write to stdout and can be extended for other purposes). Maxwell stores the metadata about MySQL tables and binary log events (and other metadata) inside MySQL, so it solves the potential issues from the above list.

Here is a quick demo of Maxwell:

Session 1 (Insert into MySQL):

mysql> insert into a (i) values (151); Query OK, 1 row affected (0.00 sec) mysql> update a set i = 300 limit 5; Query OK, 5 rows affected (0.01 sec) Rows matched: 5 Changed: 5 Warnings: 0

Session 2 (starting Maxwell):

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout 16:00:15,303 INFO Maxwell - Maxwell is booting (StdoutProducer), starting at BinlogPosition[master.000001:15494460] 16:00:15,327 INFO TransportImpl - connecting to host: 127.0.0.1, port: 3306 16:00:15,350 INFO TransportImpl - connected to host: 127.0.0.1, port: 3306, context: AbstractTransport.Context[threadId=9,... 16:00:15,350 INFO AuthenticatorImpl - start to login, user: maxwell, host: 127.0.0.1, port: 3306 16:00:15,354 INFO AuthenticatorImpl - login successfully, user: maxwell, detail: OKPacket[packetMarker=0,affectedRows=0,insertId=0,serverStatus=2,warningCount=0,message=<null>] 16:00:15,533 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at BinlogPosition[master.000001:3921]) {"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}} {"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"commit":true,"data":{"i":300},"old":{"i":150}}

As we can see in this example, Maxwell get the events from MySQL replication stream and outputs it into stdout (if we change the producer, it can save it to Apache Kafka).

Saving binlog events to MySQL document store or MongoDB

If we want to save the events to some other place we can use MongoDB or MySQL JSON fields and document store (as Maxwell will provide use with JSON documents). For a simple proof of concept, I’ve created nodeJS scripts to implement a CDC “pipleline”:

var mysqlx = require('mysqlx'); var mySession = mysqlx.getSession({ host: '10.0.0.2', port: 33060, dbUser: 'root', dbPassword: 'xxx' }); process.on('SIGINT', function() { console.log("Caught interrupt signal. Exiting..."); process.exit() }); process.stdin.setEncoding('utf8'); process.stdin.on('readable', () => { var chunk = process.stdin.read(); if(chunk != null) { process.stdout.write(`data: ${chunk}`); mySession.then(session => { session.getSchema("mysqlcdc").getCollection("mysqlcdc") .add( JSON.parse(chunk) ) .execute(function (row) { // can log something here }).catch(err => { console.log(err); }) .then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices)) }); }).catch(function (err) { console.log(err); process.exit(); }); } }); process.stdin.on('end', () => { process.stdout.write('end'); process.stdin.resume(); });

And to run it we can use the pipeline:

./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR | node ./maxwell_to_mysql.js

The same approach can be used to save the CDC events to MongoDB with mongoimport:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR |mongoimport -d mysqlcdc -c mysqlcdc --host localhost:27017

Reacting to binary log events: asynchronous triggers

In the above example, we only recorded the binary log events. Now we can add “reactions”.

One of the practical applications is re-implementing MySQL triggers to something more performant. MySQL triggers are executed for each row, and are synchronous (the query will not return until the trigger event finishes). This was known to cause poor performance, and can significantly slow down bulk operations (i.e., “load data infile” or “insert into … values (…), (…)”). With triggers, MySQL will have to process the “bulk” operations row by row, killing the performance. In addition, when using statement-based replication, triggers on the slave can slow down the replication thread (it is much less relevant nowadays with ROW-based replication and potentially multithreaded slaves).

With the ability to read binary logs from MySQL (using Maxwell), we can process the events and re-implement triggers — now in asynchronous mode — without delaying MySQL operations. As Maxwell gives us a JSON document with the “new” and “old” values (with the default option binlog_row_image=FULL, MySQL records the previous values for updates and deletes) we can use it to create triggers.

Not all triggers can be easily re-implemented based on the binary logs. However, in my experience most of the triggers in MySQL are used for:

  • auditing (if you deleted a row, what was the previous value and/or who did and when)
  • enriching the existing table (i.e., update the field in the same table)

Here is a quick algorithm for how to re-implement the triggers with Maxwell:

  • Find the trigger table and trigger event text (SQL)
  • Create an app or a script to parse JSON for the trigger table
  • Create a new version of the SQL changing the NEW.<field> to “data.field” (from JSON) and OLD.<field> to “old.field” (from JSON)

For example, if I want to audit all deletes in the “transactions” table, I can do it with Maxwell and a simple Python script (do not use this in production, it is a very basic sample):

import json,sys line = sys.stdin.readline() while line: print line, obj=json.loads(line); if obj["type"] == "delete": print "INSERT INTO transactions_delete_log VALUES ('" + str(obj["data"]) + "', Now() )" line = sys.stdin.readline()

MySQL:

mysql> delete from transactions where user_id = 2; Query OK, 1 row affected (0.00 sec)

Maxwell pipeline:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR | python trigger.py {"database":"test","table":"transactions","type":"delete","ts":1472942384,"xid":214395,"commit":true,"data":{"id":2,"user_id":2,"value":2,"last_updated":"2016-09-03 22:39:31"}} INSERT INTO transactions_delete_log VALUES ('{u'last_updated': u'2016-09-03 22:39:31', u'user_id': 2, u'id': 2, u'value': 2}', Now() )

Maxwell limitations

Maxwell was designed for MySQL 5.6 with ROW-based replication. Although it can work with MySQL 5.7, it does not support new MySQL 5.7 data types (i.e., JSON fields). Maxwell does not support GTID, and can’t failover based on GTID (it can parse events with GTID thou).

Conclusion

Streaming MySQL binary logs (for example with Maxwell application) can help to implement CDC for auditing and other purposes, and also implement asynchronous triggers (removing the MySQL level triggers can increase MySQL performance).

Categories: MySQL

ProxySQL and MHA Integration

MySQL Performance Blog - Tue, 2016-09-13 18:03

This blog post discusses ProxySQL and MHA integration, and how they work together.

MHA (Master High Availability Manager and tools for MySQL) is almost fully integrated with the ProxySQL process. This means you can count on the MHA standard feature to manage failover, and ProxySQL to manage the traffic and shift from one server to another.

This is one of the main differences between MHA and VIP, and MHA and ProxySQL: with MHA/ProxySQL, there is no need to move IPs or re-define DNS.

The following is an example of an MHA configuration file for use with ProxySQL:

server default] user=mha password=mha ssh_user=root repl_password=replica manager_log=/tmp/mha.log manager_workdir=/tmp remote_workdir=/tmp master_binlog_dir=/opt/mysql_instances/mha1/logs client_bindir=/opt/mysql_templates/mysql-57/bin client_libdir=/opt/mysql_templates/mysql-57/lib master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change log_level=debug [server1] hostname=mha1r ip=192.168.1.104 candidate_master=1 [server2] hostname=mha2r ip=192.168.1.107 candidate_master=1 [server3] hostname=mha3r ip=192.168.1.111 candidate_master=1 [server4] hostname=mha4r ip=192.168.1.109 no_master=1

NOTE: Be sure to comment out the “FIX ME ” lines in the sample/scripts.

After that, just install MHA as you normally would.

In ProxySQL, be sure to have all MHA users and the servers set.

When using ProxySQL with standard replication, it’s important to set additional privileges for the ProxySQL monitor user. It must also have “Replication Client” set or it will fail to check the SLAVE LAG. The servers MUST have a defined value for the attribute max_replication_lag, or the check will be ignored.

As a reminder:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10); INSERT INTO mysql_replication_hostgroups VALUES (600,601); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_W',600,1); insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_R',601,1); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',600,1,3,'^SELECT.*FOR UPDATE'); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',601,1,3,'^SELECT'); LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK; insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_W','test',1,600,'test_mha'); insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_R','test',1,601,'test_mha'); insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_RW','test',1,600,'test_mha'); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

OK, now that all is ready,  let’s rock’n’roll!

Controlled fail-over

First of all, the masterha_manager should not be running or you will get an error.

Now let’s start some traffic:

Write sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run Read only sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

Let it run for a bit, then check:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.1.104 | 3306 | ONLINE | 10 | 0 | 20 | 0 | 551256 | 44307633 | 0 | 285 | <--- current Master | 601 | 192.168.1.111 | 3306 | ONLINE | 5 | 3 | 11 | 0 | 1053685 | 52798199 | 4245883580 | 1133 | | 601 | 192.168.1.109 | 3306 | ONLINE | 3 | 5 | 10 | 0 | 1006880 | 50473746 | 4052079567 | 369 | | 601 | 192.168.1.107 | 3306 | ONLINE | 3 | 5 | 13 | 0 | 1040524 | 52102581 | 4178965796 | 604 | | 601 | 192.168.1.104 | 3306 | ONLINE | 7 | 1 | 16 | 0 | 987548 | 49458526 | 3954722258 | 285 | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Now perform the failover. To do this, instruct MHA to do a switch, and to set the OLD master as a new slave:

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

Check what happened:

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects: 0.00 [ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects: 0.00 [ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects: 0.00 <--- moment of the switch [ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects: 0.00 [ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects: 1.00 [ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects: 0.00

Check ProxySQL:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.1.107 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 123457 | 9922280 | 0 | 658 | <--- new master | 601 | 192.168.1.111 | 3306 | ONLINE | 2 | 6 | 14 | 0 | 1848302 | 91513537 | 7590137770 | 1044 | | 601 | 192.168.1.109 | 3306 | ONLINE | 5 | 3 | 12 | 0 | 1688789 | 83717258 | 6927354689 | 220 | | 601 | 192.168.1.107 | 3306 | ONLINE | 3 | 5 | 13 | 0 | 1834415 | 90789405 | 7524861792 | 658 | | 601 | 192.168.1.104 | 3306 | ONLINE | 6 | 2 | 24 | 0 | 1667252 | 82509124 | 6789724589 | 265 | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the servers weren’t behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue, given the need to move around a VIP or change name resolution, had a limited hiccup.

Read operations were not affected, at all. Nice, eh?

Do you know how long it takes to do a switch under these conditions? real 0m2.710s yes 2.7 seconds.

This is more evidence that, most of the time, an MHA-based switch is caused by the need to redirect traffic from A to B using the network.

Crash fail-over

What happened if instead of an easy switch, we have to cover a real failover?

First of all, let’s start masterha_manager:

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

Then let’s start a load again. Finally, go to the MySQL node that uses master xxx.xxx.xxx.107

ps aux|grep mysql mysql 18755 0.0 0.0 113248 1608 pts/0 S Aug28 0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf mysql 21975 3.2 30.4 4398248 941748 pts/0 Sl Aug28 93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306 And kill the MySQL process. kill -9 21975 18755

As before, check what happened on the application side:

[ 80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects: 0.00 [ 90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects: 0.00 [ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects: 0.00 [ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects: 0.00 <-- issue starts [ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects: 0.00 [ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.40 <-- total stop in write [ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects: 0.00 <-- writes restart [ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects: 0.00 [ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects: 0.00 [ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects: 0.00 [ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects: 0.00

So it takes ~10 seconds to perform failover.

To understand better, let see what happened in MHA-land:

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf.. ... Read conf and start Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock.. Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. ... Wait for errors Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s).. <--- Finally MHA decide to do something Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker! Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable! Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable. Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

MHA sees the server failing at xx:47, but because of the retry and checks validation, it actually fully acknowledges the downtime at xx:56 (~8 seconds after).

To perform the whole failover, it only takes ~2 seconds (again). Because no movable IPs or DNSs were involved, the operations were fast. This is true when the servers have the binary-log there, but it’s a different story if MHA also has to manage and push data from the binarylog to MySQL.

As you can see, ProxySQL can also help reduce the timing for this scenario, totally skipping the network-related operations. These operations are the ones causing the most trouble in these cases.

Categories: MySQL

Percona Monitoring and Management (PMM) is now available

MySQL Performance Blog - Tue, 2016-09-13 10:00

Percona announces the availability of Percona Monitoring and Management (PMM), an open source software database monitoring and management tool. Completely open source and free to download and use, Percona Monitoring and Management provides point-in-time visibility and historical trending of database performance that enables DBAs and application developers to optimize the performance of their MySQL and MongoDB databases.

Percona Monitoring and Management combines several best-of-breed tools, including Grafana, Prometheus, and Consul, in a single, easy-to-manage virtual appliance, along with Percona-developed query analytics, administration, API, agent and exporter components. Percona Monitoring and Management monitors and provides actionable performance data for MySQL variants, including Oracle MySQL Community Edition, Oracle MySQL Enterprise Edition, Percona Server for MySQL, and MariaDB, as well as MongoDB variants, including MongoDB Community Edition, and Percona Server for MongoDB.

PMM is an on-premises solution that keeps all of your performance and query data inside the confines of your environment, with no requirement for any data to cross the Internet.

Percona Monitoring and Management Highlights:

  • Provides query and metric information that enables administrators to optimize database performance
  • Displays current queries and highlights potential query issues to enable faster issue resolution
  • Maps queries against metrics to help make informed decisions about crucial database resources: platform needs, system growth, team focus and the most important database activities.

PMM provides database maintenance teams with better visibility into database and query activity, in order to implement actionable strategies and issue resolution more quickly. More information allows you to concentrate efforts on the areas that yield the highest value.

Like prior versions, PMM is distributed through Docker Hub and is free to download. Full instructions for downloading and installing the server and client are available in the documentation.

A PMM demonstration is available at pmmdemo.percona.com. We have also implemented forums for PMM discussions.

There will also be a webinar with Percona’s Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10:00am PDT (UTC-7) about “Identifying and Solving Database Performance Issues with PMM.” Register here for the webinar to learn more about PMM.  Can’t attend the webinar we got you covered! Register anyways and we’ll send you the recording and slides even if you can’t attend the webinar.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

As always, thanks for your continued support of Percona!

Categories: MySQL

Is Your Database Affected by CVE-2016-6662?

MySQL Performance Blog - Mon, 2016-09-12 22:55

In this blog post, I will discuss the CVE-2016-6662 vulnerability, how to tell if it affects you, and how to prevent the vulnerability from affecting you if you have an older version of MySQL.

I’ll also list which MySQL versions include the vulnerability fixes.

As we announced in a previous post, there are certain scenarios in Percona Server (and MySQL) that can allow a remote root code execution (CVE-2016-6662).

Vulnerability approach

The website legalhackers.com contains the full, current explanation of the CVE-2016-6662 vulnerability.

To summarize, the methods used to gain root privileges require multiple conditions:

  1. A remote (or even local) MySQL user that has FILE permissions (or SUPER, which encompasses all of them).
  2. Improper OS files/directories permissions around MySQL configuration files that allow the MySQL system user access to modify or create new configuration files.
  3. Several techniques alter the MySQL configuration to include loading a malicious shared library.
    The techniques currently described require FILE or SUPER privileges, but also include the currently undisclosed CVE-2016-6663 (which demonstrates how to alter the configuration without FILE privileges).
  4. Have that malicious shared library loaded when MySQL restarts, which includes the code that allows privilege escalation.
Fixed versions MySQL fixes

MySQL seems to have already released versions that include the security fixes.

This is coming from the release notes in MySQL 5.6.33:

  • For mysqld_safe, the argument to --malloc-lib now must be one of the directories /usr/lib, /usr/lib64, /usr/lib/i386-linux-gnu, or /usr/lib/x86_64-linux-gnu. In addition, the --mysqld and --mysqld-version options can be used only on the command line and not in an option file. (Bug #24464380)
  • It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
  • Privilege escalation was possible by exploiting the way REPAIR TABLE used temporary files. (Bug #24388746)

You aren’t affected if you use version 5.5.52, 5.6.33 or 5.7.15.

Release notes: 5.5.525.6.335.7.15

Percona Server

The way Percona increased security was by limiting which libraries are allowed to be loaded with LD_PRELOAD (including --malloc-lib), and limiting them to /usr/lib/, /usr/lib64 and the MySQL installation base directory.

This means only locations that are accessible by root users can load shared libraries.

The following Percona Server versions have this fix:

We are working on releasing new Percona XtraDB Cluster versions as well.

Future Percona Server releases will include all fixes from MySQL.

MariaDB

MariaDB has fixed the issue in 5.5.5110.1.17 and 10.0.27

I have an older MySQL Version, what to do now?

It is possible to change the database configuration so that it isn’t affected anymore (without changing your MySQL versions and restarting your database). There are several options, each of them focusing on one of the conditions required for the vulnerability to work.

Patch mysqld_safe Manually

Just before publishing this, a blogpost came out with another alternative on how to patch your server: https://www.psce.com/blog/2016/09/12/how-to-quickly-patch-mysql-server-against-cve-2016-6662/.

Database user permissions

One way to avoid the vulnerability is making sure no remote user has SUPER or FILE privileges.

However, CVE-2016-6663 mentions there is a way to do this without any FILE privileges (likely related to the REPAIR TABLE issue mentioned in MySQL release notes).

Configuration files permissions

The vulnerability needs to be able to write to some MySQL configuration files. Prevent that and you are secure.

Make sure you configure permissions for various config files as follows:

  • MySQL reads configuration files from different paths, including from your datadir
    • Create an (empty) my.cnf  and .my.cnf in the datadir (usually /var/lib/mysql) and make root the owner/group with 0644 permissions.
    • Other Locations to look into: /etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf  (mysqld --help --verbose shows you where mysqld will look)
  • This also includes !includedir paths defined in your current configurations — make sure they are not writeable by the mysql user as well
  • No config files should be writeable by the mysql user (change ownership and permissions)
Categories: MySQL

Percona Live Europe featured talk with Ronald Bradford — Securing your MySQL/MariaDB data

MySQL Performance Blog - Mon, 2016-09-12 15:59

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Ronald Bradford, Founder & CEO of EffectiveMySQL. His talk will be on Securing your MySQL/MariaDB data. This talk will answer questions like:

  • How do you apply the appropriate filesystem permissions?
  • How do you use TLS/SSL for connections, and are they good for replication?
  • Encrypting all your data at rest
  • How to monitor your database with the audit plugin

. . . and more. I had a chance to speak with Ronald and learn a bit more about database security:

PerconaGive me a brief history of yourself: how you got into database development, where you work, what you love about it?

Ronald: My first introduction to relational theory and databases was with the writings of C.J. Date and Michael Stonebraker while using the Ingres RDBMS in 1988. For 28 years, my industry experience in the database field has covered a number of relational and non-relational products, including MySQL – which I started using at my first startup in 1999. For the last 17 years, I have enjoyed contributing to the MySQL ecosystem in many ways. I’ve consulted with hundreds of organizations, both small and large, that rely on MySQL to deliver strategic value to their business customers. I have given over 130 presentations in the past ten years across six continents and published a number of books and blog articles from my experiences with MySQL and open source. I am also the organizer of the MySQL Meetup group in New York City.

My goals have always been to help educate the current generation of software engineers to appreciate, use and maximize the right product for the job. I always hope that MySQL is the right solution, but recommend other options when it is not.

I am presently looking for my next opportunity to help organizations develop a strategic and robust data infrastructure that ensures business continuity for growing needs – ensuring a reliable and consistent user experience.

Percona: Your talk is called “Securing your MySQL/MariaDB data.” Why is securing your database important, and what are the real-world ramifications for a database security breach?

Ronald: We secure the belongings in our home, we secure the passengers in our car, we secure the possessions we carry on us. Data is a valuable asset for many organizations, and for some it is the only asset of value for continued operation. Should such important business information not have the same value as people or possessions?

Within any industry, you want to be the disruptor and not the disrupted. The press coverage on any alleged or actual data breach generally leads to a loss of customer confidence. This in turn can directly affect your present and future business viability – enabling competitors to take advantage of the situation. Data security should be as important as data recovery and system performance. Today we hear about data breaches on a weekly basis – everything from government departments to large retail stores. We often do not hear of the data breaches that can occur with smaller organizations, who also have your information: your local medical provider, or a school or university that holds your personal information.

A data breach can be much more impactful than data loss. It can be harder to detect and assess the long-term impact of a security breach because there might be unauthorized access over a longer time period. Often there are insufficient audit trails and logs to validate the impact of any security breach. Inadequate access controls can also lead to unauthorized data access both internally and externally. Many organizations fail to manage risk by not providing a “least privileges required approach” for any access to valuable data by applications or staff.

Any recent real-world example highlights the potential of insufficient data security, and therefore the increased risk of your personal information being used illegally. What is your level of confidence about security when you register with a new service and then you receive an email with your login and password in clear text? If your password is not secure, your personal data is also not secure and now it’s almost impossible for your address, phone number and other information to be permanently removed from this insecure site.

Percona: Are there significant differences between security for on-premise and cloud-based databases? What are they?

Ronald: There should be no differences in protecting your data within MySQL regardless of where this is stored.  When using a cloud-based database there is the additional need to have a shared responsibility with your cloud provider ensuring their IaaS and provided services have adequate trust and verification. For example, you need to ensure that provisioned disk and memory is adequately zeroed after use, and also ensure that adequate separation exists between hosts and clients on dedicated equipment in a virtualized environment. While many providers state these security and compliance processes, there have been instances where data has not been adequately protected.

Just as you may trust an internal department with additional security in the physical and electronic access to the systems that hold your data, you should “trust but verify” your cloud provider’s capacity to protect your data and that these providers continue to assess risk regularly and respond appropriately.

Percona: What is changing in database security that keeps you awake at night? What things does the market need to address immediately?

Ronald: A discussion with a CTO recently indicated he was worried about how their infrastructure would support high availability: what is the impact of any outage, and how does the organization know if he is prepared enough? Many companies, regardless of their size, are not prepared for either a lack of availability or a security breach.

The recent Delta is an example of an availability outage that cost the company many millions of dollars. Data security should be considered with the exact same concern, however it is often the poor cousin to availability. Disaster recovery is a commonly used term for addressing the potential loss of access to data, but there is not a well-known term or common processes for addressing data protection.

You monitor the performance of your system for increased load and for slow queries. When did you last monitor the volume of access to secure data to look for unexpected patterns or anomalies? A data breach can be a single SQL statement that is not an expected application traffic pattern. How can you protect your data in this situation? We ask developers to write unit tests to improve code coverage. Does your organization ask developers to write tests to perform SQL injection, or write SQL statements that should not be acceptable to manipulate data and are therefore correctly identified, alerted and actioned? Many organizations run load and volume testing regularly, but few organizations run security drills as regularly.

As organizations continue to address the growing data needs in the digital age, ongoing education and awareness are very important. There is often very little information in the MySQL ecosystem about validating data security, determining what is applicable security monitoring, and what is the validation and verification of authorized and unauthorized data access. What also needs to be addressed is the use (and abuse) of available security in current and prior MySQL versions. The key advancements in MySQL 5.6 and MySQL 5.7, combined with a lack of a migration path for organizations, is a sign that ongoing security improvements are not considered as important as other features.

Percona: What are looking forward to the most at Percona Live Europe this year?

Ronald: Percona Live Europe is a chance for all attendees, including myself, to see, hear and share in the wide industry use of MySQL today (and the possibilities tomorrow).

With eight sessions per time slot, I often wish for the ability to be in multiple places at  once! Of particular interest to myself are new features that drive innovation of the product, such as MySQL group replication.

I am also following efforts related to deploying your application stack in containers using Docker. Solving the state and persistence needs of a database is very different to providing application micro-services. I hope to get a better appreciation for finding a balance between the use of containers, VMs and dedicated hardware in a MySQL stack that promotes accelerated development, performance, business continuity and security.

You can read more about Ronald and his thoughts on database security at ronaldbradford.com.

Want to find out more about Ronald, MySQL/MariaDB and security? Register for Percona Live Europe 2016, and come see his talk Securing your MySQL/MariaDB data.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Categories: MySQL

Percona Server Critical Update CVE-2016-6662

MySQL Performance Blog - Mon, 2016-09-12 14:06

This blog is an announcement for a Percona Server update with regards to CVE-2016-6662.

We have added a fix for CVE-2016-6662 in the following releases:

From seclist.org:

An independent research has revealed multiple severe MySQL vulnerabilities. This advisory focuses on a critical vulnerability with a CVEID of CVE-2016-6662. The vulnerability affects MySQL servers in all version branches (5.7, 5.6, and 5.5) including the latest versions, and could be exploited by both local and remote attackers.

Both the authenticated access to MySQL database (via network connection or web interfaces such as phpMyAdmin) and SQL Injection could be used as exploitation vectors. Successful exploitation could allow attackers to execute arbitrary code with root privileges which would then allow them to fully compromise the server on which an affected version of MySQL is running.

This is a CRITICAL update, and the fix mitigates the potential for remote root code execution.

We encourage our users to update to the latest version of their particular fork as soon as possible, ensuring they have appropriate change management procedures in place beforehand so they can test the update before placing it into production.

Percona would like to thank Dawid Golunski of http://legalhackers.com/ for disclosing this vulnerability in the MySQL software, and working with us to resolve this problem.

Categories: MySQL

Don’t Spin Your Data, Use SSDs!

MySQL Performance Blog - Fri, 2016-09-09 20:49

This blog post discussed the advantages of SSDs over HDDs for database environments.

For years now, I’ve been telling audiences for my MySQL Performance talk the following: if you are running an I/O-intensive database on spinning disks you’re doing it wrong. But there are still a surprising number of laggards who aren’t embracing SSD storage (whether it’s for cost or reliability reasons).

Let’s look at cost first. As I write this now (September 2016), high-performance server-grade spinning hard drives run for about $240 for 600GB (or $0.40 per GB).  Of course, you can get an 8TB archive drive at about same price (about $0.03 per GB), but it isn’t likely you’d use something like that for your operational database. At the same time, you can get a Samsung 850 EVO drive for approximately $300 (or $0.30 per GB), which is cheaper than the server-grade spinning drive!  

While it’s not the best drive money can buy, it is certainly an order of magnitude faster than any spinning disk drive!

(I’m focusing on the cost per GB rather than the cost of the number of IOPS per drive as SSDs have overtaken HDDs years ago when it comes to IOPS/$.)

If we take a look at Amazon EBS pricing, we will find that Amazon has moved to SSD volumes by default as “General Purpose” storage (gp2). Prices for this volume type run about 2x higher per GB than high-performance HDD-based volumes (st1) and provisioned IOPs volumes. The best volumes for databases will likely run you 4x higher than HDD.

This appears to be a significant cost difference, but keep in mind you can get much more IOPS at much better latency from these volumes. They also handle IO spikes better, which is very important for real workloads.

Whether we’re looking at a cloud or private environment, it is wrong just to look at the cost of the storage alone – you must look at the whole server cost. When using an SSD, you might not need to buy a RAID card with battery-backed-up (BBU) cache, as many SSDs have similar functions built in.

(For some entry-level SSDs, there might be an advantage to purchasing a RAID with BBU, but it doesn’t affect performance nearly as much as for HDDs. This works out well, however, as entry level SSDs aren’t going to cost that much to begin with and won’t make this setup particularly costly, relative to a higher-end SSD.)  

Some vendors can charge insane prices for SSDs, but this is where you should negotiate and your alternative vendor choice powers.

Some folks are concerned they can’t get as much storage per server with SSDs because they are smaller. This was the case a few years back, but not any more. You can find a 2TB 2.5” SSD drive easily, which is larger than the available 2.5” spinning drives. You can go as high as 13TB in the 2.5” form factor

There is a bit of challenge if you’re looking at the NVMe (PCI-E) cards, as you typically can’t have as many of those per server as you could using spinning disks, but the situation is changing here as well with the 6.4TB SX300 from Sandisk/FusionIO or the PM1725 from Samsung. Directly attached storage provides extremely high performance and 10TB-class sizes.  

To get multiple storage units together, you can use hardware RAID, software RAID, LVM striping or some file systems (such as ZFS) can take care of it for you.    

Where do we stand with SSD reliability? In my experience, modern SSDs (even inexpensive ones) are pretty reliable, particularly for online data storage. The shelf life of unpowered SSDs is likely to be less than HDDs, but we do not really keep servers off for long periods of time when running database workloads. Most SSDs also do something like RAID internally (it’s called RAIN) in addition to error correction codes that protect your data from a full single flash chip.

In truth, focusing on storage-level redundancy is overrated for databases. We want to protect most critical applications from complete database server failure, which means using some form of replication, storing several copies of data. In this case, you don’t need bulletproof storage on a single server – just a replication setup where you won’t lose the data and any server loss is easy to handle. For MySQL, solutions like Percona XtraDB Cluster come handy. You can use external tools such as Orchestrator or MHA to make MySQL replication work.  

When it comes to comparing SSD vs. HDD performance, whatever you do with SSDs they will likely still perform better than HDDs. Your RAID5 and RAID6 arrays made from SSDs will beat your RAID10 and RAID0 made from HDDs (unless your RAID card is doing something nasty).

Another concern with SSD reliability is write endurance. SSDs indeed have a specified amount of writes they can handle (after which they are likely to fail). If you’re thinking about replacing HDDs with SSDs, examine how long SSDs would endure under a comparable write load.  

If we’re looking at a high HDD write workload, a single device is likely to handle 200 write IOPS of 16KB (when running InnoDB). Let’s double that. That comes to 6.4MB/sec, which gives us  527GB/day (doing this 24/7). Even with the inexpensive Samsung 850 Pro we get 300TB of official write endurance – enough for 1.5 years. And in reality, drives tend to last well beyond their official specs.    

If you don’t like living on the edge, more expensive server-grade storage options have much better endurance. For example, 6.4TB SX300 offers almost 100x more endurance at 22 Petabytes written.

In my experience, people often overestimate how many writes their application performs on a sustained basis. The best approach is to do the math, but also monitor the drive status with a SMART utility or vendor tool. The tools can alert you in advance when drive wears out.

Whatever your workload is, you will likely find an SSD solution that offers you enough endurance while significantly exceeding the performance of an HDD-based solution.

Finally, there is a third and very important component of SSD reliability for operational database workloads: not losing your data during a power failure. Many “consumer-grade” SSDs come with drive write cache enabled by default, but without proper power loss protection. This means you can lose some writes during a power failure, causing data loss or database corruption.

Disabling write cache is one option, though it can severely reduce write performance and does not guarantee data won’t be lost. Using enterprise-grade SSDs from a reputable vendor is another option, and testing SSDs yourself might be a good idea if you’re on a budget.  

Conclusion

When it comes to operational databases, whether your workload is on-premises or in the cloud,  Don’t spin your data – use SSD. There are choices and options for almost any budget and every workload.

Categories: MySQL

Basic Housekeeping for MySQL Indexes

MySQL Performance Blog - Fri, 2016-09-09 17:44

In this blog post, we’ll look at some of the basic housekeeping steps for MySQL indexes.

We all know that indexes can be the difference between a high-performance database and a bad/slow/painful query ride. It’s a critical part that needs deserves some housekeeping once in a while. So, what should you check? In no particular order, here are some things to look at:

1. Unused indexes

With sys schema, is pretty easy to find unused indexes: use the schema_unused_indexes view.

mysql> select * from sys.schema_unused_indexes; +---------------+-----------------+-------------+ | object_schema | object_name | index_name | +---------------+-----------------+-------------+ | world | City | CountryCode | | world | CountryLanguage | CountryCode | +---------------+-----------------+-------------+ 2 rows in set (0.01 sec)

This view is based on the performance_schema.table_io_waits_summary_by_index_usage table, which will require enabling the Performance Schema, the events_waits_current consumer and the wait/io/table/sql/handler instrument. PRIMARY (key) indexes are ignored.

If you don’t have them enabled, just execute these queries:

update performance_schema.setup_consumers set enabled = 'yes' where name = 'events_waits_current'; update performance_schema.setup_instruments set enabled = 'yes' where name = 'wait/io/table/sql/handler';

Quoting the documentation:

“To trust whether the data from this view is representative of your workload, you should ensure that the server has been up for a representative amount of time before using it.”

And by representative amount, I mean representative: 

  • Do you have a weekly job? Wait at least one week
  • Do you have monthly reports? Wait at least one month
  • Don’t rush!

Once you’ve found unused indexes, remove them.

2. Duplicated indexes

You have two options here:

  • pt-duplicate-key-checker
  • the schema_redundant_indexes view from sys_schema

The pt-duplicate-key-checker is part of Percona Toolkit. The basic usage is pretty straightforward:

[root@e51d333b1fbe mysql-sys]# pt-duplicate-key-checker # ######################################################################## # world.CountryLanguage # ######################################################################## # CountryCode is a left-prefix of PRIMARY # Key definitions: # KEY `CountryCode` (`CountryCode`), # PRIMARY KEY (`CountryCode`,`Language`), # Column types: # `countrycode` char(3) not null default '' # `language` char(30) not null default '' # To remove this duplicate index, execute: ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`; # ######################################################################## # Summary of indexes # ######################################################################## # Size Duplicate Indexes 2952 # Total Duplicate Indexes 1 # Total Indexes 37

Now, the schema_redundant_indexes view is also easy to use once you have sys schema installed. The difference is that it is based on the information_schema.statistics table:

mysql> select * from schema_redundant_indexesG *************************** 1. row *************************** table_schema: world table_name: CountryLanguage redundant_index_name: CountryCode redundant_index_columns: CountryCode redundant_index_non_unique: 1 dominant_index_name: PRIMARY dominant_index_columns: CountryCode,Language dominant_index_non_unique: 0 subpart_exists: 0 sql_drop_index: ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode` 1 row in set (0.00 sec)

Again, once you find the redundant index, remove it.

3. Potentially missing indexes

The statements summary tables from the performance schema have several interesting fields. For our case, two of them are pretty important: NO_INDEX_USED (means that the statement performed a table scan without using an index) and NO_GOOD_INDEX_USED (“1” if the server found no good index to use for the statement, “0” otherwise).

Sys schema has one view that is based on the performance_schema.events_statements_summary_by_digest table, and is useful for this purpose: statements_with_full_table_scans, which lists all normalized statements that have done a table scan.

For example:

mysql> select * from world.CountryLanguage where isOfficial = 'F'; 55a208785be7a5beca68b147c58fe634 - 746 rows in set (0.00 sec) mysql> select * from statements_with_full_table_scansG *************************** 1. row *************************** query: SELECT * FROM `world` . `Count ... guage` WHERE `isOfficial` = ? db: world exec_count: 1 total_latency: 739.87 us no_index_used_count: 1 no_good_index_used_count: 0 no_index_used_pct: 100 rows_sent: 746 rows_examined: 984 rows_sent_avg: 746 rows_examined_avg: 984 first_seen: 2016-09-05 19:51:31 last_seen: 2016-09-05 19:51:31 digest: aa637cf0867616c591251fac39e23261 1 row in set (0.01 sec)

The above query doesn’t use an index because there was no good index to use, and thus was reported. See the explain output:

mysql> explain select * from world.CountryLanguage where isOfficial = 'F'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryLanguage type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 984 Extra: Using where

Note that the “query” field reports the query digest (more like a fingerprint) instead of the actual query.

In this case, the CountryLanguage table is missing an index over the “isOfficial” field. It is your job to decide whether it is worth it to add the index or not.

4. Multiple column indexes order

It was explained before that Multiple Column index beats Index Merge in all cases when such index can be used, even when sometimes you might have to use index hints to make it work.

But when using them, don’t forget that the order matters. MySQL will only use a multi-column index if at least one value is specified for the first column in the index.

For example, consider this table:

mysql> show create table CountryLanguageG *************************** 1. row *************************** Table: CountryLanguage Create Table: CREATE TABLE `CountryLanguage` ( `CountryCode` char(3) NOT NULL DEFAULT '', `Language` char(30) NOT NULL DEFAULT '', `IsOfficial` enum('T','F') NOT NULL DEFAULT 'F', `Percentage` float(4,1) NOT NULL DEFAULT '0.0', PRIMARY KEY (`CountryCode`,`Language`), KEY `CountryCode` (`CountryCode`), CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1

A query against the field “Language” won’t use an index:

mysql> explain select * from CountryLanguage where Language = 'English'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryLanguage type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 984 Extra: Using where

Simply because it is not the leftmost prefix for the Primary Key. If we add the “CountryCode” field, now the index will be used:

mysql> explain select * from CountryLanguage where Language = 'English' and CountryCode = 'CAN'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryLanguage type: const possible_keys: PRIMARY,CountryCode key: PRIMARY key_len: 33 ref: const,const rows: 1 Extra: NULL

Now, you’ll have to also consider the selectivity of the fields involved. Which is the preferred order?

In this case, the “Language” field has a higher selectivity than “CountryCode”:

mysql> select count(distinct CountryCode)/count(*), count(distinct Language)/count(*) from CountryLanguage; +--------------------------------------+-----------------------------------+ | count(distinct CountryCode)/count(*) | count(distinct Language)/count(*) | +--------------------------------------+-----------------------------------+ | 0.2368 | 0.4644 | +--------------------------------------+-----------------------------------+

So in this case, if we create a multi-column index, the preferred order will be (Language, CountryCode).

Placing the most selective columns first is a good idea when there is no sorting or grouping to consider, and thus the purpose of the index is only to optimize where lookups. You might need to choose the column order, so that it’s as selective as possible for the queries that you’ll run most.

Now, is this good enough? Not really. What about special cases where the table doesn’t have an even distribution? When a single value is present way more times than all the others? In that case, no index will be good enough. Be careful not to assume that average-case performance is representative of special-case performance. Special cases can wreck performance for the whole application.

In conclusion, we depend heavily on proper indexes. Give them some love and care once in a while, and the database will be very grateful.

All the examples were done with the following MySQL and Sys Schema version:

mysql> select * from sys.version; +-------------+-----------------+ | sys_version | mysql_version | +-------------+-----------------+ | 1.5.1 | 5.6.31-77.0-log | +-------------+-----------------+

Categories: MySQL

MySQL Replication Troubleshooting: Q & A

MySQL Performance Blog - Thu, 2016-09-08 18:53

In this blog, I will provide answers to the Q & A for the MySQL Replication Troubleshooting webinar.

First, I want to thank everybody for attending the August 25 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: Hi Sveta. One question: how is it possible to get N previous events using the SHOW BINLOG EVENTS command? For example, the position is 999 and I want to analyze the previous five events. Is it possible?

A: Not, there is no such option. You cannot get the previous five events using SHOW BINLOG EVENTS. However, you can use mysqlbinlog with the option --stop-position and tail its output.

Q: We are having issues with inconsistencies over time. We also have a lot of “waiting for table lock” statuses during high volume usage. Would changing these tables to InnoDB help the replicated database remain consistent?

A: Do you use MyISAM? Switching to InnoDB might help, but it depends on what types of queries you use. For example, if you often use the LOCK TABLE  command, that will cause a "waiting for table lock"  error for InnoDB too. Regarding data consistency between the master and slave, you need to use row-based replication.

Q: For semi-sync replication, what’s the master’s behavior when the master never received ACK from any of the slaves?

A: It will timeout after rpl_semi_sync_master_timeout  milliseconds, and then switch to asynchronous replication.

Q: We’re using MySQL on r3.4xlarge EC2 instances (16 CPU). We use RBR. innodb_read_io_threads and innodb_write_io_threads =4. We often experience lags. Would increasing these to eight offer better IO for slaves? What other parameters could boost slave IO?

A: Yes, an increased number of IO threads would most likely improve performance. Other parameters that could help are similar to the ones discussed in “InnoDB Troubleshooting” and “Introduction to Troubleshooting Performance: What Affects Query Execution?” webinars. You need to pay attention to InnoDB options that affect IO (innodb_thread_concurrency, innodb_flush_method, innodb_flush_log_at_trx_commit, innodb_flush_log_at_timeout ) and general IO options, such as sync_binlog .

Q: How many masters can I have working together?

A: What do you mean by “how many masters can [you] have working together”? Do you mean circular replication or a multi-master setup? In any case, the only limitation is hardware. For a multi-master setup you should ensure that the slave has enough resources to process all requests. For circular replication, ensure that each of the masters in the chain can handle the increasing number of writes as they replicate down the chain, and do not lead to permanently increasing slave lags.

Q: What’s the best way to handle auto_increment?

A: Follow the advice in the user manual: set auto_increment_offset  to a unique value on each of servers,auto_increment_increment  to the number of servers and never update auto-incremented columns manually.

Q: I configured multi threads replication. Sometimes the replication lag keeps increasing while the slave was doing “invalidating query cache entries(table)”.  How should I do to fine tune it?

A: The status "invalidating query cache entries(table)" means that the query cache is invalidating entries, and has been changed by a command currently being executed by the slave SQL thread. To avoid this issue, you need to keep the query cache small (not larger than 512 MB) and de-fragment it from time to time using the FLUSH QUERY CACHE command.

Q: Sometimes when IO is slow and during lag we see info: Reading event from the relay log “Waiting for master to send event” — How do we troubleshoot to get more details.

A: The "Waiting for master to send event" state shows that the slave IO thread sent a request for a new event, and is waiting for the event from the master. If you believe it hasn’t received the event in a timely fashion, check the error log files on both the master and slave for connection errors. If there is no error message, or if the message doesn’t provide enough information to solve the issue, use the network troubleshooting methods discussed in the “Troubleshooting hardware resource usage” webinar.

Save

Categories: MySQL

Percona is Hiring: Director of Platform Engineering

MySQL Performance Blog - Thu, 2016-09-08 18:20

Percona is hiring a Director of Platform Engineering. Find out more!

At Percona, we recognize you need much more than just a database server to successfully run a database-powered infrastructure. You also need strong tools that deploy, manage and monitor the software. Percona’s Platform Engineering group is responsible just for that. They build next-generation open source solutions for the deployment, monitoring and management of open source databases.

This  team is currently responsible for products such as Percona Toolkit , Percona Monitoring Plugins and Percona Monitoring and Management.  

Percona builds products that advance state-of-the-art open source software. Our products help our customers monitor and manage their databases. They help our services team serve customers faster, better and more effectively.

The leader of the Platform Engineering group needs a strong vision, as well as an understanding of market trends, best practices for automation, monitoring and management – in the cloud and on premises. This person must have some past technical operations background and experience building and leading engineering teams that have efficiently delivered high-quality software. The ideal candidate will also understand the nature of open source software development and experience working with distributed teams.

This position is for “player coach” – you will get your hands dirty writing code, performing quality assurance, making great documentation and assisting customers with troubleshooting.

We not looking for extensive experience with a particular programming language, but qualified candidates should be adept at learning new programming languages. Currently, our teams use a combination of Perl, Python, Go and Javascript.

The Director of Platform Engineering reports to Vadim Tkachenko, CTO and VP of Engineering. They will also work closely with myself, other senior managers and experts at Percona.

Interested? Please apply here on Percona’s website.

Categories: MySQL

Percona Live Europe featured talk with Igor Canadi — Everything you wanted to know about MongoRocks

MySQL Performance Blog - Wed, 2016-09-07 17:47

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Igor Canadi, Software Engineer at Facebook, Inc. His talk will be on Everything you wanted to know about MongoRocks. MongoRocks is MongoDB with RocksDB storage engine. It was developed by Facebook, where it’s used to power mobile backend as a service provider Parse.

I had a chance to speak with Igor and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?

Igor: After I finished my undergrad at the University of Zagreb in Croatia, I joined University of Wisconsin-Madison’s Masters program. Even though UW-M is famous for its work on databases, during my two years there I worked in a different area. However, as I joined Facebook after school, I heard of a cool new project called RocksDB. Everything about building a new storage engine sounded exciting to me, although I had zero idea how thrilling the ride will actually be. The best part was working with and getting to know amazing people from Facebook, Parse, MongoDB, Percona, and many other companies that are using or experimenting with RocksDB.

Percona: Your talk is called “Everything you wanted to know about MongoRocks.” Briefly, what is MongoRocks and why did it get developed?

Igor: Back in 2014 MongoDB announced that they are building a pluggable storage engine API, which would enable MongoDB users to seamlessly choose a storage engine that works best for their workload. Their first prototype was actually using RocksDB as a storage engine, which was very exciting for us. However, they bought WiredTiger soon after, another great storage engine, and decided to abandon MongoDB+RocksDB project. At the same time, Parse was running into scaling challenges with their MongoDB deployment. We decided to help out and take over the development of MongoRocks. We started rolling it out at Parse in March of 2015 already and completed the rollout in October. Running MongoRocks instead of MongoDB with the MMap storage engine resulted in much greater efficiency and lower latencies in some scenarios. Some of the experiences are captured in Parse’s blog posts: http://blog.parse.com/announcements/mongodb-rocksdb-parse/ and http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-so-fast-it-makes-your-head-spin/

Percona: What are the workloads and database environments that are best suited for a MongoRocks deployment? Do you see and expansion of the solution to encompass other scenarios?

Igor: Generally speaking, MongoRocks should compress really well. Over the years of using LSM engines, we learned that its compression rates are hard to beat. The difference can sometimes be substantial. For example, many benchmarks of MyRocks, which is a MySQL with RocksDB storage engines, have shown that compressed InnoDB uses two times as much space as compressed RocksDB. With better compression, more of your data fits in memory, which could also improve read latencies and lower the stress on storage media. However, this is a tricky question to answer generally. It really depends on the metrics you care about. One great thing about Mongo and different storage engines is that the replication format is the same across all of them, so it’s simple to try it out and see how it performs under your workload. You can just add an additional node in your replica set that’s using RocksDB and monitor the metric you care about on that node.

Percona: What are the unique database requirements at Facebook that keep you awake at night? What would you most like to see feature-wise in MongoDB in the near future (or any database technology)?

Igor: One of the most exciting database projects that we’re working on at Facebook is MyRocks, which I mentioned previously. Currently, we use MySQL with InnoDB to store our Facebook graph and we are experimenting with replacing that with MyRocks. The main motivation behind the project is 2x better compression rates, but we also see better performance in some areas. If you’re attending Percona Live Europe I encourage you to attend either Mark Callaghan’s talk on MyRocks, or Yoshinori’s 3-hour tutorial to learn more.

Percona: What are looking forward to the most at Percona Live Europe this year?

Igor: The best part of attending conferences is the people. I am looking forward to seeing old friends and meeting new ones. If you like to talk storage engines, hit me up!

You can read more about Igor’s thoughts on MongoRocks at his twitter feed.

Want to find out more about Igor, Facebook and MongoRocks? Register for Percona Live Europe 2016, and come see his talk Everything you wanted to know about MongoRocks.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Categories: MySQL
Syndicate content