All You Need to Know About GCache (Galera-Cache)

MySQL Performance Blog - Thu, 2016-11-17 01:21

This blog discusses some important aspects of GCache.

Why do we need GCache?

Percona XtraDB Cluster is a multi-master topology, where a transaction executed on one node is replicated on another node(s) of the cluster. This transaction is then copied over from the group channel to Galera-Cache followed by apply action.

The cache can be discarded immediately once the transaction is applied, but retaining it can help promote a node as a DONOR node serving write-sets for a newly booted node.

So in short, GCache acts as a temporary storage for replicated transactions.

How is GCache managed?

Naturally, the first choice to cache these write-sets is to use memory allocated pool, which is governed by gcache.mem_store. However, this is deprecated and buggy and shouldn’t be used.

Next on the list is on-disk files. Galera has two types of on-disk files to manage write-sets:

  • RingBuffer File:
    • A circular file (aka RingBuffer file). As the name suggests, this file is re-usable in a circular queue fashion, and is pre-created when the server starts. The size of this file is preconfigured and can’t be changed dynamically, so selecting a proper size for this file is important.
    • The user can set the size of this file using gcache.size. (There are multiple blogs about how to estimate size of the Galera Cache, which is generally linked to downtime. If properly planned, the next booting node will find all the missing write-sets in the cache, thereby avoiding need for SST.)
    • Write-sets are appended to this file and, when needed, the file is re-cycled for use.
  • On-demand page store:
    • If the transaction write-set is large enough not to fit in a RingBuffer File (actually large enough not to fit in half of the RingBuffer file) then an independent page (physical disk file) is allocated to cache the write-sets.
    • Again there are two types of pages:
      • Page with standard size: As defined by gcache.page_size (default=128M).
      • Page with non-standard page size: If the transaction is large enough not to fit into a standard page, then a non-standard page is created for the transaction. Let’s say gcache.page_size=1M and transaction write_set = 1.5M, then a separate page (in turn on-disk file) will be created with a size of 1.5M.

How long are on demand pages retained? This is controlled using following two variables:

  • gcache.keep_pages_size
    • keep_pages_size defines total size of allocated pages to keep. For example, if keep_pages_size = 10M then N pages that add up to 10M can be retained. If N pages add to more than 10M, then pages are removed from the start of the queue until the size falls below set threshold. A size of 0 means don’t retain any page.
  • gcache.keep_pages_count (PXC specific)
    • But before pages are actually removed, a second check is done based on page_count. Let’s say keep_page_count = N+M, then even though N pages adds up to 10M, they will be retained as the page_count threshold is not yet hit. (The exception to this is non-standard pages at the start of the queue.)

So in short, both condition must be satisfied. The recommendation is to use whichever condition is applicable in the user environment.

Where are GCache files located?

The default location is the data directory, but this can be changed by setting gcache.dir. Given the temporary nature of the file, and iterative read/write cycle, it may be wise to place these files in a faster IO disk. Also, the default name of the file is gcache.cache. This is configurable by setting

What if one of the node is DESYNCED and PAUSED?

If a node desyncs, it will continue to received write-sets and apply them, so there is no major change in gcache handling.

If the node is desynced and paused, that means the node can’t apply write-sets and needs to keep caching them. This will, of course, affect the desynced/paused node and the node will continue to create on-demand page store. Since one of the cluster nodes can’t proceed, it will not emit a “last committed” message. In turn, other nodes in the cluster (that can purge the entry) will continue to retain the write-sets, even if these nodes are not desynced and paused.

Categories: MySQL

Is Docker Good for Your Database?

MySQL Performance Blog - Wed, 2016-11-16 16:27

This blog post reviews the appropriateness of Docker and other container solutions for your database environment.

A few weeks back, I wrote a fairly high-level blog post about containers. It covered what you should consider when thinking about using Docker, rkt, LXC, etc. I hope you’ve taken the chance to give it a quick read. It’s a good way to understand some of the disciplines you need to consider before moving to a new technology. However, it sparked a conversation in our Solutions Engineering team. Hopefully, the same one that you’re having in your organization: should customers run their database in containers?

Before we start, I’ll admit that Percona uses containers. Percona Monitoring and Management (PMM for short) presents all of the pretty graphs and query analytics by running in a Docker container. We made that choice because the integration between the components is where we could provide the most value to users. Docker lets us distribute a single ready-to-go unit of awesomeness. In short, it has huge potential on the application side of your environment. 

However, for databases… here are some of our recommendations:

Quick n Dirty

Decision = NOT FOR DBs (as it sits right now)

This is not the case for every environment. It is the default that we think is the best recommendation for the majority of our customers. Please note, that I am only making this recommendation for your database. If you’re using microservices for your application today, then it could make more sense to containerize your database depending on the load characteristics of your database, your scaling needs and the skillset you currently have.


Lack of Synergy

Before you decide to shoot me, please take some time to understand where we’re coming from. First of all, people designed container solutions to deal with stateless applications that have ephemeral data. Containers spin up a quick microservice and then destroy it. This includes all the components of that container (including its cache and data). The transient nature of containers is because all of the components and services of that container are considered to be part of the container (essentially it’s all or nothing). Serving the container a data volume owned by the underlying OS by punching a hole through the container can be very challenging. Current methods are too unreliable for most databases.

Most of the development efforts put into the various solutions had one goal in mind: statelessness. There are solutions that can help keep your data persistent, but they are very quickly evolving. From what we can tell, they require a high level of complexity, that negate any efficiency gains due to increased operational complexity (and risk). To further my point, this is precisely the conclusion that we’ve come to time and again when we’ve reviewed any “real world” information about the use of containers (especially Docker).

They’re Just Not Stable Yet

These container solutions are meant for quick development and deployment of applications that are broken into tiny components: microservices. Normally, these applications evolve very quickly in organizations that are very software/developer driven. That seems to be how these container solutions (again, especially Docker) are developed as well. New features are pushed out with little testing and design. The main focus seems to be the latest featureset and being first to market. They “beg for forgiveness” instead of “ask for permission.” On top of that, backward compatibility (from what we can tell) is a distant concern (and even that might be an overstatement). This means that you’re going to have to have a mature Continuous Delivery and testing environment as well as a known and tested image repository for your containers.

These are awesome tools to have for the right use cases, but they take time, money, resources and experience. In speaking with many of our customers, this is just not where they’re at as an organization. Their businesses aren’t designed around software development, and they simply don’t have the checkbooks to support the resources needed to keep this hungry machine fed. Rather, they are looking for something stable and performant that can keep their users happy 24×7. I know that we can give them a performant, highly-available environment requires much less management if we strip out containers.

Is There Hope?

Absolutely, in fact, there’s a lot more than hope. There are companies running containers (including databases) at massive scale today! These are the types of companies that have very mature processes. Their software development is a core part of their business plan and value proposition. You probably know who I’m talking about: Uber, Google, Facebook (there are more, these are a just a few). There’s even a good rundown of how you can get persistence in containers from Joyent. But as I said before, the complexity needed to get the basic features necessary to keep your data alive and available (the most basic use of a database) is much too high. When containers have a better and more stable solution for persistent storage volumes, they will be one step closer to being ready, in my opinion. Even then, containerizing databases in most organizations that aren’t dealing with large scale deployments (50+ nodes) with wildly varying workloads is probably unnecessary.

Don’t’ Leave Us Hanging…

I realize that the statement “you’re probably not ready to containerize your database” does not constitute a solution. So here it is: the Solutions Engineering team (SolEng for short) has you covered. Dimitri Vanoverbeke is in the process of a great blog series on configuration management. Configuration management solutions can greatly increase the repeatability of your infrastructure, and make sure that your IT/App Dev processes are repeatable in the physical configuration of your environment. Automating this process can lead to great gains. However, this should make use of a mature development/testing process as part of your application development lifecycle. The marriage of process and technology creates stable applications and happy customers.

Besides configuration management as an enhanced solution, there are some services that can make the life of your operations team much easier. Service discovery and health checking come to mind. My favorite solution is Consul, which we use extensively in PMM for configuration and service metadata. Consul can make sure that your frontend applications and backend infrastructure are working from a real-time snapshot of the state of your services.


There is a lot to think about when it comes to managing an environment, especially when your application develops at a quick pace. With the crafty use of available solutions, you can reduce the overhead that goes into every release. On top of that, you can increase resiliency and availability. If you need our help, please reach out. We’d love to help you!

Categories: MySQL

Webinar Thursday, November 17: MySQL High Availability with Percona XtraDB Cluster 5.7

MySQL Performance Blog - Tue, 2016-11-15 18:37

Join Percona’s Percona XtraDB Cluster Lead Software Engineer Krunal Bauskar for a webinar on Thursday, November 17, 2016, at 7:30 am PST on MySQL High Availability with Percona XtraDB Cluster 5.7.

Percona XtraDB Cluster 5.7 is our brand new MySQL 5.7 compatible Galera-based high availability (HA) solution. Whether you’re new to MySQL clustering technology, or experienced with Galera-based replication, this tutorial provides great insights into working with the software, including:

  • New and unique Features XtraDB Cluster 5.7, including Cluster Safe Mode, instrumentation with Performance Schema and extended support for encrypted tablespace in multi-master topology
  • Seamless integration with ProxySQL for better HA and read/write splitting
  • Improved security with native data at rest encryption and secure networking
  • Native integration with Docker, optimized for Container World
  • Monitoring with Percona Monitoring and Management (PMM)
  • Improved stability with many critical bug fixes and improved error messaging

This tutorial will demonstrate how to set up XtraDB Cluster, complete with High Availability Proxy and Monitoring, as well as perform the most important MySQL high availability management operations.

Register for this webinar here.

Krunal Bauskar, Percona XtraDB Cluster Lead Software Engineer

Krunal joined Percona in September 2015. Before joining Percona, he worked as part of the InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work, in addition to many other features. In the past, he worked with Yahoo! Labs researching big data issues, as well as working for a database startup that is now part of Teradata. His interests mainly include data management at any scale – which he has been working at for more than decade now.

Categories: MySQL

Percona Monitoring and Management 1.0.6 is now available

MySQL Performance Blog - Tue, 2016-11-15 17:07

Percona announces the release of Percona Monitoring and Management 1.0.6 on November 15, 2016.

The instructions for installing or upgrading Percona Monitoring and Management 1.0.6 are available in the documentation. Detailed release notes are available here.

New in PMM Server:

  • Prometheus 1.2.2
  • External static files are now local for PMM home page
  • Metrics Monitor improvements:
    • Added Amazon RDS OS Metrics dashboard and CloudWatch data source.
    • Added the PMM Server host to metrics monitoring.
    • Refactored MongoDB dashboards.
    • Added File Descriptors graph to System Overview dashboard.
    • Added Mountpoint Usage graph to Disk Space dashboard.
  • Query Analytics improvements:
    • QAN data is now purged correctly.
    • QAN data retention is made configurable with QUERIES_RETENTION option. The default is eight days.
    • Various small fixes to Query Analytics.

New in PMM Client:

  • Fixes for mysql:queries service using Performance Schema as query source:
    • Fixed crash when DIGEST_TEXT is NULL.
    • Removed iteration over all query digests on startup.
    • Added sending of query examples to QAN if available (depends on the workload).
  • Added query source information for mysql:queries service in pmm-admin list output.
  • Added purge command to purge metrics data on the server.
  • Updated mongodb_exporter with RocksDB support and various fixes.
  • Removed --nodetype and --replset flags for mongodb:metrics. The --cluster flag is now optional.
    It is recommended to re-add mongodb:metrics service and purge existing MongoDB metrics using the purgecommand.
  • Enabled monitoring of file descriptors (requires re-adding linux:metrics service).
  • Improved full uninstallation when PMM Server is unreachable.
  • Added time drift check between server and client to pmm-admin check-network output.

Live demo of PMM is available at

We welcome your feedback and questions on our PMM forum.

About Percona Monitoring and Management
Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. It is developed by Percona in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

Categories: MySQL

Using Vault with MySQL

MySQL Performance Blog - Tue, 2016-11-15 00:31

Using Vault with MySQL

In my previous post I discussed using GPG to secure your database credentials. This relies on a local copy of your MySQL client config, but what if you want to keep the credentials stored safely along with other super secret information? Sure, GPG could still be used, but there must be an easier way to do this.

This post will look at a way to use Vault to store your credentials in a central location and use them to access your database. For those of you that have not yet come across Vault, it is a great way to manage your secrets – securing, storing and tightly controlling access. It has the added benefits of being able to handle leasing, key revocation, key rolling and auditing.

During this blog post we’ll accomplish the following tasks:

  1. Download the necessary software
  2. Get a free SAN certificate to use for Vault’s API and automate certificate renewal
  3. Configure Vault to run under a restricted user and secure access to its files and the API
  4. Create a policy for Vault to provide access control
  5. Enable TLS authentication for Vault and create a self-signed client certificate using OpenSSL to use with our client
  6. Add a new secret to Vault and gain access from a client using TLS authentication
  7. Enable automated, expiring MySQL grants


Before continuing onwards, I should drop in a quick note to say that the following is a quick example to show you how you can get Vault up and running and use it with MySQL, it is not a guide to production setup and does not cover High Availability (HA) implementations, etc.

Download time

We will be using some tools in addition to Vault, Let’s Encrypt, OpenSSL and json_pp (a command line utility using JSON::PP). For this post we’ll be using Ubuntu 16.04 LTS and we’ll presume that these aren’t yet installed.

$ sudo apt-get install letsencrypt openssl libjson-pp-perl

If you haven’t already heard of Let’s Encrypt then it is a free, automated, and open Certificate Authority (CA) enabling you to secure your website or other services without paying for an SSL certificate; you can even create Subject Alternative Name (SAN) certificates to make your life even easier, allowing one certificate to be used a number of different domains. The Electronic Frontier Foundation (EFF) provide Certbot, the recommended tool to manage your certificates, which is the new name for the letsencrypt software. If you don’t have letsencrypt/certbot in your package manager then you should be able to use the quick install method. We’ll be using json_pp to prettify the JSON output from the Vault API and openssl to create a client certificate.

We also need to download Vault, choosing the binary relevant for your Operating System and architecture. At the time of writing this, the latest version of Vault is 0.6.2, so the following steps may need adjusting if you use a different version.

# Download Vault (Linux x86_64), SHA256SUMS and signature $ wget # Import the GPG key $ gpg --keyserver --recv-keys 51852D87348FFC4C # Verify the checksums $ gpg --verify vault_0.6.2_SHA256SUMS.sig gpg: assuming signed data in `vault_0.6.2_SHA256SUMS' gpg: Signature made Thu 06 Oct 2016 02:08:16 BST using RSA key ID 348FFC4C gpg: Good signature from "HashiCorp Security <>" gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 91A6 E7F8 5D05 C656 30BE F189 5185 2D87 348F FC4C # Verify the download $ sha256sum --check <(fgrep vault_0.6.2_SHA256SUMS) OK # Extract the binary $ sudo unzip -j -d /usr/local/bin Archive: inflating: /usr/local/bin/vault

Let’s Encrypt… why not?

We want to be able to access Vault from wherever we are, we can put additional security in place to prevent unauthorised access, so we need to get ourselves encrypted. The following example shows the setup on a public server, allowing the CA to authenticate your request. More information on different methods can be found in the Certbot documentation.

$ sudo letsencrypt --webroot -w /home/www/vhosts/default/public -d -d #IMPORTANT NOTES: # - Congratulations! Your certificate and chain have been saved at # /etc/letsencrypt/live/ Your cert will # expire on 2017-01-29. To obtain a new or tweaked version of this # certificate in the future, simply run certbot again. To # non-interactively renew *all* of your certificates, run "certbot # renew" # - If you like Certbot, please consider supporting our work by: # # Donating to ISRG / Let's Encrypt: # Donating to EFF: #

That’s all it takes to get a SAN SSL certificate! The server that this was executed has a public webserver serving the domains that the certificates were requested for. During the request process a file is place in the specified webroot and is used to authenticate the domain(s) for the request. Essentially, the command said: and use /home/www/vhosts/default/public for the document root, so place your files there

Let’s Encrypt CA issues short-lived certificates (90 days), so you need to keep renewing them, but don’t worry as that is as easy as it was to create them in the first place! You can test that renewal works OK as follows (which will renew all certificates that you have without --dry-run):

$ sudo letsencrypt renew --dry-run # #------------------------------------------------------------------------------- #Processing /etc/letsencrypt/renewal/ #------------------------------------------------------------------------------- #** DRY RUN: simulating 'letsencrypt renew' close to cert expiry #** (The test certificates below have not been saved.) # #Congratulations, all renewals succeeded. The following certs have been renewed: # /etc/letsencrypt/live/ (success) #** DRY RUN: simulating 'letsencrypt renew' close to cert expiry #** (The test certificates above have not been saved.) # #IMPORTANT NOTES: # - Your account credentials have been saved in your Certbot # configuration directory at /etc/letsencrypt. You should make a # secure backup of this folder now. This configuration directory will # also contain certificates and private keys obtained by Certbot so # making regular backups of this folder is ideal.

Automating renewal

The test run for renewal worked fine, so we can now go and schedule this to take place automatically. I’m using systemd so the following example uses timers, but cron or similar could be used too. Here’s how to make systemd run the scheduled renew for you, running at 0600 – the rewew process will automatically proceed for any previously-obtained certificates that expire in less than 30 days.

$ sudo cat <<EOF > /etc/systemd/system/cert-renewal.service [Unit] Description=SSL renewal [Service] Type=simple ExecStart=/usr/bin/letsencrypt renew --quiet User=root Group=root EOF $ sudo cat <<EOF > /etc/systemd/system/cert-renewal.timer [Unit] Description=Automatic SSL renewal [Timer] OnCalendar=*-*-* 06:00:00 Persistent=true [Install] EOF $ sudo systemctl enable cert-renewal.timer Created symlink from /etc/systemd/system/ to /etc/systemd/system/cert-renewal.timer. $ sudo systemctl start cert-renewal.timer $ sudo systemctl list-timers NEXT LEFT LAST PASSED UNIT ACTIVATES Tue 2016-11-01 06:00:00 UTC 6h left n/a n/a cert-renewal.timer cert-renewal.service

Getting started with Vault

Firstly, a quick reminder that this is not an in-depth review, how-to or necessarily best-practice Vault installation as that is beyond the scope of this post. It is just to get you going to test things out, so please read up on the Vault documentation if you want to use it more seriously.

Whilst there is a development server that you can fire up with the command vault server -dev to get yourself testing a little quicker, we’re going to take a little extra time and configure it ourselves and make the data persistent. Vault supports a number of backends for data storage, including Zookeeper, Amazon S3 and MySQL, however the 3 maintained by HashiCorp are consul, file and inmem. The memory storage backend does not provide persistent data, so whilst there could possibly be uses for this it is really only useful for development and testing – it is the storage backend used with the -dev option to the server command. Rather than tackle the installation and configuration of Consul during this post, we’ll use file storage instead.

Before starting the server we’ll create a config, which can be written in one of 2 formats – HCL (HashiCorp Configuration Language) or JSON (JavaScript Object Notation). We’ll use HCL as it is a little cleaner and saves us a little extra typing!

# Create a system user $ sudo useradd -r -g daemon -d /usr/local/vault -m -s /sbin/nologin -c "Vault user" vault $ id vault uid=998(vault) gid=1(daemon) groups=1(daemon) # Create a config directory remove global access $ sudo mkdir /etc/vault /etc/ssl/vault $ sudo chown vault.root /etc/vault /etc/ssl/vault $ sudo chmod 750 /etc/vault /etc/ssl/vault $ sudo chmod 700 /usr/local/vault # Copy the certficates and key $ sudo cp -v /etc/letsencrypt/live/*pem /etc/ssl/vault /etc/letsencrypt/live/ -> /etc/ssl/vault/cert.pem /etc/letsencrypt/live/ -> /etc/ssl/vault/chain.pem /etc/letsencrypt/live/ -> /etc/ssl/vault/fullchain.pem /etc/letsencrypt/live/ -> /etc/ssl/vault/privkey.pem # Create a combined PEM certificate $ sudo cat /etc/ssl/vault/{cert,fullchain}.pem /etc/ssl/vault/fullcert.pem # Write the config to file $ cat <<EOF | sudo tee /etc/vault/demo.hcl listener "tcp" { address = "" tls_disable = 0 tls_cert_file = "/etc/ssl/vault/fullcert.pem" tls_key_file = "/etc/ssl/vault/privkey.pem" } backend "file" { path = "/usr/local/vault/data" } disable_mlock = true EOF

So, we’ve now set up a user and some directories to store the config, SSL certificate and key, and also the data, restricting access to the vault user. The config that we wrote specifies that we will use the file backend, storing data in /usr/local/vault/data, and the listener that will be providing TLS encryption using our certificate from Let’s Encrypt. The final setting, disable_mlock is not recommended for production and is being used to avoid some extra configuration during this post. More details about the other options available for configuration can be found in the Server Configuration section of the online documentation.

Please note that the Vault datadir should be kept secured as it contains all of the keys and secrets. In the example, we have done this by placing it in the vault user’s home directory and only allowing the vault user access. You can take this further by restricting local access (via logins) and access control lists

Starting Vault

Time to start the server and see if everything is looking good!

$ sudo -su vault vault server -config=/etc/vault/demo.hcl >/tmp/vault-debug.log 2>&1 & $ jobs [1] + running sudo -su vault vault server -config=/etc/vault/demo.hcl > /tmp/vault-debug.lo $ VAULT_ADDR= vault status Error checking seal status: Error making API request. URL: GET Code: 400. Errors: * server is not yet initialized

Whilst it looks like something is wrong (we need to initialize the server), it does mean that everything is otherwise working as expected. So, we’ll initialize Vault, which is a pretty simple task, but you do need to make note/store some of the information that you will be given by the server during initialization – the unseal tokens and initial root key. You should distribute these to somewhere safe, but for now we’ll store them with the config.

# Change to vault user $ sudo su -l vault -s /bin/bash (vault)$ export VAULT_ADDR= VAULT_SSL=/etc/ssl/vault # Initialize Vault and save the token and keys (vault)$ vault init 2>&1 | egrep '^Unseal Key|Initial Root Token' >/etc/vault/keys.txt (vault)$ chmod 600 /etc/vault/keys.txt # Unseal Vault (vault)$ egrep -m3 '^Unseal Key' /etc/vault/keys.txt | cut -f2- -d: | tr -d ' ' | while read key do vault unseal -ca-cert=${VAULT_SSL}/fullchain.pem -client-cert=${VAULT_SSL}/client.pem -client-key=${VAULT_SSL}/privkey.pem ${key} done Sealed: true Key Shares: 5 Key Threshold: 3 Unseal Progress: 1 Sealed: true Key Shares: 5 Key Threshold: 3 Unseal Progress: 2 Sealed: false Key Shares: 5 Key Threshold: 3 Unseal Progress: 0 # Check Vault status (vault)$ vault status Sealed: false Key Shares: 5 Key Threshold: 3 Unseal Progress: 0 Version: 0.6.2 Cluster Name: vault-cluster-ebbd5ec7 Cluster ID: 61ae8f54-f420-09c1-90bb-60c9fbfa18a2 High-Availability Enabled: false

There we go, the vault is initialized and the status command now returns details and confirmation that it is up and running. It is worth noting here that each time you start Vault it will be sealed, which means that it cannot be accessed until 3 unseal keys have been used with vault unseal – for additional security here you would ensure that a single person cannot know any 3 keys, so that it always requires more than one person to (re)start the service.

Setting up a policy

Policies allow you to set access control restrictions to determine the data that authenticated users have access to. Once again the documents used to write policies are in either the HCL or JSON format. They are easy to write and apply, the only catch being that the policies associated with a token cannot be changed (added/removed) once the token has been issued; you need to revoke the token and apply the new policies. However, If you want to change the policy rules then this can be done on-the-fly as modifications apply on the next call to Vault.

When we initialized the server we were given the initial root key and we now need to use that in order to start configuring the server.

(vault)$ export VAULT_TOKEN=$(egrep '^Initial Root Token:' /etc/vault/keys.txt | cut -f2- -d: | tr -d ' ')

We will create a simple policy that allows us to read the MySQL secrets, but prevent access to the system information and commands

(vault)$ cat <<EOF > /etc/vault/demo-policy.hcl path "sys/*" { policy = "deny" } path "secret/mysql/*" { policy = "read" capabilities = ["list", "sudo"] } EOF (vault)$ vault policy-write demo /etc/vault/demo-policy.hcl Policy 'demo' written.

We have only added one policy here, but you should really create as many policies as you need to suitably control access amongst the variety of humans and applications that may be using the service. As with any kind of data storage planning how to store your data is important, as it will help you write more compact policies with the level of granularity that you require. Writing everything in /secrets at the top level will most likely bring you headaches, or long policy definitions!

TLS authentication for MySQL secrets

We’re getting close to adding our first secret to Vault, but first of all we need a way to authenticate our access. Vault provides an API for access to your stored secrets, along with wealth of commands with direct use of the vault binary as we are doing at the moment. We will now enable the cert authentication backend, which allows authentication using SSL/TLS client certificates

(vault)$ vault auth-enable cert Successfully enabled 'cert' at 'cert'!

Generate a client certificate using OpenSSL

The TLS authentication backend accepts certificates that are either signed by a CA or self-signed, so let’s quickly create ourselves a self-signed SSL certificate using openssl to use for authentication.

# Create working directory for SSL managment and copy in the config $ mkdir ~/.ssl && cd $_ $ cp /usr/lib/ssl/openssl.cnf . # Create a 4096-bit CA $ openssl genrsa -des3 -out ca.key 4096 Generating RSA private key, 4096 bit long modulus ...........++ ..........................................................................++ e is 65537 (0x10001) Enter pass phrase for ca.key: Verifying - Enter pass phrase for ca.key: $ openssl req -config ./openssl.cnf -new -x509 -days 365 -key ca.key -out ca.crt Enter pass phrase for ca.key: You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [GB]: State or Province Name (full name) [Some-State]: Locality Name (eg, city) [Some-Place]: Organization Name (eg, company) [Percona]: Organizational Unit Name (eg, section) [Demo]: Comon Name (e.g. server FQDN or YOUR name) [ceri]: Email Address []: # Create a 4096-bit Client Key and CSR $ openssl genrsa -des3 -out client.key 4096 Generating RSA private key, 4096 bit long modulus ......................++ ..................................++ e is 65537 (0x10001) Enter pass phrase for client.key: Verifying - Enter pass phrase for client.key: $ openssl req -config ./openssl.cnf -new -key client.key -out client.csr Enter pass phrase for client.key: You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [GB]: State or Province Name (full name) [Some-State]: Locality Name (eg, city) [Some-Place]: Organization Name (eg, company) [Percona]: Organizational Unit Name (eg, section) [Demo]: Comon Name (e.g. server FQDN or YOUR name) [ceri]: Email Address []: Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []: An optional company name []: # Self-sign $ openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt Signature ok subject=/C=GB/ST=Some-State/L=Some-Place/O=Percona/OU=Demo/CN=ceri/ Getting CA Private Key Enter pass phrase for ca.key: # Create an unencrypted copy of the client key $ openssl rsa -in client.key -out privkey.pem Enter pass phrase for client.key: writing RSA key # Copy the certificate for Vault access $ sudo cp client.crt /etc/ssl/vault/user.pem

OK, there was quite a lot of information there. You can edit openssl.cnf to set reasonable defaults for yourself and save time. In brief, we have created our own CA, created a self-signed certificate and then created a single PEM certificate with a decrypted key (this avoids specifying the password to use it – you may wish to leave the password in place to add more security, assuming that your client application can request the password.

Adding an authorisation certificate to Vault

Now that we have created a certificate and a policy we now need to allow authentication to occur using the certificate. We will give the token a 1-hour expiration and allow access to the MySQL secrets via the demo policy that we created in the previous step.

(vault)$ vault write auth/cert/certs/demo display_name=demo policies=demo certificate=@${VAULT_SSL}/user.pem ttl=3600 Success! Data written to: auth/cert/certs/demo $ curl --cert user.pem --key privkey.pem ${VAULT_ADDR}/v1/auth/cert/login -X POST {"request_id":"d5715ce1-2c6c-20c8-83ef-ce6259ad9110","lease_id":"","renewable":false,"lease_duration":0,"data":null,"wrap_info":null,"warnings":null,"auth":{"client_token":"e3b98fac-2676-9f44-fdc2-41114360d2fd","accessor":"4c5b4eb5-4faf-0b01-b732-39d309afd216","policies":["default","demo"],"metadata":{"authority_key_id":"","cert_name":"demo","common_name":"","subject_key_id":""},"lease_duration":600,"renewable":true}}

Awesome! We requested out first client token using an SSL client certificate, we are logged it and we were given our access token (client_token) in the response that provides us with a 1 hour lease (lease_duration) to go ahead and make requests as a client without reauthentication, but there is nothing in the vault right now.

Ssshh!! It’s secret!

“The time has come,” the Vault master said, “to encrypt many things: our keys and passwords and top-secret notes, our MySQL DSNs and strings.”

Perhaps the easiest way to use Vault with your application is to store information there as you would do in a configuration file and read it when the application first requires it. An example of such information is the Data Source Name (DSN) for a MySQL connection, or perhaps the information needed to dynamically generate a .my.cnf. As this is about using Vault with MySQL we will do exactly that and store the user, password and connection method as our first secret, reading it back using the command line tool to check that it looks as expected.

(vault)$ $ vault write secret/mysql/test password="mysupersecretpassword" user="percona" socket="/var/run/mysqld/mysqld.sock" Success! Data written to: secret/mysql/test (vault)$ vault read secret/mysql/test Key Value --- ----- refresh_interval 768h0m0s password mysupersecretpassword socket /var/run/mysqld/mysqld.sock user percona

A little while back (hopefully less than 1 hour ago!) we authenticated using cURL and gained a token, so now that we have something secret to read we can try it out. Fanfares and trumpets at the ready…

$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 2f1fb630-cbe9-a8c9-5931-515a12d79291' ${VAULT_ADDR}/v1/secret/mysql/test -X GET 2>/dev/null | json_pp { "wrap_info" : null, "lease_id" : "", "request_id" : "c79033b1-f8f7-be89-4208-44d721a55804", "auth" : null, "data" : { "password" : "mysupersecretpassword", "socket" : "/var/run/mysqld/mysqld.sock", "user" : "percona" }, "lease_duration" : 2764800, "renewable" : false, "warnings" : null }

We did it! Now there is no longer the need to store passwords in your code or config files, you can just go and get them from Vault when you need them, such as when your application starts and holding them in memory, or on-demand if your application can tolerate any additional latency, etc. You would need to take further steps to make sure that your application is tolerant of Vault going down, as well as providing an HA setup of Vault to minimise the risk of the secrets being unavailable.

It doesn’t stop here though…

On-demand MySQL grants

Vault acts like a virtual filesystem and uses the generic storage backend by default, mounted as /secret, but due to powerful abstraction it is possible to use many other backends as mountpoints such as an SQL database, AWS IAM, HSMs and much more. We have kept things simple and been using the generic backend so far. You can view the available (mounted) backends using the mounts command:

(vault)$ vault mounts Path Type Default TTL Max TTL Description secret/ generic system system generic secret storage sys/ system n/a n/a system endpoints used for control, policy and debugging

We are now going to enable the MySQL backend, add the management connection (which will use the auth_socket plugin) and then request a new MySQL user that will auto-expire!

# Create a dedicated MySQL user account $ mysql -Bsse "CREATE USER vault@localhost IDENTIFIED WITH auth_socket; GRANT CREATE USER, SELECT, INSERT, UPDATE ON *.* TO vault@localhost WITH GRANT OPTION;" # Enable the MySQL backend and set the connection details (vault)$ vault mount mysql (vault)$ vault write mysql/config/connection connection_url="vault:vault@unix(/var/run/mysqld/mysqld.sock)/" Read access to this endpoint should be controlled via ACLs as it will return the connection URL as it is, including passwords, if any. # Write the template for the readonly role (vault)$ vault write mysql/roles/readonly sql="CREATE USER '{{name}}'@'%' IDENTIFIED WITH mysql_native_password BY '{{password}}' PASSWORD EXPIRE INTERVAL 1 DAY; GRANT SELECT ON *.* TO '{{name}}'@'%';" Success! Data written to: mysql/roles/readonly # Set the lease on MySQL grants (vault)$ vault write mysql/config/lease lease=1h lease_max=12h Success! Data written to: mysql/config/lease

Here you can see that a template is created so that you can customise the grants per role. We created a readonly role, so it just has SELECT access. We have set an expiration on the account so that MySQL will automatically mark the password as expired and prevent access. This is not strictly necessary since Vault will remove the user accounts that it created as it expires the tokens, but by adding an extra level in MySQL it would allow you to set the lease, which seems to be global, in Vault to a little longer than required and vary it by role using MySQL password expiration. You could also use it as a way of tracking which Vault-generated MySQL accounts are going to expire soon. The important part is that you ensure that the application is tolerant of reauthentication, whether it would hand off work whilst doing so, accept added latency, or perhaps the process would terminate and respawn.

Now we will authenticate and request our user to connect to the database with.

$ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' ${VAULT_ADDR}/v1/auth/cert/login -X POST 2>/dev/null | json_pp { "auth" : { "policies" : [ "default", "demo" ], "accessor" : "2e6d4b95-3bf5-f459-cd27-f9e35b9bed16", "renewable" : true, "lease_duration" : 3600, "metadata" : { "common_name" : "", "cert_name" : "demo", "authority_key_id" : "", "subject_key_id" : "" }, "client_token" : "018e6feb-65c4-49f2-ae30-e4fbba81e687" }, "lease_id" : "", "wrap_info" : null, "renewable" : false, "data" : null, "request_id" : "f00fe669-4382-3f33-23ae-73cec0d02f39", "warnings" : null, "lease_duration" : 0 } $ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 018e6feb-65c4-49f2-ae30-e4fbba81e687' ${VAULT_ADDR}/v1/mysql/creds/readonly -X GET 2>/dev/null | json_pp { "errors" : [ "permission denied" ] }

Oh, what happened? Well, remember the policy that we created earlier? We hadn’t allowed access to the MySQL role generator, so we need to update and apply the policy.

(vault)$ cat <<EOF | vault policy-write demo /dev/stdin path "sys/*" { policy = "deny" } path "secret/mysql/*" { policy = "read" capabilities = ["list", "sudo"] } path "mysql/creds/readonly" { policy = "read" capabilities = ["list", "sudo"] } EOF Policy 'demo' written.

Now that we have updated the policy to allow access to the readonly role (requests go via mysql/creds when requesting access) we can check that the policy has applied and whether we get a user account for MySQL.

# Request a user account $ curl --cert user.pem --key privkey.pem -H 'Content-type: application/json' -H 'X-Vault-Token: 018e6feb-65c4-49f2-ae30-e4fbba81e687' ${VAULT_ADDR}/v1/mysql/creds/readonly -X GET 2>/dev/null | json_pp { "request_id" : "7b45c9a1-bc46-f410-7af2-18c8e91f43de", "lease_id" : "mysql/creds/readonly/c661426c-c739-5bdb-cb7a-f51f74e16634", "warnings" : null, "lease_duration" : 3600, "data" : { "password" : "099c8f2e-588d-80be-1e4c-3c2e20756ab4", "username" : "read-cert-401f2c" }, "wrap_info" : null, "renewable" : true, "auth" : null } # Test MySQL access $ mysql -h localhost -u read-cert-401f2c -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 17 Server version: 5.7.14-8-log Percona Server (GPL), Release '8', Revision '1f84ccd' Copyright (c) 2009-2016 Percona LLC and/or its affiliates Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show grants; +-----------------------------------------------+ | Grants for read-cert-401f2c@% | +-----------------------------------------------+ | GRANT SELECT ON *.* TO 'read-cert-401f2c'@'%' | +-----------------------------------------------+ 1 row in set (0.00 sec) # Display the full account information $ pt-show-grants --only='read-cert-401f2c'@'%' -- Grants dumped by pt-show-grants -- Dumped from server Localhost via UNIX socket, MySQL 5.7.14-8-log at 2016-11-08 23:28:37 -- Grants for 'read-cert-401f2c'@'%' CREATE USER IF NOT EXISTS 'read-cert-401f2c'@'%'; ALTER USER 'read-cert-401f2c'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*FF157E33408E1FBE707B5FF89C87A2D14E8430C2' REQUIRE NONE PASSWORD EXPIRE INTERVAL 1 DAY ACCOUNT UNLOCK; GRANT SELECT ON *.* TO 'read-cert-401f2c'@'%';

Hurrah! Now we don’t even need to go and create a user, the application can get one when it needs one. We’ve made the account auto-expire so that the credentials are only valid for 1 day, regardless of Vault expiration, and also we’ve reduced the amount of time that the token is valid, so we’ve done a pretty good job of limiting the window of opportunity for any rogue activity

We’ve covered quite a lot in this post, some detail for which has been left out to keep us on track. The online documentation for OpenSSL, Let’s Encrypt and Vault are pretty good, so you should be able to take a deeper dive should you wish to. Hopefully, this post has given a good enough introduction to Vault to get you interested and looking to test it out, as well as bringing the great Let’s Encrypt service to your attention so that there’s very little reason to not provide a secure online experience for your readers, customers and services.

Categories: MySQL

MongoDB Through a MySQL Lens

MySQL Performance Blog - Mon, 2016-11-14 19:56

This blog post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.

Delving into NoSQL coming from an exclusively SQL background can seem like a daunting task. I have worked with SQL in both small MySQL environments and large Oracle SQL environments. 

When is it a good choice?

MongoDB is an incredibly robust, scalable, and operator-friendly database solution. MongoDB is a good choice when your developers will also be responsible for the database environment. In small shops and startups, this might be the case. MongoDB stores information in BSON (binary JSON). BSON is the native JSON (JavaScript Object Notation) language used by MongoDB to retrieve information stored in BSON on the back end. JSON is easily relatable to other programming languages, and many developers will already have experience with it.

MongoDB is also a good option when you expect a great deal of your traffic to be writes. This is not to say that MySQL does not have good options when dealing with write-heavy environments, but MongoDB handles this with relative ease. Facebook designed the RocksDB storage engine for write-heavy environments, and performs well (with benchmark testing demonstrating this).

MongoDB is a good choice when you need a schemaless, or schema-flexible, data structure. MongoDB handles changes to your data organization with relative ease and grace. This is the selling point of NoSQL solutions. There have been many improvements in the MySQL world that make online schema changes possible, but the ease at which this is done in MongoDB has not yet been matched. The ability to create records without defining structure gives MongoDB added flexibility.

Another reason to choose MongoDB is its functionality with regards to replication setup, built-in sharding, and auto elections. Setting up a replicated environment in MongoDB is easy, and the auto-election process allows a secondary to take over in the event of a primary database failure. Built-in sharding allows for easy horizontal scaling, which can be more complicated to manage, setup and configure in a MySQL environment.

When should you choose something else?

MongoDB is a great choice for some use cases. It is also not a great choice for others. MongoDB might not be the right choice when your data is highly relational and structured. MongoDB does not support  transactions, but on a document level there is atomicity. There are configuration considerations to make for a replicated environment with regards to write concern, but these come at the cost of performance. Write concern verifies that replicas have written the information. By default, MongoDB sets the write concern to request acknowledgment from the primary only, not replicas. This can lead to consistency issues if there is a problem with the replicas.

How is the structure different?

Many concepts in the SQL world are relatable to the document structure of MongoDB. Let’s take a look at the high-level structure of a simple MongoDB environment to better understand how MongoDB is laid out.

The below chart relates MySQL to MongoDB (which is found in MongoDB’s documentation).

Another interesting note is the mongod process. This is a daemon that processes data requests, much the same as the mysqld process for MySQL. This is the process that listens for MongoDB requests, and manages access to the database. As with MySQL, there are a number of start-up options for the mongod process. One of the most important configuration options is --config which specifies a config file to use for your mongod instance. Slightly different from MySQL, this file uses YAML formatting. Below is an example config file for MongoDB. Please note this is to demonstrate formatting. It isn’t optimized for any production database.

By definition, MongoDB is a document store database. This chart gives you some idea of how that relates to the structure of MySQL or any SQL flavor. Instead of building a table and adding data, you can immediately insert documents into a collection without having to define a structure. This is one of the advantages in flexibility that MongoDB offers over MySQL. It is important to note that just because MongoDB offers this flexibility does not mean that organizing a highly functional production MongoDB database is effortless. Similar to choosing any database, thought should be put into the structure and goal of the database to avoid pitfalls down the line.

# mongod.conf, Percona Server for MongoDB # for documentation of all options, see: # # Where and how to store data. storage: dbPath: /var/lib/mongodb journal: enabled: true engine: rocksdb # where to write logging data. systemLog: destination: file logAppend: true path: /var/log/mongodb/mongod.log processManagement: fork: true pidFilePath: /var/run/ # network interfaces net: port: 27017 bindIp:

NOTE: YAML formatting does not handle tab. Use spaces to indent.

How is querying different?

Interacting with the database via the shell also offers something slightly different from SQL. JSON queries MongoDB. Again, this should be familiar to web developers which is one of the appeals of using MongoDB. Below is an example of a query translated from SQL to MongoDB. We have a user table with just usernames and an associated ID.


select username from user where id = 2;

In MongoDB:


In the JSON format, we specify the user collection to query from and then the ID associated with the document we are interested in. Finally, the field is specified from which we want the value. The result of this query would be the username of the user that has an ID of 2.

Final thoughts

MongoDB is not a silver bullet to your MySQL woes. As both databases continue to evolve, their weaknesses and strengths slowly start to blend together. Do not let the flexibility of MongoDB’s structure fool you into thinking that you must not have a plan for your database environment. This will surely lead to headaches down the road. The flexibility should allow for dynamic, fast changes not thoughtless ones. I encourage any MySQL user to get their hands on a MongoDB instance for testing purposes. MongoDB is a popular option in the e-commerce and gaming world because of its flexibility in schema design and its ability to scale horizontally with large amounts of data.

Categories: MySQL

Amazon AWS Service Tiers

MySQL Performance Blog - Fri, 2016-11-11 20:33

This blog post discusses the differences between the Amazon AWS service tiers.

Many people want to move to an Amazon environment but are unsure what AWS service makes the most sense (EC2, RDS, Aurora). For database services, the tiering at Amazon starts with EC2, then moves up to RDS, and on to Aurora. Amazon takes on more of the implementation and management of the database As you move up the tiers. This limits the optimization options. Obviously, moving up the tiers increases basic costs, but there are tradeoffs at each level to consider.

  • EC2 (Elastic Compute Cloud) is a basic cloud platform. It provides the user with complete control of the compute environment, while reducing your need to monitor and manage hardware. From a database perspective, you can do almost anything in EC2 that you could do running a database on your own hardware. You can tweak OS and database settings, plus do all of the normal database optimization work you would do in a bare metal environment. In EC2, you can run a single server, master/slave, or a cluster, and you can use MySQL, MongoDB, or any other product. You can use AWS Snapshot Manager to take backups, or you can use another backup tool. This option is ideal if you want all the flexibility of running your own hardware without the hassles of daily hardware maintenance.
  • RDS (Relational Data Service) makes it easy to set up a relational database in the cloud. It offers similar resizing capabilities to EC2, but also automates a lot of tasks. RDS supports Aurora (more on that later), Postgres, MySQL, MariaDB, Oracle, and MSSQL. RDS simplifies deployment and automates some maintenance tasks. This means that you are limited in terms of the tweaks that you can implement at the OS and database configuration level. This means you will focus on query and schema changes to optimize a database in this environment. RDS also includes automated backups and provides options for read replicas that you can spread across multiple availability zones. You must consider and manage all these are all items in the EC2 world. This choice is great if you are looking to implement a database but don’t want (or know how) to take on a lot of the tasks, such as backups and replication setup, that are needed for a stable and highly available environment.
  • Aurora is one of the database options available through RDS. You might hear people refer to it either as Aurora or RDS Aurora (they’re both the same). With Aurora, Amazon takes on even more of the configuration and management options. This limits your optimization capabilities even more. It also means that there are far fewer things to worry about since Amazon handles so much of the administration. Aurora is MySQL-compatible, and is great if you want the power and convenience of MySQL with a minimum of effort on the hardware side. Aurora is designed to automatically detect database crashes and restart without the need for crash recovery or to rebuild the database cache. If the entire instance fails, Aurora will automatically failover to one of up to 15 read replicas.

With data in the cloud, security becomes a bigger concern. You continue to govern access to your content, platform, applications, systems ,and networks, just like you would with data stored in your own datacenter. Amazon’s cloud offerings also support highly secure environments, like HIPAA and PCI compliance. They have designed the cloud environment to be a secure database environment while maintaining the necessary access for use and administration, even in these more regulated environments.

Storing data in the cloud is becoming more common. Amazon offers multiple platform options and allows for easy scalability, availability, and reliability.

Categories: MySQL

Is Your Query Cache Really Disabled?

MySQL Performance Blog - Fri, 2016-11-11 18:46

This blog post was motivated by an internal discussion about how to fully disable query cache in MySQL.

According to the manual, we should be able to disable “Query Cache” on the fly by changing query_cache_type to 0, but as we will show this is not fully true. This blog will show you how to properly disable “query cache,” and how common practices might not be as good as we think.

Can we just disable it by changing variables, or does it requires a restart to avoid the global mutex? Let’s see how it works.

Some Query Cache context

The query cache stores the text of a “Select” statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.

But cacheable queries take out an “exclusive lock” on MySQL’s query cache. In addition, any insert, update, delete or other modifications to a table causes any relevant entries in the query cache to be flushed. If you see many “Waiting for query cache lock” in the processlist, you might be suffering from this exclusive lock. In this blog post, you can see how this global mutex in high concurrency can cause performance degradation.

If we are facing with this situation, how can we disable it?

Disabling Query Cache

There are two options that you can change: query_cache_type and query_cache_size.

So if we change query_cache_size to “0”, does it means the cache is disabled? Or we also have to change query_cache_type? Or both? And does MySQL require a restart to avoid the global mutex?

The source code shows us this:

int Query_cache::send_result_to_client(THD *thd, const LEX_CSTRING &sql) { ulonglong engine_data; Query_cache_query *query; #ifndef EMBEDDED_LIBRARY Query_cache_block *first_result_block; #endif Query_cache_block *result_block; Query_cache_block_table *block_table, *block_table_end; char *cache_key= NULL; size_t tot_length; Query_cache_query_flags flags; DBUG_ENTER("Query_cache::send_result_to_client"); /* Testing 'query_cache_size' without a lock here is safe: the thing we may loose is that the query won't be served from cache, but we save on mutex locking in the case when query cache is disabled. See also a note on double-check locking usage above. */ if (is_disabled() || thd->locked_tables_mode || thd->variables.query_cache_type == 0 || query_cache_size == 0) goto err; ...

MySQL is going to check if the query cache is enabled before it locks it. It is checking four conditions, and one of them has to be true. The last three could be obvious, but what is the “is_disabled()” function? Following the source code, we can find the next: sql_cache.h

void disable_query_cache(void) { m_query_cache_is_disabled= TRUE; } ... bool is_disabled(void) { return m_query_cache_is_disabled; }

void Query_cache::init() { DBUG_ENTER("Query_cache::init"); mysql_mutex_init(key_structure_guard_mutex, &structure_guard_mutex, MY_MUTEX_INIT_FAST); mysql_cond_init(key_COND_cache_status_changed, &COND_cache_status_changed); m_cache_lock_status= Query_cache::UNLOCKED; initialized = 1; /* If we explicitly turn off query cache from the command line query cache will be disabled for the reminder of the server life time. This is because we want to avoid locking the QC specific mutex if query cache isn't going to be used. */ if (global_system_variables.query_cache_type == 0) query_cache.disable_query_cache(); DBUG_VOID_RETURN; }

If the global_system_variables.query_cache_type == 0 condition is true it is going to call the disable_query_cache  function, which sets m_query_cache_is_disabled = True, so is_disabled going to be “True”. That means if we are setting query_cache_type to 0 in runtime, that should eliminate the global mutex. Let’s run some tests to confirm this and see if the global mutex disappears after changing query_cache_type to 0.

Running tests

Context on the tests:

  1. We ran simple OLTP tests using sysbench as follows:

sysbench --test="/usr/share/doc/sysbench/tests/db/oltp.lua" --report-interval=1 --max-time=120 --oltp-read-only=off --max-requests=0 --num-threads=4 --oltp-table-size=2000000 --mysql-host=localhost --mysql-db=test --db-driver=mysql --mysql-user=root run

  1. Important portion of my.cnf file:

query_cache_type =1 query_cache_limit = 1M query_cache_size =1G performance_schema_instrument='wait/synch/%Query_cache%=COUNTED'

So basically the tests were run for two minutes each while playing with query_cache_type and query_cache_size.

  1. Started MySQL with query_cache_type = 1 and query_cache_size=1G.
  2. Change query_cache_type to 0. As we can see nothing changed, MySQL is still using the query cache.
  3. But when we stopped sysbench and started again (closing and opening new connections), we can see there are no more inserts going into query cache. But we still can see the queries like “Not Cached” that means changing the query_cache_type applies only for the new connections, and we still can see some mutex.
  4. Restarted MySQL with query_cache_type = 0 and query_cache_size=0. Finally we disabled the query cache and all the mutex is disappeared.
  5. Restarted MySQL with query cache enabled.
  6. We changed query_cache_size=0 and it almost worked, we could disable query cache on the fly, but as we can see there is still some mutex activity.
  7. Changing query_cache_type=0 and restarting sysbench does not have any effect on the mutex.

So the only way to stop any activity around query cache requires restarting MySQL with query_cache_type = 0  and query_cache_size=0. Disabling it or even set it to “0” on runtime is not completely stopping mutex activity.

But why do we still need query_cache_size while in theory query_cache_type should be enough?

As referenced above, the manual says if query_cache_type = 0:

Do not cache results in or retrieve results from the query cache. Note that this does not deallocate the query cache buffer. To do that, you should set query_cache_size to 0.

Based on our test, if we change query_cache_type to 0, it still hits the cache.

So you might think “well, I don’t enable the query cache and use defaults to keep it disabled.” Keep reading, because you might be wrong. According to manual, starting from 5.6.8 query_cache_type=0 is set by default, but query_cache_size= 1048576  (1MB). This means that if we keep default configuration, we will still see activity in the query cache as follows:

mysql -e "show global status like 'qca%';" +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1031320 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 423294 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 1 | +-------------------------+---------+

But if we just add query_cache_size=0  to my.cnf and check again (of course after restarting server):

mysql -e "show global status like 'qca%';" +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Qcache_free_blocks | 0 | | Qcache_free_memory | 0 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 0 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 0 | +-------------------------+-------+

We finally get no query cache related activity at all. How much overhead is caused by this? We’re not fully sure because we didn’t perform benchmarks, but we like to see no activity when we don’t want to.
Now we’re wondering if this case requires a bug report. Stay tuned, we will publish results in the post soon.

Digging more code

Let’s have a look on store_query function. MySQL uses this function to store queries in the query cache. If we read the code we can find this:

if (thd->locked_tables_mode || query_cache_size == 0) DBUG_VOID_RETURN;

It only checks the query_cache_size, it does not check the type. Store_query is called in handle_query, which also does not check the query_chache_type.


There is some contradiction between checking the query cache and storing the data in the query cache, which needs further investigation. But as we can see it is not possible to fully disable the query cache on the fly by changing query_cache_type  or/and query_cache_size to 0. Based on the code and the tests, if you want to make sure the query cache is fully disabled, change query_cache_size and query_cache_type to 0 and restart MySQL.

Is a known fact that query cache can be a big point of contention, and we are not trying to benchmark the performance overhead since this mostly depends on the workload type. However, we still can see some overhead if the query cache is not fully disabled when MySQL is started.

Categories: MySQL

Thoughts About Column Compression, with Optional Predefined Dictionary

MySQL Performance Blog - Thu, 2016-11-10 23:05

This blog discusses column compression with an optional predefined dictionary.

Compression, more compression with different algorithms, compress again, compress multiple times!

Categories: MySQL

Database Daily Ops Series: GTID Replication

MySQL Performance Blog - Thu, 2016-11-10 17:19

This post discusses ways of fixing broken GTID replication.

This blog series is all about the daily stories we have in Managed Services, dealing with customers’ environments (mostly when we need to quickly restore a service level within the SLA time).

One of the issues we encounter daily is replication using the GTID protocol. While there are a lot of blogs written about this subject, I would like to just highlight GTID replication operations, and the way you can deal with broken replication.

Most of the time we face way more complex scenarios then the one I’m about to present as an example, but the main goal of this blog is to quickly highlight the tools that can be used to fix issues to resume replication.

After reading this blog, you might ask yourself “Now, we know how to fix replication, but what about consistency?” The next blog will be entirely focused on that matter, data consistency!

Little less talk, little more action…

Replication is broken, and the SHOW SLAVE STATUS command output looks like below:

mysql> show slave statusG *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host:                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: mysqld-bin.000005           Read_Master_Log_Pos: 879                Relay_Log_File: mysqld-relay-bin.000009                 Relay_Log_Pos: 736         Relay_Master_Log_File: mysqld-bin.000005              Slave_IO_Running: Yes             Slave_SQL_Running: No               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 1062                    Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'                  Skip_Counter: 0           Exec_Master_Log_Pos: 634               Relay_Log_Space: 1155               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 1062                Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'   Replicate_Ignore_Server_Ids:              Master_Server_Id: 3                   Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806              Master_Info_File: mysql.slave_master_info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State:            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp: 161108 16:47:53                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1                 Auto_Position: 1 1 row in set (0.00 sec)
When a slave configured to replicate using the GTID protocol breaks, pay attention to the SHOW SLAVE STATUS command output. You will find the Retrieved_Gtid_Set and Executed_Gtid_Set in the listed columns. You can see that the last global transaction ID retrieved from the current master was not executed (it’s going to appear on the Retrieved_Gtid_Set but not on the Executed_Gtid_Set, following the GTID format).

That means that the slave has retrieved a transaction that, for some other reason, it couldn’t execute. That’s the global transaction ID you need if you want to inject a fake transaction and get replication resumed. The fake transaction you inject takes the place of the one that has an SQL that cannot be executed due to an error found in Last_Error from the SHOW SLAVE STATUS.

Let’s analyze it: #: replication is broken due to error 1062, when the primary key of a particular table is violated Last_Errno: 1062 Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'   #: you can identify what is the global transaction id with problems, so, getting the replication streaming broken            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
As shown above, we can see clearly that the transaction causing issues is global transaction ID number 4, coming from master under UUID 46fdb7ad-5852-11e6-92c9-0800274fb806. You can use the SHOW RELAYLOG EVENTS to try and verify that is the transaction’s query that is causing problems:

mysql> show relaylog events in 'mysqld-relay-bin.000009' from 736G *************************** 1. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 736  Event_type: Gtid   Server_id: 3 End_log_pos: 682        Info: SET @@SESSION.GTID_NEXT= '46fdb7ad-5852-11e6-92c9-0800274fb806:4' *************************** 2. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 784  Event_type: Query   Server_id: 3 End_log_pos: 755        Info: BEGIN *************************** 3. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 857  Event_type: Query   Server_id: 3 End_log_pos: 848        Info: insert into wb.t1 set i=1 *************************** 4. row ***************************    Log_name: mysqld-relay-bin.000009         Pos: 950  Event_type: Xid   Server_id: 3 End_log_pos: 879        Info: COMMIT /* xid=66 */ 4 rows in set (0.00 sec)
Before fixing and resuming the replication stream, we need to check why that INSERT query breaks replication. Let’s SELECT data and check the structure of table wb.t1:

mysql> select * from wb.t1; +---+ | i | +---+ | 1 | +---+ 1 row in set (0.00 sec)   mysql> show create table wb.t1; +-------+-----------------------------------------------------+ | Table | Create Table                                                                                                       | +-------+-----------------------------------------------------+ | t1    | CREATE TABLE `t1` (   `i` int(11) NOT NULL DEFAULT '0',   PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+-----------------------------------------------------+ 1 row in set (0.01 sec)
It’s clear that something is wrong somewhere else than just the database. It’s time to fix and resume replication, and catch up with the master data. GTID replication data uses the below tools:

  • pt-slave-restart
  • mysqlslavetrx
  • inject a fake/empty transaction


One of the easiest ways of resuming replication on slaves when replication is broken is using pt-slave-restart, which is part of Percona Toolkit. Once you find the above facts (mainly the master UUID of the problematic global transaction ID that broke slave replication), you can move forward using pt-slave-restart  with the GTID flag —-master-uuid. This passes the master’s UUID and it skips all global transactions breaking replication on a specific slave server, as you can see below:

[root@dbops02 ~]# pt-slave-restart --master-uuid 46fdb7ad-5852-11e6-92c9-0800274fb806 --host=localhost -u root 2016-11-08T17:24:09 h=localhost,u=root mysqld-relay-bin.000009         736 1062 2016-11-08T17:24:25 h=localhost,u=root mysqld-relay-bin.000010         491 1062 2016-11-08T17:24:34 h=localhost,u=root mysqld-relay-bin.000010         736 1062 2016-11-08T17:24:35 h=localhost,u=root mysqld-relay-bin.000010         981 1062 2016-11-08T17:24:36 h=localhost,u=root mysqld-relay-bin.000010        1226 1062
With the resources provided by pt-slave-restart, together with the above info, replication should resume. If you don’t have the Percona Toolkit package setup on your servers, make sure you follow these steps. It’s easier if you add the Percona Repository to your servers (you can use the Package Manager to install it for Debian-based and for RedHat-based systems).


To use mysqlslavetrx (which is part of MySQL Utilities developer by Oracle), I recommend you read the article written by Daniel Guzman, and install MySQL Utilities on your database servers. Using it to skip problematic transactions and inject fake ones is pretty straightforward as well .

So, find the below on the slave side:

         Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-8, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
Then use GTID_SUBTRACT as the first parameter, which you can find on Retrieved_Gtid_Set and the subset, which you can grab from the Executed_Gtid_Set. Use the UUID of the direct master in the function to find the global transaction IDs executed on slave:

#: this below function is pretty cool and will exactly shows you if the slave is lacking some #: GTIDs master have and vice-versa - this is going to help out using mysqlslavetrx to put #: master and slave consistently with the same binary logs contents mysql> SELECT GTID_SUBTRACT('46fdb7ad-5852-11e6-92c9-0800274fb806:1-13','46fdb7ad-5852-11e6-92c9-0800274fb806:1-8') gap; *************************** 1. row *************************** gap: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 1 row in set (0.00 sec)
Now we can use mysqlslavetrx to insert a fake transaction on the slave to resume replication, as below:

 [root@dbops02 mysql-utilities-1.6.2]# mysqlslavetrx --gtid-set=46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 --verbose --slaves=wb@localhost:3306 WARNING: Using a password on the command line interface can be insecure. # # GTID set to be skipped for each server: # - localhost@3306: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 # # Injecting empty transactions for 'localhost:3306'... # - 46fdb7ad-5852-11e6-92c9-0800274fb806:9 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:10 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:11 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:12 # - 46fdb7ad-5852-11e6-92c9-0800274fb806:13 # #...done. #

When you get back to the MySQL client on the slave, you’ll see that the retrieved and executed out of SHOW SLAVE STATUS will point that they are in the same position:

           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
After, make sure you start the slave (mysqlslavetrx won’t start the replication as previous tool does).

Inject a Fake Transaction

Fake transactions are called empty transactions as well, but the fact is if a global transaction is affecting a slave, you must fake empty transactions that won’t affect data to resume replication and carry on processing the data streaming from the master aka replication! We need to know that it’s not going to affect future slaves, especially if this server becomes a new master after a failover/switchover process. You can get more information about Errant Transactions here and here.

mysql> stop slave; Query OK, 0 rows affected (0.01 sec)   mysql> set gtid_next='46fdb7ad-5852-11e6-92c9-0800274fb806:14'; Query OK, 0 rows affected (0.00 sec)   mysql> begin; commit; Query OK, 0 rows affected (0.00 sec)   Query OK, 0 rows affected (0.00 sec)   mysql> set gtid_next=automatic; Query OK, 0 rows affected (0.00 sec) Now, when you check retrieved and executed out of SHOW SLAVE STATUS, you can see the below:            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1
It’s time to start slave (and be happy)!

mysql> show slave statusG *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host:                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: mysqld-bin.000005           Read_Master_Log_Pos: 3329                Relay_Log_File: mysqld-relay-bin.000011                 Relay_Log_Pos: 491         Relay_Master_Log_File: mysqld-bin.000005              Slave_IO_Running: Yes             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 3329               Relay_Log_Space: 3486               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 0                 Last_IO_Error:                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 3                   Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806              Master_Info_File: mysql.slave_master_info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp:      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14, 4fbe2d57-5843-11e6-9268-0800274fb806:1-3, 81a567a8-5852-11e6-92cb-0800274fb806:1                 Auto_Position: 1 1 row in set (0.00 sec)

Categories: MySQL

Orchestrator and ProxySQL

MySQL Performance Blog - Wed, 2016-11-09 20:19

In this blog post, I am going to show you how can you use Orchestrator and ProxySQL together.

In my previous blog post, I showed how to use bash scripts and move virtual IPs with Orchestrator. As in that post, I assume you already have Orchestrator working. If not, you can find the installation steps here.

In the case of a failover, Orchestrator changes the MySQL topology and promotes a new master. But who lets the application know about this change? This is where ProxySQL helps us.


You can find the ProxySQL install steps here. In our test, we use the following topology:

For this topology we need the next rules in “ProxySQL”:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('',601,3306,1000,0); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('',600,3306,1000,0); INSERT INTO mysql_replication_hostgroups VALUES (600,601,''); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_w',600,1); insert into mysql_query_rules (username,destination_hostgroup,active) values('testuser_r',601,1); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('testuser_rw',601,1,3,'^SELECT'); LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK; insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_w','Testpass1.',1,600,'test',1); insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_r','Testpass1.',1,601,'test',1); insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('testuser_rw','Testpass1.',1,600,'test',1); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

See the connection pool:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | | 3306 | ONLINE | 4 | 0 | 4 | 0 | 2833 | 224351 | 0 | 3242 | | 601 | | 3306 | ONLINE | 1 | 1 | 11 | 0 | 275443 | 11785750 | 766914785 | 431 | | 601 | | 3306 | ONLINE | 1 | 1 | 10 | 0 | 262509 | 11182777 | 712120599 | 1343 | | 601 | | 3306 | ONLINE | 1 | 1 | 2 | 0 | 40598 | 1733059 | 111830195 | 3242 | +-----------+----------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ 4 rows in set (0.00 sec)

It shows us “” is in “hostgroup” 600, which means that server is the master.

How does ProxySQL decide who the new master is?

ProxySQL does not know what the topology looks like, which is really important. ProxySQL is monitoring the “read_only” variables on the MySQL servers, and the server where read_only=off is going to get the writes. If the old master went down and we changed our topology, we have to change the read_only variables on the new master. Of course, applications like MHA or Orchestrator can do that for us.

We have two possibilities here: the master went down, or we want to promote a new master.

Master is down

If the master goes down, Orchestrator is going to change the topology and set the read_only = OFF on the promoted master. ProxySQL is going to realize the master went down and send the write traffic to the server where read_only=OFF.

Let’s do a test. After we stopped MySQL on “”, Orchestrator promoted “” as the new master. ProxySQL is using it now as a master:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 790 | | 601 | | 3306 | ONLINE | 0 | 0 | 13 | 0 | 277953 | 11894400 | 774312665 | 445 | | 601 | | 3306 | ONLINE | 0 | 0 | 10 | 0 | 265056 | 11290802 | 718935768 | 790 | | 601 | | 3306 | SHUNNED | 0 | 0 | 2 | 0 | 42961 | 1833016 | 117959313 | 355 | +-----------+----------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ 4 rows in set (0.00 sec)

This happens quickly and does not require any application, VIP or DNS modification.

Promoting a new Master

When we perform a graceful-master-takeover with Orchestrator, it promotes a slave as a new master, removes the old master from the replicaset and sets read_only=ON.

From Orchestrator’s point of view, this is great. It promoted a slave as a new master, and old master is not part of the replicaset anymore. But as I mentioned earlier, ProxySQL does not know what the replicaset looks like.

It only knows we changed the read_only variables on some servers. It is going to send reads to the old master, but it does not have up-to-date data anymore. This is not good at all.

We have two options to avoid this.

Remove master from read hostgroup

If the master is not part of the read hostgroup, ProxySQL won’t send any traffic there after we promote a new master. But in this case, if we lose the slaves, ProxySQL cannot redirect the reads to the master. If we have a lot of slaves, and the replication stopped on the saves because of an error or mistake, the master probably won’t be able to handle all the read traffic. But if we only have a few slaves, it would be good if the master can also handle reads if there is an issue on the slaves.

Using Scheduler

In this great blog post from Marco Tusa, we can see that ProxySQL can use “Schedulers”. We can use the same idea here as well. I wrote a script based on Marco’s that can recognize if the old master is no longer a part of the replicaset.

The script checks the followings:

  • read_only=ON – the server is read-only (on the slave servers, this has to be ON)
  • repl_lag  is NULL – on the master, this should be NULL (if the seconds_behind_master is not defined, ProxySQL will report repl_lag is NULL)

If the read_only=ON, it means the server is not the master at the moment. But if the repl_lag is NULL, it means the server is not replicating from anywhere, and it probably was a master. It has to be removed from the Hostgroup.

Adding a Scheduler

INSERT INTO scheduler (id,interval_ms,filename,arg1) values (10,2000,"/var/lib/proxysql/","-u=admin -p=admin -h= -G=601 -P=6032 --debug=0 --log=/var/lib/proxysql/server_check"); LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

The script has parameters like username, password or port. But we also have to define the read Hostgroup (-G).

Let’s see what happens with ProsySQL after we run the command orchestrator -c graceful-master-takeover -i rep1 -d rep2 :

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 504 | | 601 | | 3306 | ONLINE | 0 | 2 | 2 | 0 | 6784 | 238075 | 2175559 | 454 | | 601 | | 3306 | ONLINE | 0 | 0 | 2 | 0 | 6761 | 237409 | 2147005 | 504 | | 601 | | 3306 | OFFLINE_HARD | 0 | 0 | 2 | 0 | 6170 | 216001 | 0 | 435 | +-----------+----------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ 4 rows in set (0.00 sec)

As we can see, the status changed to OFFLINE_HARD:

mysql> select * from mysql_servers; +--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ | 601 | | 3306 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | | | 601 | | 3306 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | | | 9601 | | 3306 | ONLINE | 1000 | 0 | 1000 | 0 | 0 | 0 | | | 600 | | 3306 | ONLINE | 1000 | 0 | 1000 | 10 | 0 | 0 | | +--------------+----------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+ 4 rows in set (0.00 sec)

This is because we changed the “hostgroup_id” to 9601. This is what we wanted so that the old master won’t get more traffic.


Because ProxySQL redirects the traffic based on the read_only  variables, it is important to start the servers with read_only=ON (even on the master). In that case, we can avoid getting writes on many servers at the same time.

If we want to use graceful-master-takeover with Orchestrator, we have to use a scheduler that can remove the old master from the read Hostgroup.

Categories: MySQL

Checking if a Slave Has Applied a Transaction from the Master

MySQL Performance Blog - Tue, 2016-11-08 21:08

In this blog post, we will discuss how we can verify if an application transaction executed on the master has been applied to the slaves.

In summary, is a good practice to alleviate the load on the master by doing reads on slaves. It is acceptable in most of the cases to just connect on slaves and issue selects. But there are some cases we need to ensure that the data we just applied on our master has been applied on the slaves before we query it.

One way to do this is using a built-in function called MASTER_POS_WAIT. This function receives a binary log name and position. It will block the query until the slave applies transactions up to that point, or timeout. Here is one example of how to use it:

-- insert our data on master master [localhost] {msandbox} (test) > INSERT INTO test VALUES (); Query OK, 1 row affected (0.00 sec) -- get the binlog file and position from master master [localhost] {msandbox} (test) > SHOW MASTER STATUS; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000005 | 1591 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) -- connect on slave and run MASTER_POS_WAIT passing the binlog name and position slave [localhost] {msandbox} ((none)) > SELECT NOW(); SELECT MASTER_POS_WAIT('mysql-bin.000005', 1591); SELECT NOW(); +---------------------+ | NOW() | +---------------------+ | 2016-10-20 18:24:24 | +---------------------+ 1 row in set (0.00 sec) -- it will wait until the slave apply up to that point +-------------------------------------------+ | MASTER_POS_WAIT('mysql-bin.000005', 1591) | +-------------------------------------------+ | 1 | +-------------------------------------------+ 1 row in set (3.82 sec) +---------------------+ | NOW() | +---------------------+ | 2016-10-20 18:24:28 | +---------------------+ 1 row in set (0.00 sec)

Blocking the connection until the slave is in sync with the coordinate passed as a parameter on MASTER_POS_WAIT might not be affordable to all applications, however.

As an alternative, MySQL 5.6+ makes use of relay_log_info_repository configuration. If we set this variable to TABLE, MySQL stores the slave status information in the slave_relay_log_info table under mysql database. We must configure the sync_relay_log_info variable, and set to 1 in case we use non-transactional tables such as MyISAM. It forces slave_relay_log_info  to sync after each statement. So edit my.cnf on slaves:

relay_log_info_repository=TABLE sync_relay_log_info=1

Now we can query slave_relay_log_info directly to see if the slave we are connected to already applied the transaction we need:

master [localhost] {msandbox} (test) > INSERT INTO test VALUES (NULL); Query OK, 1 row affected (0.00 sec) master [localhost] {msandbox} (test) > SHOW MASTER STATUS; +------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+-------------------+ | mysql-bin.000003 | 366 | | | | +------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) slave1 [localhost] {msandbox} ((none)) > SELECT COUNT(*) FROM mysql.slave_relay_log_info WHERE (Master_log_name > 'mysql-bin.000003') OR ( Master_log_name = 'mysql-bin.000003' AND Master_log_pos >= '366' ); +----------+ | COUNT(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec)


You can use relay_log_info_repository as a replacement for MASTER_POS_WAIT to check if a slave has applied a particular transaction. Since it won’t block your thread (in case the slave is not in sync) you will be able to either abort the operation or disconnect and move to the next slave.

Categories: MySQL

Percona Live 2017 Call for Papers Extended Deadline: November 13th

MySQL Performance Blog - Tue, 2016-11-08 18:01

We’ve extended the Percona Live Open Source Database Conference 2017 call for papers deadline until November 13th!

If you haven’t submitted already, please consider doing so. Speaking at Percona Live is a great way to build your personal and company brands, and if selected you will receive a complimentary full conference pass! For Percona Live 2017, we’re not just looking for MySQL and MongoDB topics, but also talks on other open source databases. 

The Percona Live 2017 Call for Papers is open until November 13, 2016. Do you have a MySQL, MongoDB, PostgreSQL or open source database use case to share, a skill to teach, or a big idea to discuss? We invite you to submit your speaking proposal for either breakout or tutorial sessions. This conference provides an opportunity to network with peers and technology professionals. It brings together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience.

Percona Live 2017 is looking for topics for Breakout Sessions, Tutorial Sessions, and Lightning Talks:

  • Breakout Session. Make submissions detailed, and clearly indicate the topic and content of your proposal for the Conference Committee. Sessions should either be 25 minutes or 50 minutes in length, including Q&A.
  • Tutorial Session. Make submissions detailed, and include an agenda for review by the Conference Committee. Tutorial sessions should present immediate and practical applications of in-depth knowledge of MySQL, MongoDB and open source database technologies. They should be presented at a level between a training class and a conference breakout session. Attendees are expected to have their laptops to work through detailed and potentially hands-on presentations. Tutorials will be 3 hours in length including Q&A. If you would like to submit your proposal as a full day, 6-hour tutorial, please indicate this in your submission.
  • Lightning Talks. Lightning talks are five-minute presentations focusing on one key point that will be of interest to the community. Talks can be technical, lighthearted, fun or otherwise entertaining submissions. These can include new ideas, a successful project, a cautionary story, quick tip or demonstration. This session is an opportunity for ideas to get the attention they deserve. The rules for this session are easy: five minutes and only five minutes. Use this time wisely to present the pertinent message of the subject matter and have fun doing so!

Submit your topics as soon as you can, the period closes on November 13, 2016!

Percona Live Open Source Database Conference 2017: Santa Clara, CA

The Percona Live Open Source Database Conference 2017 is the premier event for the diverse and active open source database community, as well as organizations that develop and use open source database software.

The conference will feature one day of tutorials and three days of keynote talks and breakout sessions related to open source databases and software. Learn about the hottest topics, building and maintaining high-performing deployments and what top industry leaders have to say.

The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Register for Percona Live 2017 now! Super Saver registration lasts until Nov 13. This year’s Percona Live Europe sold out, and we’re looking to do the same for Percona Live 2017. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.

Percona Live 2017 sponsorship opportunities are available now. Click here to find out how to sponsor.

Categories: MySQL

Updating Percona XtraDB Cluster from 5.6.24-72.2 to 5.6.32-25.17

MySQL Performance Blog - Mon, 2016-11-07 19:35

This blog describes how to upgrade Percona XtraDB Cluster in place from 5.6.24-72.2 to 5.6.32-25.17.

This very hands-on blog is the result of some questions such as “can I perform an in-place upgrade for Percona XtraDB Cluster” coming in. We have done these minor upgrades for Percona Managed Services customers running Percona XtraDB Cluster with lots of nodes, and I think it’s feasible to smoothly do it – if we pay special attention to some specific points I’ll call out. The main concern you should have is that if you have a big dataset, you should avoid SST (which consumes a lot of time if a node rebuild is needed).

Make sure you have all the steps very clear in order to avoid spending too much time when updating packages. The crucial point is Galera’s API GCache size. If you’re executing this when part of the cluster is online, and writes cannot be avoided, check first if the current configuration for the GCache can avoid nodes being written to SST while shutting down Percona Server on each of the nodes, updating packages and finally getting Percona Server back up online again.

A blog post written by Miguel Angel Nieto provides instructions on how to check the GCache file’s size and make sure it’s covering all the transactions for the time you need to take the node out. After increasing the size of the GCache, if the new node finds all the missing transactions on the donor’s GCache, it goes to IST. If not, it will need to use SST.

You can read more about the difference between IST and SST in the Galera API documentation.

Little less talk, little more action…

At this point, we need to update the packages one cluster node at a time. The cluster needs to stay up. I’m going to use a cluster with three nodes. Node 01 is dedicated to writes, while nodes 02 and 03 are dedicated to scaling the cluster’s reads (all are running 5.6.24-72.2). Just for the reference, it’s running on CentOS 6.5, and I’m going to use yum, but you can convert that to any other package manager depending on the Linux distort you’re running. This is the list of nodes and the packages we need to update:

#: servers are like below (writes) node01::, Server version: 5.6.24-72.2 Percona XtraDB Cluster (GPL) (reads) node02::, Server version: 5.6.24-72.2 Percona XtraDB Cluster (GPL) (reads) node03::, Server version: 5.6.24-72.2 Percona XtraDB Cluster (GPL) #: packages currently installed [vagrant@node02 ~]$ sudo rpm -qa | grep Percona Percona-XtraDB-Cluster-client-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-server-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-galera-3-3.15-1.rhel6.x86_64 Percona-XtraDB-Cluster-shared-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-devel-56-5.6.24-72.2.el6.x86_64
Before updating the packages above, make sure you update the XtraBackup package in case you have configured the variable wsrep_sst_method as xtrabackup-v2, this avoids the error below:

WSREP_SST: [ERROR] FATAL: The innobackupex version is 2.3.4. Needs xtrabackup-2.3.5 or higher to perform SST (2016102620:47:15.307) 2016-10-26 20:47:15 5227 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '5227'  ''
So, on all three nodes, update percona-xtrabackup to make sure we’re running the latest version:

[root@node02 vagrant]# yum update percona-xtrabackup Loaded plugins: fastestmirror, versionlock Determining fastest mirrors ... --> Running transaction check ---> Package percona-xtrabackup.x86_64 0:2.3.4-1.el6 will be updated ---> Package percona-xtrabackup.x86_64 0:2.3.5-1.el6 will be an update
With that, take out of the cluster one node at a time, update all old binaries using yum update and start mysqld back up online. You don’t need to run mysql_upgrade in this case. When you start mysqld with the newer binaries in place, depending on the size of configured cache, it’s going to perform either an IST or SST.

As you’re going to take the node out of rotation and out of the cluster, you don’t need to worry about configuring it as read_only. If you can do that in a maintenance window, where no one is writing data to the main node, it’s the best scenario. You won’t need to worry about SST, as in most cases the dataset is too big (TB++) and the SST time can be some hours (an overnight streaming in my experience).

Let’s take out node02 and update the packages: #: let's take out node02 to update packages [vagrant@node02 ~]$ sudo /etc/init.d/mysql stop Shutting down MySQL (Percona XtraDB Cluster).... SUCCESS! [vagrant@node02 ~]$ sudo yum update Percona-XtraDB-Cluster-client-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-server-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-galera-3-3.15-1.rhel6.x86_64 Percona-XtraDB-Cluster-shared-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-devel-56-5.6.24-72.2.el6.x86_64 ... Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package Percona-XtraDB-Cluster-client-56.x86_64 1:5.6.24-72.2.el6 will be updated ---> Package Percona-XtraDB-Cluster-client-56.x86_64 1:5.6.32-25.17.1.el6 will be an update ---> Package Percona-XtraDB-Cluster-devel-56.x86_64 1:5.6.24-72.2.el6 will be updated ---> Package Percona-XtraDB-Cluster-devel-56.x86_64 1:5.6.32-25.17.1.el6 will be an update ---> Package Percona-XtraDB-Cluster-galera-3.x86_64 0:3.15-1.rhel6 will be updated ---> Package Percona-XtraDB-Cluster-galera-3.x86_64 0:3.17-1.rhel6 will be an update ---> Package Percona-XtraDB-Cluster-server-56.x86_64 1:5.6.24-72.2.el6 will be updated ---> Package Percona-XtraDB-Cluster-server-56.x86_64 1:5.6.32-25.17.1.el6 will be an update ---> Package Percona-XtraDB-Cluster-shared-56.x86_64 1:5.6.24-72.2.el6 will be updated ---> Package Percona-XtraDB-Cluster-shared-56.x86_64 1:5.6.32-25.17.1.el6 will be an update #: new packages in place after yum update - here, make sure you run yum clean all before yum update [root@node02 ~]# rpm -qa | grep Percona Percona-XtraDB-Cluster-shared-56-5.6.32-25.17.1.el6.x86_64 Percona-XtraDB-Cluster-galera-3-3.17-1.rhel6.x86_64 Percona-XtraDB-Cluster-devel-56-5.6.32-25.17.1.el6.x86_64 Percona-XtraDB-Cluster-client-56-5.6.32-25.17.1.el6.x86_64 Percona-XtraDB-Cluster-server-56-5.6.32-25.17.1.el6.x86_64
Now start node02, knowing that it’s going to join the cluster, but with updated packages:

[root@node02 vagrant]# /etc/init.d/mysql start Starting MySQL (Percona XtraDB Cluster)...State transfer in progress, setting sleep higher .. SUCCESS! #: here you can see that the state transfer was required due to different states from cluster and current node #: this is gonna test the wsrep_sst_method to make sure it’s working well after updating percona-xtrabackup #: to latest version available 2016-10-26 21:51:38 3426 [Note] WSREP: State transfer required:  Group state: 63788863-1f8c-11e6-a8cc-12f338870ac3:52613  Local state: 63788863-1f8c-11e6-a8cc-12f338870ac3:52611 2016-10-26 21:51:38 3426 [Note] WSREP: New cluster view: global state: 63788863-1f8c-11e6-a8cc-12f338870ac3:52613, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3 2016-10-26 21:51:38 3426 [Warning] WSREP: Gap in state sequence. Need state transfer. 2016-10-26 21:51:38 3426 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '3426'  '' ' WSREP_SST: [INFO] Streaming with xbstream (20161026 21:51:39.023) WSREP_SST: [INFO] Using socat as streamer (20161026 21:51:39.025) WSREP_SST: [INFO] Evaluating timeout -s9 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} )(20161026 21:51:39.100) 2016-10-26 21:51:39 3426 [Note] WSREP: Prepared SST request: xtrabackup-v2| ... 2016-10-26 21:51:39 3426 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 52613) 2016-10-26 21:51:39 3426 [Note] WSREP: Requesting state transfer: success, donor: 1 WSREP_SST: [INFO] Proceeding with SST (20161026 21:51:39.871) WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (2016102621:51:39.873) WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20161026 21:51:39.876) ... WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20161026 21:51:55.826) WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf  --defaults-group=mysqld --no-version-check  --datadir=/var/lib/mysql/ --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (2016102621:51:55.829) WSREP_SST: [INFO] Move successful, removing /var/lib/mysql//.sst (20161026 21:51:55.859) ... Version: '5.6.32-78.1-56'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel78.1,Revision 979409a, WSREP version 25.17, wsrep_25.17 2016-10-26 21:51:56 3426 [Note] WSREP: 0.0 (pxc01): State transfer from 1.0 (pxc01) complete. 2016-10-26 21:51:56 3426 [Note] WSREP: Shifting JOINER -> JOINED (TO: 52613) 2016-10-26 21:51:56 3426 [Note] WSREP: Member 0.0 (pxc01) synced with group. 2016-10-26 21:51:56 3426 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 52613) 2016-10-26 21:51:56 3426 [Note] WSREP: Synchronized with group, ready for connections 2016-10-26 21:51:56 3426 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
As you can see above, node02 is back in the cluster. Additionally, it’s important to see that both the Percona-Server packages and the Galera API packages were updated. When the node is up and part of the cluster, you should see a new API version in the output of a SHOW GLOBAL STATUS LIKE ‘wsrep%’ command:

#: node02, the one we just updated [root@node02 mysql]# mysql -e "show global status like 'wsrep_provider_version'G" *************************** 1. row *************************** Variable_name: wsrep_provider_version         Value: 3.17(r447d194) #: node01 not updated yet [root@node01 mysql]# mysql -e "show global status like 'wsrep_provider_version'G" *************************** 1. row *************************** Variable_name: wsrep_provider_version         Value: 3.15(r5c765eb)

Summarizing the procedure until now, the cluster packages update plan is:
  1. Take nodes out of rotation one at a time
  2. Shutdown mysqld on each node in order
  3. Update the below packages (or the ones corresponding to what you’re running):

[vagrant@node02 ~]$ sudo rpm -qa | grep Percona Percona-XtraDB-Cluster-client-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-server-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-galera-3-3.15-1.rhel6.x86_64 Percona-XtraDB-Cluster-shared-56-5.6.24-72.2.el6.x86_64 Percona-XtraDB-Cluster-devel-56-5.6.24-72.2.el6.x86_64

  1. Update percona-xtrabackup on all the cluster’s nodes to avoid issues (as explained above):

WSREP_SST: [ERROR] FATAL: The innobackupex version is 2.3.4. Needs xtrabackup-2.3.5 or higher to perform SST (2016102620:47:15.307) ... [root@node01 ~]# yum update percona-xtrabackup ... [root@node02 ~]# xtrabackup --version xtrabackup version 2.3.5 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 45cda89)

  1. Start mysqld back online to grab the cluster’s current state

After finishing up with each node’s packages update, make sure you check the main node to see if they have joined the cluster. On node01, you can enter the below query to return the main status variables. This checks the current status of node01 and the cluster size:

mysql> SELECT @@HOSTNAME AS HOST, NOW() AS `DATE`, VARIABLE_NAME,VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_cluster_state_uuid','wsrep_cluster_conf_id','wsrep_cluster_size','wsrep_cluster_status','wsrep_local_state_comment')G *************************** 1. row *************************** HOST: node01 DATE: 2016-10-27 18:14:42 VARIABLE_NAME: WSREP_LOCAL_STATE_COMMENT VARIABLE_VALUE: Synced *************************** 2. row *************************** HOST: node01 DATE: 2016-10-27 18:14:42 VARIABLE_NAME: WSREP_CLUSTER_CONF_ID VARIABLE_VALUE: 10 *************************** 3. row *************************** HOST: node01 DATE: 2016-10-27 18:14:42 VARIABLE_NAME: WSREP_CLUSTER_SIZE VARIABLE_VALUE: 3 *************************** 4. row *************************** HOST: node01 DATE: 2016-10-27 18:14:42 VARIABLE_NAME: WSREP_CLUSTER_STATE_UUID VARIABLE_VALUE: 1e0b9725-9c5e-11e6-886d-7708872d6aa5 *************************** 5. row *************************** HOST: node01 DATE: 2016-10-27 18:14:42 VARIABLE_NAME: WSREP_CLUSTER_STATUS VARIABLE_VALUE: Primary 5 rows in set (0.00 sec)

Check the other nodes as well:

#: node02 [root@node02 mysql]# mysql -e "show global status like 'wsrep_local_state%'G" *************************** 1. row *************************** Variable_name: wsrep_local_state_uuid Value: 1e0b9725-9c5e-11e6-886d-7708872d6aa5 *************************** 2. row *************************** Variable_name: wsrep_local_state Value: 4 *************************** 3. row *************************** Variable_name: wsrep_local_state_comment Value: Synced #: node03 [root@node03 ~]# mysql -e "show global status like 'wsrep_local_state%'G" *************************** 1. row *************************** Variable_name: wsrep_local_state_uuid Value: 1e0b9725-9c5e-11e6-886d-7708872d6aa5 *************************** 2. row *************************** Variable_name: wsrep_local_state Value: 4 *************************** 3. row *************************** Variable_name: wsrep_local_state_comment Value: Synced


Categories: MySQL

Forestry, a Static Site CMS

Xaprb, home of innotop - Sat, 2016-11-05 00:33

Forestry is a content management system that runs in your browser. On the backend, it integrates with a static site generator such as Hugo, letting you edit your static website in your browser just like any other content management system. I am writing this blog post with it on my iPad, using voice dictation.

Setup was very easy, and although I had a slight hiccup importing my existing site, which is large, their support team was responsive and got that taken care of quickly.

Editing appears to be very straightforward so far, and I do not see any problems that seem to be showstoppers. I am seeing some unwanted newlines and capitalization behavior in the markdown editor, but that doesn’t bother me too much.

Uploading an image may prove to be slightly challenging. It feels 99% functional, but the last 1% is confusing. I am not sure where the image I uploaded has gone, and when I publish this I am not sure if it will be broken.

It is worth mentioning that the location where the source code for my site is hosted has not changed. All I did was authorize Forestry to access and change that source code.

I will not exercise the full set of capabilities of this product. I host my site using an amazing static site hosting service called Netlify. They take care of watching for changes in the repository, fetching the changes, building the site from that source, and publishing that on their content delivery network. If I were not using this service, I think Forestry could cover some of those needs.

Pic credit

Categories: MySQL

Forestry, a Static Site CMS

Xaprb, home of innotop - Sat, 2016-11-05 00:33

Forestry is a content management system that runs in your browser. On the backend, it integrates with a static site generator such as Hugo, letting you edit your static website in your browser just like any other content management system. I am writing this blog post with it on my iPad, using voice dictation.

Setup was very easy, and although I had a slight hiccup importing my existing site, which is large, their support team was responsive and got that taken care of quickly.

Editing appears to be very straightforward so far, and I do not see any problems that seem to be showstoppers. I am seeing some unwanted newlines and capitalization behavior in the markdown editor, but that doesn’t bother me too much.

Uploading an image may prove to be slightly challenging. It feels 99% functional, but the last 1% is confusing. I am not sure where the image I uploaded has gone, and when I publish this I am not sure if it will be broken.

It is worth mentioning that the location where the source code for my site is hosted has not changed. All I did was authorize Forestry to access and change that source code.

I will not exercise the full set of capabilities of this product. I host my site using an amazing static site hosting service called Netlify. They take care of watching for changes in the repository, fetching the changes, building the site from that source, and publishing that on their content delivery network. If I were not using this service, I think Forestry could cover some of those needs.

Pic credit

Categories: MySQL

Changing the Tablespace Directory with pt-online-schema-change

MySQL Performance Blog - Fri, 2016-11-04 19:38

In this blog, we’ll discuss changing the tablespace directory using pt-online-schema-change.

One of the most annoying situations in the life of a DBA is realizing that the disk where the datadir resides is running out of space. If you’re lucky enough to run over an LVM volume or a RAID (depending on the level, though), it is easy to add disk space. But what if you are not that lucky, and your datadir is running on a single disk? Not so funny!

That is the exact situation we recently faced with a customer, for both the master and slave server. When trying to figure out a solution we saw that:

  • There was enough space on a different partition within the same server.
  • The tables have their own tablespace (innodb_file_per_table = on)
  • The MySQL version was 5.6.

We proceed to move some of the tables to the other partition to make room in the datadir, by using the tablespace placing feature:

One note before we continue: if you are using a version equal or lower than 5.6.29, and innodb_flush_method = O_DIRECT, there’s a bug that the CREATE TABLE....DATA DIRECTORY = '/another/directory/' won’t work. See: This was fixed on 5.6.30.

In the slave, we were able to stop the replication and move the tables. A problem occurred when we wanted to do the same on the master, since no downtime was allowed.

This is where pt-online-schema-change came to the rescue!

We could use pt-osc to do the table placing without downtime, but there’s a catch: pt-osc only works when what you want to do is possible by using an ALTER TABLE statement, and in order to use the CREATE TABLE....DATA DIRECTORY = '/another/directory'  you need to use a CREATE TABLE statement.

What to do, then? Add a new feature to pt-online-schema-change: --data-dir="/new/directory" 

With the help of the main developer of the Percona Toolkit, Carlos Salguero, adding this new feature was possible in record time. Now moving the tablespace to another place without downtime is possible.

The new feature will be available with version 2.2.20 of Percona Toolkit, but until the release the code is available at the GitHub repository:

Moving the table is just a matter of executing pt-online-schema-change  --data-dir="/new/datadir" --execute

Let’s see an example. The following table resides in the default datadir:

mysql> show create table sbtest5; *************************** 1. row *************************** Table: sbtest5 Create Table: CREATE TABLE `sbtest5` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_5` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000 1 row in set (0.00 sec)

Now, let’s move it to the directory /opt/datadir, which owner is the MySQL user:

[root@ps56-1 percona]# pt-online-schema-change --data-dir="/opt/datadir" --execute D=percona,t=sbtest5 No slaves found. See --recursion-method if host ps56-1 has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `percona`.`sbtest5`... Creating new table... Created new table percona._sbtest5_new OK. 2016-11-01T19:22:27 Creating triggers... 2016-11-01T19:22:27 Created triggers OK. 2016-11-01T19:22:27 Copying approximately 1000 rows... 2016-11-01T19:22:27 Copied rows OK. 2016-11-01T19:22:27 Analyzing new table... 2016-11-01T19:22:27 Swapping tables... 2016-11-01T19:22:28 Swapped original and new tables OK. 2016-11-01T19:22:28 Dropping old table... 2016-11-01T19:22:28 Dropped old table `percona`.`_sbtest5_old` OK. 2016-11-01T19:22:28 Dropping triggers... 2016-11-01T19:22:28 Dropped triggers OK. Successfully altered `percona`.`sbtest5`.

Okay, all good. Let’s see the new table definition:

mysql> show create table sbtest5; *************************** 1. row *************************** Table: sbtest5 Create Table: CREATE TABLE `sbtest5` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_5` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000 DATA DIRECTORY='/opt/datadir/'

DATA DIRECTORY='/opt/datadir/' is in the right place!

Categories: MySQL

Orchestrator: Moving VIPs During Failover

MySQL Performance Blog - Thu, 2016-11-03 23:00

In this post, I’ll discuss how to moving VIPs during a failover using Orchestrator.

In our previous post, we showed you how Orchestrator works. In this post, I am going to give you a proof-of-concept on how Orchestrator can move VIPs in case of failover. For this post, I’m assuming the Orchestrator is already installed and able to manage the topology.


Orchestrator is a topology manager. Nothing less nothing more. In the case of failover, it will reorganize the topology, promote a new master and connect the slaves to it. But it won’t do any DNS changes, and it won’t move VIPs (or anything else).

However, Orchestrator supports hooks. Hooks are external scripts that can be invoked through the recovery process. There are six different hooks:

  • OnFailureDetectionProcesses
  • PreFailoverProcesses
  • PostIntermediateMasterFailoverProcesses
  • PostMasterFailoverProcesses
  • PostFailoverProcesses
  • PostUnsuccessfulFailoverProcesses

More details are in the Orchestrator manual.

With these hooks, we can call our own external scripts, which fit in our architecture and can make modifications or let the application knows who is the new master.

There are different ways to do this:

  • Updating a CNAME: if a CNAME is pointing to the master, an external script can easily do a DNS update after failover.
  • Moving a VIP to the new master: this solution is similar to a MHA and MHA-helper script (this post will discuss this solution).

When Orchestrator calls an external script, it can also use parameters. Here is an example using the parameters available with “PostFailoverProcesses”:

{failureType}, {failureDescription}, {failedHost}, {failureCluster}, {failureClusterAlias}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countSlaves}, {slaveHosts}, {isDowntimed}, {isSuccessful}, {lostSlaves}

Without these parameters, we wouldn’t know who the new master is and which host died.

Moving VIPs

As I already mentioned, in this post I am going to show you how can you move VIPs with Orchestrator. I think many people are familiar with MHA. This solution is a bit similar to what MHA and MHA-helper does.

The main requirement is, at the same time, the main disadvantage. This solution requires SSH access from the Orchestrator node to the MySQL servers.

Adding User

First we have to add a user on the MySQL servers and Orchestrator node (you can change the username):

useradd -m orchuser -s /bin/bash

Adding sudo permissions:

vi /etc/sudoers.d/orch Defaults !requiretty orchuser ALL=(ALL) NOPASSWD: /usr/sbin/arping,/sbin/ip,/bin/ping

We have to add the public key from the Orchestrator node on the MySQL servers to the “/home/orchuser/.ssh/authorized_keys” file.

Now we can SSH from the Orchestrator server to the others without a password:

ssh orchuser@

Failover Script

Now we need a failover script. I wrote two small bash scripts that can do it for us.

The first one called Orchestrator calls this script like so:

vi /etc/orchestrator.conf.json ... "PostFailoverProcesses": [ "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log", "/usr/local/bin/ {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log" ], ...

Because Orchestrator can handle multiple clusters, we have to define some cluster parameters:

vi /usr/local/bin/ ... rep=( eth0 "" orchuser )

Where “rep” is the name of the cluster, “eth0” is the name of the interface where the VIP should be added, “” is the VIP on this cluster and “orchuser” is the SSH user. If we have multiple clusters, we have to add more arrays like this with the cluster details.

Orchestrator executes this script with parameters:

/usr/local/bin/ DeadMaster rep mysql1 mysql2

After the script recognized the cluster, it calls the next script.

The next script is named This is called by “” and it is going to move the VIP to the new master. It is executed like this:

/usr/local/bin/ -d 1 -n mysql2 -i eth0 -I -u orchuser -o mysql1

  • -d 1 the master is dead
  • -n mysql2is the new master
  • -i eth0 the network interface
  • -I is the VIP
  • -u orchuser is the SSH user
  • -o mysql1 is the old master

The script requires the “arping” and “mail” commands.


With these two small scripts, Orchestrator is able to move VIPs from the master to the new master, and the application can work again. However this script is not production ready, and there could be cases that it cannot handle. You can test it, but use it at your own risk.

I would appreciate any comments or pull requests so we can make it better. But stay tuned: in my next blog post I am going to show you how Orchestrator can work with “ProxySQL.”

Categories: MySQL

Percona responds to CVE-2016-6663 and CVE-2016-6664

MySQL Performance Blog - Wed, 2016-11-02 17:21

Percona has addressed CVE-2016-6663 and CVE-2016-6664 in releases of Percona Server for MySQL and Percona XtraDB Cluster.

Percona is happy to announce that the following vulnerabilities are fixed in current releases of Percona Server for MySQL and Percona XtraDB Cluster:

  • CVE-2016-6663: allows a local system user with access to the affected database in the context of a low-privileged account (CREATE/INSERT/SELECT grants) to escalate their privileges and execute arbitrary code as the database system user (typically “mysql”).
  • CVE-2016-6664: can let attackers who have gained access to mysql system user to further escalate their privileges to root user allowing them to fully compromise the system.

Users should upgrade to their relevant incremental release.

Percona Server

Percona XtraDB Cluster

Users should update as soon as is practical to ensure protection from these vulnerabilities.

Percona would like to thank Dawid Golunski ( for disclosing this issue.

Categories: MySQL

Webinar Thursday November 3: The MySQL Ecosystem in 2016

MySQL Performance Blog - Wed, 2016-11-02 16:06

Join Percona’s Chief Evangelist Colin Charles on Thursday, November 3, 2016, at 10 am PDT/ 1:00pm EDT (UTC-7) as he presents “The MySQL Ecosystem in 2016.”

MySQL is a unique adult (now 21 years old) in many ways. It supports plugins. It supports storage engines. It is also owned by Oracle, thus birthing two branches of the popular opensource database: Percona Server and MariaDB Server. It also once spawned a fork: Drizzle. Lately, a consortium of web scale users (think a chunk of the top ten sites out there) have spawned WebScaleSQL.

You’re a busy DBA having to maintain this mix of technologies. Or you’re a CIO planning to choose one branch. How do you go about picking? Supporting multiple databases? Find out more in this talk. We will also take a deep-dive into what feature differences exist between MySQL/Percona Server/MariaDB/WebScaleSQL, and how distributions package the various databases differently. Within the hour, we’ll inform you about the past, the present and hopefully make you knowledgeable enough to know what to pick in the future.

Note, there will also be coverage of the various trees around WebScaleSQL, like the Facebook tree, the Alibaba tree as well as the Twitter tree.

Register for The MySQL Ecosystem in 2016 webinar here.

Colin Charles, Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, and worked at MySQL since 2005, Colin has been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and projects. He’s well known within many open source communities, and has spoken on the conference circuit.

Categories: MySQL
Syndicate content