How to Manually Build Percona Server for MySQL RPM Packages

MySQL Performance Blog - Fri, 2017-01-20 17:58

In this blog, we’ll look at how to manually build Percona Server for MySQL RPM packages.

Several customers and other people from the open source community have asked us how they could make their own Percona Server for MySQL RPM binaries from scratch.

This request is often made by companies that want to add custom patches to our release. To do this, you need to make some modifications to the percona-server.spec file in the source tree, and some preparation is necessary.

This post covers how you can make your own RPMs from GIT or source tarball so that you can build RPMs from your own modified branch, or by applying patches. In this example, we’ll build Percona Server 5.7.16-10.

Making your own RPMs is not a recommended practice, and should rarely be necessary.

Prepare the Source Using GIT Repository

We can fetch percona/percona-server from GitHub (or your own fork). As we build Percona Server 5.7.16-10, we create a new branch based on the tag of that version:

$ git clone Cloning into 'percona-server'... remote: Counting objects: 1216597, done. remote: Compressing objects: 100% (4/4), done. remote: Total 1216597 (delta 1), reused 1 (delta 1), pack-reused 1216592 Receiving objects: 100% (1216597/1216597), 510.50 MiB | 5.94 MiB/s, done. Resolving deltas: 100% (997331/997331), done. Checking out files: 100% (28925/28925), done. $ cd percona-server/ $ git checkout -b build-Percona-Server-5.7.16-10 Percona-Server-5.7.16-10 Switched to a new branch 'build-Percona-Server-5.7.16-10' $ git submodule init Submodule 'Percona-TokuBackup' ( registered for path 'plugin/tokudb-backup-plugin/Percona-TokuBackup' Submodule 'storage/rocksdb/rocksdb' ( registered for path 'storage/rocksdb/rocksdb' Submodule 'PerconaFT' ( registered for path 'storage/tokudb/PerconaFT' $ git submodule update Cloning into 'plugin/tokudb-backup-plugin/Percona-TokuBackup'... ... Submodule path 'plugin/tokudb-backup-plugin/Percona-TokuBackup': checked out '1b9bb16ad74588601d8fefe46c74cc1dac1dd1d5' Cloning into 'storage/rocksdb/rocksdb'... ... Submodule path 'storage/rocksdb/rocksdb': checked out '6a17b07ca856e573fabd6345d70787d4e481f57b' Cloning into 'storage/tokudb/PerconaFT'... ... Submodule path 'storage/tokudb/PerconaFT': checked out '0c1b53909bc62a4d9b420116ec8833c78c0c6e8e'

Downloading Source Tarball

An alternative way is to download the source tarball, which you can find at

Extract the source tarball, as the RPM spec file is located there:

$ wget $ tar -xzvf percona-server-5.7.16-10.tar.gz $ cd percona-server-5.7.16-10

Making Changes with Patch Files

If you need to make any changes to the source code, you can either use your own GitHub fork or you can apply patches. If you use the former, then you can skip this section.

Why Patches?

Why would we want to use patch files? Because you won’t need to maintain your own fork. You can just build the RPM with the Percona Server source tarball and the patch file.

Create Patch Files

If you do not want to use your own fork in GitHub, you can also create some patch files and modify the RPM spec file to include them.

  1. Create your changes to the source files
  2. Use diff to create the patch file:
    $ diff -ru FILE-orig FILE >| ~/custom.patch
Add Patch to RPM Spec File

In order for the patch to be applied during the building of the RPM, edit the ./build-ps/percona-server.spec file and add the two lines that are highlighted:

... Source91: Patch0: mysql-5.7-sharedlib-rename.patch Patch1: custom.patch BuildRequires: cmake >= 2.8.2 ... %prep %setup -q -T -a 0 -a 10 -c -n %{src_dir} pushd %{src_dir} %patch0 -p1 %patch1 -p1 %build ...

Note that you have to number the patches, in this case I gave it the name patch1.

Creating New Source Tarball

If you use your own GitHub fork, or you made manual changes to the source (and you’re not using patch files), you should use that to create your own source tarball.

First, change the Percona Server version number. In this case, we are naming it 10custom to indicate this is not a standard build. You can adapt as you wish, just make sure the VERSION file looks something like this:


Then make the source tarball:

$ cmake . -DDOWNLOAD_BOOST=1 -DWITH_BOOST=build-ps/boost $ make dist ... -- Source package ~/percona-server/percona-server-5.7.16-10custom.tar.gz created Built target dist

Now you have the tarball in your source directory, but we won’t use it yet. We need to add some TokuDB submodules to it first. The make dist also kept the uncompressed directory, which we will use to create the tarball again when the TokuDB parts are included:

$ rm percona-server-5.7.16-10custom.tar.gz $ cp -R storage/tokudb/PerconaFT/* percona-server-5.7.16-10custom/storage/tokudb/PerconaFT/ $ cp -R plugin/tokudb-backup-plugin/Percona-TokuBackup/* percona-server-5.7.16-10custom/plugin/tokudb-backup-plugin/Percona-TokuBackup/ $ tar --owner=0 --group=0 --exclude=.git -czf percona-server-5.7.16-10custom.tar.gz percona-server-5.7.16-10custom

Preparing Build Environment Environment Requirements

Make sure the build host has at least 10GB of free space and at least 4GB of RAM, or the build will fail at some point.

Install Dependencies

To build the RPM, we need to prepare our build environment and ensure the necessary build dependencies are installed:

$ sudo yum install epel-release $ sudo yum install git gcc gcc-c++ openssl check cmake bison boost-devel asio-devel libaio-devel ncurses-devel readline-devel pam-devel wget perl-Env time numactl-devel rpmdevtools rpm-build

Prepare RPM Build Tree

Next we need to prepare our build directory structure, in this case we will install it in ~/rpmbuild:

$ cd ~/ $ rpmdev-setuptree

Download Boost Source

We also need to download the boost source ( and put it in the ~/rpmbuild/SOURCES/ directory:

$ cd ~/rpmbuild/SOURCES $ wget

Move Files to the RPM Build Tree

Copy the source tarball:

$ cp percona-server-5.7.16-10custom.tar.gz ~/rpmbuild/SOURCES/

Also copy the ./build-ps/rpm/* files:

$ cp ~/percona-server/build-ps/rpm/* ~/rpmbuild/SOURCES

If you have any patch files, move them to the ~/rpmbuild/SOURCES/ directory as well:

$ cp ~/custom.patch ~/rpmbuild/SOURCES/

Then the percona-server.spec file goes into the ~/rpmbuild/SPECS directory:

$ cp ~/percona-server/build-ps/percona-server.spec ~/rpmbuild/SPECS/

Setting Correct Versions in the Spec File

$ cd ~/rpmbuild/SPECS/ $ sed -i s:@@MYSQL_VERSION@@:5.7.16:g percona-server.spec $ sed -i s:@@PERCONA_VERSION@@:10custom:g percona-server.spec $ sed -i s:@@REVISION@@:f31222d:g percona-server.spec $ sed -i s:@@RPM_RELEASE@@:1:g percona-server.spec $ sed -i s:@@BOOST_PACKAGE_NAME@@:boost_1_59_0:g percona-server.spec

Note the @@PERCONA_VERSION@@ contains our 10custom version number.

You can add your changelog information in the %changelog section.

Building RPM Packages

For Percona Server, making the RPMs is a two-step process. First, we need to make the SRPMS:

$ cd ~/ $ rpmbuild -bs --define 'dist .generic' rpmbuild/SPECS/percona-server.spec Wrote: /home/vagrant/rpmbuild/SRPMS/Percona-Server-57-5.7.16-10custom.1.generic.src.rpm

And then we can build the binary RPMs from the SRPMS:

$ rpmbuild --define 'dist .el7' --rebuild rpmbuild/SRPMS/Percona-Server-57-5.7.16-10custom.1.generic.src.rpm ... Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-server-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-client-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-test-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-devel-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-shared-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-shared-compat-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-tokudb-57-5.7.16-10custom.1.el7.x86_64.rpm Wrote: /home/vagrant/rpmbuild/RPMS/x86_64/Percona-Server-57-debuginfo-5.7.16-10custom.1.el7.x86_64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.xmcI8e + umask 022 + cd /home/vagrant/rpmbuild/BUILD + cd percona-server-5.7.16-10custom + /usr/bin/rm -rf /home/vagrant/rpmbuild/BUILDROOT/Percona-Server-57-5.7.16-10custom.1.generic.x86_64 + exit 0 Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.UXUTmh + umask 022 + cd /home/vagrant/rpmbuild/BUILD + rm -rf percona-server-5.7.16-10custom + exit 0

Once the build is done, you can find the RPMs in the RPMS directory:

$ ls -1 ~/rpmbuild/RPMS/x86_64/ Percona-Server-57-debuginfo-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-client-57-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-devel-57-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-server-57-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-shared-57-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-shared-compat-57-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-test-57-5.7.16-10custom.1.el7.x86_64.rpm Percona-Server-tokudb-57-5.7.16-10custom.1.el7.x86_64.rpm

Categories: MySQL

Setup ProxySQL for High Availability (not a Single Point of Failure)

MySQL Performance Blog - Thu, 2017-01-19 22:05

In this blog post, we’ll look at how to set up ProxySQL for high availability.

During the last few months, we’ve had a lot of opportunities to present and discuss a very powerful tool that will become more and more used in the architectures supporting MySQL: ProxySQL.

ProxySQL is becoming more flexible, solid, performant and used every day ( and recent You can use ProxySQL for high availability.

The tool is a winner when compared to similar ones, and we should all have a clear(er) idea of how to integrate it in our architectures in order to achieve the best results.

The first thing to keep in mind is that ProxySQL doesn’t natively support any high availability solution. We can setup a cluster of MySQL(s) and achieve four or even five nines of HA. But if we include ProxySQL as it is, and as a single block, our HA has a single point of failure (SPOF) that can drag us down in the case of a crash.

To solve this, the most common solution is setting up ProxySQL as part of a tile architecture, where Application/ProxySQL are deployed together.

This is a good solution for some cases, and it for sure reduces the network hops, but it might be less than practical when our architecture has a very high number of tiles (hundreds of application servers, which is not so unusual nowadays).
In that case, managing ProxySQL is challenging. But more problematically, ProxySQL must perform several checks on the destination servers (MySQL). If we have 400 instances of ProxySQL, we end up keeping our databases busy just performing the checks.

In short, this is not a smart move.

Another possible approach is to have two layers of ProxySQL, one close to the application and another in the middle to connect to the database.

I personally don’t like this approach for many reasons. The most relevant reasons are that this approach creates additional complexity in the management of the platform, and it adds network hops.

So what can be done?

I love the KISS principle because I am lazy and don’t want to reinvent a wheel someone else has already invented. I also like it when my customers don’t need to depend on me or any other colleagues after I am done and gone. They must be able to manage, understand and fix their environment by themselves.

To keep things simple, here is my checklist:

  • Exclude the cases where a tile (application/ProxySQL) makes sense
  • Exclude the “in the cloud” cases where tools like ELB (Elastic Load Balancer) exist
  • Exclude architecture that already includes a balancer

What can I use for the remaining cases?

The answer comes with combining existing solutions and existing blocks: KeepAlived + ProxySQl + MySQL.

For an explanation of KeepAlived, visit

Short description
“Keepalived is a routing software written in C. The main goal of this project is to provide simple and robust facilities for load balancing and high-availability to Linux system and Linux-based infrastructures. The load balancing framework relies on well-known and widely used Linux Virtual Server (IPVS) kernel module providing Layer4 load balancing. Keepalived implements a set of checkers to dynamically and adaptively maintain and manage load-balanced server pool according to their health. On the other hand, high-availability is achieved by VRRP protocol. VRRP is a fundamental brick for router failover. Also, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. Keepalived frameworks can be used independently or all together to provide resilient infrastructures.”

Bingo! This is exactly what we need for our ProxySQL setup.

Below, I will explain how to set up:

  • A simple solution based on a single VIP
  • A more complex solution using multiple VIPs
  • An even more complex solution using virtual VIPs and virtual servers

All we want to do is to prevent ProxySQL from becoming a SPOF. And while doing that, we need to reduce network hops as much as possible (keeping the solution SIMPLE).

Another important concept to keep in mind is that ProxySQL (re)start takes place in less than a second. This means that if it crashes, assuming it can be restarted by the angel process, doing that is much more efficient than any kind of failover mechanism. As such, your solution plan should keep in mind the ~1 second time of ProxySQL restart as a baseline.

Ready? Let’s go.


Choose three machines to host the combination of Keepalived and ProxySQL.

In the following example, I will use three machines for ProxySQL and Keepalived, and three hosting Percona XtraDB Cluster. You can have the Keepalived+ProxySQL whenever you like (even on the same Percona XtraDB Cluster box).

For the following examples, we will have:

PXC node1 galera1h1n5 node2 galera2h2n21 node3 galera1h3n31   ProxySQL-Keepalived test1 test2 test3   VIP /89/90  

To check, I will use this table (please create it in your MySQL server):

DROP TABLE test.`testtable2`; CREATE TABLE test.`testtable2` ( `autoInc` bigint(11) NOT NULL AUTO_INCREMENT, `a` varchar(100) COLLATE utf8_bin NOT NULL, `b` varchar(100) COLLATE utf8_bin NOT NULL, `host` varchar(100) COLLATE utf8_bin NOT NULL, `userhost` varchar(100) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`autoInc`) ) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;

And this bash TEST command to use later:

while [ 1 ];do export mydate=$(date +'%Y-%m-%d %H:%M:%S.%6N'); mysql --defaults-extra-file=./my.cnf -h -P 3311 --skip-column-names -b -e "BEGIN;set @userHost='a';select concat(user,'_', host) into @userHost from information_schema.processlist where user = 'load_RW' limit 1;insert into test.testtable2 values(NULL,'$mydate',SYSDATE(6),@@hostname,@userHost);commit;select * from test.testtable2 order by 1 DESC limit 1" ; sleep 1;done
  1. Install ProxySQL (refer to
  2. Install Keepalived (yum install keepalived; apt-get install keepalived)
  3. Setup ProxySQL users and servers

Once you have your ProxySQL up (run the same configuration on all ProxySQL nodes, it is much simpler), connect to the Admin interface and execute the following:

DELETE FROM mysql_replication_hostgroups WHERE writer_hostgroup=500 ; DELETE FROM mysql_servers WHERE hostgroup_id IN (500,501);   INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',500,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',501,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',500,3306,1000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',501,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',500,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('',501,3306,1000000000); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;   DELETE FROM mysql_users WHERE username='load_RW'; INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('load_RW','test',1,500,'test',1); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;   DELETE FROM mysql_query_rules WHERE rule_id IN (200,201); INSERT INTO mysql_query_rules (rule_id,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(200,'load_RW',501,1,3,'^SELECT.*FOR UPDATE',1); INSERT INTO mysql_query_rules (rule_id,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(201,'load_RW',501,1,3,'^SELECT ',1);   LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

Create a my.cnf file in your default dir with:

[mysql] user=load_RW password=test Simple Setup using a single VIP, 3 ProxySQL and 3 Galera nodes

First, setup the Keepalived configuration file (/etc/keepalived/keepalived.conf):

global_defs { # Keepalived process identifier lvs_id proxy_HA } # Script used to check if Proxy is running vrrp_script check_proxy { script "killall -0 proxysql" interval 2 weight 2 } # Virtual interface # The priority specifies the order in which the assigned interface to take over in a failover vrrp_instance VI_01 { state MASTER interface em1 virtual_router_id 51 priority <calculate on the WEIGHT for each node>   # The virtual ip address shared between the two loadbalancers virtual_ipaddress { dev em1 } track_script { check_proxy } }

Given the above, and given I want to have TEST1 as the main priority, we will set as:

test1 = 101 test2 = 100 test3 = 99

Modify the config in each node with the above values and (re)start Keepalived.

If all is set correctly, you will see the following in the system log of the TEST1 machine:

Jan 10 17:56:56 mysqlt1 systemd: Started LVS and VRRP High Availability Monitor. Jan 10 17:56:56 mysqlt1 Keepalived_healthcheckers[6183]: Configuration is using : 6436 Bytes Jan 10 17:56:56 mysqlt1 Keepalived_healthcheckers[6183]: Using LinkWatch kernel netlink reflector... Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: Configuration is using : 63090 Bytes Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: Using LinkWatch kernel netlink reflector... Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)] Jan 10 17:56:56 mysqlt1 Keepalived_vrrp[6184]: VRRP_Script(check_proxy) succeeded Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Transition to MASTER STATE Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received lower prio advert, forcing new election Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Received higher prio advert Jan 10 17:56:57 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering BACKUP STATE Jan 10 17:56:58 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) forcing a new MASTER election ... Jan 10 17:57:00 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Transition to MASTER STATE Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Entering MASTER STATE <-- MASTER Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) setting protocol VIPs. Jan 10 17:57:01 mysqlt1 Keepalived_healthcheckers[6183]: Netlink reflector reports IP added Jan 10 17:57:01 mysqlt1 avahi-daemon[937]: Registering new address record for on em1.IPv4. Jan 10 17:57:01 mysqlt1 Keepalived_vrrp[6184]: VRRP_Instance(VI_01) Sending gratuitous ARPs on em1 for

In the other two machines:

Jan 10 17:56:59 mysqlt2 Keepalived_vrrp[13107]: VRRP_Instance(VI_01) Entering BACKUP STATE <---

Which means the node is there as a Backup.  

Categories: MySQL

Open Source Databases on Big Machines: Does READ COMMITTED Scale on 144 Cores?

MySQL Performance Blog - Thu, 2017-01-19 21:12

In the second post in my series on open source databases on big machines, we’ll look at whether READ COMMITTED scales with multiple cores.

The default transaction level for InnoDB is REPEATABLE READ. A more permissive level is READ COMMITTED, and is known to work well. While the REPEATABLE READ level maintains the transaction history up to the start of the transaction, READ COMMITTED maintains the transaction history up to the start of the current statement. Peter Zaitsev described the differences between how these modes are handled in this blog post. Both can theoretically cause performance slowdowns, but READ COMMITTED is usually seen as fast-working – at least on a typical MySQL machine (owned by Percona):

The default transaction isolation mode for PostgreSQL is also READ COMMITTED. Originally I wanted to use this mode for MySQL tests as well. But when I tested on a machine with 144 cores, I found that after 36 threads REPEATABLE READ continued to scale while READ COMMITTED slowed down. It then got stuck at about 3x slower results for standard OLTP RW tests.

Dimitri Kravtchuk wrote about performance issues with READ COMMITTED in 2015, but he tested with 40 cores that time. My tests show that there is a huge difference after 40 cores.

I tested this originally with Percona Server 5.7.15 and recently re-tested with Oracle’s MySQL versions 5.6.35 and 5.7.17. I confirmed that the bug exists in these versions as well, and reported it. I the Oracle MySQL Team fixes it. The good news is that while 5.6 stopped scaling after 16 threads, 5.7 improves this to 36 threads.

Results for 5.6.35:

Results for 5.7.17:

Machine details:

PostgreSQL Professional and Freematiq machine (tests for MYSQL 5.6.35, Percona 5.7.15 and MySQL 5.7.17 servers):

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

Percona machine (test for Percona 5.7.15 server):

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Test: SysBench OLTP RW test, converted to use prepared statements, as described in this post.

MySQL Options: described in this post.











Categories: MySQL

Elasticsearch Ransomware: Open Source Database Security Part 2

MySQL Performance Blog - Wed, 2017-01-18 18:15

In this blog post, we’ll look at a new Elasticsearch ransomware outbreak and what you can do to prevent it happening to you.

Mere weeks after reports of MongoDB servers getting hacked and infected with ransomware, Elasticsearch clusters are experiencing the same difficulties. David Murphy’s blog discussed the situation and the solution for MongoDB servers. In this blog post, we look at how you can prevent ransomware attacks on your Elasticsearch clusters.

First off, what is Elasticsearch? Elasticsearch is an open source distributed index based on Apache Lucene. It provides a full-text search with an HTTP API, using schemaless JSON documents. By its nature, it is also distributed and redundant. Companies use Elasticsearch with logging via the ELK stack and data-gathering software, to assist with data analytics and visualizations. It is also used to back search functionality in a number of popular apps and web services.

In this new scenario, the ransomware completed wiped away the cluster data, and replaced it with the following warning index:


As with the MongoDB situation, this isn’t a flaw in the Elasticsearch software. This vulnerability stems from not correctly using the security settings provided by Elasticsearch. As the PCWorld article sums up:

According to experts, there is no reason to expose Elasticsearch clusters to the internet. In response to these recent attacks, search technologies and distributed systems architect Itamar Syn-Hershko has published a blog post with recommendations for securing Elasticsearch deployments.

The blog post they reference has excellent advice and examples of how to protect your Elasticsearch installations from exploitation. To summarize its advice (from the post itself):

Whatever you do, never expose your cluster nodes to the web directly.

So how do you prevent hackers from getting into your Elasticsearch cluster? Using the advice from Syn-Hershko’s blog, here are some bullet points for shoring up your Elasticsearch security:

  • HTTP-enabled nodes should only listen to private IPs. You can configure what IPs Elasticsearch listens to: localhost, private IPs, public IPs or several of these options. network.bind_host and control the IP types (manual). Never set Elasticsearch to listen to a public IP or a publicly accessible DNS name.
  • Use proxies to communicate with clients. You should pass any application queries to Elasticsearch through some sort of software that can filter requests, perform audit-logging and password-protect the data. Your client-side javascript shouldn’t talk to Elastic directly, and should only communicate with your server-side software. That software can translate all client-side requests to Elasticsearch DSL, execute the query, and then send the response in a format the clients expect.
  • Don’t use default ports. Once again for clarity: DON’T USE DEFAULT PORTS. You can easily change Elasticsearch’s default ports by modifying the .YML file. The relevant parameters are http.port and transport.tcp.port (manual).
  • Disable HTTP if you don’t need it. Only Elasticsearch client nodes should enable HTTP, and your private network applications should be the only ones with access to them. You can completely disable the HTTP module by setting http.enabled to false (manual).
  • Secure publicly available client nodes. You should protect your Elasticsearch client and any UI it communicates with (such as Kibana and Kopf) behind a VPN. If you choose to allow some nodes access to the public network, use HTTPS and don’t transmit data and credentials as plain-text. You can use plugins like Elastic’s Shield or SearchGuard to secure your cluster.
  • Disable scripting (pre-5.x). Malicious scripts can hack clusters via the Search API. Earlier versions of Elasticscript allowed unsecured scripts to access the software. If you are using an older version (pre-5.x), upgrade to a newer version or disable dynamic scripting completely.

Go to Syn-Hershko’s blog for more details.

This should get you started on correctly protecting yourself against Elasticsearch ransomware (and other security threats). If you want to have someone review your security, please contact us.

Categories: MySQL

Webinar Wednesday January 18, 2017: Lessons from Database Failures

MySQL Performance Blog - Tue, 2017-01-17 18:09

Join Percona’s Chief Evangelist Colin Charles on Wednesday, January 18, 2017, at 7:00 am (PST) / 10:00 am (EST) (UTC-8) as he presents “Lessons from Database Failures.”

MySQL failures at scale can teach a great deal. MySQL failures can lead to a discussion about such topics as high availability (HA), geographical redundancy and automatic failover. In this webinar, Colin will present case study material (how automatic failover caused Github to go offline, why Facebook uses assisted failover rather than fully automated failover, and other scenarios) to look at how the MySQL world is making things better. One way, for example, is using semi-synchronous replication to run fully scalable services.

The webinar will begin with an obvious example of how a business died due to incorrect MySQL backup procedures. The agenda includes backups (and verification), replication (and failover) and security (and encryption).

The webinar will cover a mix of big “fail whale” problems from the field, and how you should avoid them by properly architecting solutions.

Register for the webinar here.

Colin Charles is the Chief Evangelist at Percona. Previously, he was part of the founding team of MariaDB Server in 2009. Before that, he worked at MySQL since 2005. Colin has been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and projects. He’s well known within the APAC open source communities and has spoken at many conferences.

Categories: MySQL

Percona Live Featured Tutorial with Morgan Tocker — MySQL 8.0 Optimizer Guide

MySQL Performance Blog - Mon, 2017-01-16 23:06

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Morgan Tocker, MySQL Product Manager at Oracle. His tutorial is a MySQL 8.0 Optimizer Guide. Many users who follow MySQL development are aware that recent versions introduced a number of improvements to query execution (via the addition of new execution strategies and an improved cost model). But what we don’t talk enough about is that the diagnostic features are also significantly better. I had a chance to speak with Morgan and learn a bit more about the MySQLOptimizer:

Percona: How did you get into database technology? What do you love about it?

Morgan: I started my career as a web developer, mainly focusing on the front end area. As the team I worked on grew and required different skills, I tried my hand at the back end programming. This led me to databases.

I think what I enjoyed about databases at the time was that front end design was a little bit too subjective for my tastes. With databases, you could prove what was “correct” by writing a simple micro-benchmark.  I joined the MySQL team in January 2006, and rejoined it again in 2013 after a five-year hiatus.

I don’t quite subscribe to this same view on micro benchmarks today, since it is very easy to accidentally (or intentionally) write a naïve benchmark. But I am still enjoying myself.

Percona: Your tutorial is called “MySQL 8.0 Optimizer Guide.” What exactly is the MySQL optimizer, and what new things have been added in MySQL 8.0?

Morgan: Because SQL is declarative (i.e., you state “what you want” rather than “do this then that”), there is a process that has to happen internally to prepare a query for execution. I like to describe it as similar to what happens when you enter an address in a GPS navigator. Some software then spits out the best steps on how to get there. In a database server, the Optimizer is that software code area.

There are a number of new optimizer features in MySQL 8.0, both in terms of new syntax supported and performance improvements to existing queries. These will be covered in some talks at the main conference (and also my colleague Øystein’s tutorial). The goal of my tutorial is to focus more on diagnostics than the enhancements themselves.

Percona: How can you use diagnostics to improve queries?

Morgan: To put it in numbers: it is not uncommon to see a user obsess over a configuration change that may yield a 2x improvement, and not spot the 100x improvement available by adding an index!

I like to say that users do not get the performance that they are entitled to if and when they lack the visibility and diagnostics available to them:

-In MySQL 5.6, an optimizer trace diagnostic was added so that you can now see not only why the optimizer arrived at a particular execution plan, but why other options were avoided.

-In MySQL 5.7, the EXPLAIN FORMAT=JSON command now includes the cost information (the internal formula used for calculations). My experience has been that sharing this detail actually makes the optimizer a lot easier to teach.

Good diagnostics by themselves do not make the improvements, but they bring required changes to the surface. On most systems, there are opportunities for improvements (indexes, hints, slight changes to queries, etc.).

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Morgan: Visibility into running systems has been a huge priority for the MySQL Engineering team over the last few releases. I think in many cases, force-of-habit leaves users using an older generation of diagnostics (EXPLAIN versus EXPLAIN FORMAT=JSON). My goal is to show them the light using the state-of-the-art stack. This is why I called my talk an 8.0 guide, even though much of it is still relevant for 5.7 and 5.6.

I also have a companion website to my tutorial at

Percona: What are you most looking forward to at Percona Live?

Morgan: I’m excited to talk to users about MySQL 8.0, and not just in an optimizer sense. The MySQL engineering team has made a large investment in improving the reliability of MySQL with the introduction of a native data dictionary. I expect it will be the subject of many discussions, and a great opportunity for feedback.

There is also the social aspect for me, too. It will be 11 years since I first attended the predecessor to Percona Live. I have a lot of fond memories, and enjoy catching up with new friends and old over a beer!

You can find out more about Morgan Tocker and his work with the Optimizer at his tutorial website. Want to find out more about Morgan and MySQL query optimization? Register for Percona Live Data Performance Conference 2017, and see his MySQL 8.0 Optimizer Guide tutorial. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Categories: MySQL

Ad-hoc Data Visualization and Machine Learning with mysqlshell

MySQL Performance Blog - Mon, 2017-01-16 22:00

In this blog post, I am going to show how we can use mysqlshell to run ad-hoc data visualizations and use machine learning to predict new outcomes from the data.

Some time ago Oracle released MySQL Shell, a command line client to connect to MySQL using the X protocol. It allows us to use Python or JavaScript scripting capabilities. This unties us from the limitations of SQL, and the possibilities are infinite. It means that MySQL can not only read data from the tables, but also learn from it and predict new values from features never seen before.

Some disclaimers:

  • This is not a post about to how to install mysqlshell or enable the X plugin. It should be already installed. Follow the first link if instructions are needed.
  • The idea is to show some of the things that can be done from the shell. Don’t expect the best visualizations or a perfectly tuned Supervised Learning algorithm.

It is possible to start mysqlshell with JavaScript or Python interpreter. Since we are going to use Pandas, NumPy and Scikit, Python will be our choice. There is an incompatibility between mysqlshell and Python > 2.7.10 that gives an error when loading some external libraries, so make sure you use 2.7.10.

We’ll work the “employees” database that can be downloaded here. In order to make everything easier and avoid several lines of data parsing, I have created a new table that summarizes the data we are going to work with, generated using the following structure and query:

mysql> show create table data\G *************************** 1. row *************************** Create Table: CREATE TABLE `data` ( `emp_no` int(11) NOT NULL, `age` int(11) DEFAULT NULL, `hired` int(11) DEFAULT NULL, `gender` int(11) DEFAULT NULL, `salary` int(11) DEFAULT NULL, `department` int(11) DEFAULT NULL, PRIMARY KEY (`emp_no`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> INSERT INTO data SELECT employees.emp_no, YEAR(now()) - YEAR(birth_date) as age, YEAR(now()) - YEAR(hire_date) as hired, IF(gender='M',0,1) as gender, max(salary) as salary, RIGHT(dept_no,1) as department from employees, salaries, dept_emp WHERE employees.emp_no = salaries.emp_no and employees.emp_no = dept_emp.emp_no and dept_emp.to_date="9999-01-01" GROUP BY emp_no, dept_emp.dept_no;

mysql> select * from data limit 5; +--------+------+-------+--------+--------+------------+ | emp_no | age | hired | gender | salary | department | +--------+------+-------+--------+--------+------------+ | 10001 | 64 | 31 | 0 | 88958 | 5 | | 10002 | 53 | 32 | 1 | 72527 | 7 | | 10003 | 58 | 31 | 0 | 43699 | 4 | | 10004 | 63 | 31 | 0 | 74057 | 4 | | 10005 | 62 | 28 | 0 | 94692 | 3 | +--------+------+-------+--------+--------+------------+

The data is:

  • Age: the age of the employee
  • Hired: the number of years working in the company
  • Gender: 0 Male, 1 Female
  • Salary: the salary
Categories: MySQL

MongoDB 3.4 Views

MySQL Performance Blog - Fri, 2017-01-13 23:07

This blog post covers MongoDB 3.4 views, one of the more recent MongoDB features.

Views are often used in relational databases to achieve both data security and a high level of abstraction, making it easier to retrieve data. Unlike regular tables, views neither have a physical schema nor use disk space. They execute a pre-specified query. There are exceptions (such as materialized views and pre-executed views), but as a default the engine actually executes a query and then sends the result set as a single table when a view is used.

In MySQL, a simple view can be defined as:

create database percona; use percona; create view user_hosts as select user, host from mysql.user; select * from user_hosts +------------------+-----------+ | user | host | +------------------+-----------+ | myuser | % | ...

The query above shows only the users and host field, rather than all the table fields. Anyone who queries this view sees a table that only has the user and host fields.

This feature was not available in previous MongoDB versions. All we could do was either deny reads in a collection (which would make it useless to the user) or allow reads to the entire collection (which  was pretty unsafe).

The views feature request was open for a while, and as we can see there was a considerable number of votes to make this feature available:

MongoDB 3.4 views are non-materialized views, and behind the scenes the engine runs an aggregation. Creating a view requires that we specify a collection or a previous existing view. When a view is the source collection from another view, it allows us to execute a chained aggregation.

To create a view, we should use the db.createView(‘view_name’,’source’,[pipeline]) command, specifying the view name, the view source collection and the aggregation pipeline. This aggregation pipeline, as well as the other parameters, is saved in the system.views collection. This is the only space that the view will use in the system. A new document is saved in the system.views collection for each view created.

Although views seem very easy to create, a few pitfalls when using them.

Since views always run an aggregation, an index is desired to cover the aggregation $match pipeline, or slow responses might be expected during the full collection scans.

Cascading aggregations (creating views of views) can be slow, as the view does not have any data and therefore cannot be indexed. MongoDB neither checks the collection fields nor the collection existence before creating the view. If there is no collection, the view returns an empty cursor.

Views appear as a collection when we are listing them. The show collections command shows us views as one collection, but such collections are read-only. To drop a view, we simply execute db.collection.drop(). The collection is removed from the system.collections, but the data remains untouched because it only removes the code that generates the view result.

How to create views:

In this step-by-step, we will create a view and restrict the user UserReadOnly to read privileges only:

1. Populate collection:

$ mongo --authenticatinDatabase admin -u foo -p use financial switched to db financial > db.employee.insert({FirstName : 'John', LastName: 'Test', position : 'CFO', wage : 180000.00 }) WriteResult({ "nInserted" : 1 }) > db.employee.insert({FirstName : 'John', LastName: 'Another Test', position : 'CTO', wage : 210000.00 }) WriteResult({ "nInserted" : 1 }) > db.employee.insert({FirstName : 'Johnny', LastName: 'Test', position : 'COO', wage : 180000.00 }) WriteResult({ "nInserted" : 1 })

2. Create view that only shows full names:

use financial db.createView('employee_names','employee', [{ $project : { _id : 0, "fullname" : {$concat : ["$FirstName", " ", "$LastName"]}}}]) { "ok" : 1 } >show collections employee employee_names system.views db.system.views.find() db.employee_names.find() { "fullname" : "John Test" } { "fullname" : "John Another Test" } { "fullname" : "Johnny Test" }

3. Create a user-defined role that only gives access to the views:

Create a file “createviewOnlyRole.js” with the following javascript, or copy and paste the following code: 

use financial db_name = db.toString() priv = [] db.system.views.find({},{"_id" : 1, "viewOn" : 1}).forEach(function (view) { database_collection = view['_id'].split('.') database = database_collection[0] database_collection.splice(0,1) coll = database_collection.join([separator = '.']) priv.push({"resource" : { "db" : database, "collection" : coll}, "actions" : ["find"]}) }) var viewrole = db.getRole(db_name + '_readAnyView') if (viewrole == null) { db.runCommand({ createRole: db_name + "_readAnyView", "privileges": priv, roles : [] }); } else { db.runCommand({ updateRole: db_name + "_readAnyView", "privileges": priv, roles : [] }); } print('access granted to:') printjson(priv)

Then authenticate and use the desired database to create this role. In our case:

use financial db.load('createviewOnlyRole.js')

4. Create a new user assigned to the readAnyView role. This new user is only able to query against views, and they must know the view name because no other privileges are granted:

use financial db_name = db.toString() db.createUser( { user: "userReadViews", pwd: "123", roles: [ db_name + "_readAnyView"] } )

Notes: If you receive an error when trying to execute the .js file, please create a new role that grants find in the system.views collection:

use admin db.runCommand({ createRole: "readViewCollection", privileges: [ { resource: { db: "", collection: "system.views" }, actions: [ "find"] }], roles : [] }) db.grantRoleToUser('<your_user>',['readViewCollection'])

For more information about user-defined roles, please check please check the user-defined docs.

This should help explain MongoDB 3.4 views. Please feel free to contact me @AdamoTonete or @percona for any questions and suggestions.

Categories: MySQL

The Impact of Swapping on MySQL Performance

MySQL Performance Blog - Fri, 2017-01-13 17:27

In this blog, I’ll look at the impact of swapping on MySQL performance. 

It’s common sense that when you’re running MySQL (or really any other DBMS) you don’t want to see any I/O in your swap space. Scaling the cache size (using innodb_buffer_pool_size in MySQL’s case) is standard practice to make sure there is enough free memory so swapping isn’t needed.   

But what if you make some mistake or miscalculation, and swapping happens? How much does it really impact performance? This is exactly what I set out to investigate.

My test system has the following:

  • 32GB of physical memory
  • OS (and swap space) on a (pretty old) Intel 520 SSD device
  • Database stored on Intel 750 NVMe storage

To simulate a worst case scenario, I’m using Uniform Sysbench Workload:

sysbench --test=/usr/share/doc/sysbench/tests/db/select.lua   --report-interval=1 --oltp-table-size=700000000 --max-time=0 --oltp-read-only=off --max-requests=0 --num-threads=64 --rand-type=uniform --db-driver=mysql --mysql-password=password --mysql-db=test_innodb  run

To better visualize the performance of the metrics that matter for this test, I have created the following custom graph in our Percona Monitoring and Management (PMM) tool. It shows performance disk IO and swapping activity on the same graph.

Here are the baseline results for innodb_buffer_pool=24GB. The results are a reasonable ballpark number for a system with 32GB of memory.

As you can see in the baseline scenario, there is almost no swapping, with around 600MB/sec read from the disk. This gives us about 44K QPS. The 95% query response time (reported by sysbench) is about 3.5ms.

Next, I changed the configuration to innodb_buffer_pool_size=32GB, which is the total amount of memory available. As memory is required for other purposes, it caused swapping activity:

We can see that performance stabilizes after a bit at around 20K QPS, with some 380MB/sec disk IO and 125MB/sec swap IO. The 95% query response time has grown to around 9ms.

Now let’s look at an even worse case. This time, we’ll set our configuration to innodb_buffer_pool_size=48GB (on a 32GB system).

Now we have around 6K QPS. Disk IO has dropped to 250MB/sec, and swap IO is up to 190MB/sec. The 95% query response time is around 35ms. As the graph shows, the performance becomes more variable, confirming the common assumption that intense swapping affects system stability.

Finally, let’s remember MySQL 5.7 has the Online Buffer Pool Resize feature, which was created to solve exactly this problem (among other reasons). It changes the buffer pool size if you accidentally set it too large. As we have tested innodb_buffer_pool_size=24GB, and demonstrated it worked well, let’s scale it back to that value:

mysql> set global innodb_buffer_pool_size=24*1024*1024*1024; Query OK, 0 rows affected (0.00 sec)

Now the graph shows both good and bad news. The good news is that the feature works as intended, and after the resize completes we get close to the same results before our swapping experiment. The bad news is everything pretty much grinds to halt for 15 minutes or so while resizing occurs. There is almost no IO activity or intensive swapping while the buffer pool resize is in progress.   

I also performed other sysbench runs for selects using Pareto random type rather than Uniform type, creating more realistic testing (skewed) data access patterns. I further performed update key benchmarks using both Uniform and Pareto access distribution.

You can see the results below:

As you can see, the results for selects are as expected. Accesses with Pareto distributions are better and are affected less – especially by minor swapping.  

If you look at the update key results, though, you find that minor swapping causes performance to improve for Pareto distribution. The results at 48GB of memory are pretty much the same.

Before you say that that is impossible, let me provide an explanation: I limited innodb_max_purge_lag on this system to avoid unbound InnoDB history length growth. These workloads tend to be bound by InnoDB purge performance. It looks like swapping has impacted the user threads more than it did the purge threads, causing such an unusual performance profile. This is something that might not be repeatable between systems.


When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.  

While you should continue to plan your capacity so that there is no constant swapping on the database system, these results show that a few MB/sec of swapping activity it is not going to have a catastrophic impact.

This assumes your swap space is on an SSD, of course! SSDs handle random IO (which is what paging activity usually is) much better than HDDs.

Categories: MySQL

CVE-2016-6225: Percona Xtrabackup Encryption IV Not Being Set Properly

MySQL Performance Blog - Thu, 2017-01-12 21:34

If you are using Percona XtraBackup with xbcrypt to create encrypted backups, and are using versions older than 2.3.6 or 2.4.5, we advise that you upgrade Percona XtraBackup.

Note: this does not affect encryption of encrypted InnoDB tables.


Percona XtraBackup versions older than 2.3.6 or 2.4.5 suffered an issue of not properly setting the Initialization Vector (IV) for encryption. This could allow someone to carry out a Chosen-Plaintext Attack, which could recover decrypted content from the encrypted backup files without the need for a password.


Percona XtraBackup carries backward compatibility to allow for the decryption of older backup files. However, encrypted backup files produced by the versions that have the fix will not be compatible with older versions of Percona XtraBackup.


Access to the encrypted files must already be present for exploitation to occur. So long as you adequately protect the encrypted files, we don’t expect this issue to adversely affect users.


Percona would like to thank and give credit to Ken Takara for discovering this issue and working it through to PoC exploitation.

More Information Release Notes
Categories: MySQL

The Percona Online Store: Get Database Help Now with Support and Health Audit Services

MySQL Performance Blog - Thu, 2017-01-12 21:10

We are proud to announce the new Percona online store!

Keeping your database environment tuned, optimized and high-performance is key to achieving business goals. If your database goes down, so does your business. Percona experts have a long history of helping enterprises ensure their databases are running smoothly. With Percona, you can meet today’s workloads, and prepare for future workloads before they impact performance.

Now we’ve made it even easier to get Percona database services: visit Percona’s new online store! The webstore is perfect for ordering a health audit and immediate, smaller-scale database support. Simply select your service type, answer a few questions about your environment, and then submit. A Percona expert will be in touch.

The webstore makes it fast and easy to purchase Percona Services, with recurring monthly credit card payments. Shop now for Percona’s highly responsive, effective and affordable support and service options, including MySQL Standard Support, MongoDB Standard Support and a MySQL Health Audit.

Percona has some of the best reviews and one of the highest renewal rates in the industry. We can help you increase your uptime, be more productive, reduce your support budget and implement fixes for performance issues faster.

Check out the new Percona online store here!

Categories: MySQL

How to Replace MySQL with Percona Server on a CPanel, WHM VPS or Dedicated Server

MySQL Performance Blog - Wed, 2017-01-11 18:41

In this blog post, we’ll look at how to replace MySQL with Percona Server for MySQL on a CPanel, WHM VPS or dedicated server.

In general, CPanel and WHM have been leaning towards support of MariaDB over other flavors. This is partly due to the upstream repos replacing the MySQL package with MariaDB (for example, on CentOS).

MySQL 5.6 is still supported though, which means they are keeping support for core MySQL products. But if you want to get some extra performance enhancements or enterprise features for free, without getting too many bells and whistles, you might want to install Percona Server.

I’ve done this work on a new dedicated server with the latest WHM and CPanel on CentOS 7, with MySQL 5.6 installed. Besides the backup, this is a fairly quick process.

It’s pretty simple. From the Percona Server for MySQL 5.7 installation doc, we can get the YUM repo. (Run commands as root if you can, otherwise as sudo.)

yum install

Now that we have the repo, let’s install Percona XtraBackup in case we need to roll this back at any point:

yum install percona-xtrabackup

This server had a drive mounted at /backup, so I created the backup with the following commands:

xtrabackup --target-dir=/backup/xtrabackup --backup xtrabackup --target-dir=/backup/xtrabackup --prepare

Now that we have a good backup, let’s remove MySQL:

service mysql stop yum remove MySQL* mysql*

Depending on your dependency chain, this could remove Percona XtraBackup, but that can be fixed. Let’s accept this uninstall.

Let’s install Percona Server for MySQL 5.7 and Percona Toolkit:

yum install Percona-Server-server-57 percona-toolkit percona-xtrabackup

Now that it’s installed ensure the mysql service is running. If it isn’t, start it. Now let’s upgrade:


NOTE. This works if you can log in as root without a password; if you can’t, you will need to specify the -u and -p flags.

Once you run the upgrade, restart the mysql service:

service mysql restart

And there you go, you are now running on Percona Server for MySQL 5.7. If your managed providers tell you it’s not supported, don’t worry! It works as long as CPanel supports MySQL 5.6.

If you have any issues, just restore the backup.

NOTE: One thing to keep in mind is that 5.7 breaks CPanel’s ability to create users in MySQL. I believe this is due to the changes to the mysql.user table. If this is an issue for you, you can always use Percona Server for MySQL 5.6.

Categories: MySQL

Reinstall MySQL and Preserve All MySQL Grants and Users

MySQL Performance Blog - Wed, 2017-01-11 16:35

In this blog post, we’ll look at how to preserve all MySQL grants and users after reinstalling MySQL.

Every so often, I need to reinstall a MySQL version from scratch and preserve all the user accounts and their permissions (or move the same users and privileges to another server).

As of MySQL 5.7, MySQL does not make this easy! MySQL SHOW GRANTS only shows permissions for one user, and the method suggested on StackExchange – dumping tables containing grants information directly – is not robust (as Rick James mentions in the comments). It also doesn’t work between different MySQL versions.

This problem is easily solved, however, with the pt-show-grants tool from Percona Toolkit (which serves pretty much as a mysqldump for user privileges).  

All you need to do is:

  1. On the source, or to backup MySQL privileges, run:

pt-show-grants > grants.sql

  1. On the target, or to restore MySQL privileges, run:

mysql  < grants.sql

  1. If you would like to clean up the old privileges from MySQL before loading new ones, use:

pt-show-grants --drop  --ignore root@localhost | grep "^DROP USER " | mysql

This removes all the users (except the root user, which you will need to connect back and load new privileges).

With Percona Toolkit, preserving your grants and user privileges is easy!

Categories: MySQL

Webinar Thursday, January 12: Percona Software News and Roadmap Update for Q1 2017

MySQL Performance Blog - Tue, 2017-01-10 19:44

Please join Percona CEO Peter Zaitsev for a webinar on Thursday, January 12, 2017 at 11 am PST/ 2 pm EST (UTC-8) for a discussion on the Percona Software News and Roadmap Update for Q1 2017.

In this webinar, Peter will discuss what’s new in Percona open source software. This will include Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.

During this webinar Peter will talk about newly released features in Percona software, show a few quick demos and share with you highlights from the Percona open source software roadmap.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register for the Percona Software News and Roadmap Update webinar here.

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization is one of Percona’s most popular downloads.


Categories: MySQL

How to Move a MySQL Partition from One Table to Another

MySQL Performance Blog - Tue, 2017-01-10 18:00

In this blog post we’ll look at how to move a MySQL partition from one table to another, for MySQL versions before 5.7.

Up to version 5.7, MySQL had a limitation that made it impossible to directly exchange partitions between partitioned tables. Now and then, we get questions about how to import an .ibd for use as a partition in a table, as well as how to exchange partitions with another partitioned table. Below is step-by-step instructions on how to move a partition from one table to another.

In this example, one of our customers had two tables with the following structures:

CREATE TABLE live_tbl ( some_id bigint(20) NOT NULL DEFAULT '0', summary_date date NOT NULL, PRIMARY KEY (some_id,summary_date) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci /*!50500 PARTITION BY RANGE COLUMNS(summary_date) (PARTITION p201203 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB, PARTITION p201204 VALUES LESS THAN ('2012-05-01') ENGINE = InnoDB, PARTITION p201205 VALUES LESS THAN ('2012-06-01') ENGINE = InnoDB, PARTITION p201206 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB, PARTITION future VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;

CREATE TABLE archive_tbl ( some_id bigint(20) NOT NULL DEFAULT '0', summary_date date NOT NULL, PRIMARY KEY (some_id,summary_date) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci /*!50500 PARTITION BY RANGE COLUMNS(summary_date) (PARTITION p201109 VALUES LESS THAN ('2011-10-01') ENGINE = InnoDB, PARTITION p201110 VALUES LESS THAN ('2011-11-01') ENGINE = InnoDB, PARTITION p201111 VALUES LESS THAN ('2011-12-01') ENGINE = InnoDB, PARTITION p201112 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB, PARTITION p201201 VALUES LESS THAN ('2012-02-01') ENGINE = InnoDB, PARTITION p201202 VALUES LESS THAN ('2012-03-01') ENGINE = InnoDB, PARTITION future VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;

And their (likely obvious) goal is to move (not copy) the oldest partition from live_tbl to archive_tbl. To achieve this, we came up with the following procedure:

For the following, we assume:

  • The datadir is “/var/lib/mysql/”
  • MySQL Server is run by “mysql” Linux user
  • “p201203” is the partition name you want to move
  • “live_tbl is the source table from where you want to move the partition
  • “archive_tbl” is the destination table to where you want to move the partition
  • “dest_tbl_tmp” is the temporary table we will create, using the same CREATE TABLE criteria as in the live_tbl
  • “thedb” is the database name
1. Copy the .ibd data file from that particular partition

First, make sure you flush any pending changes to disk and that the table is locked, so that binary table copies can be made while the server is running. Keep in mind that the table will be locked while you copy the .ibd file. All reads/writes during that time will be blocked.

Important: Don’t close this session or the lock will be released.

mysql> USE thedb mysql> FLUSH TABLE live_tbl FOR EXPORT;

Open another session, and copy the .ibd file to a temporary folder.

shell> cp /var/lib/mysql/thedb/live_tbl#P#p201203.ibd /tmp/dest_tbl_tmp.ibd

After you copy the .ibd file to the temporary folder, go back to the MySQL session and unlock the table so that all reads and writes to that particular table are allowed again.


2. Prepare a temporary table to import the tablespace

Create a temporary table exactly like the one into which you want to import the partition. Remove the partitioning on it and discard the tablespace so that it is ready for the .ibd import.

mysql> CREATE TABLE dest_tbl_tmp LIKE archive_tbl; mysql> ALTER TABLE dest_tbl_tmp REMOVE PARTITIONING; mysql> ALTER TABLE dest_tbl_tmp DISCARD TABLESPACE;

3.  Import the tablespace to the temporary table

Place the .ibd file in the appropriate folder, set the correct permissions and ownership and then import the tablespace to the temporary table.

shell> cp /tmp/dest_tbl_tmp.ibd /var/lib/mysql/thedb/ shell> chmod 660 /var/lib/mysql/thedb/dest_tbl_tmp.ibd shell> chown mysql.mysql /var/lib/mysql/thedb/dest_tbl_tmp.ibd mysql> ALTER TABLE dest_tbl_tmp IMPORT TABLESPACE;

4. Swap the tablespace with the destination table’s partition tablespace

Partition according to your own schema. (This is just an example using date values. In our case, we have to REORGANIZE PARTITION to accommodate a new LESS THAN range before the MAXVALUE.)

mysql> ALTER TABLE archive_tbl REORGANIZE PARTITION future INTO ( PARTITION p201203 VALUES LESS THAN ('2012-04-01'), PARTITION future VALUES LESS THAN (MAXVALUE) ); mysql> ALTER TABLE archive_tbl EXCHANGE PARTITION p201203 WITH TABLE dest_tbl_tmp;

5. Check that the partitions are correctly exchanged before dropping the one from the source table

SELECT * FROM archive_tbl; SELECT * FROM dest_tbl_tmp; SELECT * FROM live_tbl; ALTER TABLE live_tbl DROP PARTITION p201203;

For more information on why these steps are needed, please check the following documentation link for ALTER TABLE … EXCHANGE PARTITION:

In MySQL version 5.7, it is possible to exchange partitions without the unpartitioned table step, as described in the following link:

There are bugs related to the steps in this guide that might be useful to take into consideration:

Categories: MySQL

MongoDB PIT Backups: Part 2

MySQL Performance Blog - Tue, 2017-01-10 00:14

This blog post is the second in a series covering MongoDB PIT backups. You can find the first part here.

Sharding Makes Everything Fun(ner)

The first blog post in this series looked at MongoDB backups in a simple single-replica set environment. In this post, we’ll look at the scale-out use case. When sharding, we have exactly the same problem as we do on a single replica set. However, now the problem is multiplied by the number of replica sets in the cluster. Additionally, we have a bonus problem: each replica set has unique data. That means to get a truly consistent snapshot of the cluster, we need to orchestrate our backups to capture a single consistent point in time. Just so we’re on the same page, that means that every replica set needs to stop their backups at, or near, the same time that the slowest replica set stops. Are you sufficiently confused now? Let me get to a basic concept that I forgot to cover in the first post, and then I’ll give you a simple description of the problem.

Are you Write Concerned?

So far, I’ve neglected to talk about the very important role of “write concern” when taking consistent backups. In MongoDB, the database is not durable by default. By “durable,” I mean “on disk” when the database acknowledges receipt of an operation from your application. There are most likely several reasons for this. Most likely the biggest one originally was probably throughput given a lack of concurrency.

However, the side effect is possible data loss due to loss of operations applied only in memory. Changing the write concern to “journaled” (j : true) will change this behavior so that MongoDB journals changes before acknowledging them (you also need to be running with journal enabled).

TIP: For true durability in a replica set, you should use a write concern of “majority” for operations and the writeConcernMajorityJournalDefault : true on all replica set members (new to v3.4). This has the added benefit of greatly decreasing the chance of rollback after an election.

Wow, you’re inconsistent

At the risk of being repetitive, the crux of this issue is that we need to run a backup on every shard (replica set). This is necessary because every shard has a different piece of the data set. Each piece of that data set is necessary to get an entire picture of the data set for the cluster (and thus, your application). Since we’re using mongodump, we’ll only have a consistent snapshot at the point in time when the backup completes. This means we must end each shard’s backup at a consistent point in time. We cannot expect that the backup will complete in exactly the same amount of time on every shard, which is what we’ll need for a consistent point in time across the cluster. This means that Shard1 might have a backup that is consistent to 12:05 PM, and another shard that is consistent to 12:06 PM. In a high traffic environment (the kind that’s likely to need horizontal scale), this could mean thousands of lost documents. Here’s a diagram:

MongoDB PIT Backups


Here’s the math to illustrate the problem:

  • Shard1’s backup will contain 30,000 documents ((100 docs * 60 secs) * 5 mins)
  • Shard2’s backup will contain 36,000 documents ((100 docs * 60 secs) * 6 mins)

In this example, to get a consistent point in time you’d need to remove all insert, update and delete operations that happened on Shard 2 from the time that Shard 1’s backup completed (6,000 documents). This means examining the timestamp of every operation in the oplog and reversing it’s operation. That’s a very intensive process, and will be unique for every mongodump that’s executed. Furthermore, this is a pretty tricky thing to do. The repeatable and much more efficient method is to have backups that finish in a consistent state, ready to restore when needed.

Luckily, Percona has you covered!

You’re getting more consistent

Having data is important, but knowing what data you have is even more important. Here’s how you can be sure you know what you have in your MongoDB backups:

David Murphy has released his MongoDB Consistent Backup Tool in the Percona Labs github account, and has written a very informative blog post about it. My goal with these blog posts is to make it even easier to understand the problem and how to solve it. We’ve already had an exhaustive discussion about the problem on both small and large scales. How about the solution?

It’s actually pretty simple. The solution, at a basic level, is to use a simple algorithm to decide when a cluster-wide consistent point-in-time can be reached. In the MongoDB Consistent Backup tool, this is done by the backup host kicking off backups on a “known good member” of each shard (that’s a pretty cool feature by itself) and then tracking the progress of each dump. At the same time the backup is kicked off, the backup host kicks off a separate thread that tails the oplog on each “known good member” until the mongodump on the slowest shard completes. By using this method, we have a very simple way of deciding when we can get a cluster-wide consistent snapshot. In other words, when the slowest member completes their piece of the workload. Here’s the same workload from Figure 4, but with the MongoDB Consistent Backup Tool methodology:

MongoDB PIT Backups


TIP: The amount of time that it takes to perform these backups is often decided by two factors:

  1. How evenly distributed the data is across the shards (balanced)
  2. How much data each shard contains (whether or not it’s balanced).

The takeaway here is that you may need to shard so that each shard has a manageable volume of data. This allows you to hit your backup/restore windows more easily.

…The Proverbial “Monkey Wrench”

There’s always a “gotcha” just when you think you’ve got your mind around any difficult concept. Of course, this is no different.

There is one very critical concept in sharding that we didn’t cover: tracking what data lies on which shard. This is important for routing the workload to the right place, and balancing the data across the shards. In MongoDB, this is completed by the config servers. If you cannot reach (or recover) your config servers, your entire cluster is lost! For obvious reasons, you need to back them up as well. With the Percona Labs MongoDB Consistent Backup Tool, there are actually two modes used to backup config servers: v3.2 and greater, and legacy. The reason is that in v3.2, config servers went from mirrors to a standard replica set. In v3.2 mode, we just treat the config servers like another replica set. They have their own mongodump and oplog tail thread. They get a backup that is consistent to the same point in time as all other shards in the cluster. If you’re on a version of MongoDB prior to v3.2, and you’re interested in an explanation of legacy mode, please refer back to David’s blog post.

The Wrap Up

We’ve examined the problems with getting consistent backups in a running MongoDB environment in this and the previous blog posts. Whether you have a single replica set or a sharded cluster, you should have a much better understanding of what the problems are and how Percona has you covered. If you’re still confused, or you’d just like to ask some additional questions, drop a comment in the section below. Or shoot me a tweet @jontobs, and I’ll make sure to get back to you.

Categories: MySQL

MySQL 8.0.1: The Next Development Milestone

MySQL Performance Blog - Mon, 2017-01-09 16:49

This post discusses the next MySQL development milestone: MySQL 8.0.1.

From the outset, MySQL 8.0 has received plenty of attention. Both this blog (see the MySQL 8.0 search) and other sites around the Internet have covered it. Early reviews seem positive (including my own MySQL 8.0 early bugs review). There is plenty of excitement about the new features.

As for early feedback on MySQL 8.0, Peter Zaitsev (Percona CEO) listed a set of recommendations for benchmarking MySQL 8.0. I hope these get reviewed and implemented.

MySQL achieved the current development milestone (available for download on on September 12, 2016. Its release immediately came with a detailed review by Geir Hoydalsvik from MySQL. If you haven’t had the opportunity to do so yet, you can also review the MySQL 8.0 release notes.

It now looks like we’re nearing 8.0.1, the next development milestone. I don’t have insider information, but it’s quite clear when navigating that:

Regarding timing, it’s interesting to note that the “What Is New in MySQL 8.0” page was updated on the 6th of January.

It looks like the release might come soon. So, restrain your excitement for a few days (or weeks?) more. Maybe you’ll be able to checkout the all new MySQL 8.0.1!

PS: If MySQL quality interests you, have a look at this recent – and very interesting – change made to the MTR (MySQL Test Run, the MySQL test suite) program. I believe it improves quality for everyone who runs MySQL (including its forks). The tests (which are run worldwide, often for each code change made) will now test the product with its own defaults.

Categories: MySQL

Archiving MySQL and MongoDB Data

MySQL Performance Blog - Fri, 2017-01-06 21:07

This post discusses archiving MySQL and MongoDB data, and determining what, when and how to archive data.

Many people store infrequently used data. This data is taking up storage space and might make your database slower than it could be. Archiving data can be a huge benefit, both regarding the performance impact and storage savings.

Why archive?

One of the reasons for archiving data is freeing up space on your database volumes. You can store archived data on slower, less expensive storage devices, and current data on the faster database drives. Archiving old data makes backups and restores run faster since they need to process less data. Last, but by no means least, archiving data has the benefit of making your queries perform more efficiently since they do not need to process through old data.

What do you archive?

That is the big question. Archiving too much is just as detrimental as not archiving enough (or at all). As you’ll see, finding this balance requires foresight and planning. Fortunately, you can tweak your archiving scheme to make it work better as time goes by,

Some people feel that keeping all the data in their database, even if they don’t access that data frequently, is the best way to go. If you are lucky enough to have vast quantities of storage, and a database that is performing well, keeping all of the data in your database might be a good idea. Even with lots of storage, archiving off some data that you don’t use regularly might have advantages. We all know someone whose desk is piled with stacks of paper. When they need something, they tell us that they know where everything is. Even if they can find the requested item, they need to work through the piles of stuff to locate it. They also have to decide where to put new items so that they can be easily found. In your database, this equates to slower queries and potentially slower writes. Clearing out some of the less frequently accessed data will have a beneficial effect overall.

At the other end of the spectrum are the people who want to archive in a manner that is too aggressive. This means that any requests for data must access the archive location This might be slower and more burdensome, causing the queries to run slowly. In addition, new data written into the database will have to go through an archive process fairly quickly, which might slow down the database. This is the person who puts each and every item they own into storage. It makes for a clean home, but it’s tough to find many of the items that you own. In our database, this means that most queries are run against archived data, and archive processes are frequently running. This too can slow down performance overall.

The best archiving solution is one that meets both the needs of efficient use of storage and efficiency of queries and inserts. You want to be able to write new data quickly, access frequently used data promptly, and still be able to get the information that might not often be used. There is no simple answer here: each company will have different needs and requirements. For some companies, regulations might govern how long data must be stored. With these sorts of requirements in place, you should look to place data that isn’t accessed often on a storage medium that is lower in cost (and often slower in performance). It is still there, but it is not crowding out the more commonly used data. Other companies might query or manipulate data shortly after it is loaded into the database, and they might be able to archive more often.

When do you archive?

This is another big consideration. Do you archive data daily, weekly, monthly, annually or on some other schedule? The basic answer is that it doesn’t matter what the schedule is. It matters that there is some sort of schedule, and that archiving is happening as expected. Keeping to a schedule allows everyone to know that the data is being archived as expected, and will avoid any “gee, we completely forgot about doing that” issues from arising.

Frequent archiving (daily or weekly) is good when you have high data volumes and normally need to access only the latest data in your queries. Think of stock data. Queries to pull trade volumes and pricing over a short time period are more frequent than queries that would analyze a stock’s performance over time. Therefore, archiving old data can be helpful since it keeps the frequently accessed table’s data easily accessible, but still accommodates the need to get at data for longer time spans. With high data volume, you might need to archive often so that one archive process can complete before another is started.

Less frequent archiving might be used when you have longer term projects or if you find your current database is performing reasonably well. In these cases, archiving monthly, quarterly, or annually might make sense. This is like cleaning out your garage or attic. You might do it, but you probably don’t do it every week. The amount of stuff being stored, along with the space to store it in, might determine how often you do this type of cleanup.

How do you go about archiving MySQL and MongoDB data?

There are lots of possibilities here as well. If well planned, it can be an easy implementation. But like many things, figuring this out is usually done once things have gotten a little out of control.

You can archive data using a standard backup, moving it to another table in the database, exporting to a flat file, or moving it to another database altogether. The end goal is a cleaner production environment that still allows access to the archived data if it is needed. The method for performing the archive determines the method used to bring that data back to a state in which it can be queried. One of the considerations must be how much time you are willing and able to invest in making that data available again.

  1. You can use your standard backup method to create and manage your archive, but this is a solution that is cumbersome and prone to error. You can perform a backup and then delete the unwanted data from your table(s). Now, the deleted data is only stored in your backup and must be restored in order to be queried. You should restore to another database for this purpose so that you keep your production environment clean. With this option, you also have to consider the methods for recovering space used by deleted files. This opens to the possibility of someone restoring to the original database, which can cause a host of problems. With MongoDB, there is an optional –archive option that moves the data to an archive location that you specify. MongoDB version 3.2 added this option.
  2. Another possibility is to move the data to another MySQL table or MongoDB collection in the existing database (i.e., moving from the transactions table to transactions_archived). This is a fast and efficient way to backup the data, and it allows for easy querying since the data still resides in the database. Of course, this assumes that you have enough storage space to accommodate the active and the archive tables/collections.
  3. You can also export the data to be archived to a flat file and then delete it from the original table or collection. This is workable if the data needs to be kept available but is unlikely to be regularly needed. (It will need to be imported in order to query it.) This method also comes with all the caveats about needing to delete and recover the space of the archived records, issues with importing into the original database (and ruining all the good archiving work you’ve done, and the possibility of deleting the flat file.
  4. Alternatively, you can move the data to another database. This too can be an effective method for archiving data, and can also allow that data to be made available to others for query analysis. Once again, all warnings about recovering the space apply, but the good thing here is that the data does not need to be restored to be queried. It is simply queried through the other database.

Another option for archiving MySQL data is a tool like pt-archiver. pt-archiver is a component of the Percona Toolkit that nibbles old data from a source table and moves it to a target table. The target can be in the current or an archive database. It is designed for use with an up-and-running database. It has minimal impact on the overall performance of the database. It is part of the Percona Toolkit, so it is available as an open source download. It has the benefit of slowly working to move the data and is always running. This allows it to archive data regularly and cleanly. One warning is that it does delete the data from the source table, so you should test it before running it in production. pt-archiver works with MySQL data only. It is also important to note that removing large quantities of data might cause InnoDB fragmentation. Running OPTIMIZE TABLE to recover the space resolves this. As of version 5.7.4, this is no longer a locking action.

So now what?

Unless you are in the enviable position where archiving MySQL and MongoDB data isn’t an issue, the first step is to come up with an archiving scheme. This will likely involve many different people since there can be an impact across the entire organization. Determine what can and should be archived, and then determine how best to archive the data. Document the process and test it before putting it into production. In the end, your database and your users will thank you.

Categories: MySQL

Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads

MySQL Performance Blog - Fri, 2017-01-06 16:33

This blog compares how PostgreSQL and MySQL handle millions of queries per second.

Anastasia: Can open source databases cope with millions of queries per second? Many open source advocates would answer “yes.” However, assertions aren’t enough for well-grounded proof. That’s why in this blog post, we share the benchmark testing results from Alexander Korotkov (CEO of Development, Postgres Professional) and Sveta Smirnova (Principal Technical Services Engineer, Percona). The comparative research of PostgreSQL 9.6 and MySQL 5.7 performance will be especially valuable for environments with multiple databases.

The idea behind this research is to provide an honest comparison for the two popular RDBMSs. Sveta and Alexander wanted to test the most recent versions of both MySQL and PostgreSQL with the same tool, under the same challenging workloads and using the same configuration parameters (where possible). However, because both PostgreSQL and MySQL ecosystems evolved independently, with standard testing tools (pgbench and SysBench) used for each database, it wasn’t an easy journey.

The task fell to database experts with years of hands-on experience. Sveta has worked as a Senior Principal Technical Support Engineer in the Bugs Verification Group of the MySQL Support Group at Oracle for more than eight years, and since 2015  has worked as a Principal Technical Services Engineer at Percona. Alexander Korotkov is a PostgreSQL major contributor, and the developer of a number PostgreSQL features – including the CREATE ACCESS METHOD command, generic WAL interface, lockfree Pin/UnpinBuffer, index-based search for regular expressions and much more. So we have a pretty decent cast for this particular play!

SvetaDimitri Kravtchuk regularly publishes detailed benchmarks for MySQL, so my main task wasn’t confirming that MySQL can do millions of queries per second. As our graphs will show, we’ve passed that mark already. As a Support Engineer, I often work with customers who have heterogeneous database environments in their shops, and want to know about the impact of migrating jobs from one database to another. So instead, I found the chance to work with the Postgres Professional company and identify both the strong and weak points of the two databases an excellent opportunity.

We wanted to test both databases on the same hardware, using the same tools and tests. We expected to test base functionality, and then work on more detailed comparisons. That way we could compare different real-world use case scenarios and popular options.

Spoiler: We are far from the final results. This is the start of a blog series.

OpenSource Databases on Big Machines, Series 1: “That Was Close…”

PostgreSQL Professional together with Freematiq provided two modern, powerful machines for tests.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

I also used a smaller Percona machine.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Note that machines with smaller numbers of CPU cores and faster disks are more common for MySQL installations than machines with larger numbers of cores.

The first thing we needed to agree on is which tool to use. A fair comparison only makes sense if the workloads are as close as possible.

The standard PostgreSQL tool for performance tests is pgbench, while for MySQL it’s SysBench. SysBench supports multiple database drivers and scriptable tests in the Lua programming language, so we decided to use this tool for both databases.

The initial plan was to convert pgbench tests into SysBench Lua syntax, and then run standard tests on both databases. After initial results, we modified our tests to better examine specific MySQL and PostgreSQL features.

I converted pgbench tests into SysBench syntax, and put the tests into an open-database-bench GitHub repository.

And then we both faced difficulties.

As I wrote already, I also ran the tests on a Percona machine. For this converted test, the results were almost identical:

Percona machine:

OLTP test statistics: transactions: 1000000 (28727.81 per sec.) read/write requests: 5000000 (143639.05 per sec.) other operations: 2000000 (57455.62 per sec.)

Freematiq machine:

OLTP test statistics: transactions: 1000000 (29784.74 per sec.) read/write requests: 5000000 (148923.71 per sec.) other operations: 2000000 (59569.49 per sec.)

I started investigating. The only place where the Percona machine was better than Freematiq’s was disk speed. So I started running the pgbench read-only test, which was identical to SysBench’s point select test with full dataset in memory. But this time SysBench used 50% of the available CPU resources:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4585 smirnova 20 0 0,157t 0,041t 9596 S 7226 1,4 12:27.16 mysqld 8745 smirnova 20 0 1266212 629148 1824 S 7126 0,0 9:22.78 sysbench

Alexander, in turn, had issues with SysBench, which could not create a high load on PostgreSQL when prepared statements were used:

93087 korotkov 20 0 9289440 3,718g 2964 S 242,6 0,1 0:32.82 sysbench 93161 korotkov 20 0 32,904g 81612 80208 S 4,0 0,0 0:00.47 postgres 93116 korotkov 20 0 32,904g 80828 79424 S 3,6 0,0 0:00.46 postgres 93118 korotkov 20 0 32,904g 80424 79020 S 3,6 0,0 0:00.47 postgres 93121 korotkov 20 0 32,904g 80720 79312 S 3,6 0,0 0:00.47 postgres 93128 korotkov 20 0 32,904g 77936 76536 S 3,6 0,0 0:00.46 postgres 93130 korotkov 20 0 32,904g 81604 80204 S 3,6 0,0 0:00.47 postgres 93146 korotkov 20 0 32,904g 81112 79704 S 3,6 0,0 0:00.46 postgres

We contacted SysBench author Alexey Kopytov, and he fixed MySQL issue. The solution is:

  • Use SysBench with the options --percentile=0 --max-requests=0  (reasonable CPU usage)
  • Use the concurrency_kit branch (better concurrency and Lua processing)
  • Rewrite Lua scripts to support prepared statements (pull request:
  • Start both SysBench and mysqld with the jemalloc or tmalloc library pre-loaded

A fix for PostgreSQL is on the way. For now, Alexander converted a standard SysBench test into pgbench format and we stuck with it. Not much new for MySQL, but at least we had a baseline for comparison.

The next difficulty I faced was the default operating system parameters. To make the long story short, I changed them to the recommended ones (described below):

vm.swappiness=1 cpupower frequency-set --governor performance kernel.sched_autogroup_enabled=0 kernel.sched_migration_cost_ns= 5000000 vm.dirty_background_bytes=67108864 vm.dirty_bytes=536870912 IO scheduler [deadline]

The same parameters were better for PostgreSQL performance as well. Alexander set his machine similarly.

After solving these issues we learned and implemented the following:

  • We cannot use a single tool (for now)
  • Alexander wrote a test for pgbench, imitating the standard SysBench tests
  • We are still not able to write custom tests because we use different tools

But we could use these tests as a baseline. After work done by Alexander, we stuck with the standard SysBench tests. I converted them to use prepared statements, and Alexander converted them into pgbench format.

I should mention that I was not able to get the same results as Dimitri for the Read Only and Point Select tests. They are close, but slightly slower. We need to investigate if this is the result of different hardware, or my lack of performance testing abilities. The results from the Read-Write tests are similar.

Another difference was between the PostgreSQL and MySQL tests. MySQL users normally have many connections. Setting the value of the variable max_conenctions, and limiting the total number of parallel connections to thousands is not rare nowadays. While not recommended, people use this option even without the thread pool plugin. In real life, most of these connections are sleeping. But there is always a chance they all will get used in cases of increased website activity.

For MySQL I tested up to 1024 connections. I used powers of two and multiplies of the number of cores: 1, 2, 4, 8, 16, 32, 36, 64, 72, 128, 144, 256, 512 and 1024 threads.

For Alexander, it was more important to test in smaller steps. He started from one thread and increased by 10 threads, until 250 parallel threads were reached. So you will see a more detailed graph for PostgreSQL, but no results after 250 threads.

Here are our comparison results.


  • pgsql-9.6 is standard PostgreSQL
  • pgsql-9.6 + pgxact-align is PostgreSQL with this patch (more details can be found in this blog post)
  • MySQL-5.7 Dimitri is Oracle’s MySQL Server
  • MySQL-5.7 Sveta is Percona Server 5.7.15



Sync commit in PostgreSQL is a feature, similar to innodb_flush_log_at_trx_commit=1 in InnoDB, and async commit is similar to innodb_flush_log_at_trx_commit=2.

You see that the results are very similar: both databases are developing very fast and work with modern hardware well.

MySQL results which show 1024 threads for reference.


OLTP RW with innodb_flush_log_at_trx_commit set to 1 and 2

After receiving these results, we did a few feature-specific tests that will be covered in separate blog posts.

More Information

MySQL Options for OLTP RO and Point SELECT tests:

# general table_open_cache = 8000 table_open_cache_instances=16 back_log=1500 query_cache_type=0 max_connections=4000 # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group=3 innodb_open_files=4000 # Monitoring innodb_monitor_enable = '%' performance_schema=OFF #cpu-bound, matters for performance #Percona Server specific userstat=0 thread-statistics=0 # buffers innodb_buffer_pool_size=128000M innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex innodb_log_buffer_size=64M # InnoDB-specific innodb_checksums=1 #Default is CRC32 in 5.7, very fast innodb_use_native_aio=1 innodb_doublewrite= 1 # innodb_stats_persistent = 1 innodb_support_xa=0 #(We are read-only, but this option is deprecated) innodb_spin_wait_delay=6 #(Processor and OS-dependent) innodb_thread_concurrency=0 join_buffer_size=32K innodb_flush_log_at_trx_commit=2 sort_buffer_size=32K innodb_flush_method=O_DIRECT_NO_FSYNC innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_page_cleaners=4 # perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 4 innodb_write_io_threads = 4 innodb_io_capacity=2000 innodb_io_capacity_max=4000 innodb_purge_threads=4 innodb_max_purge_lag_delay=30000000 innodb_max_purge_lag=0 innodb_adaptive_hash_index=0 (depends on workload, always check)

MySQL Options for OLTP RW:

#Open files table_open_cache = 8000 table_open_cache_instances = 16 query_cache_type = 0 join_buffer_size=32k sort_buffer_size=32k max_connections=16000 back_log=5000 innodb_open_files=4000 #Monitoring performance-schema=0 #Percona Server specific userstat=0 thread-statistics=0 #InnoDB General innodb_buffer_pool_load_at_startup=1 innodb_buffer_pool_dump_at_shutdown=1 innodb_numa_interleave=1 innodb_file_per_table=1 innodb_file_format=barracuda innodb_flush_method=O_DIRECT_NO_FSYNC innodb_doublewrite=1 innodb_support_xa=1 innodb_checksums=1 #Concurrency innodb_thread_concurrency=144 innodb_page_cleaners=8 innodb_purge_threads=4 innodb_spin_wait_delay=12 Good value for RO is 6, for RW and RC is 192 innodb_log_file_size=8G innodb_log_files_in_group=16 innodb_buffer_pool_size=128G innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex innodb_io_capacity=18000 innodb_io_capacity_max=36000 innodb_flush_log_at_timeout=0 innodb_flush_log_at_trx_commit=2 innodb_flush_sync=1 innodb_adaptive_flushing=1 innodb_flush_neighbors = 0 innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_adaptive_hash_index=0 innodb_change_buffering=none #can be inserts, workload-specific optimizer_switch="index_condition_pushdown=off" #workload-specific

MySQL SysBench parameters:

LD_PRELOAD=/data/sveta/5.7.14/lib/mysql/  /data/sveta/sbkk/bin/sysbench  [ --test=/data/sveta/sysbench/sysbench/tests/db/oltp_prepared.lua | --test=/data/sveta/sysbench/sysbench/tests/db/oltp_simple_prepared.lua ]  --db-driver=mysql --oltp-tables-count=8 --oltp-table-size=10000000 --mysql-table-engine=innodb --mysql-user=msandbox --mysql-password=msandbox  --mysql-socket=/tmp/mysql_sandbox5715.sock --num-threads=$i --max-requests=0 --max-time=300 --percentile=0 [--oltp-read-only=on --oltp-skip-trx=on]
PostgreSQL pgbench parameters:

$ git clone $ cd pg_oltp_bench $ make USE_PGXS=1 $ sudo make USE_PGXS=1 install $ psql DB -f oltp_init.sql $ psql DB -c "CREATE EXTENSION pg_oltp_bench;" $ pgbench -c 100 -j 100 -M prepared -f oltp_ro.sql -T 300 -P 1 DB $ pgbench -c 100 -j 100 -M prepared -f oltp_rw.sql -T 300 -P 1 DB

Features in MySQL 5.7 that significantly improved performance:

  • InnoDB: transaction list optimization
  • InnoDB: Reduce lock_sys_t::mutex contention
  • InnoDB: fix index->lock contention
  • InnoDB: faster and parallel flushing
    • Multiple page cleaner threads: WL #6642
    • Reduced number of pages which needs to be flushed: WL #7047
    • Improved adaptive flushing: WL #7868
  • MDL (Meta-Data Lock) scalability
    • Remove THR_LOCK::mutex for InnoDB: Wl #6671
    • Partitioned LOCK_grant
    • Number of partitions is constant
    • Thread ID used to assign partition
    • Lock-free MDL lock acquisition for DML

Anastasia: The initial findings of this research were announced at Percona Live Amsterdam 2016. More findings were added to the second version of the same talk given at Moscow HighLoad++ 2016. Hopefully the third iteration of this talk will be available at Percona Live Open Source Database Conference 2017 in Santa Clara. Stay tuned: the Percona Live Committee is working on the program!



















Categories: MySQL

MongoDB Ransomware: Not Likely, But How Do You Know?

MySQL Performance Blog - Thu, 2017-01-05 20:07

In this blog post, we’ll look at some of the concerns recently seen around MongoDB ransomware and security issues.

Security blogs and magazines have recently been aflutter with the news that a hacker is stealing data from MongoDB instantiations and demanding bitcoins to get the data back. This sounds pretty bad at first glance, but let’s examine the facts.

The hacker needs a few things to pull this off:

  1. MongoDB is running on default ports
  2. MongoDB is not using authentication
  3. MongoDB is accessible on the Internet with no security groups or firewalls

If this sounds familiar, you might remember a similar flurry occurred last year when people counted the number of open MongoDB installs on the web. That required these same conditions to all be true. This also means the solution is the same: you simply need to make sure you follow the normal security practices of locking down ports and using authentication. Not so scary after all, right?

What does this hack look like?

Finding out if this happened is simple: your data is removed and gone! In its place, you will find a “WARNING” database, which holds a “WARNING” collection. This collection has a document that looks like:

{ "_id" : ObjectId("5859a0370b8e49f123fcc7da"), "mail" : "", "note" : "SEND 0.2 BTC TO THIS ADDRESS 13zaxGVjj9MNc2jyvDRhLyYpkCh323MsMq AND CONTACT THIS EMAIL WITH YOUR IP OF YOUR SERVER TO RECOVER YOUR DATABASE !" }

To fix this, hopefully, you have backups. If you don’t, you might want to look at on how to get consistent backups. If not, you will need to send the hackers the 0.2 bitcoins (~200 USD) to get your data back.

So, backup!

But this brings us to the real question: can you be hijacked? It’s pretty easy to check:

  1. Do you have authentication on? Try running this command:

rs1:PRIMARY> if (db.adminCommand('getCmdLineOpts') === undefined || db.adminCommand('getCmdLineOpts') === undefined || db.adminCommand('getCmdLineOpts') == "disabled"){ print("Auth not enabled!")}else{print("Your safe!")} Auth not enabled!

  1. Are you running on a non-default port? Simply run this command (if you’re using 27017 or 29017, you’re using a default port):

rs1:PRIMARY> db.adminCommand('getCmdLineOpts') 27001

The last part is a bit harder if the other two are both false. You will need to spin up a server outside of your environment and test the connection. I suggest an Amazon EC2 Micro instance (it’s very inexpensive – free if you use a new account). It’s simple to install a MongoDB client on. Check your setup:

  1. Login to Amazon and launch an EC2 node.
  2. Open a shell to this node (this can be done via their website).
  3. Get MongoDB’s binaries:

wget -q --show-progress gzip -d mongodb-linux-x86_64-amazon-3.4.1.tgz tar xf mongodb-linux-x86_64-amazon-3.4.1.tar -C 3.4 --strip-components=1

  1. Try and connect to your MongoDB Server

./3.4/bin/mongo --host <your_host_name> --port <your_mongod_port>

If this connects, and you can run “db.serverStatus()”, you are at risk and should enable authentication ASAP!

We will have a blog out shortly on the particulars of creating a user. To enable authentication, you simply need to add “–auth” to your startup, or the following to your YAML config file:

security: authorization:1

This should get you started on correctly protecting yourself against MongoDB ransomware (and other security threats). If you want to have someone review your security, or even help you use LDAP to tie into your main authentication systems, please contact us.

Categories: MySQL
Syndicate content