CVE-2016-6225: Percona Xtrabackup Encryption IV Not Being Set Properly

MySQL Performance Blog - Thu, 2017-01-12 21:34

If you are using Percona XtraBackup with xbcrypt to create encrypted backups, and are using versions older than 2.3.6 or 2.4.5, we advise that you upgrade Percona XtraBackup.

Note: this does not affect encryption of encrypted InnoDB tables.


Percona XtraBackup versions older than 2.3.6 or 2.4.5 suffered an issue of not properly setting the Initialization Vector (IV) for encryption. This could allow someone to carry out a Chosen-Plaintext Attack, which could recover decrypted content from the encrypted backup files without the need for a password.


Percona XtraBackup carries backward compatibility to allow for the decryption of older backup files. However, encrypted backup files produced by the versions that have the fix will not be compatible with older versions of Percona XtraBackup.


Access to the encrypted files must already be present for exploitation to occur. So long as you adequately protect the encrypted files, we don’t expect this issue to adversely affect users.


Percona would like to thank and give credit to Ken Takara for discovering this issue and working it through to PoC exploitation.

More Information Release Notes
Categories: MySQL

The Percona Online Store: Get Database Help Now with Support and Health Audit Services

MySQL Performance Blog - Thu, 2017-01-12 21:10

We are proud to announce the new Percona online store!

Keeping your database environment tuned, optimized and high-performance is key to achieving business goals. If your database goes down, so does your business. Percona experts have a long history of helping enterprises ensure their databases are running smoothly. With Percona, you can meet today’s workloads, and prepare for future workloads before they impact performance.

Now we’ve made it even easier to get Percona database services: visit Percona’s new online store! The webstore is perfect for ordering a health audit and immediate, smaller-scale database support. Simply select your service type, answer a few questions about your environment, and then submit. A Percona expert will be in touch.

The webstore makes it fast and easy to purchase Percona Services, with recurring monthly credit card payments. Shop now for Percona’s highly responsive, effective and affordable support and service options, including MySQL Standard Support, MongoDB Standard Support and a MySQL Health Audit.

Percona has some of the best reviews and one of the highest renewal rates in the industry. We can help you increase your uptime, be more productive, reduce your support budget and implement fixes for performance issues faster.

Check out the new Percona online store here!

Categories: MySQL

How to Replace MySQL with Percona Server on a CPanel, WHM VPS or Dedicated Server

MySQL Performance Blog - Wed, 2017-01-11 18:41

In this blog post, we’ll look at how to replace MySQL with Percona Server for MySQL on a CPanel, WHM VPS or dedicated server.

In general, CPanel and WHM have been leaning towards support of MariaDB over other flavors. This is partly due to the upstream repos replacing the MySQL package with MariaDB (for example, on CentOS).

MySQL 5.6 is still supported though, which means they are keeping support for core MySQL products. But if you want to get some extra performance enhancements or enterprise features for free, without getting too many bells and whistles, you might want to install Percona Server.

I’ve done this work on a new dedicated server with the latest WHM and CPanel on CentOS 7, with MySQL 5.6 installed. Besides the backup, this is a fairly quick process.

It’s pretty simple. From the Percona Server for MySQL 5.7 installation doc, we can get the YUM repo. (Run commands as root if you can, otherwise as sudo.)

yum install

Now that we have the repo, let’s install Percona XtraBackup in case we need to roll this back at any point:

yum install percona-xtrabackup

This server had a drive mounted at /backup, so I created the backup with the following commands:

xtrabackup --target-dir=/backup/xtrabackup --backup xtrabackup --target-dir=/backup/xtrabackup --prepare

Now that we have a good backup, let’s remove MySQL:

service mysql stop yum remove MySQL* mysql*

Depending on your dependency chain, this could remove Percona XtraBackup, but that can be fixed. Let’s accept this uninstall.

Let’s install Percona Server for MySQL 5.7 and Percona Toolkit:

yum install Percona-Server-server-57 percona-toolkit percona-xtrabackup

Now that it’s installed ensure the mysql service is running. If it isn’t, start it. Now let’s upgrade:


NOTE. This works if you can log in as root without a password; if you can’t, you will need to specify the -u and -p flags.

Once you run the upgrade, restart the mysql service:

service mysql restart

And there you go, you are now running on Percona Server for MySQL 5.7. If your managed providers tell you it’s not supported, don’t worry! It works as long as CPanel supports MySQL 5.6.

If you have any issues, just restore the backup.

NOTE: One thing to keep in mind is that 5.7 breaks CPanel’s ability to create users in MySQL. I believe this is due to the changes to the mysql.user table. If this is an issue for you, you can always use Percona Server for MySQL 5.6.

Categories: MySQL

Reinstall MySQL and Preserve All MySQL Grants and Users

MySQL Performance Blog - Wed, 2017-01-11 16:35

In this blog post, we’ll look at how to preserve all MySQL grants and users after reinstalling MySQL.

Every so often, I need to reinstall a MySQL version from scratch and preserve all the user accounts and their permissions (or move the same users and privileges to another server).

As of MySQL 5.7, MySQL does not make this easy! MySQL SHOW GRANTS only shows permissions for one user, and the method suggested on StackExchange – dumping tables containing grants information directly – is not robust (as Rick James mentions in the comments). It also doesn’t work between different MySQL versions.

This problem is easily solved, however, with the pt-show-grants tool from Percona Toolkit (which serves pretty much as a mysqldump for user privileges).  

All you need to do is:

  1. On the source, or to backup MySQL privileges, run:

pt-show-grants > grants.sql

  1. On the target, or to restore MySQL privileges, run:

mysql  < grants.sql

  1. If you would like to clean up the old privileges from MySQL before loading new ones, use:

pt-show-grants --drop  --ignore root@localhost | grep "^DROP USER " | mysql

This removes all the users (except the root user, which you will need to connect back and load new privileges).

With Percona Toolkit, preserving your grants and user privileges is easy!

Categories: MySQL

Webinar Thursday, January 12: Percona Software News and Roadmap Update for Q1 2017

MySQL Performance Blog - Tue, 2017-01-10 19:44

Please join Percona CEO Peter Zaitsev for a webinar on Thursday, January 12, 2017 at 11 am PST/ 2 pm EST (UTC-8) for a discussion on the Percona Software News and Roadmap Update for Q1 2017.

In this webinar, Peter will discuss what’s new in Percona open source software. This will include Percona Server for MySQL and MongoDB, Percona XtraBackup, Percona Toolkit, Percona XtraDB Cluster and Percona Monitoring and Management.

During this webinar Peter will talk about newly released features in Percona software, show a few quick demos and share with you highlights from the Percona open source software roadmap.

Peter will also talk about new developments in Percona commercial services and finish with a Q&A.

Register for the Percona Software News and Roadmap Update webinar here.

Peter Zaitsev co-founded Percona and assumed the role of CEO in 2006. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the business. With over 150 professionals in 20 plus countries, Peter’s venture now serves over 3000 customers – including the “who’s who” of internet giants, large enterprises and many exciting startups.

Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. Peter is a co-author of High Performance MySQL: Optimization, Backups, and Replication, one of the most popular books on MySQL performance. Peter frequently speaks as an expert lecturer at MySQL and related conferences, and regularly posts on the Percona Data Performance Blog. He has also been tapped as a contributor to Fortune and DZone, and his recent ebook Practical MySQL Performance Optimization is one of Percona’s most popular downloads.


Categories: MySQL

How to Move a MySQL Partition from One Table to Another

MySQL Performance Blog - Tue, 2017-01-10 18:00

In this blog post we’ll look at how to move a MySQL partition from one table to another, for MySQL versions before 5.7.

Up to version 5.7, MySQL had a limitation that made it impossible to directly exchange partitions between partitioned tables. Now and then, we get questions about how to import an .ibd for use as a partition in a table, as well as how to exchange partitions with another partitioned table. Below is step-by-step instructions on how to move a partition from one table to another.

In this example, one of our customers had two tables with the following structures:

CREATE TABLE live_tbl ( some_id bigint(20) NOT NULL DEFAULT '0', summary_date date NOT NULL, PRIMARY KEY (some_id,summary_date) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci /*!50500 PARTITION BY RANGE COLUMNS(summary_date) (PARTITION p201203 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB, PARTITION p201204 VALUES LESS THAN ('2012-05-01') ENGINE = InnoDB, PARTITION p201205 VALUES LESS THAN ('2012-06-01') ENGINE = InnoDB, PARTITION p201206 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB, PARTITION future VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;

CREATE TABLE archive_tbl ( some_id bigint(20) NOT NULL DEFAULT '0', summary_date date NOT NULL, PRIMARY KEY (some_id,summary_date) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci /*!50500 PARTITION BY RANGE COLUMNS(summary_date) (PARTITION p201109 VALUES LESS THAN ('2011-10-01') ENGINE = InnoDB, PARTITION p201110 VALUES LESS THAN ('2011-11-01') ENGINE = InnoDB, PARTITION p201111 VALUES LESS THAN ('2011-12-01') ENGINE = InnoDB, PARTITION p201112 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB, PARTITION p201201 VALUES LESS THAN ('2012-02-01') ENGINE = InnoDB, PARTITION p201202 VALUES LESS THAN ('2012-03-01') ENGINE = InnoDB, PARTITION future VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */;

And their (likely obvious) goal is to move (not copy) the oldest partition from live_tbl to archive_tbl. To achieve this, we came up with the following procedure:

For the following, we assume:

  • The datadir is “/var/lib/mysql/”
  • MySQL Server is run by “mysql” Linux user
  • “p201203” is the partition name you want to move
  • “live_tbl is the source table from where you want to move the partition
  • “archive_tbl” is the destination table to where you want to move the partition
  • “dest_tbl_tmp” is the temporary table we will create, using the same CREATE TABLE criteria as in the live_tbl
  • “thedb” is the database name
1. Copy the .ibd data file from that particular partition

First, make sure you flush any pending changes to disk and that the table is locked, so that binary table copies can be made while the server is running. Keep in mind that the table will be locked while you copy the .ibd file. All reads/writes during that time will be blocked.

Important: Don’t close this session or the lock will be released.

mysql> USE thedb mysql> FLUSH TABLE live_tbl FOR EXPORT;

Open another session, and copy the .ibd file to a temporary folder.

shell> cp /var/lib/mysql/thedb/live_tbl#P#p201203.ibd /tmp/dest_tbl_tmp.ibd

After you copy the .ibd file to the temporary folder, go back to the MySQL session and unlock the table so that all reads and writes to that particular table are allowed again.


2. Prepare a temporary table to import the tablespace

Create a temporary table exactly like the one into which you want to import the partition. Remove the partitioning on it and discard the tablespace so that it is ready for the .ibd import.

mysql> CREATE TABLE dest_tbl_tmp LIKE archive_tbl; mysql> ALTER TABLE dest_tbl_tmp REMOVE PARTITIONING; mysql> ALTER TABLE dest_tbl_tmp DISCARD TABLESPACE;

3.  Import the tablespace to the temporary table

Place the .ibd file in the appropriate folder, set the correct permissions and ownership and then import the tablespace to the temporary table.

shell> cp /tmp/dest_tbl_tmp.ibd /var/lib/mysql/thedb/ shell> chmod 660 /var/lib/mysql/thedb/dest_tbl_tmp.ibd shell> chown mysql.mysql /var/lib/mysql/thedb/dest_tbl_tmp.ibd mysql> ALTER TABLE dest_tbl_tmp IMPORT TABLESPACE;

4. Swap the tablespace with the destination table’s partition tablespace

Partition according to your own schema. (This is just an example using date values. In our case, we have to REORGANIZE PARTITION to accommodate a new LESS THAN range before the MAXVALUE.)

mysql> ALTER TABLE archive_tbl REORGANIZE PARTITION future INTO ( PARTITION p201203 VALUES LESS THAN ('2012-04-01'), PARTITION future VALUES LESS THAN (MAXVALUE) ); mysql> ALTER TABLE archive_tbl EXCHANGE PARTITION p201203 WITH TABLE dest_tbl_tmp;

5. Check that the partitions are correctly exchanged before dropping the one from the source table

SELECT * FROM archive_tbl; SELECT * FROM dest_tbl_tmp; SELECT * FROM live_tbl; ALTER TABLE live_tbl DROP PARTITION p201203;

For more information on why these steps are needed, please check the following documentation link for ALTER TABLE … EXCHANGE PARTITION:

In MySQL version 5.7, it is possible to exchange partitions without the unpartitioned table step, as described in the following link:

There are bugs related to the steps in this guide that might be useful to take into consideration:

Categories: MySQL

MongoDB PIT Backups: Part 2

MySQL Performance Blog - Tue, 2017-01-10 00:14

This blog post is the second in a series covering MongoDB PIT backups. You can find the first part here.

Sharding Makes Everything Fun(ner)

The first blog post in this series looked at MongoDB backups in a simple single-replica set environment. In this post, we’ll look at the scale-out use case. When sharding, we have exactly the same problem as we do on a single replica set. However, now the problem is multiplied by the number of replica sets in the cluster. Additionally, we have a bonus problem: each replica set has unique data. That means to get a truly consistent snapshot of the cluster, we need to orchestrate our backups to capture a single consistent point in time. Just so we’re on the same page, that means that every replica set needs to stop their backups at, or near, the same time that the slowest replica set stops. Are you sufficiently confused now? Let me get to a basic concept that I forgot to cover in the first post, and then I’ll give you a simple description of the problem.

Are you Write Concerned?

So far, I’ve neglected to talk about the very important role of “write concern” when taking consistent backups. In MongoDB, the database is not durable by default. By “durable,” I mean “on disk” when the database acknowledges receipt of an operation from your application. There are most likely several reasons for this. Most likely the biggest one originally was probably throughput given a lack of concurrency.

However, the side effect is possible data loss due to loss of operations applied only in memory. Changing the write concern to “journaled” (j : true) will change this behavior so that MongoDB journals changes before acknowledging them (you also need to be running with journal enabled).

TIP: For true durability in a replica set, you should use a write concern of “majority” for operations and the writeConcernMajorityJournalDefault : true on all replica set members (new to v3.4). This has the added benefit of greatly decreasing the chance of rollback after an election.

Wow, you’re inconsistent

At the risk of being repetitive, the crux of this issue is that we need to run a backup on every shard (replica set). This is necessary because every shard has a different piece of the data set. Each piece of that data set is necessary to get an entire picture of the data set for the cluster (and thus, your application). Since we’re using mongodump, we’ll only have a consistent snapshot at the point in time when the backup completes. This means we must end each shard’s backup at a consistent point in time. We cannot expect that the backup will complete in exactly the same amount of time on every shard, which is what we’ll need for a consistent point in time across the cluster. This means that Shard1 might have a backup that is consistent to 12:05 PM, and another shard that is consistent to 12:06 PM. In a high traffic environment (the kind that’s likely to need horizontal scale), this could mean thousands of lost documents. Here’s a diagram:

MongoDB PIT Backups


Here’s the math to illustrate the problem:

  • Shard1’s backup will contain 30,000 documents ((100 docs * 60 secs) * 5 mins)
  • Shard2’s backup will contain 36,000 documents ((100 docs * 60 secs) * 6 mins)

In this example, to get a consistent point in time you’d need to remove all insert, update and delete operations that happened on Shard 2 from the time that Shard 1’s backup completed (6,000 documents). This means examining the timestamp of every operation in the oplog and reversing it’s operation. That’s a very intensive process, and will be unique for every mongodump that’s executed. Furthermore, this is a pretty tricky thing to do. The repeatable and much more efficient method is to have backups that finish in a consistent state, ready to restore when needed.

Luckily, Percona has you covered!

You’re getting more consistent

Having data is important, but knowing what data you have is even more important. Here’s how you can be sure you know what you have in your MongoDB backups:

David Murphy has released his MongoDB Consistent Backup Tool in the Percona Labs github account, and has written a very informative blog post about it. My goal with these blog posts is to make it even easier to understand the problem and how to solve it. We’ve already had an exhaustive discussion about the problem on both small and large scales. How about the solution?

It’s actually pretty simple. The solution, at a basic level, is to use a simple algorithm to decide when a cluster-wide consistent point-in-time can be reached. In the MongoDB Consistent Backup tool, this is done by the backup host kicking off backups on a “known good member” of each shard (that’s a pretty cool feature by itself) and then tracking the progress of each dump. At the same time the backup is kicked off, the backup host kicks off a separate thread that tails the oplog on each “known good member” until the mongodump on the slowest shard completes. By using this method, we have a very simple way of deciding when we can get a cluster-wide consistent snapshot. In other words, when the slowest member completes their piece of the workload. Here’s the same workload from Figure 4, but with the MongoDB Consistent Backup Tool methodology:

MongoDB PIT Backups


TIP: The amount of time that it takes to perform these backups is often decided by two factors:

  1. How evenly distributed the data is across the shards (balanced)
  2. How much data each shard contains (whether or not it’s balanced).

The takeaway here is that you may need to shard so that each shard has a manageable volume of data. This allows you to hit your backup/restore windows more easily.

…The Proverbial “Monkey Wrench”

There’s always a “gotcha” just when you think you’ve got your mind around any difficult concept. Of course, this is no different.

There is one very critical concept in sharding that we didn’t cover: tracking what data lies on which shard. This is important for routing the workload to the right place, and balancing the data across the shards. In MongoDB, this is completed by the config servers. If you cannot reach (or recover) your config servers, your entire cluster is lost! For obvious reasons, you need to back them up as well. With the Percona Labs MongoDB Consistent Backup Tool, there are actually two modes used to backup config servers: v3.2 and greater, and legacy. The reason is that in v3.2, config servers went from mirrors to a standard replica set. In v3.2 mode, we just treat the config servers like another replica set. They have their own mongodump and oplog tail thread. They get a backup that is consistent to the same point in time as all other shards in the cluster. If you’re on a version of MongoDB prior to v3.2, and you’re interested in an explanation of legacy mode, please refer back to David’s blog post.

The Wrap Up

We’ve examined the problems with getting consistent backups in a running MongoDB environment in this and the previous blog posts. Whether you have a single replica set or a sharded cluster, you should have a much better understanding of what the problems are and how Percona has you covered. If you’re still confused, or you’d just like to ask some additional questions, drop a comment in the section below. Or shoot me a tweet @jontobs, and I’ll make sure to get back to you.

Categories: MySQL

MySQL 8.0.1: The Next Development Milestone

MySQL Performance Blog - Mon, 2017-01-09 16:49

This post discusses the next MySQL development milestone: MySQL 8.0.1.

From the outset, MySQL 8.0 has received plenty of attention. Both this blog (see the MySQL 8.0 search) and other sites around the Internet have covered it. Early reviews seem positive (including my own MySQL 8.0 early bugs review). There is plenty of excitement about the new features.

As for early feedback on MySQL 8.0, Peter Zaitsev (Percona CEO) listed a set of recommendations for benchmarking MySQL 8.0. I hope these get reviewed and implemented.

MySQL achieved the current development milestone (available for download on on September 12, 2016. Its release immediately came with a detailed review by Geir Hoydalsvik from MySQL. If you haven’t had the opportunity to do so yet, you can also review the MySQL 8.0 release notes.

It now looks like we’re nearing 8.0.1, the next development milestone. I don’t have insider information, but it’s quite clear when navigating that:

Regarding timing, it’s interesting to note that the “What Is New in MySQL 8.0” page was updated on the 6th of January.

It looks like the release might come soon. So, restrain your excitement for a few days (or weeks?) more. Maybe you’ll be able to checkout the all new MySQL 8.0.1!

PS: If MySQL quality interests you, have a look at this recent – and very interesting – change made to the MTR (MySQL Test Run, the MySQL test suite) program. I believe it improves quality for everyone who runs MySQL (including its forks). The tests (which are run worldwide, often for each code change made) will now test the product with its own defaults.

Categories: MySQL

Archiving MySQL and MongoDB Data

MySQL Performance Blog - Fri, 2017-01-06 21:07

This post discusses archiving MySQL and MongoDB data, and determining what, when and how to archive data.

Many people store infrequently used data. This data is taking up storage space and might make your database slower than it could be. Archiving data can be a huge benefit, both regarding the performance impact and storage savings.

Why archive?

One of the reasons for archiving data is freeing up space on your database volumes. You can store archived data on slower, less expensive storage devices, and current data on the faster database drives. Archiving old data makes backups and restores run faster since they need to process less data. Last, but by no means least, archiving data has the benefit of making your queries perform more efficiently since they do not need to process through old data.

What do you archive?

That is the big question. Archiving too much is just as detrimental as not archiving enough (or at all). As you’ll see, finding this balance requires foresight and planning. Fortunately, you can tweak your archiving scheme to make it work better as time goes by,

Some people feel that keeping all the data in their database, even if they don’t access that data frequently, is the best way to go. If you are lucky enough to have vast quantities of storage, and a database that is performing well, keeping all of the data in your database might be a good idea. Even with lots of storage, archiving off some data that you don’t use regularly might have advantages. We all know someone whose desk is piled with stacks of paper. When they need something, they tell us that they know where everything is. Even if they can find the requested item, they need to work through the piles of stuff to locate it. They also have to decide where to put new items so that they can be easily found. In your database, this equates to slower queries and potentially slower writes. Clearing out some of the less frequently accessed data will have a beneficial effect overall.

At the other end of the spectrum are the people who want to archive in a manner that is too aggressive. This means that any requests for data must access the archive location This might be slower and more burdensome, causing the queries to run slowly. In addition, new data written into the database will have to go through an archive process fairly quickly, which might slow down the database. This is the person who puts each and every item they own into storage. It makes for a clean home, but it’s tough to find many of the items that you own. In our database, this means that most queries are run against archived data, and archive processes are frequently running. This too can slow down performance overall.

The best archiving solution is one that meets both the needs of efficient use of storage and efficiency of queries and inserts. You want to be able to write new data quickly, access frequently used data promptly, and still be able to get the information that might not often be used. There is no simple answer here: each company will have different needs and requirements. For some companies, regulations might govern how long data must be stored. With these sorts of requirements in place, you should look to place data that isn’t accessed often on a storage medium that is lower in cost (and often slower in performance). It is still there, but it is not crowding out the more commonly used data. Other companies might query or manipulate data shortly after it is loaded into the database, and they might be able to archive more often.

When do you archive?

This is another big consideration. Do you archive data daily, weekly, monthly, annually or on some other schedule? The basic answer is that it doesn’t matter what the schedule is. It matters that there is some sort of schedule, and that archiving is happening as expected. Keeping to a schedule allows everyone to know that the data is being archived as expected, and will avoid any “gee, we completely forgot about doing that” issues from arising.

Frequent archiving (daily or weekly) is good when you have high data volumes and normally need to access only the latest data in your queries. Think of stock data. Queries to pull trade volumes and pricing over a short time period are more frequent than queries that would analyze a stock’s performance over time. Therefore, archiving old data can be helpful since it keeps the frequently accessed table’s data easily accessible, but still accommodates the need to get at data for longer time spans. With high data volume, you might need to archive often so that one archive process can complete before another is started.

Less frequent archiving might be used when you have longer term projects or if you find your current database is performing reasonably well. In these cases, archiving monthly, quarterly, or annually might make sense. This is like cleaning out your garage or attic. You might do it, but you probably don’t do it every week. The amount of stuff being stored, along with the space to store it in, might determine how often you do this type of cleanup.

How do you go about archiving MySQL and MongoDB data?

There are lots of possibilities here as well. If well planned, it can be an easy implementation. But like many things, figuring this out is usually done once things have gotten a little out of control.

You can archive data using a standard backup, moving it to another table in the database, exporting to a flat file, or moving it to another database altogether. The end goal is a cleaner production environment that still allows access to the archived data if it is needed. The method for performing the archive determines the method used to bring that data back to a state in which it can be queried. One of the considerations must be how much time you are willing and able to invest in making that data available again.

  1. You can use your standard backup method to create and manage your archive, but this is a solution that is cumbersome and prone to error. You can perform a backup and then delete the unwanted data from your table(s). Now, the deleted data is only stored in your backup and must be restored in order to be queried. You should restore to another database for this purpose so that you keep your production environment clean. With this option, you also have to consider the methods for recovering space used by deleted files. This opens to the possibility of someone restoring to the original database, which can cause a host of problems. With MongoDB, there is an optional –archive option that moves the data to an archive location that you specify. MongoDB version 3.2 added this option.
  2. Another possibility is to move the data to another MySQL table or MongoDB collection in the existing database (i.e., moving from the transactions table to transactions_archived). This is a fast and efficient way to backup the data, and it allows for easy querying since the data still resides in the database. Of course, this assumes that you have enough storage space to accommodate the active and the archive tables/collections.
  3. You can also export the data to be archived to a flat file and then delete it from the original table or collection. This is workable if the data needs to be kept available but is unlikely to be regularly needed. (It will need to be imported in order to query it.) This method also comes with all the caveats about needing to delete and recover the space of the archived records, issues with importing into the original database (and ruining all the good archiving work you’ve done, and the possibility of deleting the flat file.
  4. Alternatively, you can move the data to another database. This too can be an effective method for archiving data, and can also allow that data to be made available to others for query analysis. Once again, all warnings about recovering the space apply, but the good thing here is that the data does not need to be restored to be queried. It is simply queried through the other database.

Another option for archiving MySQL data is a tool like pt-archiver. pt-archiver is a component of the Percona Toolkit that nibbles old data from a source table and moves it to a target table. The target can be in the current or an archive database. It is designed for use with an up-and-running database. It has minimal impact on the overall performance of the database. It is part of the Percona Toolkit, so it is available as an open source download. It has the benefit of slowly working to move the data and is always running. This allows it to archive data regularly and cleanly. One warning is that it does delete the data from the source table, so you should test it before running it in production. pt-archiver works with MySQL data only. It is also important to note that removing large quantities of data might cause InnoDB fragmentation. Running OPTIMIZE TABLE to recover the space resolves this. As of version 5.7.4, this is no longer a locking action.

So now what?

Unless you are in the enviable position where archiving MySQL and MongoDB data isn’t an issue, the first step is to come up with an archiving scheme. This will likely involve many different people since there can be an impact across the entire organization. Determine what can and should be archived, and then determine how best to archive the data. Document the process and test it before putting it into production. In the end, your database and your users will thank you.

Categories: MySQL

Millions of Queries per Second: PostgreSQL and MySQL’s Peaceful Battle at Today’s Demanding Workloads

MySQL Performance Blog - Fri, 2017-01-06 16:33

This blog compares how PostgreSQL and MySQL handle millions of queries per second.

Anastasia: Can open source databases cope with millions of queries per second? Many open source advocates would answer “yes.” However, assertions aren’t enough for well-grounded proof. That’s why in this blog post, we share the benchmark testing results from Alexander Korotkov (CEO of Development, Postgres Professional) and Sveta Smirnova (Principal Technical Services Engineer, Percona). The comparative research of PostgreSQL 9.6 and MySQL 5.7 performance will be especially valuable for environments with multiple databases.

The idea behind this research is to provide an honest comparison for the two popular RDBMSs. Sveta and Alexander wanted to test the most recent versions of both MySQL and PostgreSQL with the same tool, under the same challenging workloads and using the same configuration parameters (where possible). However, because both PostgreSQL and MySQL ecosystems evolved independently, with standard testing tools (pgbench and SysBench) used for each database, it wasn’t an easy journey.

The task fell to database experts with years of hands-on experience. Sveta has worked as a Senior Principal Technical Support Engineer in the Bugs Verification Group of the MySQL Support Group at Oracle for more than eight years, and since 2015  has worked as a Principal Technical Services Engineer at Percona. Alexander Korotkov is a PostgreSQL major contributor, and the developer of a number PostgreSQL features – including the CREATE ACCESS METHOD command, generic WAL interface, lockfree Pin/UnpinBuffer, index-based search for regular expressions and much more. So we have a pretty decent cast for this particular play!

SvetaDimitri Kravtchuk regularly publishes detailed benchmarks for MySQL, so my main task wasn’t confirming that MySQL can do millions of queries per second. As our graphs will show, we’ve passed that mark already. As a Support Engineer, I often work with customers who have heterogeneous database environments in their shops, and want to know about the impact of migrating jobs from one database to another. So instead, I found the chance to work with the Postgres Professional company and identify both the strong and weak points of the two databases an excellent opportunity.

We wanted to test both databases on the same hardware, using the same tools and tests. We expected to test base functionality, and then work on more detailed comparisons. That way we could compare different real-world use case scenarios and popular options.

Spoiler: We are far from the final results. This is the start of a blog series.

OpenSource Databases on Big Machines, Series 1: “That Was Close…”

PostgreSQL Professional together with Freematiq provided two modern, powerful machines for tests.

Hardware configuration:

Processors: physical = 4, cores = 72, virtual = 144, hyperthreading = yes
Memory: 3.0T
Disk speed: about 3K IOPS
OS: CentOS 7.1.1503
File system: XFS

I also used a smaller Percona machine.

Hardware configuration:

Processors: physical = 2, cores = 12, virtual = 24, hyperthreading = yes
Memory: 251.9G
Disk speed: about 33K IOPS
OS: Ubuntu 14.04.5 LTS
File system: EXT4

Note that machines with smaller numbers of CPU cores and faster disks are more common for MySQL installations than machines with larger numbers of cores.

The first thing we needed to agree on is which tool to use. A fair comparison only makes sense if the workloads are as close as possible.

The standard PostgreSQL tool for performance tests is pgbench, while for MySQL it’s SysBench. SysBench supports multiple database drivers and scriptable tests in the Lua programming language, so we decided to use this tool for both databases.

The initial plan was to convert pgbench tests into SysBench Lua syntax, and then run standard tests on both databases. After initial results, we modified our tests to better examine specific MySQL and PostgreSQL features.

I converted pgbench tests into SysBench syntax, and put the tests into an open-database-bench GitHub repository.

And then we both faced difficulties.

As I wrote already, I also ran the tests on a Percona machine. For this converted test, the results were almost identical:

Percona machine:

OLTP test statistics: transactions: 1000000 (28727.81 per sec.) read/write requests: 5000000 (143639.05 per sec.) other operations: 2000000 (57455.62 per sec.)

Freematiq machine:

OLTP test statistics: transactions: 1000000 (29784.74 per sec.) read/write requests: 5000000 (148923.71 per sec.) other operations: 2000000 (59569.49 per sec.)

I started investigating. The only place where the Percona machine was better than Freematiq’s was disk speed. So I started running the pgbench read-only test, which was identical to SysBench’s point select test with full dataset in memory. But this time SysBench used 50% of the available CPU resources:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4585 smirnova 20 0 0,157t 0,041t 9596 S 7226 1,4 12:27.16 mysqld 8745 smirnova 20 0 1266212 629148 1824 S 7126 0,0 9:22.78 sysbench

Alexander, in turn, had issues with SysBench, which could not create a high load on PostgreSQL when prepared statements were used:

93087 korotkov 20 0 9289440 3,718g 2964 S 242,6 0,1 0:32.82 sysbench 93161 korotkov 20 0 32,904g 81612 80208 S 4,0 0,0 0:00.47 postgres 93116 korotkov 20 0 32,904g 80828 79424 S 3,6 0,0 0:00.46 postgres 93118 korotkov 20 0 32,904g 80424 79020 S 3,6 0,0 0:00.47 postgres 93121 korotkov 20 0 32,904g 80720 79312 S 3,6 0,0 0:00.47 postgres 93128 korotkov 20 0 32,904g 77936 76536 S 3,6 0,0 0:00.46 postgres 93130 korotkov 20 0 32,904g 81604 80204 S 3,6 0,0 0:00.47 postgres 93146 korotkov 20 0 32,904g 81112 79704 S 3,6 0,0 0:00.46 postgres

We contacted SysBench author Alexey Kopytov, and he fixed MySQL issue. The solution is:

  • Use SysBench with the options --percentile=0 --max-requests=0  (reasonable CPU usage)
  • Use the concurrency_kit branch (better concurrency and Lua processing)
  • Rewrite Lua scripts to support prepared statements (pull request:
  • Start both SysBench and mysqld with the jemalloc or tmalloc library pre-loaded

A fix for PostgreSQL is on the way. For now, Alexander converted a standard SysBench test into pgbench format and we stuck with it. Not much new for MySQL, but at least we had a baseline for comparison.

The next difficulty I faced was the default operating system parameters. To make the long story short, I changed them to the recommended ones (described below):

vm.swappiness=1 cpupower frequency-set --governor performance kernel.sched_autogroup_enabled=0 kernel.sched_migration_cost_ns= 5000000 vm.dirty_background_bytes=67108864 vm.dirty_bytes=536870912 IO scheduler [deadline]

The same parameters were better for PostgreSQL performance as well. Alexander set his machine similarly.

After solving these issues we learned and implemented the following:

  • We cannot use a single tool (for now)
  • Alexander wrote a test for pgbench, imitating the standard SysBench tests
  • We are still not able to write custom tests because we use different tools

But we could use these tests as a baseline. After work done by Alexander, we stuck with the standard SysBench tests. I converted them to use prepared statements, and Alexander converted them into pgbench format.

I should mention that I was not able to get the same results as Dimitri for the Read Only and Point Select tests. They are close, but slightly slower. We need to investigate if this is the result of different hardware, or my lack of performance testing abilities. The results from the Read-Write tests are similar.

Another difference was between the PostgreSQL and MySQL tests. MySQL users normally have many connections. Setting the value of the variable max_conenctions, and limiting the total number of parallel connections to thousands is not rare nowadays. While not recommended, people use this option even without the thread pool plugin. In real life, most of these connections are sleeping. But there is always a chance they all will get used in cases of increased website activity.

For MySQL I tested up to 1024 connections. I used powers of two and multiplies of the number of cores: 1, 2, 4, 8, 16, 32, 36, 64, 72, 128, 144, 256, 512 and 1024 threads.

For Alexander, it was more important to test in smaller steps. He started from one thread and increased by 10 threads, until 250 parallel threads were reached. So you will see a more detailed graph for PostgreSQL, but no results after 250 threads.

Here are our comparison results.


  • pgsql-9.6 is standard PostgreSQL
  • pgsql-9.6 + pgxact-align is PostgreSQL with this patch (more details can be found in this blog post)
  • MySQL-5.7 Dimitri is Oracle’s MySQL Server
  • MySQL-5.7 Sveta is Percona Server 5.7.15



Sync commit in PostgreSQL is a feature, similar to innodb_flush_log_at_trx_commit=1 in InnoDB, and async commit is similar to innodb_flush_log_at_trx_commit=2.

You see that the results are very similar: both databases are developing very fast and work with modern hardware well.

MySQL results which show 1024 threads for reference.


OLTP RW with innodb_flush_log_at_trx_commit set to 1 and 2

After receiving these results, we did a few feature-specific tests that will be covered in separate blog posts.

More Information

MySQL Options for OLTP RO and Point SELECT tests:

# general table_open_cache = 8000 table_open_cache_instances=16 back_log=1500 query_cache_type=0 max_connections=4000 # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group=3 innodb_open_files=4000 # Monitoring innodb_monitor_enable = '%' performance_schema=OFF #cpu-bound, matters for performance #Percona Server specific userstat=0 thread-statistics=0 # buffers innodb_buffer_pool_size=128000M innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex innodb_log_buffer_size=64M # InnoDB-specific innodb_checksums=1 #Default is CRC32 in 5.7, very fast innodb_use_native_aio=1 innodb_doublewrite= 1 # innodb_stats_persistent = 1 innodb_support_xa=0 #(We are read-only, but this option is deprecated) innodb_spin_wait_delay=6 #(Processor and OS-dependent) innodb_thread_concurrency=0 join_buffer_size=32K innodb_flush_log_at_trx_commit=2 sort_buffer_size=32K innodb_flush_method=O_DIRECT_NO_FSYNC innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_page_cleaners=4 # perf special innodb_adaptive_flushing = 1 innodb_flush_neighbors = 0 innodb_read_io_threads = 4 innodb_write_io_threads = 4 innodb_io_capacity=2000 innodb_io_capacity_max=4000 innodb_purge_threads=4 innodb_max_purge_lag_delay=30000000 innodb_max_purge_lag=0 innodb_adaptive_hash_index=0 (depends on workload, always check)

MySQL Options for OLTP RW:

#Open files table_open_cache = 8000 table_open_cache_instances = 16 query_cache_type = 0 join_buffer_size=32k sort_buffer_size=32k max_connections=16000 back_log=5000 innodb_open_files=4000 #Monitoring performance-schema=0 #Percona Server specific userstat=0 thread-statistics=0 #InnoDB General innodb_buffer_pool_load_at_startup=1 innodb_buffer_pool_dump_at_shutdown=1 innodb_numa_interleave=1 innodb_file_per_table=1 innodb_file_format=barracuda innodb_flush_method=O_DIRECT_NO_FSYNC innodb_doublewrite=1 innodb_support_xa=1 innodb_checksums=1 #Concurrency innodb_thread_concurrency=144 innodb_page_cleaners=8 innodb_purge_threads=4 innodb_spin_wait_delay=12 Good value for RO is 6, for RW and RC is 192 innodb_log_file_size=8G innodb_log_files_in_group=16 innodb_buffer_pool_size=128G innodb_buffer_pool_instances=128 #to avoid wait on InnoDB Buffer Pool mutex innodb_io_capacity=18000 innodb_io_capacity_max=36000 innodb_flush_log_at_timeout=0 innodb_flush_log_at_trx_commit=2 innodb_flush_sync=1 innodb_adaptive_flushing=1 innodb_flush_neighbors = 0 innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 innodb_lru_scan_depth=4000 innodb_adaptive_hash_index=0 innodb_change_buffering=none #can be inserts, workload-specific optimizer_switch="index_condition_pushdown=off" #workload-specific

MySQL SysBench parameters:

LD_PRELOAD=/data/sveta/5.7.14/lib/mysql/  /data/sveta/sbkk/bin/sysbench  [ --test=/data/sveta/sysbench/sysbench/tests/db/oltp_prepared.lua | --test=/data/sveta/sysbench/sysbench/tests/db/oltp_simple_prepared.lua ]  --db-driver=mysql --oltp-tables-count=8 --oltp-table-size=10000000 --mysql-table-engine=innodb --mysql-user=msandbox --mysql-password=msandbox  --mysql-socket=/tmp/mysql_sandbox5715.sock --num-threads=$i --max-requests=0 --max-time=300 --percentile=0 [--oltp-read-only=on --oltp-skip-trx=on]
PostgreSQL pgbench parameters:

$ git clone $ cd pg_oltp_bench $ make USE_PGXS=1 $ sudo make USE_PGXS=1 install $ psql DB -f oltp_init.sql $ psql DB -c "CREATE EXTENSION pg_oltp_bench;" $ pgbench -c 100 -j 100 -M prepared -f oltp_ro.sql -T 300 -P 1 DB $ pgbench -c 100 -j 100 -M prepared -f oltp_rw.sql -T 300 -P 1 DB

Features in MySQL 5.7 that significantly improved performance:

  • InnoDB: transaction list optimization
  • InnoDB: Reduce lock_sys_t::mutex contention
  • InnoDB: fix index->lock contention
  • InnoDB: faster and parallel flushing
    • Multiple page cleaner threads: WL #6642
    • Reduced number of pages which needs to be flushed: WL #7047
    • Improved adaptive flushing: WL #7868
  • MDL (Meta-Data Lock) scalability
    • Remove THR_LOCK::mutex for InnoDB: Wl #6671
    • Partitioned LOCK_grant
    • Number of partitions is constant
    • Thread ID used to assign partition
    • Lock-free MDL lock acquisition for DML

Anastasia: The initial findings of this research were announced at Percona Live Amsterdam 2016. More findings were added to the second version of the same talk given at Moscow HighLoad++ 2016. Hopefully the third iteration of this talk will be available at Percona Live Open Source Database Conference 2017 in Santa Clara. Stay tuned: the Percona Live Committee is working on the program!



















Categories: MySQL

MongoDB Ransomware: Not Likely, But How Do You Know?

MySQL Performance Blog - Thu, 2017-01-05 20:07

In this blog post, we’ll look at some of the concerns recently seen around MongoDB ransomware and security issues.

Security blogs and magazines have recently been aflutter with the news that a hacker is stealing data from MongoDB instantiations and demanding bitcoins to get the data back. This sounds pretty bad at first glance, but let’s examine the facts.

The hacker needs a few things to pull this off:

  1. MongoDB is running on default ports
  2. MongoDB is not using authentication
  3. MongoDB is accessible on the Internet with no security groups or firewalls

If this sounds familiar, you might remember a similar flurry occurred last year when people counted the number of open MongoDB installs on the web. That required these same conditions to all be true. This also means the solution is the same: you simply need to make sure you follow the normal security practices of locking down ports and using authentication. Not so scary after all, right?

What does this hack look like?

Finding out if this happened is simple: your data is removed and gone! In its place, you will find a “WARNING” database, which holds a “WARNING” collection. This collection has a document that looks like:

{ "_id" : ObjectId("5859a0370b8e49f123fcc7da"), "mail" : "", "note" : "SEND 0.2 BTC TO THIS ADDRESS 13zaxGVjj9MNc2jyvDRhLyYpkCh323MsMq AND CONTACT THIS EMAIL WITH YOUR IP OF YOUR SERVER TO RECOVER YOUR DATABASE !" }

To fix this, hopefully, you have backups. If you don’t, you might want to look at on how to get consistent backups. If not, you will need to send the hackers the 0.2 bitcoins (~200 USD) to get your data back.

So, backup!

But this brings us to the real question: can you be hijacked? It’s pretty easy to check:

  1. Do you have authentication on? Try running this command:

rs1:PRIMARY> if (db.adminCommand('getCmdLineOpts') === undefined || db.adminCommand('getCmdLineOpts') === undefined || db.adminCommand('getCmdLineOpts') == "disabled"){ print("Auth not enabled!")}else{print("Your safe!")} Auth not enabled!

  1. Are you running on a non-default port? Simply run this command (if you’re using 27017 or 29017, you’re using a default port):

rs1:PRIMARY> db.adminCommand('getCmdLineOpts') 27001

The last part is a bit harder if the other two are both false. You will need to spin up a server outside of your environment and test the connection. I suggest an Amazon EC2 Micro instance (it’s very inexpensive – free if you use a new account). It’s simple to install a MongoDB client on. Check your setup:

  1. Login to Amazon and launch an EC2 node.
  2. Open a shell to this node (this can be done via their website).
  3. Get MongoDB’s binaries:

wget -q --show-progress gzip -d mongodb-linux-x86_64-amazon-3.4.1.tgz tar xf mongodb-linux-x86_64-amazon-3.4.1.tar -C 3.4 --strip-components=1

  1. Try and connect to your MongoDB Server

./3.4/bin/mongo --host <your_host_name> --port <your_mongod_port>

If this connects, and you can run “db.serverStatus()”, you are at risk and should enable authentication ASAP!

We will have a blog out shortly on the particulars of creating a user. To enable authentication, you simply need to add “–auth” to your startup, or the following to your YAML config file:

security: authorization:1

This should get you started on correctly protecting yourself against MongoDB ransomware (and other security threats). If you want to have someone review your security, or even help you use LDAP to tie into your main authentication systems, please contact us.

Categories: MySQL

MongoDB 3.4: Sharding Improvements

MySQL Performance Blog - Wed, 2017-01-04 21:37

In this blog post, we will discuss some of the Sharding improvements in the recent MongoDB 3.4 GA release.


Let’s go over what MongoDB Sharding “is” at a simplified, high level.

The concept of “sharding” exists to allow MongoDB to scale to very large data sets that may exceed the available resources of a single node or replica set. When a MongoDB collection is sharding-enabled, it’s data is broken into ranges called “chunks.” These are intended to be evenly distributed across many nodes or replica sets (called “shards”). MongoDB computes the ranges of a given chunk based on a mandatory document-key called a “shard key.” The shard key is used in all read and write queries to route a database request to the right shard.

The MongoDB ecosystem introduced additional architectural components so that this could happen:

  1. A shard. A single MongoDB node or replica set used for storing the cluster data. There are usually many shards in a cluster and more shards can be added/removed to scale.
  2. “mongos” router. A sharding-aware router for handling client traffic. There can be one or more mongos instances in a cluster.
  3. The “config servers”. Used for storing the cluster metadata. Config servers are essentially regular MongoDB servers dedicated to storing the cluster metadata within the “config” database. Database traffic does not access these servers, only the mongos.

Under sharding, all client database traffic is directed to one or more of the mongos router process(es), which use the cluster metadata, such as the chunk ranges, the members of the cluster, etc., to service requests while keeping the details of sharding transparent to the client driver. Without the cluster metadata, the routing process(es) do not know where the data is, making the config servers a critical component of sharding. Due to this, at least three config servers are required for full fault tolerance.

Sharding: Chunk Balancer

To ensure chunks are always balanced among the cluster shards, a process named the “chunk balancer” (or simply “balancer”) runs periodically, moving chunks from shard to shard to ensure data is evenly distributed. When a chunk is balanced, the balancer doesn’t actually move any data, it merely coordinates the transfer between the source and destination shard and updates the cluster metadata when chunks have moved.

Before MongoDB 3.4, the chunk balancer would run on whichever mongos process could acquire a cluster-wide balancer lock first. From my perspective this was a poor architectural decision for these reasons:

  1. Predictability. Due to the first-to-lock nature, the mongos process running the balancer is essentially chosen at random. This can complicate troubleshooting as you try to chase down which mongos process is the active balancer to see what it is doing, it’s logs, etc. As a further example: it is common in some deployments for the mongos process to run locally on application servers and in large organizations it is common for a DBA to not have access to application hosts – something I’ve ran into many times myself.
  2. Efficiency. mongos was supposed to be a stateless router, not a critical administrative process! As all client traffic passes in-line through the mongos process, it is important for it to be as simple, reliable and efficient as possible.
  3. Reliability. in order to operate, the mongos process must read and write cluster metadata hosted within the config servers. As mongos is almost always running on a physically separate host from the config servers, any disruption (network, hardware, etc) in between the balancer and config server nodes will break balancing!

Luckily, MongoDB 3.4 has come to check this problem (and many others) off of my holiday wish list!

MongoDB 3.4: Chunk Balancer Moved to Config Servers

In MongoDB 3.4, the chunk balancer was moved to the Primary config server, bringing these solutions to my concerns about the chunk balancer:

  1. Predictability. The balancer is always running in a single, predictable place: the primary config server.
  2. Efficiency. Removing the balancer from “mongos” allows it to worry about routing only. Also, as config servers are generally dedicated nodes that are never directly hit by client database traffic, in my opinion this is a more efficient place for the balancer to run.
  3. Reliability. Perhaps the biggest win I see with this change is the balancer can no longer lose connectivity with the cluster metadata that is stored on separate hosts. The balancer now runs inside the same node as the metadata!
  4. Centralized. As a freebie, now all the background/administrative components of Sharding are in one place!

Note: although we expect the overhead of the balancer to be negligible, keep in mind that a minor overhead is added to the config server Primary-node due to this change.

See more about this change here:

MongoDB 3.4: Required Config Server Replica Set

In MongoDB releases before 3.2, the set of cluster config servers received updates using a mode called Sync Cluster Connection Config (SCCC) to ensure all nodes received the same change. This essentially meant that any updates to cluster metadata would be sent N x times from the mongos to the config servers in a fan-out pattern. This is another legacy design choice that always confused me, considering MongoDB already has facilities for reliably replicating data: MongoDB Replication. Plus without transactions in MongoDB, there are some areas where SCCC can fail.

Luckily MongoDB 3.2 introduced Replica-Set based config servers as an optional feature. This moved us away from the SCCC fan-out mode to traditional replication and write concerns for consistency. This brought many benefits: rebuilding a config server node became simpler, backups became more straightforward and flexible and the move towards a consistent method of achieving consistent updates simplified the architecture.

MongoDB 3.4 requires Replica-Set based config servers, and removed the SCCC mode entirely. This might require some changes for some, but I think the benefits outweigh the cost. For more details on how to upgrade from SCCC to Replica-Set based config servers, see this article.

Note: the balancer in MongoDB 3.4 always runs on the config server that is the ‘PRIMARY’ of the replica set.

MongoDB 3.4: Parallel Balancing

As intended, MongoDB Sharded Clusters can get very big, with 10s, 100s or even 1000s of shards. Historically MongoDB’s balancer worked in serial, meaning it could only coordinate 1 x chunk balancing round at any given time within the cluster. On very large clusters, this limitation poses a huge throughput limitation on balancing: all chunk moves have to wait in a serial queue.

In MongoDB 3.4, the chunk balancer can now perform several chunk moves in parallel given they’re between a unique source and destination shard. Given shards: A, B, C and D, this means that a migration from A -> B can now happen at the same time as a migration from C -> D as they’re mutually exclusive source and destination shards. Of course, you need four or more shards to really see the benefit of this change.

Of course, on large clusters this change could introduce a significant change in network bandwidth usage. This is due to the ability for several balancing operations to occur at once. Be sure to test your network capacity with this change.

See more about this change here:


Of course, there were many other improvements to sharding and other areas in 3.4. We hope to cover more in the future. These are just some of my personal highlights.

For more information about what has changed in the new GA release, see: MongoDB 3.4 Release Notes. Also, please check out our beta release of Percona Server for MongoDB 3.4. This includes all the improvements in MongoDB 3.4 plus additional storage engines and features.


Categories: MySQL

Enabling and Disabling Jemalloc on Percona Server

MySQL Performance Blog - Tue, 2017-01-03 20:29

This post discusses enabling and disabling jemalloc on Percona Server for MySQL.

The benefits of jemalloc versus glibc for use with MySQL have been widely discussed. With jemalloc (along with Transparent Huge Pages disabled) you have less memory fragmentation, and thus more efficient resource management of the available server memory.

For standard installations of Percona Server 5.6+ (releases starting with 5.6.19-67.0), the only thing needed to use jemalloc as the memory library for mysqld is for it to be installed on the server.

Enabling Jemalloc on Percona Server

First thing first: install Jemalloc.

The library is available on the Percona repository, which is available for both apt and yum package management:

Once you have the repo, just run the install command (according to your OS) to install it:

yum install jemalloc / apt-get install libjemalloc1

Now that you have the jemalloc package installed, all it takes to start using it is…..

  • Restart the server.

That’s it! No modifications needed on the my.cnf file or anywhere else. Plain and simple!

Disabling Jemalloc on Percona Server

If for any reason you need to disable jemalloc and go back to the default library, you have two options: remove the jemalloc package (not too practical), or add the following line to the [mysqld_safe] section of the my.cnf file:

malloc-lib =

In other words, an empty path. That will do the trick. Note that commenting or removing the “malloc-lib” parameter on the cnf file won’t work.

How to Know if Jemalloc is Being Used?

There are couple of ways you can verify this, but the less invasive way is by using the pt-mysql-summary (version 2.2.20 and higher) tool from the Percona Toolkit:

root@reports:~# pt-mysql-summary | grep -A5 -i "memory management" # Memory management library ################################## jemalloc enabled in MySQL config for process with ID 5122 Using jemalloc from /usr/lib/x86_64-linux-gnu/ # The End ####################################################

Categories: MySQL

Product Market Fit

Xaprb, home of innotop - Sat, 2016-12-31 15:26

The way I think about product/market fit has changed a lot over the years. I view it differently than I used to.

I used to think of product/market fit along the lines of: users are enthusiastic, word of mouth is positive, people are delighted, the app is a joy to use, it makes hard things easy, it nails the use case—the app solves a problem or meets a need in such a way that it makes users happy. I’ve built the right thing to do the right thing in the right way. Obviously product/market fit, right?

Nope. That’s service/user fit, not product/market fit. It’s not a bad thing, but it’s totally different. It took me a while to see that.

Let’s dig into what those three words (product, market, fit) mean. The following discussion is especially informed by my experience in business-to-business, but the general concepts are just as true in B2C or other types of businesses.


What is a product? A lot of people, especially engineers, especially past-me, think the product is the thing they build. I used to think of the product as the app, the service, the UI, the UX, the functionality. This couldn’t be further from the truth. The product isn’t what you build. The product is what you sell. It’s easy to confuse one for the other, but important to understand the difference.

Let’s take whiskey stones as an example. Look at listings on Amazon and you’ll see sellers promoting the benefits of whiskey stones: chills your bourbon without diluting it with meltwater, no mess, reusable, convenient, etc. Sounds legit, right?

But what’s the product, really? Is it the stones? No. To quote the famous line, people don’t buy quarter-inch drill bits, they buy quarter-inch holes. The product is the result or outcome. Ostensibly chilled-but-not-ruined whiskey, but we’ll revisit that later.

If I use a dating app as an example, the product isn’t the app, it’s companionship. If I use VividCortex, the product isn’t database monitoring, it’s improved revenue through a more responsive and efficient app or service. If the example is group chat, the product isn’t chat rooms, it’s a more connected and efficient workforce with less need for meetings. (Arguably.)


Similarly, I used to think of market very differently. I used to think the market was the user, but now I recognize that users don’t have to be the buyers. In Geoffrey Moore’s classic Crossing the Chasm, he defines a market as “a set of actual or potential customers, for a given set of products or services, who have a common set of needs or wants, and who reference each other when making a buying decision.”

There are no fluff words in that definition. Importantly, the market is who you sell to. The market is the buyer, not the user.

Who is Facebook’s market? It isn’t the billions of users on their service. It’s the advertisers! If you’re selling a dating app, your users and buyers are the same. If you’re creating a free dating app that’s ad-supported, your users and market are not the same. When they’re the same, it’s just a coincidence. Many or perhaps even most products are sold to someone other than the users.

At VividCortex, for example, our users are developers, DBAs, technical operations teams, SREs, and the like. And, sure, these are our champions for promoting a sale internally, too. But we don’t sell to them, we sell to VPs and the C-suite and so on. In fact, we sell to organizations, not individuals.

A clear understanding of the market helps inform a clear understanding of the product. When you think about who buys whiskey stones, the difference becomes obvious. If you talk to a whiskey drinker, or just read the reviews on Amazon, you hear something different than the product listing would like you to believe. “One star. My brother-in-law gave this to me. I know he meant well, but don’t give these as a gift. They’re just a hazard. They’ll smash your lips, break your teeth, and you’ll end up paying thousands of dollars for dental repairs. And they don’t chill well.”

Who buys whiskey stones? Not whiskey drinkers. The actual market is their brothers-in-law. And that helps clarify what the product really is. The real product here, what the whiskey stone makers are selling, is a warm fuzzy feeling for people who think their brother-in-law is hard to buy for. The product listing, with its supposed benefits, is not designed to appeal to the drinkers, but to their relatives. And it literally plays to their ignorance about whiskey drinking.


You have product/market fit when you have clearly identified the product, as just explained, and it fits your market’s needs. Whiskey stone sellers have done a great job of selling warm fuzzies to in-laws; they have very good product/market fit. (What happens after the sale is a different matter.)

A really important part of fit includes how your market buys. If you haven’t aligned your delivery mechanism with your market’s purchasing mechanisms, you don’t have product/market fit. For example, if you sell to the government, you probably have to go through approved channels and meet various types of regulatory approval. Otherwise nobody in the government can actually buy what you’re selling.

To put this in terms of VividCortex again: we initially built a database monitoring service that made DBAs happy. Great! Nice service/user fit. Then we had to figure out how to sell it. Creating actual product/market fit required creating a sales team, enunciating our security stance, aligning our processes with the finance departments of our customers, and so on. This was way more than just the app or service; that’s a small fraction of the whole product—the tip of the proverbial iceberg. This is why I said sales is a key part of a company’s core IP. In the end, we’re selling the whole package—our entire company—to each buyer.

I remember the first time we “closed” a deal with a big company. We had all the approvals, legal signoff, even a signature on the order form! The money was as good as in-the-bank. The next step was to talk to someone in the procurement department. No big deal, it sounded like we were just talking to finance in the usual way, figuring out who could issue the purchase order and giving them the bank wiring details. NO. This time it was different. They renegotiated the entire deal from scratch, put the hammer down on pricing, and killed the deal, refusing to let the company buy what dozens of people had worked hard to bring into their team for months. What the actual!?!?!

This was our first customer with a procurement department. As I was about to learn, big companies use these because there’s no chance all of their thousands of employees are going to negotiate good deals when they’re across the table from a salesperson with decades of experience. So companies put in gatekeepers and give them commissions based on how much they’re able to browbeat vendors. These people can be absolute bulldogs (but usually in a very nice, apparently helpful but my-hands-are-tied way). The definition of their job is to get the vendor on their knees begging.

“It’s the end of the quarter and you’ve been using our service on hundreds of servers for 6 months already! Pleeeeease!!!”
I’m going on vacation, be back in a couple of weeks.
“Who’ll handle our case in your absence?”
OUT OF OFFICE: RE: VividCortex Contract

Your worst nightmare. But you know what? This is part of product/market fit. If you haven’t built your company to sell through your customers’ purchasing processes, you don’t have product/market fit. Your market is your buyers, and procurement is ultimately the buyer in this example.

Revisiting Moore’s definition, it’s also helpful to remember that your market isn’t individual buyers. It’s actually a group of buyers, plural, with commonalities. This points out that one-off deals don’t constitute “fit.” Fit is achieved when your product, taken as a whole (i.e. not only what you sell, but how you deliver it to your market) fits your market as a whole. You’re not trying to create one-by-one fit, you’re trying to create something scalable and repeatable.


Product/market fit is very different from what I thought when I founded VividCortex. I was really thinking about service/user fit without knowing it. Product/market fit isn’t when your users are getting value from your app. It’s when you sell the value you create, and your market—a set of customers with common needs and who have herd buying behavior—purchases and acquires it effectively.

All three words (product, market, fit) require clear understanding to succeed.

Pic Credit

Categories: MySQL

Query Language Type Overview

MySQL Performance Blog - Thu, 2016-12-29 22:46

This blog provides a query language type overview.

The idea for this blog originated from some customers asking me questions. When working in a particular field, you often a dedicated vocabulary that makes sense to your peers. It often includes phrases and abbreviations because it’s efficient. It’s no different in the database world. Much of this language might make sense to DBA’s, but it might sound like “voodoo” to people not used to it. The overview below covers the basic types of query languages inside SQL. I hope it clarifies what they mean, how they’re used and how you should interpret them.

DDL (Data Definition Language)

A database schema is a visualization of information. It contains the data structure separated by tables structures, views and anything that contains structure for your data. It defines how you want to store and visualize the information.

It’s like a skeleton, defining how data is organized. Any action that creates/updates/changes this skeleton is DDL.

Do you remember spreadsheets? A table definition describes something like:

Account number Account name Account owner Creation date Amount Sorted ascending Unique, indexed Date, indexed Number, linked with transactions

Whenever you want to create a table like this, you must use a DDL query. For example:

CREATE TABLE Accounts ( Account_number Bigint(16) , Account_name varchar(255), Account_name varchar(255), Creation_date date, Amount Bigint(16), PRIMARY KEY (Account_number), UNIQUE(Account_name), FOREIGN KEY (Amount) REFERENCES transactions(Balancevalue) );

CREATE, ALTER, DROP, etc.: all of these types of structure modification queries are DDL queries!

Defining the structure of the tables is important as this defines how you would potentially access the information stored in the database while also defining how you might visualize it.

Why should you care that much?

DDL queries define the structure on which you develop your application. Your structure will also define how the database server searches for information in a table, and how it is linked to other tables (using foreign keys, for example).

You must design your MySQL schema before adding information to it (unlike NoSQL solutions such as MongoDB). MySQL might be more rigid in this manner, but it often makes sense to design the pattern for how you want to store your information and query it properly.

Due to the rigidity of an RDBMS system, changing the data structure (or table schema) requires the system to rebuild the actual table in most cases. This is potentially problematic for performance or table availability (locking). Often this is a “hot” procedure (since MySQL 5.6), requiring no downtime for active operations. Additionally, tools like pt-osc or other open source solutions can be used for migrating the data structure to a new format without requiring downtime.

An example:

ALTER TABLE accounts ADD COLUMN wienietwegisisgezien varchar(20)

DML (Data Manipulation Language)

Data manipulation is what it sounds like: working with information inside a structure. Inserting information and deleting information (adding rows, deleting rows) are examples of data manipulation.

An example:

INSERT into resto_visitor values(5,'Julian',’highway 5’,12); UPDATE resto_visitor set name='Evelyn',age=17 where id=103;

Sure, but why should I use it?

Having a database environment makes no sense unless you insert and fetch information out of it. Remember that databases are plentiful in the world: whenever you click on a link on your favorite blog website, it probably means you are fetching information out of a database (and that data was at one time inserted or modified).

Interacting with a database requires that you write DML queries.

DCL (Data Control Language)

Data control language is anything that is used for administrating access to the database content. For example, GRANT queries:

GRANT ALL PRIVILEGES ON database.table to ‘jeffbridges’@’ourserver’;

Well that’s all fine, but why another subset “language” in SQL?

As a user of database environments, at some point you’ll get access permission from someone performing a DCL query. Data control language is used to define authorization rules for accessing the data structures (tables, views, variables, etc.) inside MySQL.

TCL (Transaction Control Language) Queries

Transaction control language queries are used to control transactional processing in a database. What do we mean by transactional processes? Transactional processes are typically bundled DML queries. For example:


This gives you the ability to perform or rollback a complete action. Only storage engines offering transaction support (like InnoDB) can work with TCL.

Yet another term, but why?

Ever wanted to combine information and perform it as one transaction? In some circumstances, for example, it makes sense to make sure you perform an insert first and then perform an update. If you don’t use transactions, the insert might fail and the associated update might be an invalid entry. Transactions make sure that either the complete transaction (a group of DML queries) takes place, or it’s completely rolled back (this is also referred to as atomicity).


Hopefully this blog post helps you understand some of the “insider” database speech. Post comments below.

Categories: MySQL

Percona Live Featured Tutorial with Øystein Grøvlen — How to Analyze and Tune MySQL Queries for Better Performance

MySQL Performance Blog - Thu, 2016-12-29 16:52

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Øystein Grøvlen, Senior Principal Software Engineer at Oracle. His tutorial is on How to Analyze and Tune MySQL Queries for Better Performance. SQL query performance plays a big role in application performance. If some queries execute slowly, these queries or the database schema may need tuning. I had a chance to speak with Øystein and learn a bit more about the MySQL query tuning:

Percona: How did you get into database technology? What do you love about it?

Øystein: I got into database technology during my Ph.D. studies. I got in touch with a research group in Trondheim, Norway, that did research on highly available distributed database systems. I ended up writing a thesis on query processing in such database systems.

What I love most about my job on the MySQL Optimizer Team is that it involves a lot of problem-solving. Why is a query so slow? What can we do to improve it? I have always been very interested in sports results and statistics. Working with query execution times gives me much of the same feeling. Searching for information is another interest of mine, and that is really what query execution is about.

Percona: What impacts database performance the most?

Øystein: From my point of view – mainly concerned with the performance of read-only queries – the most important performance metric is how much data needs to be accessed in order to answer a query. For update-intensive workloads, it is often about concurrency issues. For SELECT statements, the main thing is to not access more data than necessary.

Users should make sure to design their database schema so that the database system can efficiently access the needed data. This includes creating the right indexes. As MySQL developers, we need to develop the right algorithms to support efficient retrieval. We also need to provide a query optimizer that can pick the best query execution plan.

Of course, there are other performance aspects that are important. Especially if your data cannot fit in your database buffer pool. In that case, the order in which you access the data becomes more important. The best query plan when your data is disk-bound is not necessarily the same as when all data is in memory.

Percona: Your tutorial is called “How to Analyze and Tune MySQL Queries for Better Performance.” What are the most recent MySQL updates that help with tuning queries?

Øystein: I think the biggest improvements came in MySQL 5.6, with increased observability through performance schema and new variants of
EXPLAIN (Structured EXPLAIN (JSON format) and visual EXPLAIN in MySQL Workbench). We also added Optimizer Trace, which gives insight into how the optimizer arrived at a certain query plan. All this made it easier to identify queries that need tuning, understand how a query is executed and what might be done to improve it.

In MySQL 5.7, we added a new syntax for optimizer hints, and provided a lot of new hints that can be used to influence the optimizer to change a non-optimal query plan. We also provided a query rewrite plugin that makes it possible to tune queries even when it is not possible to change the application.

MySQL 5.7 also came with improvements to EXPLAIN. It is now possible to get the query plan for a running query, and Structured EXPLAIN shows both estimated total query cost and the cost per table. A more experimental feature allows you to provide your own cost constants to the optimizer.  This way, you can configure the optimizer to better suit your particular system.

For MySQL 8.0 we are continuing to improve tunability by adding more optimizer hints.  At the same time, we are working hard on features that will reduce the need for tuning. Histograms and awareness of whether data is in memory or on disk make the optimizer able to pick better query plans.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Øystein: While the query optimizer in most cases will come up with a good query plan, there are some cases where it won’t generate the most optimal query plan. This tutorial will show how you can identify which queries need tuning, how you can further investigate the issues and what types of tuning options you have for different types of queries. By attending this tutorial, you will learn how to improve the performance of applications through query tuning.

Percona: What are you most looking forward to at Percona Live?

Øystein: I am looking forward to interacting with MySQL users, discussing the query performance issues they might have, and learning how I can help with their issues.

You can find out more about Øystein Grøvlen and his work with databases at his blog, or follow him on Twitter: @ogrovlen. Want to find out more about Øystein and MySQL query optimization? Register for Percona Live Data Performance Conference 2017, and see his tutorial How to Analyze and Tune MySQL Queries for Better Performance. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Categories: MySQL

Quickly Troubleshoot Metadata Locks in MySQL 5.7

MySQL Performance Blog - Wed, 2016-12-28 19:52

In a previous article, Ovais demonstrated how a DDL can render a table blocked from new queries. In another article, Valerii introduced performance_schema.metadata_locks, which is available in MySQL 5.7 and exposes metadata lock details. Given this information, here’s a quick way to troubleshoot metadata locks by creating a stored procedure that can:

  • Find out which thread(s) have the metadata lock
  • Determine which thread has been waiting for it the longest
  • Find other threads waiting for the metadata lock
Setting up instrumentation

First, you need to enable instrumentation for metadata locks:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';

Second, you need to add this stored procedure:

USE test; DROP PROCEDURE IF EXISTS procShowMetadataLockSummary; delimiter // CREATE PROCEDURE procShowMetadataLockSummary() BEGIN DECLARE table_schema VARCHAR(64); DECLARE table_name VARCHAR(64); DECLARE id bigint; DECLARE time bigint; DECLARE info longtext; DECLARE curMdlCount INT DEFAULT 0; DECLARE curMdlCtr INT DEFAULT 0; DECLARE curMdl CURSOR FOR SELECT * FROM tmp_blocked_metadata; DROP TEMPORARY TABLE IF EXISTS tmp_blocked_metadata; CREATE TEMPORARY TABLE IF NOT EXISTS tmp_blocked_metadata ( table_schema varchar(64), table_name varchar(64), id bigint, time bigint, info longtext, PRIMARY KEY(table_schema, table_name) ); REPLACE tmp_blocked_metadata(table_schema,table_name,id,time,info) SELECT mdl.OBJECT_SCHEMA, mdl.OBJECT_NAME, t.PROCESSLIST_ID, t.PROCESSLIST_TIME, t.PROCESSLIST_INFO FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON mdl.OWNER_THREAD_ID = t.THREAD_ID WHERE mdl.LOCK_STATUS='PENDING' and mdl.LOCK_TYPE='EXCLUSIVE' ORDER BY mdl.OBJECT_SCHEMA,mdl.OBJECT_NAME,t.PROCESSLIST_TIME ASC; OPEN curMdl; SET curMdlCount = (SELECT FOUND_ROWS()); WHILE (curMdlCtr < curMdlCount) DO FETCH curMdl INTO table_schema, table_name, id, time, info; SELECT CONCAT_WS(' ','PID',t.PROCESSLIST_ID,'has metadata lock on', CONCAT(mdl.OBJECT_SCHEMA,'.',mdl.OBJECT_NAME), 'with current state', CONCAT_WS('','[',t.PROCESSLIST_STATE,']'), 'for', t.PROCESSLIST_TIME, 'seconds and is currently running', CONCAT_WS('',"[",t.PROCESSLIST_INFO,"]")) AS 'Process(es) that have the metadata lock' FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON t.THREAD_ID = mdl.OWNER_THREAD_ID WHERE mdl.LOCK_STATUS='GRANTED' AND mdl.OBJECT_SCHEMA = table_schema and mdl.OBJECT_NAME = table_name AND mdl.OWNER_THREAD_ID NOT IN(SELECT mdl2.OWNER_THREAD_ID FROM performance_schema.metadata_locks mdl2 WHERE mdl2.LOCK_STATUS='PENDING' AND mdl.OBJECT_SCHEMA = mdl2.OBJECT_SCHEMA and mdl.OBJECT_NAME = mdl2.OBJECT_NAME); SELECT CONCAT_WS(' ','PID', id, 'has been waiting for metadata lock on',CONCAT(table_schema,'.', table_name),'for', time, 'seconds to execute', CONCAT_WS('','[',info,']')) AS 'Oldest process waiting for metadata lock'; SET curMdlCtr = curMdlCtr + 1; SELECT CONCAT_WS(' ','PID', t.PROCESSLIST_ID, 'has been waiting for metadata lock on',CONCAT(table_schema,'.', table_name),'for', t.PROCESSLIST_TIME, 'seconds to execute', CONCAT_WS('','[',t.PROCESSLIST_INFO,']')) AS 'Other queries waiting for metadata lock' FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON t.THREAD_ID = mdl.OWNER_THREAD_ID WHERE mdl.LOCK_STATUS='PENDING' AND mdl.OBJECT_SCHEMA = table_schema and mdl.OBJECT_NAME = table_name AND mdl.OWNER_THREAD_ID AND t.PROCESSLIST_ID <> id ; END WHILE; CLOSE curMdl; END// delimiter ;


Now, let’s call the procedure to see if there are threads waiting for metadata locks:

mysql> CALL test.procShowMetadataLockSummary(); +----------------------------------------------------------------------------------------------------------------+ | Process(es) that have the metadata lock | +----------------------------------------------------------------------------------------------------------------+ | PID 10 has metadata lock on sbtest.sbtest with current state [] since 274 seconds and is currently running [] | | PID 403 has metadata lock on sbtest.sbtest with current state [] since 291 seconds and is currently running [] | +----------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec) +------------------------------------------------------------------------------------------------------------------------+ | Oldest process waiting for metadata lock | +------------------------------------------------------------------------------------------------------------------------+ | PID 1264 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [truncate table sbtest.sbtest] | +------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) +---------------------------------------------------------------------------------------------------------------------------+ | Other queries waiting for metadata lock | +---------------------------------------------------------------------------------------------------------------------------+ | PID 1269 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1270 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1271 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1272 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1273 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | +---------------------------------------------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec)

So, as you can see above, you have several choices. You could (a) do nothing and wait for threads 10 and 403 to complete and then thread 1264 can get the lock.

If you can’t wait, you can (b) kill the threads that have the metadata lock so that the TRUNCATE TABLE in thread 1264 can get the lock. Although, before you decide to kill threads 10 and 403, you should check SHOW ENGINE INNODB STATUS to see if the undo log entries for those threads are high. If they are, rolling back these transactions might take a long time.

Lastly, you can instead (c) kill the DDL thread 1264 to free up other queries. You should then reschedule the DDL to run during offpeak hours.

Happy metadata lock hunting!

Categories: MySQL

Using Percona XtraBackup on a MySQL Instance with a Large Number of Tables

MySQL Performance Blog - Wed, 2016-12-28 16:51

In this blog post, we’ll find out how to use Percona XtraBackup on a MySQL instance with a large number of tables.

As of Percona Xtrabackup 2.4.5, you are required to have enough open files to open every single InnoDB tablespace in the instance you’re trying to back up. So if you’re running innodb_file_per_table=1, and have a large number of tables, you’re very likely to see Percona XtraBackup fail with the following error message:

InnoDB: Operating system error number 24 in a file operation. InnoDB: Error number 24 means 'Too many open files' InnoDB: Some operating system error numbers are described at InnoDB: File ./sbtest/sbtest132841.ibd: 'open' returned OS error 124. Cannot continue operation InnoDB: Cannot continue operation.

If you run into this issue, here is what you need to do:

  1. Find out how many files you need:

root@ts140i:~# find /var/lib/mysql/ -name "*.ibd" | wc -l 1000005

I would add at least another 1000 to this number for system tablespace and other miscellaneous open file needs. You might want to go even higher to accommodate for a growing number of tables.

  1. Check the maximum number of files you can keep open in the system. If this number is too small Percona Xtrabackup might monopolize the open files in the system, causing other processes to fail when they try to open files. This can cause MySQL Server to crash, and other processes to fail.

root@ts140i:/mnt/data/backup# cat /proc/sys/fs/file-max 3262006

If you need to, here is how to  increase the number:

sysctl -w fs.file-max=5000000 echo "fs.file-max=5000000" >> /etc/sysctl.conf

  1. Increase the limit on the number of files the Percona XtraBackup process can open:

The best way to do this is using --open-files-limit option. For example, you can specify the following in your my.cnf:

[xtrabackup] open-files-limit=2000000

Alternatively, you can pass it as a command-line option, or run ulimit -n 2000000 before running the backup command.

You need to be sure your user account has permissions to set open files limit this high. If you are doing backups under the “root” user, it shouldn’t be a problem. Otherwise, you might need to adjust the limits in  /etc/security/limits.conf:

mysql hard nofile 2000000 mysql soft nofile 2000000

Specifying a “soft” limit in this file eliminates the need to run ulimit before Percona XtraBackup, or specifying it in the configuration.

  1. There is one more possible limit to overcome. Even running as a root user, you might get the following error message:

root@ts140i:/mnt/data/backup# ulimit -n 2000000 -su: ulimit: open files: cannot modify limit: Operation not permitted

If this happens, you might need to increase the kernel limit on the number of processes any can have:

pz@ts140i:~$ cat /proc/sys/fs/nr_open 1048576

The limit I have on this system is slightly above 1 million. You can increase it using the following:

sysctl -w fs.nr_open=2000000 echo "fs.nr_open=2000000" >> /etc/sysctl.conf

With these configuration adjustments, you should be able to use Percona XtraBackup to backup MySQL instances containing millions of tables without problems.

What if you can’t allow Percona XtraBackup to open that many files? Then there is the option –close-files that won’t normally require increasing the limit to the number of open files. Using this option, however, might cause the backup corruption if you’re doing DDL operations during the backup.

From where does this strange limitation requiring you to keep all tablespaces open come? It comes from this issue. In some cases, DDL operations such as RENAME TABLE might cause the wrong file to be copied, and unable to be caught up by replying to InnoDB redo logs. Keeping the file open clearly shows which file corresponds to a given tablespace at the start of a backup process, and gets handled correctly.

This problem is not unique to Percona XtraBackup. If anything, Percona Xtrabackup goes the extra mile to ensure database backups are safe.  For comparison, MySQL Enterprise Backup 4.0  simply states:  

Do not run the DDL operations ALTER TABLE, TRUNCATE TABLE, OPTIMIZE TABLE, REPAIR TABLE, RESTORE TABLE or CREATE INDEX while a backup operation is going on. The resulting backup might become corrupted.”

Categories: MySQL

Webinar Thursday December 29: JSON in MySQL 5.7

MySQL Performance Blog - Tue, 2016-12-27 22:44

Please join Percona’s Consultant David Ducos on Thursday, December 29, 2016 at 10 am PST/ 1:00 pm EST (UTC-8) as he presents JSON in MySQL 5.7.

Since it was implemented in MySQL 5.7, we can use JSON as a data type. In this webinar, we will review some of the useful functions that have been added to work with JSON.

We will examine and analyze how JSON works internally, and take into account some of the costs related to employing this new technology. 

At the end of the webinar, you will know the answers to the following questions: 

  • What is JSON?
  • Why don’t we keep using VARCHAR?
  • How does it work? 
  • What are the costs?
  • What limitations should we take into account?
  • What are the benefits of using MySQL JSON support?

Register for the webinar here.

David Ducos, Percona Consultant

David studied Computer Science at the National University of La Plata, and has worked as a Database Consultant since 2008. He worked for three years in a worldwide platform of free classifieds, until starting work for Percona in November 2014 as part of the Consulting team.

Categories: MySQL

Don’t Let a Leap Second Leap on Your Database!

MySQL Performance Blog - Tue, 2016-12-27 21:00

This blog discusses how to prepare your database for the new leap second coming in the new year.

At the end of this year, on December 31, 2016, a new leap second gets added. Many of us remember the huge problems this caused back in 2012. Some of our customers asked how they should prepare for this year’s event to avoid any unexpected problems.

It’s a little late, but I thought discussing the issue might still be useful.

The first thing is to make sure your systems avoid the issue with abnormally high CPU usage. This was an problem in 2012 due to a Linux kernel bug. After the leap second was added, CPU utilization sky-rocketed on many systems, taking down many popular sites. This issue was addressed back in 2012, and similar global problems did not occur in 2015 thanks to those fixes. So it is important to make sure you have an up-to-date Linux kernel version.

It’s worth knowing that in the case of any unpredicted system misbehavior from the leap second problem, the quick remedy for the CPU overheating was restarting services or rebooting servers (in the worst case).

(Please do not reboot the server without being absolutely sure that your serious problems started exactly when the leap second was added.)

The following are examples of bug records:

The second thing is to add proper support for the upcoming event. Leap second additions are announced some time before they are implemented, as it isn’t known exactly when the next one will occur for sure.

Therefore, you should upgrade your OS tzdata package to prepare your system for the upcoming leap second. This document shows how to check if your OS is already “leap second aware”:

zdump -v right/America/Los_Angeles | grep Sat.Dec.31.*2016

A non-updated system returns an empty output. On an updated OS, you should receive something like this:

right/America/Los_Angeles Sat Dec 31 23:59:60 2016 UTC = Sat Dec 31 15:59:60 2016 PST isdst=0 gmtoff=-28800 right/America/Los_Angeles Sun Jan 1 00:00:00 2017 UTC = Sat Dec 31 16:00:00 2016 PST isdst=0 gmtoff=-28800

If your systems use the NTP service though, the above is not necessary (as stated in Still, you should make sure that the NTP services you use are also up-to-date.

With regards to leap second support in MySQL there is nothing to do, regardless of the version. MySQL doesn’t allow an extra second numeration within the 60 seconds part of timestamp datatype, so you should expect rows with 59 instead of 60 seconds when the additional second is added, as described here:

Similarly, MongoDB expects no serious problems either.

Let’s “smear” the second

Many big Internet properties, however, introduced a technique to adapt to the leap second change more gracefully and smoothly, called Leap Smear or Slew. Instead of introducing the additional leap second immediately, the clock slows down a bit, allowing it to gradually get in sync with the new time. This way there is no issue with extra abnormal second notation, etc.

This solution is used by Google, Amazon, Microsoft, and others. You can find a comprehensive document about Google’s use here:

You can easily introduce this technique with the ntpd -x or Chronyd slew options, which are nicely explained in this document:


Make sure you have your kernel up-to-date, NTP service properly configured and consider using the Slew/Smear technique to make the change easier. After the kernel patches in 2012, no major problems happened in 2015. We expect none this year either (especially if you take time to properly prepare).

Categories: MySQL
Syndicate content