Amazon RDS and pt-online-schema-change

MySQL Performance Blog - Fri, 2016-07-01 17:30

In this blog post, I discuss some of the insights needed when using Amazon RDS and pt-online-schema-change together.

The pt-online-schema-change tool runs DDL queries (ALTER) online so that the table is not locked for reads and writes. It is a commonly used tool by community users and customers. Using it on Amazon RDS requires knowing about some specific details. First, a high-level explanation of how the tool works.

This is an example from the documentation:

pt-online-schema-change --alter "ADD COLUMN c1 INT" D=sakila,t=actor

The tool runs an ALTER on the table “actor” from the database “sakila.” The alter adds a column named “c1” of type “integer.” In the background, the tool creates a new empty table similar to “actor” but with the new column already added. It then creates triggers on the original table to update the corresponding rows in the new table. After, it starts copying rows to the new table (this is the phase that takes the longest amount of time). When the copy is done, the tables are swapped, triggers removed and the old table dropped.

As we can see, it is a tool that uses the basic features of MySQL. You can run it on MySQL, Percona Server, MariaDB, Amazon RDS and so on. But when using Amazon, there is a hidden issue: you don’t have SUPER privileges. This means that if you try to run the tool on an RDS with binary logs enabled, you could get the following error:

DBD::mysql::db do failed: You do not have the SUPER privilege and binary logging is enabled (you might want to use the less safe log_bin_trust_function_creators variable) [for Statement "CREATE TRIGGER `pt_osc_db_table_del` AFTER DELETE ON `db`.`table` FOR EACH ROW DELETE IGNORE FROM `db`.`_table_new` WHERE `db`.`_table_new`.`table_id` <=> OLD.`table_id` AND `db`.`_table_new`.`account_id` <=> OLD.`account_id`"] at /usr/bin/pt-online-schema-change line 10583.

The following documentation page explains the reason for this message:

The bottom line is creating triggers on a server with binary logs enabled requires a user with SUPER privileges (which is impossible in Amazon RDS). The error message specifies the workaround. We need to enable the variable log_bin_trust_function_creators. Enabling it is like saying to the server:

“I trust regular users’ triggers and functions, and that they won’t cause problems, so allow my users to create them.”

Since the database functionality won’t change, it becomes a matter of trusting your users. log_bin_trust_function_creators is a global variable that can be changed dynamically:

mysql> SET GLOBAL log_bin_trust_function_creators = 1;

Run the tool again. This time, it will work. After the ALTER is done, you can change the variable to 0 again.

After you’re done with the ALTER process, you can change the variable to “0” again.

Categories: MySQL

Rescuing a crashed pt-online-schema-change with pt-archiver

MySQL Performance Blog - Thu, 2016-06-30 21:20

This article discusses how to salvage a crashed pt-online-schema-change by leveraging pt-archiver and executing queries to ensure that the data gets accurately migrated. I will show you how to continue the data copy process, and how to safely close out the pt-online-schema-change via manual operations such as RENAME TABLE and DROP TRIGGER commands. The normal process to recover from a crashed pt-online-schema-change is to drop the triggers on your original table and drop the new table created by the script. Then you would restart pt-online-schema-change. In this case, this wasn’t possible.

A customer recently needed to add a primary key column to a very busy table (with around 200 million rows). The table only had a unique key on one column (called our_id below). The customer had concerns about slave lag, and wanted to ensure there was little or no lag. This, as well as the fact that you can’t add a primary key as an online DDL in MySQL and Percona Server 5.6, meant the obvious answer was using pt-online-schema-change.

Due to the sensitivity of their environment, they could only afford one short window for the initial metadata locks, and needed to manually do the drop swap that pt-online-schema-change normally does automatically. This is where no-drop-triggers and no-swap-tables come in. The triggers will theoretically run indefinitely to keep the new and old tables in sync once pt-online-schema-change is complete. We crafted the following command:

pt-online-schema-change --execute --alter-foreign-keys-method=auto --max-load Threads-running=30 --critical-load Threads_running=55 --check-slave-lag mysql-slave1,mysql-slave2,mysql-slave3 --max−lag=10 --chunk-time=0.5 --set-vars=lock_timeout=1 --tries="create_triggers:10:2,drop_triggers:10:2" --no-drop-new-table --no-drop-triggers --no-swap-tables --chunk-index "our_id" --alter "ADD newcol BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST" D=website,t=largetable --nocheck-plan

You can see some of the specifics of other flags and why we used them in the Percona Toolkit Manual.

Once we ran the command the customer got concerned, as their monitoring tools weren’t showing any work done (which is by design, pt-online-schema-change doesn’t want to hurt your running environment). The customer ran strace -p to verify it was working. This wasn’t a great choice as it crashed pt-online-schema-change.

At this point, we knew that the application (and management) would not allow us to take new metadata locks to create triggers on the table, as we had passed our metadata lock window.

So how do we recover?

First, let’s start with a clean slate. We issued the following commands to create a new table, where __largetable_new is the table created by pt-online-schema-change:

CREATE TABLE mynewlargetable LIKE __largetable_new; RENAME TABLE __largetable_new TO __largetable_old, mynewlargetable TO __largetable_new; DROP TABLE __largetable_old;

Now the triggers on the original table, largetable are updating the new empty table that has our new schema.

Now let’s address the issue of actually moving the data that’s already in largetable to __largetable_new. This is where pt-archiver comes in. We crafted the following command:

pt-archiver --execute --max-lag=10 --source D=website,t=largetable,i=our_id --dest D=website,t=__largetable_new --where "1=1" --no-check-charset --no-delete --no-check-columns --txn-size=500 --limit=500 --ignore --statistics

We use pt-archiver to slowly copy records non-destructively to the new table based on our_id and WHERE 1=1 (all records). At this point, we periodically checked the MySQL data directory over the course of a day with ls -l to compare table sizes.

Once the table files were close to the same size, we ran counts on the tables. We noticed something interesting: the new table had thousands more records than the original table.

This concerned us. We wondered if our “hack” was a mistake. At this point we ran some verification queries:

select min(our_id) from __largetable_new; select max(our_id) from __largetable_new; select min(our_id) from largetable; select max(our_id) from largetable;

We learned that there were older records that didn’t exist in the live table. This means that pt-archiver and the DELETE trigger may have missed each other (i.e., pt-archiver was already in a transaction but hadn’t written records to the new table until after the DELETE trigger already fired).

We verified with more queries:

SELECT COUNT(*) FROM largetable l WHERE NOT EXISTS (SELECT our_id FROM __largetable_new n WHERE n.our_id=l.our_id);

They returned nothing.

SELECT COUNT(*) FROM __largetable_new n WHERE NOT EXISTS (SELECT our_id FROM largetable l WHERE n.our_id=l.our_id);

Our result showed 4000 extra records in the new table. This shows that we ended up with extra records that were deleted from the original table. We ran other queries based on their data to verify as well.

This wasn’t a huge issue for our application, and it could have been easily dealt with using a simple DELETE query based on the unique index (i.e., if it doesn’t exist in the original table, delete it from the new one).

Now to complete the pt-online-schema-change actions. All we need to do is the atomic rename or drop swap. This should be done as soon as possible to avoid running in a degraded state, where all writes to the old table are duplicated on the new one.

RENAME TABLE largetable TO __largetable_old , __largetable_new TO largetable;

Then drop the triggers for safety:

DROP TRIGGER pt_osc_website_largetable_ins; DROP TRIGGER pt_osc_website_largetable_upd; DROP TRIGGER pt_osc_website_largetable_del;

At this point it is safer to wait for the old table to clear out of the buffer pool before dropping it, just to ensure there is no impact on the server (maybe a week to be safe). You can check information_schema for a more accurate reading on this:

SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE WHERE TABLE_NAME = '`website`.`__largetable_old`'; +----------+ | count(*) | +----------+ | 279175 | +----------+ 1 row in set (8.94 sec)

Once this goes to 0 you can issue:

DROP TABLE __largetable_old;

Categories: MySQL

2016 MySQL User Group Leaders Summit

MySQL Performance Blog - Wed, 2016-06-29 17:20

In this post, I’ll share my experience attending the annual MySQL User Group Leaders Summit in Bucharest, Romania.

The MySQL User Group Leaders Summit gathers together as many of the global MySQL user group leaders as possible. At the summit, we discuss further actions on how we can better act for their local communities. This year, it focused primarily on cloud technologies.

As the Azerbaijan MySQL User Group leader, I felt a keen responsibility to go. I wanted to represent our group and learn as much as possible to take back to with me. Mingling and having conversations with other group leaders helps give me more ideas about how to spread the MySQL word!

The Conference

I attended three MySQL presentations:

  • Guided tour on the MySQL source code. In this session, we reviewed the layout of the MySQL code base, roughly following the query execution path. We also covered how to extend MySQL with both built-in and pluggable add-ons.
  • How profiling SQL works in MySQL. This session gave an overview of the performance monitoring tools in MySQL: performance counters, performance schema and SYS schema. It also covered some of the details in analyzing MySQL performance with performance_schema.
  • What’s New in MySQL 5.7 Security. This session presented an overview of the new MySQL Server security-related features, as well as the MySQL 5.6 Enterprise edition tools. This session detailed the shifting big picture of secure deployments, along with all of the security-related MySQL changes.

I thought that the conference was very well organized, with uniformly great discussions. We also participated in some city activities and personal interactions. I even got to see Le Fred!

I learned a lot from the informative sessions I attended. The MySQL source code overview showed me the general paths of MySQL source code, including the most important directories, the most important functions and classes. The session about MySQL profiling instrumentation sessions informed us of the great MySQL profiling improvements. It reviewed some useful tools and metrics that you can use to get info from the server. The last session about MySQL security covered improved defaults, tablespace encryption and authentication plugins.

In conclusion, my time was well spent. Meeting and communicating with other MySQL user group leaders gives me insight into the MySQL community. Consequently, I highly recommend everyone gets involved in your local user groups and attend get-togethers like the MySQL User Group Leaders Summit when you can find the time.

Below you can see some of the pics from the trip. Enjoy!





Categories: MySQL

Percona Server for MongoDB 3.2.7-1.1 is now available

MySQL Performance Blog - Wed, 2016-06-29 16:45

Percona announces the release of Percona Server for MongoDB 3.2.7-1.1 on June 29, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.7-1.1 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. Based on MongoDB 3.2.7, it extends MongoDB with MongoRocks and PerconaFT storage engines, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.


The PerconaFT storage engine has been deprecated and will not be available in future releases.

This release includes all changes from MongoDB 3.2.7 as well as the following:

  • Fixed the software version incorrectly reported by the --version option.
  • Added recommended ulimit values for the mongod process

The release notes are available in the official documentation.


Categories: MySQL

Webinar Wednesday June 29: Percona XtraDB Cluster Reference Architecture

MySQL Performance Blog - Mon, 2016-06-27 18:56

Please join Jay Janssen for the webinar Percona XtraDB Cluster Reference Architecture Wednesday, June 29 at 10:00 AM PDT (UTC- 7).

A reference architecture shows a typical, common, best-practice deployment of a system with all the surrounding infrastructure. In the case of database clusters, this can include the hosting platform, load balancing, monitoring, backups, etc.

Percona published a commonly referred to Percona XtraDB Cluster reference architecture on the Percona blog in 2012 (which is included in the current manual). However, this architecture is out of date.

This talk will present a revised and updated Percona XtraDB Cluster reference architecture for 2016, including:

  • Load balancing
  • Read/Write splitting
  • Monitoring
  • Backups

This will include some variants, such as:

  • AWS hosting
  • WAN deployments
  • Async slaves

Register now.

Jay Janssen, Managing Principal Architect

Jay came to Percona in 2011 after working seven years for Yahoo! Jay worked in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. Jay holds a B.S. in Computer Science from Rochester Institute of Technology. He and his wife live with their four children in upstate New York.

Categories: MySQL

The need for parallel crash recovery in MySQL

MySQL Performance Blog - Mon, 2016-06-27 18:14

In this blog, I will discuss how parallel crash recovery in MySQL benefits several processes.

I recently filed an Oracle feature request to make crash recovery faster by running in multiple threads.

This might not seem very important, because MySQL does not crash that often. When it does crash, however, crash recovery can take 45 mins – as I showed in this post:

What is a big innodb_log_file_size?

Even in that case, it still might not be a big issue as you often failover to a slave.

However, crash recovery plays important part in the following processes:

  • Backups with Percona XtraBackup (and MySQL Enterprise Backups) and backups with filesystem snapshots.
    • Crash recovery is part of the backup process, and it is important to make the backup task faster.
  • State Snapshot Transfer in Percona XtraDB Cluster.
    • SST, either XtraBackup or rsync bases, also relies on the crash recovery process – so the faster it is done, the faster a new node joins the cluster.
    • It might seem that Oracle shouldn’t care about Percona XtraDB Cluster. But they are working on MySQL Group Replication. I suspect that when Group Replication copies data to the new node, it will also rely on some kind of snapshot technique. Unless they aren’t serious about this feature and will recommend mysqldump/mysqlpump for data copying).
  • My recent proof of concept for Automatic Slave propagation in Docker environment also uses Percona XtraBackup, and therefore crash recovery for new slaves.

In general, any process that involves MySQL/InnoDB data transfer will benefit from a faster crash recovery. In its current state uses just one thread to read and process data. This limits performance on modern hardware, which uses multiple CPU cores and fast SSD drives.

It is also important to consider that the crash recovery time affects how big log files can be. If we improve the crash recovery time, we can store very big InnoDB log files (which positively affects performance in general).

Percona is working on ways to make it faster. However, if faster recovery times are important to you environment, I encourage you to let Oracle know that you want to see parallel crash recovery in MySQL.

Categories: MySQL

Percona Toolkit 2.2.18 is now available

MySQL Performance Blog - Fri, 2016-06-24 16:18

Percona announces the availability of Percona Toolkit 2.2.18, released on June 24, 2016.

Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current stable release in the 2.2 series. It includes new features and bug fixes as well as improved MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New features:

  • 1537416: pt-stalk now sorts the output of transactions by id
  • 1553340: Added “Shared” memory info to pt-summary
  • PT-24: Added the --no-vertical-format option for pt-query-digest, allowing compatibility with non-standard MySQL clients that don’t support the G directive at the end of a statement

Bug fixes:

  • 1402776: Fixed error when parsing tcpdump capture with pt-query-digest
  • 1521880: Improved pt-online-schema-change plugin documentation
  • 1547225: Clarified the description of the --attribute-value-limit option for pt-query-digest
  • 1569564: Fixed all PERL-based tools to return a zero exit status when run with the --version option
  • 1576036: Fixed error that sometimes prevented to choose the primary key as index, when using the --where option for pt-table-checksum
  • 1585412: Fixed the inability of pt-query-digest to parse the general log generated by MySQL (and Percona Server) 5.7 instance
  • PT-36: Clarified the description of the --verbose option for pt-slave-restart

You can find release details in the release notes and the 2.2.18 milestone on Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

Categories: MySQL

Percona Server for MongoDB 3.0.12-1.7 is now available

MySQL Performance Blog - Fri, 2016-06-24 16:14

Percona announces the release of Percona Server for MongoDB 3.0.12-1.7 on June 24, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.0.12-1.7 is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. Based on MongoDB 3.0.12, it extends MongoDB with MongoRocks and PerconaFT storage engines, as well as features like external authentication and audit logging. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: The MongoRocks storage engine is still under development. There is currently no officially released version of MongoRocks that can be recommended for production.

This release includes all changes from MongoDB 3.0.12, and the following known issue that will be fixed in a future release:

  • The --version does not correctly report the software version. The effected binaries are:
    • bsondump
    • mongodump
    • mongoexport
    • mongofiles
    • mongoimport
    • mongooplog
    • mongorestore
    • mongostat
    • mongotop

The release notes are available in the official documentation.


Categories: MySQL

Migrate from MS SQL Server to MySQL

MySQL Performance Blog - Thu, 2016-06-23 22:56

In this blog series, I will share my experiences as I migrate commercial databases (i.e., Microsoft SQL or Oracle) to open source (MySQL). More specifically, we will look at how you can migrate from MS SQL Server to MySQL.

For this first blog post I’ve chosen Jira database as an example, and used three different tools to migrate Jira database in Microsoft SQL Server to MySQL:

  1. MySQL Workbench (opensource)
  2. Amazon DMS (cloud tool)
  3. Ispirer MnMTK 2015 (commercial tool)

When I started my research, I was under the impression that Jira database would be easy to migrate (no stored procedures, no triggers, etc.). It turned out that there were some problems that I was able to fix.

One of the reasons I chose Jira as opposed to some standard MS SQL database (such as AdventureWorks2014) is that it is a non-standard choice. Most of the software vendors use standard databases to test their software, and it works perfectly on those standard databases. Jira is not a usual choice and will be closer to real life.

MySQL Workbench

MySQL Workbench supports Microsoft SQL Server migration. The migration is straightforward except the issues with character sets. I have experienced the error “Could not successfully convert UCS-2 string to UTF-8”.

It turns out (with the help of Martin Brennan’s blog) that we will need to use “ODBC (FreeTDS)” drive for MS SQL, and enable sending Unicode data as UTF8:

After changing those settings, I was able to successfully migrate Jira database from MS SQL to MySQL.

Advantages and disadvantages:

  • Plus: free and open source tool, multi-platform
  • Plus: successful migration for Jira
  • Plus: supports multi-threaded migrations (increase worker tasks if needed, default value is 2)
  • Minus: needed some tweaks to work with character sets
  • Minus: not very easy to debug errors

Amazon DMS

AWS Database Migration Service supports migrating from MS SQL to MySQL, but the actual migration method is different from other tools. It uses the source database (MS SQL server in this case) replication feature to stream the data to the target database (MySQL). Amazon DMS starts a temporary “migration” instance that is used to stream data. Both the source and destination database can be in AWS (EC2 or RDS) or outside AWS (no restriction).

The important limitation for MS SQL migration: it only works with MS SQL Server versions that support replication (subscription service). It doesn’t work with MS SQL Express edition. Also, if the subscription service is not enabled the DMS can’t even see the schema(s) to migrate (full list of limitations for MS SQL Server migration).

I’ve also gotten errors around the “constraint” name being too long:

2016-04-02T18:20:23 [TARGET_LOAD ]E: Failed to execute statement: 'ALTER TABLE `dbo`.`AO_38321B_CUSTOM_CONTENT_LINK` ADD CONSTRAINT `AO_38321B_CUSTOM_CONTENT_LINK_pk_AO_38321B_CUSTOM_CONTENT_LINK_ID` PRIMARY KEY ( `ID` )' [122502] ODBC general error. (ar_odbc_stmt.c:4048)

The problem here is that “AO_38321B_CUSTOM_CONTENT_LINK_pk_AO_38321B_CUSTOM_CONTENT_LINK_ID” is too long a string for MySQL. At the same time, this name does not really matter as this is the PRIMARY KEY.

After changing the “constraint” name in MS SQL to smaller strings, I could migrate all tables to MySQL.

Amazon DMS notes: Amazon DMS lets you migrate from a database located anywhere (not necessarily in AWS) to another database located anywhere (not necessarily in AWS) — however, the traffic will go thru AWS. Thus the migration path is fastest and the most beneficial if either the source or target (or both) instances are in AWS (for example, ec2 or rds instances).

Advantages and disadvantages:

  • Plus: successful migration for Jira
  • Plus: Multithreaded execution (this is a huge advantage for migrating large databases)
  • Plus: In addition to migration, you can also use a replication link between the SQL Server and the new MySQL to fetch the new changes. This is a huge advantage when migrating a large database with lots of traffic and tight downtime requirements.
  • Minus: replication should be enabled to perform a migration, which means that migrating from SQL Server Express isn’t supported.
  • (Can be plus and minus): All traffic is going through a cloud environment.

Potential issues

(This section has been updated) I’ve figured out that the table structures generated by Workbench, Amazon DMS and SQLWays are different. For example:

Workbench generated:


AWS DMS generated:


SQLWays wizard generated:


In AWS DMS version, the index on CONTENT_KEY is missing and ID is not declared as auto_increment. However, the Workbench “lost” the character set. SQLWays wizard has created partial key (191 characters).

At the same time, I was able to start Jira on top of two versions of the MySQL database (migrated by using Workbench and Amazon DMS).

Ispirer MnMTK 2015

Ispirer MnMTK 2015 toolkit is a commercial (not open-source) software application that lets you migrate from MS SQL Server to MySQL (among other databases). Ispirer has provided me with a demo license so I can test the migration.

I was able to migrate the Jira database from MS SQL to MySQL with the Ispirer SQLWays Wizard:

One issue with this process is that SQL Ways Wizard relies on the MySQL command line utility (“mysql”), which should be in the path. If you do not have MySQL installed on the migration machine, or it is not in the path, the migration will fail:

To fix simply add the MySQL “bin” directory to the path. In addition, you can use the SQL Ways Wizard to generate scripts and run those scripts on the destination host where the utilities are installed.

Advantages and disadvantages:

  • Plus: successful migration for Jira
  • Plus: support from SQL ways: can work on fixing potential migration issues (requires paid license)
  • Plus: can convert stored procedures, triggers, foreign key constraints
  • Minus: commercial, not open source software.
  • Minus: only runs on Windows (however, target machine for the database migration can be Linux)
  • Minus: no multi-treaded migration support (can manually run multiple instances of SQL Ways)


All tools I tried finally worked, but at the same time I was surprised with the number of issues I found. Migrating a very simple database (no stored procedures, no triggers, no foreign key constraints) should be easier.

Another surprise was that all tools are focused on a nice GUI with “next” buttons. For migrating one database to another, I would prefer using a command line tool interface (may be similar to Percona toolkit or iconv):

# dbmigrate --source user:pass@sqlserverhost:1433 --target user:pass@mysqlhost:3309 --parallel 8 --verbose --overwrite

Actually, Ispirer MnMTK does have a command line migration utility included, but it only works on Windows.

Until somebody develops a better command line tool, any of the above solutions will help you migrate from MS SQL Server to MySQL.

Categories: MySQL

Troubleshooting configuration issues: Q & A

MySQL Performance Blog - Wed, 2016-06-22 23:18

In this blog, I will provide answers to the Q & A for the Troubleshooting configuration issues webinar.

First, I want to thank you for attending the June, 9 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: What are the predefined roles installed per default?

A: MySQL does not support roles by default. But, if you need roles you can either use MariaDB or emulate roles with Percona PAM Plugin and proxy users. But MySQL has pre-built user accounts. One of them is root@localhost, which has all available privileges, and anonymous account that can access the test database without a password. See the user manual for more information.

Q: How do you determine the root cause for a query that just hangs?

A: There are several possible reasons for a hanging query. They include:

  1. Poorly optimized query – it doesn’t really “hang,” but just requires a lot of time to execute. Dealing with these types of queries was discussed in the Troubleshooting Slow Queries webinar.
  2. The query is waiting for a lock set by another query or transaction. These issues were discussed in the Troubleshooting locking issues webinar
  3. A bug in MySQL.

When I see hanging queries I look into all the above possibilities. But I prefer to start at query optimization.

Q: Can we get the powerpoint slides?

A: I do not use PowerPoint to create slides. Slides are available in PDF format only.

Q: 1) Is it safety to use SQL_LOG_BIN = 0 for specific connections (statements), for example for DELETE, when we need to keep old data on a slave, but delete from master? What are side-effects? Can it break replication?

A: Using SQL_LOG_BIN = 0 itself is safe, but you need to understand what you are doing. For example, if you delete data in a table that has a unique key on the master, and then insert a row that has the same unique value that existed in one of rows you deleted, replication will fail with “Duplicate key” error.

Q: Is it reasonable to disable query_cache_type (set 0) on MySQL instances with very big (huge) amount of data?

A: Yes. I would recommend it.

Q: 3) How does the number of innodb_buffer_pool_instances affect performance? Does a lot of innodb_buffer_pool_instances = high performance?

A: InnoDB introduced buffer pool instances to reduce contention “as different threads read and write to cached pages“. However, they improve performance only if you have many concurrent threads inside InnoDB.

Q: I have a question, where can I download the threadpool plugin at Percona? I checked your download page and couldn’t find it. Is it bundled in the Percona official release? wW have 5.6.28,29 and 30 and there is no in the plugin directory. Can you let me know how to get it?

A: Percona built thread pool into Percona Server; a separate download isn’t necessary. See the user manual for instructions.


Categories: MySQL

Sneak peek at the Percona Live Europe Amsterdam 2016 talks

MySQL Performance Blog - Wed, 2016-06-22 17:46

On behalf of the Percona Live Conference Committee, I am excited to announce the sneak peek schedule for the Percona Live Europe Amsterdam 2016 talks!

Percona Live Europe will feature a variety of formal tracks and sessions related to MySQL, MongoDB and ODBMS from members of the open source community. With many slots to fill, there will be no shortage of great content this year. Though we won’t finalize the entire conference schedule until mid-July, this preview list of talks is sure to whet your appetite! So without further ado, here is the SNEAK PEEK SCHEDULE!

Want to Present at Percona Live Europe Amsterdam?

We are still looking for people to give talks! The committee has begun their work rating talks, but there is still time to submit. We are looking for a range of interesting talks and subjects around MySQL, MongoDB, and ODBMS. Some topics of interest include:

  • Database Virtualization
  • Integration or coexistence between SQL and NoSQL
  • Automating service life-cycle
  • Load balancing
  • Cluster control
  • Multiple Data Store Technologies and Management Solutions

But hurry, the call for papers closes July 18th! If selected, you get a free pass, and the chance to dazzle your open source community peers. Apply now!

Become a Percona Live Europe Amsterdam Sponsor

Sponsor the Percona Live Europe Amsterdam conference. There are still sponsorship opportunities at Percona Live Europe. Being a sponsor allows you to demonstrate thought leadership, promote brand awareness and support the open source community. Need a reason? Here are ten reasons to become a sponsor. Sponsor the Percona Live Europe now.

Sponsor Percona Live Europe now.

Categories: MySQL

Docker automatic MySQL slave propagation

MySQL Performance Blog - Tue, 2016-06-21 23:42

In this post, we’ll discuss Docker automatic MySQL slave propagation for help with scaling.

In my previous posts on the Docker environment, I covered Percona XtraDB Cluster. Percona XtraDB Cluster can automatically scale by conveniently adding new nodes using the highly automated State Snapshot Transfer. State Snapshot Transfer allows a new node to copy data from an existing node (I still want to see how this is possible with MySQL Group Replication).

This is not the case with regular MySQL Replication. With MySQL Replication, the slave setup still requires manual steps (well, unless you’ve already scripted it for your environment). At least these are “simple” steps (ha!). Percona XtraBackup can setup replication with less work (see this link for details:, but it still requires poking around and switching between servers.

However, nothing stops us from making it more automatic (similar to SST in Percona XtraDB Cluster), especially with Docker images. Why Docker? Because Docker provides a highly-controlled environment where we can orchestrate how scripts get executed. Severalnines provides a great intro into MySQL with Docker.

There are a few more components for this setup:

Before jumping to my solution, I should point to some work in this area by Joyent:

I propose my image, with sources on GitHub

First, we need to start a master node:

docker run -d -p 3306:3306 --net=replicaset_net --name=replicaset_master -e MYSQL_ROOT_PASSWORD=Theistareyk perconalab/ps-master-slave --innodb-buffer-pool-size=2G

I assume that we’ve created the network replicaset_net already, either bridge or overlay.

You can create a slave by pointing to the master container:

docker run -d -p 3306 --net=replicaset_net --name=replicaset_slave1 -e MYSQL_ROOT_PASSWORD=Theistareyk -e MASTER_HOST=replicaset_master perconalab/ps-master-slave --innodb-buffer-pool-size=2G

The started node will automatically connect to MASTER_HOST, copy the data and perform all the steps needed to start the slave.

You can even copy data from a running slave, instead of the master, like this:

docker run -d -p 3306 --net=replicaset_net --name=replicaset_slave2 -e MYSQL_ROOT_PASSWORD=Theistareyk -e MASTER_HOST=replicaset_master -e SLAVE_HOST=replicaset_slave1 perconalab/ps-master-slave --innodb-buffer-pool-size=2G

This node will copy data from SLAVE_HOST, but then will point itself to MASTER_HOST.

Docker Network lets you use container names "replicaset_master" and "replicaset_slave1" instead of IP addresses, which is very convenient.

As the result of above, we have one master and two slaves running. We can start as many slave nodes as needed.

Please remember, this is more proof-of-concept than “production ready” images, but it gives a good direction for implementation.

Categories: MySQL

Running Percona XtraDB Cluster nodes with Linux Network namespaces on the same host

MySQL Performance Blog - Mon, 2016-06-20 22:52

This post is a continuance of my Docker series, and examines Running Percona XtraDB Cluster nodes with Linux Network namespaces on the same host.

In this blog I want to look into a lower-level building block: Linux Network Namespace.

The same as with cgroups, Docker uses Linux Network Namespace for resource isolation. I was looking into cgroup a year ago, and now I want to understand more about Network Namespace.

The goal is to both understand a bit more about Docker internals, and to see how we can provide network isolation for different processes within the same host. You might need to isolate process when running several MySQL or MongoDB instances on the same server (which might come in handy during testing). In this case, I needed to test ProxySQL without Docker.

We can always use different ports for different MySQL instances (such as 3306, 3307, 3308), but it quickly gets complicated.

We could also use IP address aliases for an existing network interface, and use bind=<IP.ADD.RE.SS> for each instance. But since Percona XtraDB Cluster can use three different IP ports and network channels for communications, this also quickly gets complicated.

Linux Network Namespace provides greater network isolation for resources so that it can be a better fit for Percona XtraDB Cluster nodes. Now, setting up Network namespaces in and of itself can be confusing; my recommendation is if you can use Docker, use Docker instead. It provides isolation on process ID and mount points, and takes care of all the script plumbing to create and destroy networks. As you will see in our scripts, we need to talk about directory location for datadirs.

Let’s create a network for Percona XtraDB Cluster with Network Namespaces.

I will try to do the following:

  • Start four nodes of Percona XtraDB Cluster
  • For each node, create separate network namespace so the nodes will be able to allocate network ports 3306, 4567, 4568 without conflicts
  • Assign the nodes IP addresses:
  • Create a “bridge interface” for the nodes to communicate, using IP address

For reference, I took ideas from this post: Linux Switching – Interconnecting Namespaces

First, we must create the bridge interface on the host:

BRIDGE=br-pxc brctl addbr $BRIDGE brctl stp $BRIDGE off ip addr add dev $BRIDGE ip link set dev $BRIDGE up

Next, we create four namespaces (one per Percona XtraDB Cluster node) using the following logic:

for i in 1 2 3 4 do ip netns add pxc_ns$i ip link add pxc-veth$i type veth peer name br-pxc-veth$i brctl addif $BRIDGE br-pxc-veth$i ip link set pxc-veth$i netns pxc_ns$i ip netns exec pxc_ns$i ip addr add 10.200.10.$((i+1))/24 dev pxc-veth$i ip netns exec pxc_ns$i ip link set dev pxc-veth$i up ip link set dev br-pxc-veth$i up ip netns exec pxc_ns$i ip link set lo up ip netns exec pxc_ns$i ip route add default via done

We see the following interfaces on the host:

1153: br-pxc: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 32:32:4c:36:22:87 brd ff:ff:ff:ff:ff:ff inet scope global br-pxc valid_lft forever preferred_lft forever inet6 fe80::2ccd:6ff:fe04:c7d5/64 scope link valid_lft forever preferred_lft forever 1154: br-pxc-veth1@if1155: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-pxc state UP qlen 1000 link/ether c6:28:2d:23:3b:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 8 inet6 fe80::c428:2dff:fe23:3ba4/64 scope link valid_lft forever preferred_lft forever 1156: br-pxc-veth2@if1157: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-pxc state UP qlen 1000 link/ether 32:32:4c:36:22:87 brd ff:ff:ff:ff:ff:ff link-netnsid 12 inet6 fe80::3032:4cff:fe36:2287/64 scope link valid_lft forever preferred_lft forever 1158: br-pxc-veth3@if1159: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-pxc state UP qlen 1000 link/ether 8a:3a:c1:e0:8a:67 brd ff:ff:ff:ff:ff:ff link-netnsid 13 inet6 fe80::883a:c1ff:fee0:8a67/64 scope link valid_lft forever preferred_lft forever 1160: br-pxc-veth4@if1161: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-pxc state UP qlen 1000 link/ether aa:56:7f:41:1d:c3 brd ff:ff:ff:ff:ff:ff link-netnsid 11 inet6 fe80::a856:7fff:fe41:1dc3/64 scope link valid_lft forever preferred_lft forever

We also see the following network namespaces:

# ip netns pxc_ns4 (id: 11) pxc_ns3 (id: 13) pxc_ns2 (id: 12) pxc_ns1 (id: 8)

After that, we can check the namespace IP address:

# ip netns exec pxc_ns3 bash # ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 1159: pxc-veth3@if1158: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 4a:ad:be:6a:aa:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet scope global pxc-veth3 valid_lft forever preferred_lft forever inet6 fe80::48ad:beff:fe6a:aac6/64 scope link valid_lft forever preferred_lft forever

To enable communication from inside the network namespace to the external world, we should add some iptables rules, e.g.:

iptables -t nat -A POSTROUTING -s -o enp2s0f0 -j MASQUERADE iptables -A FORWARD -i enp2s0f0 -o $BRIDGE -j ACCEPT iptables -A FORWARD -o enp2s0f0 -i $BRIDGE -j ACCEPT

where enp2s0f0 is an interface that has an external IP address (by some reason modern Linux distros decided to use names like enp2s0f0 for network interfaces, instead old good "eth0").

To start a node (or mysqld instance) inside a network namespace, we should use ip netns exec prefix for commands.

For example to start Percona XtraDB Cluster first node, in the namespace pxc_ns1, with IP address, we use:

ip netns exec pxc_ns1 mysqld --defaults-file=node.cnf --datadir=/data/datadir/node1 --socket=/tmp/node1_mysql.sock --user=root --wsrep_cluster_name=cluster1

To start following nodes:

NODE=2 ip netns exec pxc_ns${NODE} mysqld --defaults-file=node${NODE}.cnf --datadir=/data/datadir/node${NODE} --socket=/tmp/node${NODE}_mysql.sock --user=root --wsrep_cluster_address="gcomm://" --wsrep_cluster_name=cluster1   NODE=3 ip netns exec pxc_ns${NODE} mysqld --defaults-file=node${NODE}.cnf --datadir=/data/datadir/node${NODE} --socket=/tmp/node${NODE}_mysql.sock --user=root --wsrep_cluster_address="gcomm://" --wsrep_cluster_name=cluster1   etc

As the result of this procedure, we have four Percona XtraDB Cluster nodes running in an individual network namespace, not worrying about IP address and ports conflicts. We also allocated a dedicated IP range for our cluster.

This procedure isn’t trivial, but it is easy to script. I also think provides a good understanding what Docker, LXC or other containerization technologies do behind the scenes with networks.


Categories: MySQL

Webinar Thursday June 23: Choosing a MySQL High Availability Solution Today

MySQL Performance Blog - Mon, 2016-06-20 19:28

Please join Percona, Technical Account Manager, Michael Patrick on Thursday, June 23, 2016 at 10 AM PDT (UTC-7) as he presents “Choosing a MySQL High Availability Solution Today.”

High availability (HA) is one of the solutions to improve performance, avoid data outages, and recover quickly from disasters. An HA environment helps guarantee that your database doesn’t have a single point of failure, accommodates rapid growth and exponentially increasing database size, and enables the applications that power your business.

Michael will discuss various topologies for achieving High Availability with MySQL.

Topics include:

  • Percona XtraDB Cluster
  • DRBD
  • MHA
  • MySQL Orchestrator

Each solution has advantages and challenges. Attendees will gain a deeper understanding of how to choose the best solution for their needs while avoiding some of the pitfalls of making the wrong choices. Avoid the costly mistakes that commonly cause outages and lost revenue. Plus get the latest and greatest developments in the technologies!

Register now.

Michael Patrick Technical Account Manager

Mike came to Percona in 2015 after working for a variety of large corporations running hundreds of MySQL and Percona XtraDB Clusters in production environments. He is skilled in performance tuning, server auditing, high availability, multi-data center replication, migration, and other MySQL-related activities. Mike holds a B.S. in Computer Science from East Tennessee State University. In his off time, he enjoys Martial Arts and Cave Exploration. He lives in East Tennessee with his wife and he has four children.

Categories: MySQL

InnoDB locks and transaction isolation level

MySQL Performance Blog - Fri, 2016-06-17 18:49

What is the difference between InnoDB locks and transaction isolation level? We’ll discuss it in this post.

Recently I received a question from a user about one of my earlier blog posts. Since it wasn’t sent as a comment, I will answer it here. The question:

> I am reading your article:

> Full table scan locks whole table.

> Some bad select (read) query can do full table scan on InnoDB, does it lock whole table please?

> My understanding was that SELECT (read) blocks another DML only in MyISAM.

To answer this question, we to need understand two different concepts: locking and transaction isolation level.

In the default transaction isolation mode for InnoDB, REPEATABLE READ and the lower TRANSACTION ISOLATION levels, SELECT  doesn’t block any DML unless it uses SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE. On slide 20 (31) of my presentation, I use the SELECT ... FOR UPDATE statement (this is why a lock is seen).

However, if transaction isolation mode is SERIALIZABLE, then SELECT can block updates. You can see this in the example below:

mysql1> set transaction isolation level serializable; Query OK, 0 rows affected (0,00 sec) mysql1> begin; Query OK, 0 rows affected (0,00 sec) mysql1> select * from employees join titles using(emp_no); c936e6fc4c6cbaf77679ba5013339dff - 443308 rows in set (29,69 sec) mysql2> begin; Query OK, 0 rows affected (0,00 sec) mysql2> update titles set title='Engineer' where emp_no=15504; ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Regarding the different level locks set by InnoDB, I recommend that you follow the links on slide 37 (67) of the presentation, especially “InnoDB Locking Explained With Stick Figures” by Bill Karwin. Innodb table locks are also explained very well in this blog post by Peter Zaitsev.

Categories: MySQL

MongoDB Security: Why pay for Enterprise when Open Source has you covered?

MySQL Performance Blog - Fri, 2016-06-17 16:55

Does ensuring MongoDB security justify the cost of the Enterprise version? In my opinion, the answer is no.

MongoDB Inc© blasted an email with a study showing that the average cost of a data breach can be $5.9M. You can find the key finding in IBM’s 2015 report here:

NH Learning Solutions

Key findings:

Data breaches cost the most in the US and Germany and the lowest in Brazil and India. The average per capita cost of a data breach is $217 in the US and $211 in Germany. The lowest cost is in Brazil ($78) and India ($56). The average total organizational cost in the US is $6.5 million and in Germany $4.9 million. The lowest organizational cost is in Brazil ($1.8 million) and India ($1.5 million).

So it might be even worse than you thought! The study presented a clear per-record cost, which helps companies determine the real cost to them. This brings us to the recently-published MongoDB Security Architecture white paper. While being a great document, it raises some questions that should be addressed. We should dissect exactly what paying for an enterprise build gets you that is not already available in the open source community version. 

The key security features in above white paper are:

  • Authentication. LDAP Authentication centralizes things with your company directory (for PCI)
  • Authorization. What role-based access controls the database provides
  • Encryption. Broken into “At-Rest” and “In-Transit” as part of regular PCI requirements
  • Governance. Document validation and even checking for sensitive data such as an SSN or birth data
  • Auditing. The ability to see who did what in the database (also required for PCI).

That list lets us break down each into why they are important, and is it something that should be free in the MongoDB ecosystem.


MongoDB has built-in users (off by default). It misses things, however, like password complexity, age-based rotation, centralization, and identification of user roles versus service functions. These are essential to passing PCI. PCI requires that people don’t use old passwords, easy-to-break passwords, and that user access gets revoked when there is a change in status (such as leaving a department or the company). Thankfully LDAP is an open-source project of its own. Many connectors allow the use of Windows Active Directory (AD) systems to talk with LDAP.

Using LDAP and AD, you can tie users in with your corporate directory. When they change roles or leave the company, they can be removed by HR from your database group. Practically, this means there are automated systems in place to ensure only those you want to access the data manually can do so, without accidently missing something. It is important to note that the MongoDB Community Edition© does not have LDAP support. For this you need MongoDB Inc’s© Enterprise build. You can also use Percona Server© for MongoDB. Percona’s build is open source and free, and we offer a support contract so that if you require support or help it is available.


Role-based Authorization or (RBAC) is core to MongoDB Community©, MongoDB Enterprise© and Percona Server for MongoDB. In Mongo 2.6+ you can use built-in roles, or even craft your own down to what actions someone might be able to do – only exposing exactly what you want users to be able to do with razor precision. This is such a core MongoDB feature that it is everywhere and in every build regardless of vendor.


As I mentioned before, this is broken into two areas of discussion: At-Rest and In-Transit

At-Rest.: This is defined as on-disk, and typically refers to data saved to an encrypted storage location. This is to prevent theft by physical means and create backups that are stored in a fashion not easily read by any third party. There are practical limits to this. The biggest are trusting your sys-admins and assuming that a hacker has not been able to get access to the system to pretend they are one of your admins. This is not an issue unique to MongoDB or Percona. Such systems used more globally work here as well. They might include things like LUKS/cryptfs, or might go into even more secure methods such as signing encryption keys with LDAP, Smart Cards, and RSA type tokens.

When doing this level of encryption, you need to consider things like automounting and decrypting of drives. However, the general point is this is not something new to your system administrators, and they can manage this requirement in the same way they manage it in other parts of the system. The added benefit is a single procedure for storage encryption, not one per whatever technology a particular function uses.

In-Transit.: To keep this simple, we are just talking about using SSL on all connections (it can be as simple as that in the end). Depending on your internal requirements,©  and Percona Server for MongoDB all additionally support custom Signing Authorities (CA), x509 clients and member certificates. The x509 system is very useful when you want to make sure only authorized machines are allowed to talk to your system because they can even attempt to send a user/password to the system.


Put simply; this is the ability to enforce complex standards on the system by using Document Validation. This is an important feature that is available to MongoDB Community©, MongoDB Enterprise© and Percona Server for MongoDB. Governance is about the insertion and updating of data. It is also useful for checking if a field name like bday, birthday, ssn, social, ect is defined. We are not limited to those: you could also do string regex’s on things like user_id to check for a $regex such as “^d{3}-d{2}-d{4}$” (which would be a Social Security Number), or a checking for a credit card number. These examples are ways your DBAs and security architects can help prevent developers from exposing the company to added risk.

You can also ensure schema changes only occur when properly vetted by your DBA staff, as the developer code could fail if they change the format of what you’re storing in the database. This brings an additional layer of control to MongoDB’s dynamic production schema (allowing itself to store anything even if it should not).


Central to any good security design – and required by PCI – is being able to track what user did what action in the database (very similar to how you need to do it on your actual servers). At this time, MongoDB’s Community© build can’t track this. However, both MongoDB Enterprise©  and Percona Server for MongoDB support this feature. Both work in similar ways, allowing you to filter output to a particular user, database, collection, or source location. This gives you a log to review in any security incident and, more importantly, shows your PCI auditor that you’ve taken the correct steps to both protect your database from an intrusion and understand and incursions depth (should one occur).

Hopefully, this has been a good overview of the security options in MongoDB Community© and Enterprise© versions and Percona Server for MongoDB. Even without an enterprise-style contract, you can fulfill all your PCI compliance needs and protect your company using reasonable and known methods. Please note Percona strives to bring enterprise features to the community, but not to the point of wanting to lock you into a non-open source build. If you need support with anything MongoDB, we have the community and its users as our first priority!

Categories: MySQL

Scaling Percona XtraDB Cluster with ProxySQL in Kubernetes

MySQL Performance Blog - Thu, 2016-06-16 18:38

How do you scale Percona XtraDB Cluster with ProxySQL in Kubernetes?

In my previous post I looked how to run Percona XtraDB Cluster in a Docker Swarm orchestration system, and today I want to review how can we do it in the more advanced Kubernetes environment.

There are already some existing posts from Patrick Galbraith ( and Raghavendra Prabhu ( on this topic. For this post, I will show how to run as many nodes as I want, see what happens if we add/remove nodes dynamically and handle incoming traffic with ProxySQL (which routes queries to one of working nodes). I also want to see if we can reuse the ReplicationController infrastructure from Kubernetes to scale nodes to a given number.

These goals should be easy to accomplish using our existing Docker images for Percona XtraDB Cluster (, and I will again rely on the running service discovery (right now the images only work with etcd).

The process of setting up Kubernetes can be pretty involved (but it can be done; check out the Kubernetes documentation to see how: It is much more convenient to use a cloud that supports it already (Google Cloud, for example). I will use Microsoft Azure, and follow this guide: Unfortunately the scripts from the guide install previous version of Kubernetes (1.1.2), which does not allow me to use ConfigMap. To compensate, I will duplicate the ENVIRONMENT variables definitions for Percona XtraDB Cluster and ProxySQL pods. This can be done more optimally in the recent version of Kubernetes.

After getting Kurbernetes running, starting Percona XtraDB Cluster with ProxySQL is easy using following pxc.yaml file (which you also can find with our Docker sources

apiVersion: v1 kind: ReplicationController metadata: name: pxc-rc app: pxc-app spec: replicas: 3 # tells deployment to run N pods matching the template selector: app: pxc-app template: # create pods using pod definition in this template metadata: name: pxc labels: app: pxc-app spec: containers: - name: percona-xtradb-cluster image: perconalab/percona-xtradb-cluster:5.6test ports: - containerPort: 3306 - containerPort: 4567 - containerPort: 4568 env: - name: MYSQL_ROOT_PASSWORD value: "Theistareyk" - name: DISCOVERY_SERVICE value: "" - name: CLUSTER_NAME value: "k8scluster2" - name: XTRABACKUP_PASSWORD value: "Theistare" volumeMounts: - name: mysql-persistent-storage mountPath: /var/lib/mysql volumes: - name: mysql-persistent-storage emptyDir: {} imagePullPolicy: Always --- apiVersion: v1 kind: ReplicationController metadata: name: proxysql-rc app: proxysql-app spec: replicas: 1 # tells deployment to run N pods matching the template selector: front: proxysql template: # create pods using pod definition in this template metadata: name: proxysql labels: app: pxc-app front: proxysql spec: containers: - name: proxysql image: perconalab/proxysql ports: - containerPort: 3306 - containerPort: 6032 env: - name: MYSQL_ROOT_PASSWORD value: "Theistareyk" - name: DISCOVERY_SERVICE value: "" - name: CLUSTER_NAME value: "k8scluster2" - name: MYSQL_PROXY_USER value: "proxyuser" - name: MYSQL_PROXY_PASSWORD value: "s3cret" --- apiVersion: v1 kind: Service metadata: name: pxc-service labels: app: pxc-app spec: ports: # the port that this service should serve on - port: 3306 targetPort: 3306 name: "mysql" - port: 6032 targetPort: 6032 name: "proxyadm" # label keys and values that must match in order to receive traffic for this service selector: front: proxysql

Here is the command to start the cluster:

kubectl create -f pxc.yaml

The command will start three pods with Percona XtraDB Cluster and one pod with ProxySQL.

Percona XtraDB Cluster nodes will register themselves in the discovery service and we will need to add them to ProxySQL (it can be done automatically with scripting, for now it is a manual task):

kubectl exec -it proxysql-rc-4e936

Increasing the cluster size can be done with the scale command:

kubectl scale --replicas=6 -f pxc.yaml

You can connect to the cluster using a single connection point with ProxySQL: You can find it this way:

kubectl describe -f pxc.yaml Name: pxc-service Namespace: default Labels: app=pxc-app Selector: front=proxysql Type: ClusterIP IP: Port: mysql 3306/TCP Endpoints: <none> Port: proxyadm 6032/TCP Endpoints: <none> Session Affinity: None

It exposes the endpoint IP address and two ports: 3306 for the MySQL connection and 6032 for the ProxySQL admin connection.

So you can see that scaling Percona XtraDB Cluster with ProxySQL in Kubernetes is pretty easy. In the next post, I want to run benchmarks in the different Docker network environments.

Categories: MySQL

Why MongoRocks: Deprecating PerconaFT and MongoDB Optimistic locking

MySQL Performance Blog - Thu, 2016-06-16 13:51

In this post, we’ll discuss the rationale behind deprecating PerconaFT and embracing RocksDB.

Why is Percona deprecating PerconaFT in favor of RocksDB?

Many of you may have seen Peter Zaitsev’s recent post about Percona embracing RocksDB and deprecating PerconaFT. I’m going to shed a bit more light on the issues between the locking models for PerconaFT’s and MongoDB’s core servers. When making this decision, we looked at how the differences between the engines measure up and impact other improvements we could make. In the end, we can do more for the community by focusing on engines that are in line with assumptions the core server makes every second in your daily operations.Then we have more resources available for improving the users’ experience by adding new tools, features, and improving the core server.

What is pessimistic locking?

Pessimistic locking locks an entity in the database for the entire time that it is actively used in application memory. A lock either limits or prevents other users from working with the entity in the database. A write lock indicates that the holder of the lock intends to update the entity and disallows anyone from reading, updating, or deleting the object. A read lock means that the owner of the lock does not want the object to change while it holds the lock, allowing others to read the entity but not update or delete it. The scope of a lock might be the entire database, a table, a collection of ranges of documents or a single document.

You can order pessimistic locks as follows (from broad to granular):

  1. Database locks
  2. Collection locks
  3. Range locks
  4. Document locks

The advantage of pessimistic locking is that changes to the database get made consistently and safely. The primary disadvantage is that this approach isn’t as scalable. The chance of waiting for a lock to be released increases when:

  • A system has a lot of users
  • The transactions (in MongoDB, there are transactions in the engine but not at the user level) involve a greater number of entities
  • When transactions are long-lived

Therefore, pessimistic locks limit the practical number of simultaneous users that your system can support.

What is optimistic locking?

In most database systems (NoSQL and RDBMS) expect collisions to be relatively uncommon. For example, although two clients are working with user objects, one might be working with the Bob Vader object while another works with the Luke Vader object. These won’t collide. In this case, optimistic locking becomes the most viable concurrency control strategy. If you accept the premise that collisions infrequently occur, instead of trying to prevent them you can choose to detect and then resolve them when they do occur.

MongoDB has something called a Write Conflict Exception (WCE). A WCE is like an engine-level deadlock. If a record inside the engine changes due to thread #1, thread #2 must wait for a time that it is safe to change the record, and retry then. Typically this occurs when a single document gets updated frequently. It can also occur when there are many updates, or there are ranges of locks happening concurrently. This is a perfect case of optimistic locking, preferring to resolve or retry operations when they occur rather than prevent them from happening.

Can you make these play well while limiting the amount of development resources needed?

These views are as polar opposite as you can get in the database world. In one view you lock as much as possible, preventing anyone else from making a change. In the other view you let things be as parallel as possible, and accept you will retry if two clients are updating the same document. With the nature of how many documents fit in a single block of memory, this has some real-world concerns. When you have more than one document in a memory block, you could have a situation where locking one document means 400% more documents get affected. For example, if we have an update using the IN operator with 25 entries, you could be blocking 125 documents (not 25 documents)!

That escalated rather quickly, don’t you think? Using optimistic locking in the same situation, you at most would have to retry five document write locks as the data changed. The challenge for optimistic locking is that if I have five clients that are all updating all documents, you get a flurry of updates. WCE’s come in and eventually resolve things. If you use pessimistic locking, everybody waits their turn, and each one would finish before the next could run.

Much of Percona’s engineering effort goes into what types of systems we should put in place to simulate cheap latches or locks in optimistic locking to allow pessimistic locking to work (without killing performance). This requires an enormous amount of work just to get on-par behavior from the system – specifically in update type workloads, given delete/inserts are very similar in the systems. As a result, we’ve spent more time improving the engine rather than adding additional variables and instrumentation.

Looking forward, MongoRocks aligns more to WiredTiger in its locking structure (they both run as log sequence merges or LSMs), and this means more time working on new optimizer patterns, building things to improve diagnostics or tuning the engine/system to your specific needs. We think you will be excited to see some of the progress we have been discussing for Mongo 3.4 (some of which might even make it directly into 3.2).

What is the MongoRocks anyhow and how does it compare to PerconaFT?

The last concept I want to cover is what RocksDB is exactly, what its future is and how it stacks up to PerconaFT. The most important news is Facebook is working on the core engine, which is used both by MyRocks and MongoRocks (you might have seen some of their talks on the engine). This means Percona can leverage some of the brilliant people working on RocksDB inside Facebook and focus instead on the API linking the engine into place, as well as optimizing how it uses the engine – rather than building the engine completely. Facebook is making some very significant bets on the backend use of RocksDB in several parts of the system, and potentially some user parts of the system (which have historically used InnoDB).

So what is RocksDB, and how does it work? Its core is an LSM system, which means it puts new data into the newest files as an append. Over time, the files get merged into five different levels (not covered here). As part of this process, when you have an update a new copy of the data is saved at the end of the latest file, and a memory table points a record to that location for the latest “version” of that data. In the background, the older records are marked for deletion by something called a “tombstone.” There is a background thread merging still-relevant data from old files into new files so that empty old files get unlinked from the filesystem.

This streamlines the process better than B-Tree’s constant rebalancing and empty blocks in files that need to be compacted or re-used over time. Being write-optimized means that, like PerconaFT previously, it will be faster for write-based operations than something like WiredTiger. (WiredTiger in some cases can be faster on reads, with MMAP being the fastest possible approach for reads.) This also means things like TTL’s can work pretty well in an LSM since all the items that were inserted in time order age out, and the engine can just delete the unneeded file. This solves some of the reasons people needed partitions to begin with, and it still allows sharding to work well.

We are also very excited about creating additional tools that let you query your backups in MongoRocks, as well as some of the very simple ways it will take binary-based backups quickly, consistently (even when sharded) and continually.

I hope this explains more about lock types and what their implications mean as a follow up to Peter’s blog post about the direction Percona is moving regarding PerconaFT and MongoRocks. If you want to ask any more questions, or would like another blog that covers some areas of the system more deeply, please let us know via this blog, email, twitter or even pigeon!

Categories: MySQL

Troubleshooting hardware resource usage webinar: Q & A

MySQL Performance Blog - Wed, 2016-06-15 22:12

In this blog, I provide answers to the Q & A for the Troubleshooting hardware resource usage webinar.

First, I want to thank everybody who attended the May 26 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: How did you find the memory IO LEAK?

A: Do you mean the replication bug I was talking about in the webinar? I wrote about this bug here. See also comments in the bug report itself.

Q: Do you have common formulas you use to tune MySQL?

A: There are best practices: relate thread concurrency to number of CPU cores you have, set InnoDB buffer pool size large enough so it can contain all your working dataset (which is not always possible), and do not set the Query Cache size larger than 512MB (or even better, turn it off) to avoid issues with global lock set when it needs to be de-fragmented. I prefer not to call them “formulas,” because all options need to be adjusted to match the workload. If this weren’t the case, MySQL Server would have an automatic configuration. There is also a separate webinar on configuration (Troubleshooting configuration issues) where I discuss these practices.

Q: Slide 11: is this real time? Can we get this info for a job that has already finished?

A: Yes, this is real time. No, it is not possible to get this info for a thread that does not exist.

Q: Slide 11: what do negative numbers mean?

A: Numbers are taken from the CURRENT_NUMBER_OF_BYTES_USED field for table memory_summary_by_thread_by_event_name in Performance Schema. These values, in turn, are calculated as (memory allocated by thread) – (memory freed by thread). Negative numbers here mean either a memory leak or incorrect calculation of memory used by the thread. I reported this behavior in the MySQL Bugs database. Please subscribe to the bug report and wait to see how InnoDB and Performance Schema developers answer.

Q: Are TokuDB memory usage stats recorded in the sys.memory_by_thread_by_current_bytes  table also?  Do we have to set something to enable this collection? I ran the query, but it shows 0 for everything.

A: TokuDB currently does not support Performance Schema, thus its memory statistics are not instrumented. See the user manual on how memory instrumentation works.

Q: With disk what we will check for disk I/O?

A: I quite don’t understand the question. Are you asking on which disk we should check IO statistics? For datadir and other disks, look at the locations where MySQL stores data and log files (if you set custom locations).

Q: How can we put CPU in parallel to process multiple client requests? Put multiple requests In memory by parallel way. By defining transaction. Or there any query hints?

A: We cannot directly put CPU in parallel, but we can affect it indirectly by tuning InnoDB threads-related options (innodb_threads_concurrency, innodb_read_io_threads, innodb_write_io_threads) and using the thread pool.

Q: Is there any information the Performance Schema that is not found in the SYS schema?

A: Yes. For example, sys schema does not have a view for statistics about prepared statements, while Performance Schema does, because sys schema takes its statement statistics from digest tables (which make no sense for prepared statements).

Q: What is your favorite tool to investigate a memory issue with a task/job that has already finished?

A: I don’t know that there is such a tool suitable for use in production. In a test environment, you can use valgrind or similar tools. You can also make core dumps of the mysqld process and investigate them after the issue is gone.

Categories: MySQL

Installing MongoDB 3.2 and upgrading MongoDB replica set

MySQL Performance Blog - Wed, 2016-06-15 20:55

In this post, we’ll examine a couple of ways for upgrading MongoDB replica set.

With the release of MongoDB 3.2, comes a rash of new features and improvements. One of these enhancements is improved replica sets. From MongoDB: “A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.”

Config servers are replica sets!

This is HUGE. It signals a significant advancement in backups, metadata stability and overall maturity. It is a very long-awaited feature that shows MongoDB is maturing. It means:

  • Mongos’ can retry connection vs error
  • Unified and consistent backups!
  • Up to 50 secondaries
    • Remove bugs with Mongos’ not near config servers!

How do we activate all these new awesome features? Let’s do it!

Upgrading to 3.2
  • Replace binaries and restart one secondary at a time
  • Then primaries as well
  • Restart configs in reverse order
    • If configdb=con1, con2, con3
      • Restart con3, con2, and then finally con1 with 3.2
      • Do con1 as FAST as possible, while the balancer is also disabled
    • You no longer need to restart a mongos –upgrade (as of 3.2)
    • Restart all mongos, this will reset ALL connections at some point (whether you do at once or space it out).
Upgrading the replset to the new protocol

This is by far the easiest upgrade bit but DON’T do it until you know your stable on 3.2. Log into each primary and run:

>cfg = rs.conf(); { "_id" : "r1", "version" : 2, "members" : [ { "_id" : 0, "host" : "localhost:17001" }, { "_id" : 1, "host" : "localhost:17002" }, { "_id" : 2, "host" : "localhost:17003", } ] } >cfg.protocolVersion=1; >rs.reconfig(cfg); { "ok" : 1, "$gleStats" : { "lastOpTime" : Timestamp(1464947003, 1), "electionId" : ObjectId("7fffffff0000000000000018") } }


>db.getSiblingDB(‘config’).shards.forEach(function(shard){ x = new Mongo(; /* Assumes no auth needed */ conf =x.getDB("local").system.replset.findOne() conf.protcolVersion=1;conf.version++; x.getDB(‘admin’).runCommand({ replSetReconfig: conf }); });

The quick upgrade scripts

Does what it says: kills every process and launches them on 3.2 binaries with no other changes.


Simply runs the quick rs.reconfig() on each primary, adds the new settings to enable to new replication features.

Let’s upgrade the configs the right way!

This is not included as part of a normal upgrade so only do this AFTER you’re stable and don’t do it before upgrading the protocolVersion we just talked about. (I mean it! Disregard this advice and your life will seriously not be awesome!)

Upgrading to a Config ReplicaSet ( the official way)
  1. Run rs.initiate on the first config in the list (must be 3.2.4+)
    • Must be a fully configured document with configsrv:true defined.
  2. Restart same config server adding
    • configsvrMode = sccc
    • replSet = <name used in rs.initiate()>
    • storageEngine= WiredTiger
  3. Start the new config servers for the other two nodes (should be a new dbpath and port)
  4. Add those nodes to the replSet and check their status
  5. Remove the second original config server from the running
  6. Restart the 1st node you set “sccc” on to not have that setting
  7. At this point, the 1st node will transition to removed if using MMAP.
  8. Restart a mongos with a new configdb line
    • –configdb <replSetName>/node1:port,node2:port,…
    • Only replset members should be listed
  9. Verify you can work and query through mongos
  10. Repeat on all mongos
  11. Remove the 1st node with rs.remove
  12. Shutdown final original config and enable balancer
There is also an easy way.

The easy way, with a small maintenance window, which lets you just restore a good backup and have a nice and simple rollback plan:

  1. Stop all mongos after backing up the config directory
  2. Run rs.initiate on first config server
  3. Stop the 2nd, then the 3rd, restarting them with an empty dbpath directory
  4. Check the rs.status now
  5. Stop the 1st config server and restart with an empty dbpath directory
  6. Check Status
  7. Restart all mongos, adding <replSetName>/ to the front of the configdb line.
  8. Done!

Oh look there is a quick script we have made for you:

    • Kill all  config and mongos processes
    • Restart the first config server on non-standard port
    • Mongodump config database
    • Restart c1 as WiredTiger, clearing that data path
    • Import dump back into first config server
    • Restart on normal port
    • Initialize Replica Set
    • Restart second and third config server after clearing dbpath folder
    • After the initial sync, start all the mongos.
    • Done and script exits!


Categories: MySQL
Syndicate content