How to improve InnoDB performance by 55% for write-bound loads

MySQL Performance Blog - Fri, 2014-05-23 10:00

During April’s Percona Live MySQL Conference and Expo 2014, I attended a talk on MySQL 5.7 performance an scalability given by Dimitri Kravtchuk, the Oracle MySQL benchmark specialist. He mentioned at some point that the InnoDB double write buffer was a real performance killer. For the ones that don’t know what the innodb double write buffer is, it is a disk buffer were pages are written before being written to the actual data file. Upon restart, pages in the double write buffer are rewritten to their data files if complete. This is to avoid data file corruption with half written pages. I knew it has an impact on performance, on ZFS since it is transactional I always disable it, but I never realized how important the performance impact could be. Back from PLMCE, a friend had dropped home a Dell R320 server, asking me to setup the OS and test it. How best to test a new server than to run benchmarks on it, so here we go!

ZFS is not the only transactional filesystem, ext4, with the option “data=journal”, can also be transactional. So, the question is: is it better to have the InnoDB double write buffer enabled or to use the ext4 transaction log. Also, if this is better, how does it compare with xfs, the filesystem I use to propose but which do not support transactions.


The goal is to stress the double write buffer so the load has to be write intensive. The server has a simple mirror of two 7.2k rpm drives. There is no controller write cache and the drives write caches are disabled. I decided to use the Percona tpcc-mysql benchmark tool and with 200 warehouses, the total dataset size was around 18G, fitting all within the Innodb buffer pool (server has 24GB). Here’re the relevant part of the my.cnf:

innodb_read_io_threads=4 innodb_write_io_threads=8 #To stress the double write buffer innodb_buffer_pool_size=20G innodb_buffer_pool_load_at_startup=ON innodb_log_file_size = 32M #Small log files, more page flush innodb_log_files_in_group=2 innodb_file_per_table=1 innodb_log_buffer_size=8M innodb_flush_method=O_DIRECT innodb_flush_log_at_trx_commit=0 skip-innodb_doublewrite #commented or not depending on test

So, I generated the dataset for 200 warehouses, added they keys but not the foreign key constraints, loaded all that in the buffer pool with a few queries and dumped the buffer pool. Then, with MySQL stopped, I did a file level backup to a different partition. I used the MySQL 5.6.16 version that comes with Ubuntu 14.04, at the time Percona server was not available for 14.04. Each benchmark followed this procedure:

  1. Stop mysql
  2. umount /var/lib/mysql
  3. comment or uncomment skip-innodb_doublewrite in my.cnf
  4. mount /var/lib/mysql with specific options
  5. copy the reference backup to /var/lib/mysql
  6. Start mysql and wait for the buffer pool load to complete
  7. start tpcc from another server

The tpcc_start I used it the following:

./tpcc_start -h10.2.2.247 -P3306 -dtpcc -utpcc -ptpcc -w200 -c32 -r300 -l3600 -i60

I used 32 connections, let the tool run for 300s of warm up, enough to reach a steady level of dirty pages, and then, I let the benchmark run for one hour, reporting results every minute.

ResultsTest:Double write bufferFile system optionsAverage NOPTM over 1hext4_dwYesrw690ext4_dionolock_dwYesrw,dioread_nolock668ext4_nodwNorw1107ext4trx_nodwNorw,data=journal1066xfs_dwYesxfs rw,noatime754


So, from the above table, the first test I did was the common ext4 with the Innodb double write buffer enabled and it yielded 690 new order transactions per minute (NOTPM). Reading the ext4 doc, I also wanted to try the “dioread_nolock” setting that is supposed to reduce mutex contention and this time, I got slightly less 668 NOTPM. The difference is within the measurement error and isn’t significant. Removing the Innodb double write buffer, although unsafe, boosted the throughput to 1107 NOTPM, a 60% increase! Wow, indeed the double write buffer has a huge impact. But what is the impact of asking the file system to replace the innodb double write buffer? Surprisingly, the performance level is only slightly lower at 1066 NOTPM and vmstat did report twice the amount writes. I needed to redo the tests a few times to convince myself. Getting a 55% increase in performance with the same hardware is not common except when some trivial configuration errors are made. Finally, I used to propose xfs with the Innodb double write buffer enabled to customers, that’s about 10% higher than ext4 with the Innodb double write buffer, close to what I was expecting. The graphic below presents the numbers in a more visual form.

TPCC NOTPM for various configurations

In term of performance stability, you’ll find below a graphic of the per minute NOTPM output for three of the tests, ext4 non-transactional with the double write buffer, ext4 transactional without the double write buffer and xfs with the double write buffer. The dispersion is qualitatively similar for all three. The values presented above are just the averages of those data sets.

TPCC NOTPM evolution over time


Innodb data corruption is not fun and removing the innodb double write buffer is a bit scary. In order to be sure it is safe, I executed the following procedure ten times:

  1. Start mysql and wait for recovery and for the buffer pool load to complete
  2. Check the error log for no corruption
  3. start tpcc from another server
  4. After about 10 minutes, physically unplug the server
  5. Plug back and restart the server

I observed no corruption. I was still a bit preoccupied, what if the test is wrong? I removed the “data=journal” mount option and did a new run. I got corruption the first time. So given what the procedure I followed and the number of crash tests, I think it is reasonable to assume it is safe to replace the InnoDB double write buffer by the ext4 transactional journal.

I also looked at the kernel ext4 sources and changelog. Up to recently, before kernel 3.2, O_DIRECT wasn’t supported with data=journal and MySQL would have issued a warning in the error log. Now, with recent kernels, O_DIRECT is mapped to O_DSYNC and O_DIRECT is faked, always for data=journal, which is exactly what is needed. Indeed, I tried “innodb_flush_method = O_DSYNC” and found the same results. With older kernels I strongly advise to use the “innodb_flush_method = O_DSYNC” setting to make sure files are opened is a way that will cause them to be transactional for ext4. As always, test thoroughfully, I only tested on Ubuntu 14.04.

Impacts on MyISAM

Since we are no longer really using O_DIRECT, even if set in my.cnf, the OS file cache will be used for InnoDB data. If the database is only using InnoDB that’s not a big deal but if MyISAM is significantly used, that may cause performance issues since MyISAM relies on the OS file cache so be warned.

Fast SSDs

If you have a SSD setup that doesn’t offer a transactional file system like the FusionIO directFS, a very interesting setup would be to mix spinning drives and SSDs. For example, let’s suppose we have a mirror of spinning drives handled by a raid controller with a write cache (and a BBU) and an SSD storage on a PCIe card. To reduce the write load to the SSD, we could send the file system journal to the spinning drives using the “journal_path=path” or “journal_dev=devnum” options of ext4. The raid controller write cache would do an awesome job at merging the write operations for the file system journal and the amount of write operations going to the SSD would be cut by half. I don’t have access to such a setup but it seems very promising performance wise.


Like ZFS, ext4 can be transactional and replacing the InnoDB double write buffer with the file system transaction journal yield a 55% increase in performance for write intensive workload. Performance gains are also expected for SSD and mixed spinning/SSD configurations.

The post How to improve InnoDB performance by 55% for write-bound loads appeared first on MySQL Performance Blog.

Categories: MySQL

Engineering behind EXPLAIN FORMAT=JSON (or lack thereof)

Sergey Petrunia's blog - Thu, 2014-05-22 22:11

MySQL 5.6 has added support for EXPLAIN FORMAT=JSON. The basic use case for that feature is that one can look at the JSON output and see more details about the query plan. More advanced/specific use cases are difficult, though. The problem is, you can’t predict what EXPLAIN FORMAT=JSON will produce. There is no documentation or any kind of convention regarding the contents of JSON document that you will get.

To make sure I’m not missing something, I looked at MySQL Workbench. MySQL Workbench has a feature called Visual Explain. If you want to use, prepare to seeing this a lot:

In Workbench 6.1.4 you get it for (almost?) any query with subquery. In Workbench 6.1.6 (released last week), some subqueries work, but it’s still easy to hit a query whose EXPLAIN JSON output confuses workbench.

Looking at the source code, this seems to be just the start of it. The code in MySQL Server is not explicitly concerned with having output of EXPLAIN FORMAT=JSON conform to any convention. Workbench also has a rather ad-hoc “parser” that walks over JSON tree and has certain arbitrary expectations about what nodes should be in various parts of the JSON document. When these two meet, bugs are a certainty. I suspect the real fun will start after a few releases of the Server (fixing stuff and adding new features) and Workbench (trying to catch up with new server while supporting old ones).

My personal interest in all this is that we want to support EXPLAIN JSON in MariaDB. MariaDB optimizer has extra features, so we will have to extend EXPLAIN JSON. I was looking for a way to do it in a compatible way. However, current state of EXPLAIN JSON in MySQL doesn’t give one a chance.

Categories: MySQL

A technical WebScaleSQL review and comparison with Percona Server

MySQL Performance Blog - Thu, 2014-05-22 10:00

The recent WebScaleSQL announcement has made quite a splash in the MySQL community over the last few weeks, and with a good reason. The collaboration between the major MySQL-at-scale users to develop a single code branch that addresses the needs of, well, web scale, is going to benefit the whole community. But I feel that the majority of community opinions and comments to date have been based on the announcement itself and the organizational matters only. What we have been missing is an actual look at the code. What actual new features and bug-fixes are there? Let’s take a look.

At the same time, as Percona is also a developer of an enhanced MySQL replacement database server, it’s natural to try to compare the two. So let’s try to do that as well, but an important caveat applies here. Both MySQL branches (a branch and an upstream-tracking fork would be more exact) are being developed with different goals and for different end users. WebScaleSQL is all things scale-performance: diagnostics, specific features-for a relatively narrow and highly proficient group of users. There are no binary releases, the code base is supposed to serve as a basis for further code branches, specific to each corporate contributor. On the other hand, Percona Server is a general-purpose server that is developed with broad input from Percona’s customers, professional services departments, and general community. Thus, it would be unfair to say that one of the branches should be considered better than the other just because a certain feature is missing or not reaching as far. The software serves different needs.

The rest of this post is an annotated list of WebScaleSQL-specific code commits: user-visible features, performance fixes, general fixes, and finally the stuff of interest to developers.

New features
  • Ability for clients to specify millisecond (as opposed to second in MySQL) read/write/connect timeouts. This patch also carries an internal cleanup to introduce a timeout data type to avoid second-milisecond unit conversion errors.
  • Super read-only when regular read-only is not enough, that is, when writes by SUPER users need to be prevented as well.

All of the above are absent from Percona Server but possible to merge if there is interest. I’d also note that this list is rather short at the moment with some obvious stuff missing, such as the user statistics patch. I’d expect this to change in near future.

Performance-related features and fixes

It is Web Scale, remember.

These two changes require understanding on the user’s part of what are the tradeoffs. They are missing in Percona Server, but, again, possible to merge if there is interest in them.

Again Percona Server does not carry these. Different from the previous ones, these should be safe for every single user and we could merge them without having to give any further thought to their merits. Oracle MySQL should do the same.

That is one general performance-related change that Percona Server has too.

InnoDB flushing performance fixes

These would belong to the previous section, but I’d like to highlight them separately. We spent a lot of effort to analyse and improve the 5.6 InnoDB flushing before the Percona Server 5.6 GA release and continue to do so in the point releases. The WebScaleSQL changes below show that we and they have discovered a lot of identical improvement areas independently, and provided different fixes for the same issues. For an overview of XtraDB 5.6 changes in this area, go here and here. Note that these changes are somewhat more extensive, especially for the high-concurrency cases.

  • Back port of 5.7 WL #7047 and a fix for MySQL bug #71411 (fixed in Percona Server too). WL #7047 reduces the buffer list scan complexity. We have identified the same issue but attempted to work around it with flushing heuristic tweaks.
  • Fix for MySQL bugs #70500 and #71988 to remove potential flushing instabilities. Both fixed in Percona Server.
  • Fix for MySQL bug #62534, enabling finer-grained setting of innodb_max_dirty_pages_pct and unbreaking it for value zero. Not fixed in Percona Server, but the Oracle fix should be coming in 5.6.19.
  • Fix for MySQL bug #70899, removing redundant flush on server startup, which should speed up crash recovery with large buffer pools. Not fixed in Percona Server. Oracle fix expected in 5.6.20.
  • Ability to specify idle system flushing rate. Absent in Percona Server. I believe it should be possible to get the same effect by tuning existing variables: setting innodb_io_capacity lower and innodb_io_capacity_max higher, but it needs experimenting before being able to tell for sure.
General fixes

Fixes for assorted MySQL bugs. None of them are present in Percona Server, they might be merged as needed. Our own list of assorted MySQL bugs we have fixed is here. I have omitted fixes for MySQL developer-specific bugs, these are listed in the next session.

  • Fix for MySQL bug #64751 – Make NO_UNSIGNED_SUBTRACTION SQL mode work for additions too, i.e. unifying the cases of “foo – 1″ and “foo + -1″.
  • Stop spawning one extra thread on server startup to work around a bug in glibc that was fixed in 2006. Interestingly I was not able to find any MySQL bug report for this. Anyone?
  • Preserve slave I/O thread connection settings if compression is used. Again, is there a MySQL bug report for this?
  • Fix for MySQL bug #64347 –  database option mix-up if lower_case_table_names = 0 and the database names differ only by case.
Build changes

Making MySQL play nicer with the system libs, and other assorted changes.

  • Static linking of semisync replication plugins, based on a MariaDB patch. This might have a performance angle to it – MySQL bug #70218?
  • Do not embed OpenSSL and zlib in the static libraries.
  • Fix building with system OpenSSL and zlib. Using system libraries whenever possible make packaging easier and more conformant to Linux distribution requirements. We have been working on this too.
  • Fix building with system libreadline (MySQL bug #63130, closed without fix). Likewise.
Developer changes

These are patches of interest to MySQL / WebScaleSQL developers and not immediately visible for end users. I’m omitting some things, such as testcase compatibility with various build options, testsuite timeout tweaks, and patches that integrate with tools used for project development: Jenkins, Phabricator, etc.

  • Switch to C++11 and C99, the newer C and C++ language versions. It’s a big change from development perspective and one that is possible to pull off only if the project does not need to support older systems and their compilers (or even the newest compilers on some platforms). This is precisely the kind of thing that is easiest to implement for WebScaleSQL than for everybody else. As for the benefits of the change, the project already makes use of C++11 memory model – see the next item.
  • An efficient atomic stat counter framework, using C++11. I wonder how its performance compares to that of get_sched_indexer_t, which is present in Oracle MySQL 5.6, but not used?
  • Making the Performance Schema MTR suite slightly more sane by not recording stuff that tests nothing and at the same time is prone to change. Performance Schema MTR bits are something I’m sure every single 5.6 branch developer has encountered. This particular commit fixes MySQL bug #68714. Fixed in Percona Server. This is useful if one configures the build to re-enable the Performance Schema.
  • More of the same. Half of that commit fixes MySQL bug #68635, which is fixed in Percona Server too but unfortunately was considered by upstream as not requiring any fixes.
  • Stabilising the MTR testsuite, SHOW PROCESSLIST bits. Is there a MySQL bug report for this?
  • Stabilising the MTR testsuit, missing ORDER BY in 5.6.17 bits. Likewise, is there a bug report for this one?
  • Fix for MTR breaking if there is a ‘@’ somewhere in the working directories. Jenkins CI likes to put ‘@’ there. Same question re. bug report?
  • Unbreak a bunch of tests in the parts suite. This looks to me like MySQL bug #69252, but it has been already fixed by Oracle. Is the WebScaleSQL fix still required?
  • Re-enable AIO if MTR –mem option is passed.
  • Stress tests in MTR.
  • Fix compilation with Bison 3, based on a MariaDB patch (MySQL bug #71250). Fixed in Percona Server.
  • Fix compilation warnings (more). A bug report?
  • Fix uninitialised variable use warnings as reported by AddressSanitizer. Is there a bug report?
  • Fix a potential out-of-bound access, found by AddressSanitizer. Is there a MySQL bug for this?
  • Fix CMake confusion of two different ways to ask for a debug build resulting in different builds (MySQL bug #70647, fixed in 5.7).

Notice that the last list is quite long, especially if compared to the list of user-visible features added to date. That makes perfect sense for the project at this stage: building a solid development foundation first so that the features can follow in good quality and reduced maintenance effort.  Add a whole bunch of performance fixes to make a big picture view for today: A solid foundation for further development; numerous performance fixes; a few general fixes and new features.

As for comparing to Percona Server, currently the biggest overlap is in the performance-InnoDB flushing-fixes. For the rest, we can merge from WebScaleSQL as necessary – if you think that a certain WebScaleSQL feature or a fix would benefit you, drop us a line to discuss the options. And of course we invite WebScaleSQL to take any our fixes or patches they would like.

The post A technical WebScaleSQL review and comparison with Percona Server appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training)

MySQL Performance Blog - Wed, 2014-05-21 10:00

I’ve had the opportunity to train lots of people on Percona XtraDB Cluster (PXC) over the last few years the product has existed.  This has taken the form of phone calls, emails, blog posts, webinars, and consulting engagements. This doesn’t count all the time I’ve spent grilling Codership on how things actually work.  But it has culminated in the PXC tutorial I have given annually at Percona Live MySQL Conference and Expo in Santa Clara, California for the last two years.  Baron even attended this year and had this say:

“Jay Janssen’s tutorial on Percona XtraDB Cluster was impressive. I can’t imagine how much time he must have spent preparing for that.”

(Baron: Thanks for this compliment, BTW)  The answer is: I haven’t kept track, but I’ve delivered it 3 times at conferences and it gets better each time.  I started working on it during the summer of 2012 for Percona Live NYC that year and have put in countless hours since making it better.  This year, despite my voice problems, I was really happy with how it came out.

We’ve had lots of requests for repeats of this tutorial and with more detail.  As such, I’ll be doing a 2-day training about MySQL HA with Percona XtraDB Cluster in sunny San Jose, California on July 16th -17th 2014.  This Percona MySQL Training session is appropriately titled, “MySQL High Availability With Percona XtraDB Cluster” and you can get more information and also register here.

This course will include an overview of how Percona XtraDB Cluster fits into the MySQL High Availability world, followed by a deep-dive into PXC.  All the content of the tutorial will be covered, but it will be spread out with more details on all the components and important bits of using PXC and Galera.

If you’re unfamiliar with my tutorial — it is intensively hands-on.  This will not be 2 days of lecture: You’ll be setting up and experimenting with a real cluster on the fly during the class. Expect plenty of command-line work and to leave with a much better sense of how to truly operate and use a Percona XtraDB Cluster.

Once again, you can check out the details on our website and register NOW to attend right here.

The post MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training) appeared first on MySQL Performance Blog.

Categories: MySQL

Database auditing alternatives for MySQL

MySQL Performance Blog - Tue, 2014-05-20 15:08

Database auditing is the monitoring of selected actions of database users. It doesn’t protect the database in case privileges are set incorrectly, but it can help the administrator detect mistakes.

Audits are needed for security. You can track data access and be alerted to suspicious activity. Audits are required for data integrity. They are the only way to validate that changes made to data are correct and legal.

There are several regulations that require database audits:

  • Sarbanes-Oxley (SOX) Act of 2002 is a US federal law that regulates how financial data must be handled and protected.
  • Payment Card Industry Data Security Standard, otherwise known as PCI-DSS is an international standard developed to protect cardholder’s data.
  • Health Insurance Portability and Accountability Act (HIPAA) enacted by the U.S. Congress to protect medical and personal information.

MySQL since version 5.5.3 provides the Audit Plugin API which can be used to write an Audit Plugin. The API provides notification for the following events:

  • messages written to general log (LOG)
  • messages written to error log (ERROR)
  • query results sent to client (RESULT)
  • logins (including failed) and disconnects (CONNECT)

All current audit plugins for MySQL provide an audit log as result of their work. They differ in record format, filtering capabilities and verbosity of log records.

McAfee MySQL Audit Plugin
This plugin is available for MySQL versions 5.1, 5.5, 5.6. It does not officially support Percona Server and MariaDB. It doesn’t use the Audit API and has better verbosity and better filtering features. This is achieved by binary patching the server at runtime inserting the hooks which extract data stored in known offsets in memory. Thus, the plugin is sensitive to any changes of server code.


  • json log format
  • log to file or UNIX socket (allows to log with syslog-ng)
  • filter logged events by users, databases and tables, commands (insert, update, delete)

Oracle Enterprise Audit Log Plugin
Oracle provides this audit plugin as a part of the MySQL Enterprise pack. It uses the MySQL Audit API and is able to log RESULT and CONNECT events. The plugin has support for two XML-based formats.


  • XML format
  • log to file
  • filter by event type

MariaDB Audit Plugin
MariaDB developers extended the MySQL Audit API by adding fields for existing events and adding new TABLE event which notifies of operation with tables (read, write, create, drop, alter). The plugin can still be used with MySQL and Percona Server but MariaDB’s additions will not be available.


  • CSV log format
  • log to file or syslog
  • filter by users, event types

Percona Server Audit Log feature
Percona has developed an audit log feature that is a part of Percona Server since 5.5.35-37.0 and 5.6.17-65.0. It’s goal is to be compatible with Oracle’s Enterprise Audit Plugin providing a similar set of features for Percona Server users. It asynchronously logs all queries and connections in order to “audit” Percona Server usage, without the overhead of the General Query Log. The Audit Log feature can be very beneficial for web applications that deal with sensitive data (e.g., credit card numbers or medical records) and require security compliance (e.g., HIPAA or SOX). Administrators of multi-tenant applications or MySQL as a service can easily audit data access from a security and performance standpoint when using the Audit Log feature in Percona Server. The Audit Log feature is helpful for investigating and troubleshooting issues and auditing performance, too. The Audit Log feature can be dynamically enabled (does not require a server restart).

The post Database auditing alternatives for MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Errant transactions: Major hurdle for GTID-based failover in MySQL 5.6

MySQL Performance Blog - Mon, 2014-05-19 07:00

I have previously written about the new replication protocol that comes with GTIDs in MySQL 5.6. Because of this new replication protocol, you can inadvertently create errant transactions that may turn any failover to a nightmare. Let’s see the problems and the potential solutions.

In short
  • Errant transactions may cause all kinds of data corruption/replication errors when failing over.
  • Detection of errant transactions can be done with the GTID_SUBSET() and GTID_SUBTRACT() functions.
  • If you find an errant transaction on one server, commit an empty transaction with the GTID of the errant one on all other servers.
  • If you are using a tool to perform the failover for you, make sure it can detect errant transactions. At the time of writing, only mysqlfailover and mysqlrpladmin from MySQL Utilities can do that.
What are errant transactions?

Simply stated, they are transactions executed directly on a slave. Thus they only exist on a specific slave. This could result from a mistake (the application wrote to a slave instead of writing to the master) or this could be by design (you need additional tables for reports).

Why can they create problems that did not exist before GTIDs?

Errant transactions have been existing forever. However because of the new replication protocol for GTID-based replication, they can have a significant impact on all servers if a slave holding an errant transaction is promoted as the new master.

Compare what happens in this master-slave setup, first with position-based replication and then with GTID-based replication. A is the master, B is the slave:

# POSITION-BASED REPLICATION # Creating an errant transaction on B mysql> create database mydb; # Make B the master, and A the slave # What are the databases on A now? mysql> show databases like 'mydb'; Empty set (0.01 sec)

As expected, the mydb database is not created on A.

# GTID-BASED REPLICATION # Creating an errant transaction on B mysql> create database mydb; # Make B the master, and A the slave # What are the databases on A now? mysql> show databases like 'mydb'; +-----------------+ | Database (mydb) | +-----------------+ | mydb | +-----------------+

mydb has been recreated on A because of the new replication protocol: when A connects to B, they exchange their own set of executed GTIDs and the master (B) sends any missing transaction. Here it is the create database statement.

As you can see, the main issue with errant transactions is that when failing over you may execute transactions ‘coming from nowhere’ that can silently corrupt your data or break replication.

How to detect them?

If the master is running, it is quite easy with the GTID_SUBSET() function. As all writes should go to the master, the GTIDs executed on any slave should always be a subset of the GTIDs executed on the master. For instance:

# Master mysql> show master statusG *************************** 1. row *************************** File: mysql-bin.000017 Position: 376 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-0800272864ba:1-30, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7 # Slave mysql> show slave statusG [...] Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-0800272864ba:1-29, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9 # Now, let's compare the 2 sets mysql> > select gtid_subset('8e349184-bc14-11e3-8d4c-0800272864ba:1-29, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9','8e349184-bc14-11e3-8d4c-0800272864ba:1-30, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7') as slave_is_subset; +-----------------+ | slave_is_subset | +-----------------+ | 0 | +-----------------+

Hum, it looks like the slave has executed more transactions than the master, this indicates that the slave has executed at least 1 errant transaction. Could we know the GTID of these transactions? Sure, let’s use GTID_SUBTRACT():

select gtid_subtract('8e349184-bc14-11e3-8d4c-0800272864ba:1-29, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9','8e349184-bc14-11e3-8d4c-0800272864ba:1-30, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7') as errant_transactions; +------------------------------------------+ | errant_transactions | +------------------------------------------+ | 8e3648e4-bc14-11e3-8d4c-0800272864ba:8-9 | +------------------------------------------+

This means that the slave has 2 errant transactions.

Now, how can we check errant transactions if the master is not running (like master has crashed, and we want to fail over to one of the slaves)? In this case, we will have to follow these steps:

  • Check all slaves to see if they have executed transactions that are not found on any other slave: this is the list of potential errant transactions.
  • Discard all transactions originating from the master: now you have the list of errant transactions of each slave

Some of you may wonder how you can know which transactions come from the master as it is not available: SHOW SLAVE STATUS gives you the master’s UUID which is used in the GTIDs of all transactions coming from the master.

How to get rid of them?

This is pretty easy, but it can be tedious if you have many slaves: just inject an empty transaction on all the other servers with the GTID of the errant transaction.

For instance, if you have 3 servers, A (the master), B (slave with an errant transaction: XXX:3), and C (slave with 2 errant transactions: YYY:18-19), you will have to inject the following empty transactions in pseudo-code:

# A - Inject empty trx(XXX:3) - Inject empty trx(YYY:18) - Inject empty trx(YYY:19) # B - Inject empty trx(YYY:18) - Inject empty trx(YYY:19) # C - Inject empty trx(XXX:3)


If you want to switch to GTID-based replication, make sure to check errant transactions before any planned or unplanned replication topology change. And be specifically careful if you use a tool that reconfigures replication for you: at the time of writing, only mysqlrpladmin and mysqlfailover from MySQL Utilities can warn you if you are trying to perform an unsafe topology change.

The post Errant transactions: Major hurdle for GTID-based failover in MySQL 5.6 appeared first on MySQL Performance Blog.

Categories: MySQL

Introduction to the Percona Server Audit Log feature

MySQL Performance Blog - Fri, 2014-05-16 13:00

Percona has developed an Audit Log feature that is now included in Percona Server since the recent 5.5 and 5.6 releases. This implementation is alternative to the MySQL Enterprise Audit Log Plugin: Percona re-implemented the Audit Plugin code as GPL as Oracle’s code was closed source. This post is a quick introduction to this plugin.

There are two ways to install the Percona MySQL Audit Plugin:


or in my.cnf

[mysqld] plugin-load=""

Verify installation

mysql> SHOW PLUGINS\G ... *************************** 38. row ***************************   Name: audit_log Status: ACTIVE   Type: AUDIT Library: License: GPL 38 rows in set (0.00 sec)

Let’s see variables provided by the Percona MySQL Audit Plugin:

mysql> show global variables like 'audit%'; +--------------------------+--------------+ | Variable_name            | Value        | +--------------------------+--------------+ | audit_log_buffer_size    | 1048576      | | audit_log_file           | audit.log    | | audit_log_flush          | OFF          | | audit_log_format         | OLD          | | audit_log_policy         | ALL          | | audit_log_rotate_on_size | 0            | | audit_log_rotations      | 0            | | audit_log_strategy       | ASYNCHRONOUS | +--------------------------+--------------+ 7 rows in set (0.00 sec)

The Percona MySQL Audit Plugin can log using the memory buffer to deliver better performance. Messages will be written into memory buffer first and then flushed to file in background. A certain amount of events can be lost in case of server crash or power outage. Another option is to log directly to file without using memory buffer. There is also an option to fsync every event.

Set audit_log_strategy to control log flushing:

  • ASYNCHRONOUS log using memory buffer, do not drop events if buffer is full
  • PERFORMANCE log using memory buffer, drop events if buffer is full
  • SEMISYNCHRONOUS log directly to file, do not fsync every event
  • SYNCHRONOUS log directly to file, fsync every event

audit_log_buffer_size specifies the size of memory buffer, it makes sense only for ASYNCHRONOUS and PERFORMANCE strategy.

Variable audit_log_file specifies the file to log into. It’s value can be path relative to datadir or absolute path.

The Percona MySQL Audit Plugin can automatically rotate log file based on size. Set audit_log_rotate_size to enable this feature. File is rotated when log grew in size to specified amount of bytes. Set audit_log_rotations to limit the number of log files to keep.

It is possible to log only logins or only queries by setting audit_log_policy value.

Log file format
Lets see how audit records look like

OLD format (audit_log_format = OLD):

<AUDIT_RECORD  "NAME"="Connect"  "RECORD"="2_2014-04-21T12:34:32"  "TIMESTAMP"="2014-04-21T12:34:32 UTC"  "CONNECTION_ID"="1"  "STATUS"="0"  "USER"="root"  "PRIV_USER"="root"  "OS_LOGIN"=""  "PROXY_USER"=""  "HOST"="localhost"  "IP"=""  "DB"="" />

NEW format (audit_log_format = NEW):


The difference is that the audit record in the OLD format was written as a single element with attributes, while in the NEW format it is written as a single element with sub-elements.

A good idea of what each sub-element means can be found in Audit Plugin API documentation here:

Lets compare performance of different audit_log_strategy modes. I used readonly sysbench on my laptop for it. Workload is CPU-bound with dataset fit in buffer pool and I set number of sysbench threads to the amount for which count of transactions per seconds is maximum.

I got TPS drop for PERFORMANCE and ASYNCHRONOUS strategies around 7%, 9% for SEMISYNCHRONOUS and 98% for SYNCHRONOUS which shows that syncing every logged statement to disk is not the best thing for performance.


Of course any software has bugs and this plugin has plenty of them. Please give it a try and provide us your feedback. Report any issues here:

The post Introduction to the Percona Server Audit Log feature appeared first on MySQL Performance Blog.

Categories: MySQL

Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation)

MySQL Performance Blog - Fri, 2014-05-16 07:00

Github user Adrianlzt provided a python-twisted alternative version of pyclustercheck per discussion on issue 7.

Due to sporadic performance issues noted with the original implementation in SimpleHTTPserver, the benchmarks which I’ve included as part of the project on github use mutli-mechanize library,

  • cache time 1 sec
  • 2 x 100 thread pools
  • 60s ramp up time
  • 600s total duration
  • testing simulated node fail (always returns 503, rechecks mysql node on cache expiry)
  • AMD FX(tm)-8350 Eight-Core Processor
  • Intel 330 SSD
  • local loop back test (

The SimpleHTTPServer instance faired as follows:

Right away we can see around 500TPS throughput, however as can be seen in both response time graphs there are “outlying” transactions, something is causing the response time to spike dramatically  SimpleHTTPServer, how does the twisted alternative compare? (note the benchmarks are from the current HEAD with features re-added to avoid regression, caching and ipv6 support)


Ouch! We appear to have taken a performance hit, at least in terms of TPS -19% rough estimates however compare the response time graphs to find a much more consistent plot, we had outliers hitting near  70s for SimpleHTTP server, we’re always under 1s within twisted.

Great! So why isn’t this merged into the master branch as the main project and therfor bundled with Percona XtraDB Cluster (PXC)? The issue here is the required version of python-twisted; ipv6 support was introduced in issue 8 by user Nibler999 and to avoid regression I re-added support for ipv6 in this commit for twisted

ipv6 support for python-twisted is not in the version distributed to main server OS’s such as

  • EL6 python-twisted 8.x
  • Ubuntu 10.04 LTS python-twisted 11.x

What’s the issue here? Attempting to bind / listen to an ipv6 interface yields the following error: twisted.internet.error.CannotListenError: Couldn't listen on :::8000: [Errno -9] Address family for hostname not supported.

Due to this regression (breaking of ipv6 support) the twisted version can not at this time be merged into master, the twisted version however as can be seen from above is much more consistent and if you have the “cycles” to implement it (e.g. install twisted from pypy via pip / easy_install to get >= 12.x) and test it’s a promising alternative.

To illustrate this further the benchmark was made more gruling:

  • 5 x 100 thread pools
  • 60s ramp up
  • 600s total duration

First the twisted results, note the initial spike is due to a local python issue where it locked up creating a new thread in multi-mechanize:

Now the SimpleHTTPServer results:

Oh dear, as the load increases clearly we get some stability issues inside SimpleHTTP server…

Also worth noting is the timeouts

  • twisted: grep 'timed out' results.csv | wc -l == 0
  • SimpleHTTPServer: grep 'timed out' results.csv | wc -l == 470


… in the case of increased load the twisted model performs far more consistently under the same test conditions when compared against SimpleHTTPServer. I include the multi-mechanize scripts as part of the project on GitHub – as such you can recreate these tests yourself and gauge the performance to see if twisted or SimpleHTTP suits your needs.

The post Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation) appeared first on MySQL Performance Blog.

Categories: MySQL

High Availability with MySQL Fabric: Part I

MySQL Performance Blog - Thu, 2014-05-15 13:00

In our previous post, we introduced the MySQL Fabric utility and said we would dig deeper into it. This post is the first part of our test of MySQL Fabric’s High Availability (HA) functionality.

Today, we’ll review MySQL Fabric’s HA concepts, and then walk you through the setup of a 3-node cluster with one Primary and two Secondaries, doing a few basic tests with it. In a second post, we will spend more time generating failure scenarios and documenting how Fabric handles them. (MySQL Fabric is an extensible framework to manage large farms of MySQL servers, with support for high-availability and sharding.)

Before we begin, we recommend you read this post by Oracle’s Mats Kindahl, which, among other things, addresses the issues we raised on our first post. Mats leads the MySQL Fabric team.

Our lab

All our tests will be using our test environment with Vagrant (

If you want to play with MySQL Fabric, you can have these VMs running in your desktop following the instructions in the README file. If you don’t want full VMs, our colleague Jervin Real created a set of wrapper scripts that let you test MySQL Fabric using sandboxes.

Here is a basic representation of our environment.

Set up

To set up MyQSL Fabric without using our Vagrant environment, you can follow the official documentation, or check the ansible playbooks in our lab repo. If you follow the manual, the only caveat is that when creating the user, you should either disable binary logging for your session, or use a GRANT statement instead of CREATE USER. You can read here for more info on why this is the case.

A description of all the options in the configuration file can be found here. For HA tests, the one thing to mention is that, in our experience, the failure detector will only trigger an automatic failover if the value for failover_interval in the [failure_tracking] section is greater than 0. Otherwise, failures will be detected and written to the log, but no action will be taken.

MySQL configuration

In order to manage a mysqld instance with MySQL Fabric, the following options need to be set in the [mysqld] section of its my.cnf file:

log_bin gtid-mode=ON enforce-gtid-consistency log_slave_updates

Additionally, as in any replication setup, you must make sure that all servers have a distinct server_id.

When everything is in place, you can setup and start MySQL Fabric with the following commands:

[vagrant@store ~]$ mysqlfabric manage setup [vagrant@store ~]$ mysqlfabric manage start --daemon

The setup command creates the database schema used by MySQL Fabric to store information about managed servers, and the start one, well, starts the daemon. The –daemon option makes Fabric start as a daemon, logging to a file instead of to standard output. Depending on the port and file name you configured in fabric.cfg, this may need to be run as root.

While testing, you can make MySQL Fabric reset its state at any time (though it won’t change existing node configurations such as replication) by running:

[vagrant@store ~]$ mysqlfabric manage teardown [vagrant@store ~]$ mysqlfabric manage setup

If you’re using our Vagrant environment, you can run the script from your host OS (from the root of the vagrant-fabric repo) to do this for you, and also initialise the datadir of the three instances.

Creating a High Availability Cluster:

A High Availability Cluster is a set of servers using the standard Asynchronous MySQL Replication with GTID.

Creating a group

The first step is to create the group by running mysqlfabric with this syntax:

$ mysqlfabric group create <group_name>

In our example, to create the cluster “mycluster” you can run:

[vagrant@store ~]$ mysqlfabric group create mycluster Procedure : { uuid = 605b02fb-a6a1-4a00-8e24-619cad8ec4c7, finished = True, success = True, return = True, activities = }

Add the servers to the group

The second step is add the servers to the group. The syntax to add a server to a group is:

$ mysqlfabric group add <group_name> <host_name or IP>[:port]

The port number is optional and only required if distinct from 3306. It is important to mention that the clients that will use this cluster must be able to resolve this host or IP. This is because clients will connect directly both with MySQL Fabric’s XML-PRC server and with the managed mysqld servers. Let’s add the nodes to our group.

[vagrant@store ~]$ for i in 1 2 3; do mysqlfabric group add mycluster node$i; done Procedure : { uuid = 9d65c81c-e28a-437f-b5de-1d47e746a318, finished = True, success = True, return = True, activities = } Procedure : { uuid = 235a7c34-52a6-40ad-8e30-418dcee28f1e, finished = True, success = True, return = True, activities = } Procedure : { uuid = 4da3b1c3-87cc-461f-9705-28a59a2a4f67, finished = True, success = True, return = True, activities = }

Promote a node as a master

Now that we have all our nodes in the group, we have to promote one of them. You can promote one specific node or you can let MySQL Fabric to choose one for you.

The syntax to promote a specific node is:

$ mysqlfabric group promote <group_name> --slave_uuid='<node_uuid>'

or to let MySQL Fabric pick one:

$ mysqlfabric group promote <group_name>

Let’s do that:

[vagrant@store ~]$ mysqlfabric group promote mycluster Procedure : { uuid = c4afd2e7-3864-4b53-84e9-04a40f403ba9, finished = True, success = True, return = True, activities = }

You can then check the health of the group like this:

[vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

One current limitation of the ‘health’ command is that it only identifies servers by their uuid. To get a list of the servers in a group, along with quick status summary, and their host names, use lookup_servers instead:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster Command : { success = True return = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'SECONDARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node2'}, {'status': 'PRIMARY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}] activities = }

We sent a merge request to use a Json string instead of the “print” of the object in the “return” field from the XML-RPC in order to be able to use that information to display the results in a friendly way. In the same merge, we have added the address of the servers in the health command too.

Failure detection

Now we have the three lab machines set up in a replication topology of one master (the PRIMARY server) and two slaves (the SECONDARY ones). To make MySQL Fabric start monitoring the group for problems, you need to activate it:

[vagrant@store ~]$ mysqlfabric group activate mycluster Procedure : { uuid = 230835fc-6ec4-4b35-b0a9-97944c18e21f, finished = True, success = True, return = True, activities = }

Now MySQL Fabric will monitor the group’s servers, and depending on the configuration (remember the failover_interval we mentioned before) it may trigger an automatic failover. But let’s start testing a simpler case, by stopping mysql on one of the secondary nodes:

[vagrant@node2 ~]$ sudo service mysqld stop Stopping mysqld: [ OK ]

And checking how MySQL Fabric report’s the group’s health after this:

[vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': False, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

We can see that MySQL Fabric successfully marks the server as faulty. In our next post we’ll show an example of this by using one of the supported connectors to handle failures in a group, but for now, let’s keep on the DBA/sysadmin side of things, and try to bring the server back online:

[vagrant@node2 ~]$ sudo service mysqld start Starting mysqld: [ OK ] [vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

So the server is back online, but Fabric still considers it faulty. To add the server back into rotation, we need to look at the server commands:

[vagrant@store ~]$ mysqlfabric help server Commands available in group 'server' are: server set_weight uuid weight [--synchronous] server lookup_uuid address server set_mode uuid mode [--synchronous] server set_status uuid status [--update_only] [--synchronous]

The specific command we need is set_status, and in order to add the server back to the group, we need to change it’s status twice: first to SPARE and then back to SECONDARY. You can see what happens if we try to set it to SECONDARY directly:

[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY Procedure : { uuid = 9a6f2273-d206-4fa8-80fb-6bce1e5262c8, finished = True, success = False, return = ServerError: Cannot change server's (e826d4ab-d889-11e3-86df-0800274fb806) status from (FAULTY) to (SECONDARY)., activities = }

So let’s try it the right way:

[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SPARE Procedure : { uuid = c3a1c244-ea8f-4270-93ed-3f9dfbe879ea, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY Procedure : { uuid = 556f59ec-5556-4225-93c9-b9b29b577061, finished = True, success = True, return = True, activities = }

And check the group’s health again:

[vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

In our next post, when we discuss how to use the Fabric aware connectors, we’ll also test other failure scenarios like hard VM shutdown and network errors, but for now, let’s try the same thing but on the PRIMARY node instead:

[vagrant@node3 ~]$ sudo service mysqld stop Stopping mysqld: [ OK ]

And let’s check the servers again:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster Command : { success = True return = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'PRIMARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node2'}, {'status': 'FAULTY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}] activities = }

We can see that MySQL Fabric successfully marked node3 as FAULTY, and promoted node2 to PRIMARY to resolve this. Once we start mysqld again on node3, we can add it back as SECONDARY using the same process of setting it’s status to SPARE first, as we did for node2 above.

Remember that unless failover_interval is greater than 0, MySQL Fabric will detect problems in an active group, but it won’t take any automatic action. We think it’s a good thing that the value for this variable in the documentation is 0, so that automatic failover is not enabled by default (if people follow the manual, of course), as even in mature HA solutions like Pacemaker, automatic failover is something that’s tricky to get right. But even without this, we believe the main benefit of using MySQL Fabric for promotion is that it takes care of reconfiguring replication for you, which should reduce the risk for error in this process, specially once the project becomes GA.

What’s next

In this post we’ve presented a basic replication setup managed by MySQL Fabric and reviewed a couple of failure scenarios, but many questions are left unanswered, among them:

  • What happens to clients connected with a Fabric aware driver when there is a status change in the cluster?
  • What happens when the XML-RPC server goes down?
  • How can we improve its availability?

We’ll try to answer these and other questions in our next post. If you have some questions of your own, please leave them in the comments section and we’ll address them in the next or other posts, depending on the topic.

The post High Availability with MySQL Fabric: Part I appeared first on MySQL Performance Blog.

Categories: MySQL

Why ALTER TABLE runs faster on Percona Server 5.5 vs. MySQL 5.5

MySQL Performance Blog - Thu, 2014-05-15 10:00

Some of us Perconians are at OpenStack summit this week in Atlanta. Matt Griffin, our director of product management, tweeted about the turbo-hipster CI talk about their experience of ALTER TABLEs running faster on Percona Server. Oracle’s Morgan Tocker then tweeted in response, asking why this was the case. I decided that the simplest way to answer that was here in this post.

The reason for this is the expand_fast_index_creation feature of Percona Server. I did a quick schema change on MySQL 5.5 and Percona Server 5.5 to demonstrate this (in the talk, the speaker mentioned that these versions were used).

The schema modifications in the talk could fall in 2 categories, the ones that could use fast index creation and the ones that could not.

I did the following tests on my laptop, on a sysbench tale with 300k records.

Vanilla MySQL 5.5:

mysql> alter table sbtest1 add index idx_c(c); Query OK, 0 rows affected (4.37 sec)

Percona Server 5.5:

mysql> alter table sbtest1 add index idx_c(c); Query OK, 0 rows affected (3.90 sec)

We know that this used fast index creation from the 0 rows affected. In this case, there is nor substantial difference between the 2 servers, also probably my laptop with CPU frewquency scaling doesn’t have the most consistent performance in the world.

For the second schema change, I added a column which copies the table.

Vanilla MySQL 5.5:

mysql> alter table sbtest1 add column d int default 0; Query OK, 300000 rows affected (37.05 sec) Records: 300000 Duplicates: 0 Warnings: 0

Percona Server 5.5:

mysql> alter table sbtest1 add column d int default 0; Query OK, 300000 rows affected (9.51 sec) Records: 300000 Duplicates: 0 Warnings: 0

The reason for this speed difference is that in case of Percona Server, for the table copy, the table is created only with a primary key, and the secondary indexes are built at the end of the process (rather than on the fly). For more details, check Alexey’s blog post on this topic.

This can be tuned further, by tuning innodb_merge_sort_block_size (in Percona Server 5.6, this is replaced by innodb_sort_buffer_size).

mysql> select @@innodb_merge_sort_block_size/1024/1024; +------------------------------------------+ | @@innodb_merge_sort_block_size/1024/1024 | +------------------------------------------+ | 1.00000000 | +------------------------------------------+ 1 row in set (0.00 sec) mysql> set innodb_merge_sort_block_size=8*1024*1024; Query OK, 0 rows affected (0.00 sec) mysql> alter table sbtest1 add column d int default 0; Query OK, 300000 rows affected (8.61 sec) Records: 300000 Duplicates: 0 Warnings: 0

So, in order to be accurate, schema changes are faster in Percona Server if they are table copies and if the tables have secondary indexes.

The post Why ALTER TABLE runs faster on Percona Server 5.5 vs. MySQL 5.5 appeared first on MySQL Performance Blog.

Categories: MySQL

Tips on benchmarking Go + MySQL

MySQL Performance Blog - Wed, 2014-05-14 16:00

We just released, as an open source release, our new percona-agent (, the agent to work with Percona Cloud Tools. This agent is written in Go.

I will give a webinar titled “Monitoring All MySQL Metrics with Percona Cloud Tools” on June 25 that will cover the new features in percona-agent and Percona Cloud Tools, where I will also explain how it works. You are welcome to register now and join me.

There will be more posts about percona-agent, but in the meantime I want to dedicate this one to Go, Go with MySQL and some performance topics.

I have had an interest in the Go programming language for a long time, but in initial versions I did not quite like the performance of the gorountine scheduler. See my report from more than two years ago on runtime: reduce scheduling contention for large $GOMAXPROCS.

Supposedly this performance issue was fixed in Go 1.1, so this is a good time to revisit my benchmark experiment.

A simple run of prime or fibonachi numbers calculation in N threas is quite boring, so I am going to run queries against Percona Server. Of course it adds some complication as there are more moving parts (i.e. go scheduler, go sql driver, MySQL by itself), but it just makes the experiment more interesting.

Source code of my benchmark: Go-pk-bench:
This is probably not the best example of how to code in Go, but that was not the point of this exercise. This post is really about some tips to take into account when writing an application in Go using a MySQL (Percona Server) database.

So, first, we will need a MySQL driver for Go. The one I used two years ago ( is quite outdated. It seems the most popular choice today is Go-MySQL-Driver, and this is the one we use for internal development. This driver is based on the standard Go “database/sql” package. This package kind of provides a standard Go-way to deal with SQL-like databases. “database/sql” seems to work out OK, with some questionable design decisions as for my taste. So using “database/sql” and Go-MySQL-Driver you will need to deal with some quirks like almost unmanageable connection pool.

The first thing you should take into account it is a proper setting of

If you do not do that, Go scheduler will use the default, which is 1. That binary will use one and only 1 CPU (so much for a modern concurrent language).

The command runtime.GOMAXPROCS(runtime.NumCPU())
will prescribe to use all available CPUs. Always remember to use this if you care about multi-threaded performance.

The next problem I faced in the benchmark is that when I ran queries in a loop, i.e. to repeat as much possible…

rows, err := db.Query("select k from sbtest"+strconv.Itoa(tab+1)+" where id = "+strconv.Itoa(i))

… very soon we ran out of TCP ports. Apparently “database/sql” and Go-MySQL-Driver and its smart connection pool creates a NEW CONNECTION for each query. I can explain why this happens, but using the following statement:


helps (I hope somebody with “database/sql” knowledge will explain what it is doing).

So after these adjustments we now can run the benchmark, which by query you see is quite simple – run primary key lookups against Percona Server which we know scales perfectly in this scenario (I used sysbench to create 64 tables 1mln rows each, all this fits into memory). I am going to run this benchmark with 1, 2, 4, 8, 16, 24, 32, 48, 64 user threads.

Below you can see graphs for MySQL Throughput and CPU Usage (both graph are built using new metrics graphing in Percona Cloud Tools)

MySQL Throughput (user threads are increasing from 1 to 64)

CPU Usage (user threads are increasing from 1 to 64)

I would say the result scales quite nicely, at least it is really much better than it was two years ago. It is interesting to compare with something, so there is a graph from an identical run, but now I will use sysbench + lua for main workload driver.

MySQL Throughput (sysbench, user threads are increasing from 1 to 64)

CPU Usage (sysbench, user threads are increasing from 1 to 64)

From the graphs (this is what I like them for), we can clearly see increases in User CPU utilization (and actually we are able to use CPUs on 100% in user+system usage) and it clearly corresponds to increased throughput.

And if you are a fan of raw numbers:

MySQL Throughput, q/s (more is better) Threads | Go-MySQL | sysbench 1 | 13,189 | 16,765 2 | 26,837 | 33,534 4 | 52,629 | 65,943 8 | 95,553 | 116,953 16 | 146,979 | 182,781 24 | 169,739 | 231,895 32 | 181,334 | 245,939 48 | 198,238 | 250,497 64 | 207,732 | 251,972

(one with a knowledge of Universal Scalability Law can draw a prediction till 1000 threads, I leave it as a homework)

So, in conclusion, I can say that Go+MySQL is able to show decent results, but it is still not as effective as plan raw C (sysbench), as it seems it spends some extra CPU time in system calls.

If you want to try these new graphs in Percona Cloud Tools and see how it works with your system – join the free beta!

The post Tips on benchmarking Go + MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

An interesting case in ORDER BY LIMIT optimization

Sergey Petrunia's blog - Wed, 2014-05-14 14:21

Recently, I was asked about an interesting case in ORDER BY … LIMIT optimization. Consider a table

create table tbl ( … KEY key1(col1, col2), PRIMARY KEY (pk) ) engine=InnoDB;

Consider queries like:

select * from tbl where col1=’foo’ and col2=123 order by pk limit 1; select * from tbl where col1=’bar’ and col2=123 order by pk limit 1;

These run nearly instantly. But, if one combines these two queries with col1='foo' and col1='bar' into one query with col1 IN ('foo','bar'):

select * from tbl where col1 IN (’foo’,'bar’) and col2=123 order by pk limit 1;

then the query is be orders of magnitude slower than both of the queries with col1=const.

The first thing to note when doing investigation is to note that the table uses InnoDB engine, which has extended_keys feature. This means, the index

KEY key1(col1, col2)

is actually

KEY key1(col1, col2, pk)

Once you have that, and also you have col1='foo' AND col2=123 in the WHERE clause, the optimizer is able to see that index `key1` produces records ordered by the `pk` column, i.e. in the order required by the ORDER BY clause. This allows to satisfy the LIMIT 1 part by reading just one row.

Now, if we change col1='foo' into col1 IN('foo','bar'), we will still be able to use index `key1`, but the rows we read will not be ordered by `pk`. They will come in two ordered batches:

'bar', 123, pkX 'bar', 123, pkX+1 'bar', 123, pkX+2 ... 'foo', 123, pkY 'foo', 123, pkY+1 'foo', 123, pkY+2

The query has ORDER BY pk LIMIT 1, but, since the rowset is not ordered by pk, the optimizer will have to read all of the rows, sort them, and find the row with the least value of `pk`.

Now, wouldn’t it be great if the optimizer was aware that the index scan returns two ordered batches? It would be able to read not more than #LIMIT rows from each batch. I can think of two possible execution strategies:

  1. Run something similar to index_merge strategy: start an index scan col1='foo' and an index scan on col1='bar'. Merge the two ordered streams until we’ve found #limit rows. This approach works well when you’re merging a few streams. If there are a lot of streams, the overhead of starting concurrent index scans will start to show up.
  2. Use the same index cursor to #LIMIT rows from the first batch, then from the second, and so forth. Merge these ordered streams using filesort’s merge pass or priority queue. This approach reads more rows than the first one, but we don’t have to create another index cursor.

Now, the question is whether this kind of queries is frequent enough to implement this optimization.

Categories: MySQL

max_allowed_packet and binary log corruption in MySQL

MySQL Performance Blog - Wed, 2014-05-14 08:00

The combination of max_allowed_packet variable and replication in MySQL is a common source of headaches. In a nutshell, max_allowed_packet is the maximum size of a MySQL network protocol packet that the server can create or read. It has a default value of 1MB (<= 5.6.5) or 4MB (>= 5.6.6) and a maximum size of 1GB. This adds some constraints in our replication environment:

  • The master server shouldn’t write events to the binary log larger than max_allowed_packet
  • All the slaves in the replication chain should have the same max_allowed_packet as the master server

Sometimes, even following those two basic rules we can have problems.

For example, there are situations (also called bugs) where the master writes more data than the max_allowed_packet limit causing the slaves to stop working. In order to fix this Oracle created a new variable called slave_max_allowed_packet. This new configuration variable available from 5.1.64, 5.5.26 and 5.6.6 overrides the max_allowed_packet value for slave threads. Therefore, regardless of the max_allowed_packet value the slaves’ threads will have 1GB limit, the default value of slave_max_allowed_packet. Nice trick that works as expected.

Sometimes even with that workaround we can get the max_allowed_packet error in the slave servers. That means that there is a packet larger than 1GB, something that shouldn’t happen in a normal situation. Why? Usually it is caused by a binary log corruption. Let’s see the following example:

Slave stops working with the following message:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master'

The important part is “got fatal error 1236 from master”. The master cannot read the event it wrote to the binary log seconds ago. To check the problem we can:

  • Use mysqlbinlog to read the binary log from the position it failed with –start-position.

This is an example taken from our Percona Forums:

#121003 5:22:26 server id 1 end_log_pos 398528 # Unknown event # at 398528 #960218 6:48:44 server id 1813111337 end_log_pos 1835008 # Unknown event ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1953066613, event_type: 8 DELIMITER ; # End of log file

Check the size of the event, 1953066613 bytes. Or the “Unknown event” messages. Something is clearly wrong there. Another usual thing to check is the server id that sometimes doesn’t correspond with the real value. In this example the person who posted the binary log event confirmed that the server id was wrong.

  • Check master’s error log.

[ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1953066613, event_type: 8

Again, the event is bigger than expected. There is no way the master and slave can read/write it, so the solution is to skip that event in the slave and rotate the logs on the master. Then, use pt-table-checksum to check data consistency.

MySQL 5.6 includes replication checksums to avoid problems with log corruptions. You can read more about it in Stephan’s blog post.


Errors on slave servers about max_allowed_packet can be caused by very different reasons. Although binary log corruption is not a common one, it is something worth checking when you have run out of ideas.

The post max_allowed_packet and binary log corruption in MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Comparing query optimizer features in MariaDB 10.0 and MySQL 5.6

Sergey Petrunia's blog - Tue, 2014-05-13 07:37

MariaDB 10.0 had a stable release last month. It is a good time to take a look and see how it compares to the stable version of MySQL, MySQL 5.6 (as for Percona Server, it doesn’t have its own optimizer features).
Changelogs and release notes have all the details, but it’s difficult to see the big picture. So I went for diagrams, and the result is a short article titled What is the difference between MySQL and MariaDB query optimizers. It should give one a clue about what are the recent developments in query optimizers in MySQL world.

In case you’re interested in details about optimizer features in MariaDB 10.0, I’ve shared slides from a talk about MariaDB 10.0 query optimizer.

Categories: MySQL

Using Percona Server 5.6 with the Docker open-source engine

MySQL Performance Blog - Tue, 2014-05-13 07:00

There are a couple of posts about setting up Percona XtraDB Cluster on Vagrant and Percona Server on MySQL Sandbox – those are two of the top tools used by the Percona Support team for testing and bug processing among other things.

In this post, however, I will show you how to use Docker with Percona Server on Ubuntu 12.04.

As per Docker’s official site:

Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.

Docker containers can encapsulate any payload, and will run consistently on and between virtually any server. The same container that a developer builds and tests on a laptop will run at scale, in production*, on VMs, bare-metal servers, OpenStack clusters, public instances, or combinations of the above.

To install Docker on Ubuntu 12.04 you need to follow instructions from Docker’s official documentation:

After installing Docker, you may either download docker images via ‘docker pull’ and store docker images on your server so you can spin a new docker container in an instance or you may choose to do a ‘docker run’ on the terminal and implicitly download/store the specific docker image from and run the image afterward.

root@Perconallc-Support / # docker images | grep centos centos centos6 0b443ba03958 2 weeks ago 297.6 MB centos latest 0b443ba03958 2 weeks ago 297.6 MB centos 6.4 539c0211cd76 13 months ago 300.6 MB

Let us create a CentOS docker container by running the following command:

root@Perconallc-Support / # docker run -i -t centos:latest bash bash-4.1# cat /etc/redhat-release CentOS release 6.5 (Final)

As you may have noticed, we have just created a new interactive (-i) CentOS 6.5 docker container and ran bash in a single line of command. Detaching from the container is as easy as typing CTRL+p – CTRL+q, you’ll get to your terminal if you typed the right keys.

Verify the active containers:

root@Perconallc-Support / # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 202765d754b7 centos:centos6 bash 11 minutes ago Up 11 minutes elegant_rosalind

To list all existing containers active or not use the following command:

root@Perconallc-Support / # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 202765d754b7 centos:centos6 bash 12 minutes ago Up 12 minutes elegant_rosalind aae47d193a22 centos:6.4 bash 2 hours ago Exited (1) 2 hours ago boring_bardeen

To attach to the active docker container:

root@Perconallc-Support / # docker attach 202765d754b7 bash-4.1#

*Tip: Hit enter twice to get to the container’s bash prompt.*

Install Percona Server 5.6

Now that you have a working docker container you will then have to install the needed packages and repository.

bash-4.1# rpm -Uhv Retrieving Preparing... ########################################### [100%] 1:percona-release ########################################### [100%] bash-4.1# yum install Percona-Server-server-56 Percona-Server-client-56 Percona-Server-shared-56 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: * extras: * updates: Setting up Install Process Resolving Dependencies --> Running transaction check ---> Package Percona-Server-client-56.x86_64 0:5.6.17-rel65.0.el6 will be installed --> Processing Dependency: /usr/bin/perl for package: Percona-Server-client-56-5.6.17-rel65.0.el6.x86_64 ---> Package Percona-Server-server-56.x86_64 0:5.6.17-rel65.0.el6 will be installed --> Processing Dependency: for package: Percona-Server-server-56-5.6.17-rel65.0.el6.x86_64 --> Processing Dependency: for package: Percona-Server-server-56-5.6.17-rel65.0.el6.x86_64 --> Processing Dependency: for package: Percona-Server-server-56-5.6.17-rel65.0.el6.x86_64 ---> Package Percona-Server-shared-56.x86_64 0:5.6.17-rel65.0.el6 will be installed --> Running transaction check ---> Package libaio.x86_64 0:0.3.107-10.el6 will be installed ---> Package perl.x86_64 4:5.10.1-136.el6 will be installed --> Processing Dependency: perl-libs = 4:5.10.1-136.el6 for package: 4:perl-5.10.1-136.el6.x86_64 --> Processing Dependency: perl-libs for package: 4:perl-5.10.1-136.el6.x86_64 --> Processing Dependency: perl(version) for package: 4:perl-5.10.1-136.el6.x86_64 --> Processing Dependency: perl(Pod::Simple) for package: 4:perl-5.10.1-136.el6.x86_64 --> Processing Dependency: perl(Module::Pluggable) for package: 4:perl-5.10.1-136.el6.x86_64 --> Processing Dependency: for package: 4:perl-5.10.1-136.el6.x86_64 --> Running transaction check ---> Package perl-Module-Pluggable.x86_64 1:3.90-136.el6 will be installed ---> Package perl-Pod-Simple.x86_64 1:3.13-136.el6 will be installed --> Processing Dependency: perl(Pod::Escapes) >= 1.04 for package: 1:perl-Pod-Simple-3.13-136.el6.x86_64 ---> Package perl-libs.x86_64 4:5.10.1-136.el6 will be installed ---> Package perl-version.x86_64 3:0.77-136.el6 will be installed --> Running transaction check ---> Package perl-Pod-Escapes.x86_64 1:1.04-136.el6 will be installed --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================================================================== Package Arch Version Repository Size ======================================================================================================================================================================== Installing: Percona-Server-client-56 x86_64 5.6.17-rel65.0.el6 percona 6.8 M Percona-Server-server-56 x86_64 5.6.17-rel65.0.el6 percona 19 M Percona-Server-shared-56 x86_64 5.6.17-rel65.0.el6 percona 714 k Installing for dependencies: libaio x86_64 0.3.107-10.el6 base 21 k perl x86_64 4:5.10.1-136.el6 base 10 M perl-Module-Pluggable x86_64 1:3.90-136.el6 base 40 k perl-Pod-Escapes x86_64 1:1.04-136.el6 base 32 k perl-Pod-Simple x86_64 1:3.13-136.el6 base 212 k perl-libs x86_64 4:5.10.1-136.el6 base 578 k perl-version x86_64 3:0.77-136.el6 base 51 k Transaction Summary ======================================================================================================================================================================== Install 10 Package(s) Total download size: 38 M Installed size: 158 M Is this ok [y/N]: y Downloading Packages: (1/10): Percona-Server-client-56-5.6.17-rel65.0.el6.x86_64.rpm | 6.8 MB 00:02 (2/10): Percona-Server-server-56-5.6.17-rel65.0.el6.x86_64.rpm | 19 MB 00:00 (3/10): Percona-Server-shared-56-5.6.17-rel65.0.el6.x86_64.rpm | 714 kB 00:00 (4/10): libaio-0.3.107-10.el6.x86_64.rpm | 21 kB 00:00 (5/10): perl-5.10.1-136.el6.x86_64.rpm | 10 MB 00:00 (6/10): perl-Module-Pluggable-3.90-136.el6.x86_64.rpm | 40 kB 00:00 (7/10): perl-Pod-Escapes-1.04-136.el6.x86_64.rpm | 32 kB 00:00 (8/10): perl-Pod-Simple-3.13-136.el6.x86_64.rpm | 212 kB 00:00 (9/10): perl-libs-5.10.1-136.el6.x86_64.rpm | 578 kB 00:00 (10/10): perl-version-0.77-136.el6.x86_64.rpm | 51 kB 00:00 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Total 7.4 MB/s | 38 MB 00:05 Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Warning: RPMDB altered outside of yum. ** Found 2 pre-existing rpmdb problem(s), 'yum check' output follows: udev-147-2.51.el6.x86_64 has missing requires of /sbin/service udev-147-2.51.el6.x86_64 has missing requires of MAKEDEV >= ('0', '3.11', None) Installing : Percona-Server-shared-56-5.6.17-rel65.0.el6.x86_64 1/10 Installing : 1:perl-Pod-Escapes-1.04-136.el6.x86_64 2/10 Installing : 1:perl-Module-Pluggable-3.90-136.el6.x86_64 3/10 Installing : 4:perl-libs-5.10.1-136.el6.x86_64 4/10 Installing : 3:perl-version-0.77-136.el6.x86_64 5/10 Installing : 1:perl-Pod-Simple-3.13-136.el6.x86_64 6/10 Installing : 4:perl-5.10.1-136.el6.x86_64 7/10 Installing : Percona-Server-client-56-5.6.17-rel65.0.el6.x86_64 8/10 Installing : libaio-0.3.107-10.el6.x86_64 9/10 Installing : Percona-Server-server-56-5.6.17-rel65.0.el6.x86_64 10/10 ... Installed: Percona-Server-client-56.x86_64 0:5.6.17-rel65.0.el6 Percona-Server-server-56.x86_64 0:5.6.17-rel65.0.el6 Percona-Server-shared-56.x86_64 0:5.6.17-rel65.0.el6 Dependency Installed: libaio.x86_64 0:0.3.107-10.el6 perl.x86_64 4:5.10.1-136.el6 perl-Module-Pluggable.x86_64 1:3.90-136.el6 perl-Pod-Escapes.x86_64 1:1.04-136.el6 perl-Pod-Simple.x86_64 1:3.13-136.el6 perl-libs.x86_64 4:5.10.1-136.el6 perl-version.x86_64 3:0.77-136.el6 Complete! bash-4.1# /etc/init.d/mysql start Starting MySQL (Percona Server). SUCCESS! bash-4.1# mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.6.17-65.0-56 Percona Server (GPL), Release 65.0, Revision 587 Copyright (c) 2009-2014 Percona LLC and/or its affiliates Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>

There you have it, a freshly installed Percona Server 5.6.17 on CentOS 6.5 Linux container. Yay!

You can now create a new MySQL user to access the server outside the container:

mysql> grant all privileges on *.* to 'test_user'@'%' identified by 'test_pass'; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) root@Perconallc-Support / # mysql -h -utest_user -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.6.17-65.0-56 Percona Server (GPL), Release 65.0, Revision 587 Copyright (c) 2009-2014 Percona LLC and/or its affiliates Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>

We’re basically following installation instructions for Percona Server from the documentation.

Just the same way as you would do on any server, you can install other equally important Percona software on the same container to complete your docker environment.

As an added bonus, you can contribute to the Docker Community by commiting your container and pushing it to the docker registry.


I’ve shown you how easy it is to spin a docker container and install Percona Server 5.6 in it. This is by far one of the fastest ways to create test/staging environments to simulate production servers. For further reading you may want to read Docker’s official documentation here.

I hope that this has piqued your interest to try it out yourself. And if that happens we would love to know how Percona Server and other Percona software perform on your docker environment. Cheers!

The post Using Percona Server 5.6 with the Docker open-source engine appeared first on MySQL Performance Blog.

Categories: MySQL

Practical MySQL performance optimization: May 14 Webinar

MySQL Performance Blog - Mon, 2014-05-12 10:00

Achieving the best possible MySQL Performance doesn’t have to be complicated. It’s all about knowing which tools are designed for the task at hand – along with some basic (yet often overlooked) best practices.

Join me Wednesday, May 14 at 10 a.m. Pacific for a free webinar titled, “Practical MySQL performance optimization.” I’ll be sharing the main areas for improving MySQL performance – along with what to specifically focus on in each. These will include:

  • Hardware
  • MySQL Configuration
  • Schema and Queries
  • Application Architecture

And as I mentioned earlier, I’ll also show you the best tools for the job and how to use them efficiently. This will help you optimize your time by focusing on the queries that are most important for your application.

At the end of this webinar, you will know how to optimize MySQL performance in the most practical way possible. The webinar is free but I recommend registering now to reserve your spot. I hope to see you on May 14!

The post Practical MySQL performance optimization: May 14 Webinar appeared first on MySQL Performance Blog.

Categories: MySQL

GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

MySQL Performance Blog - Fri, 2014-05-09 07:00

One of the MySQL 5.6 features many people are interested in is Global Transactions IDs (GTIDs). This is for a good reason: Reconnecting a slave to a new master has always been a challenge while it is so trivial when GTIDs are enabled. However, using GTIDs is not only about replacing good old binlog file/position with unique identifiers, it is also using a new replication protocol. And if you are not aware of it, it can bite.

Replication protocols: old vs new

The old protocol is pretty straightforward: the slave connects to a given binary log file at a specific offset, and the master sends all the transactions from there.

The new protocol is slightly different: the slave first sends the range of GTIDs it has executed, and then the master sends every missing transaction. It also guarantees that a transaction with a given GTID can only be executed once on a specific slave.

In practice, does it change anything? Well, it may change a lot of things. Imagine the following situation: you want to start replicating from trx 4, but trx 2 is missing on the slave for some reason.

With the old replication protocol, trx 2 will never be executed while with the new replication protocol, it WILL be executed automatically.

Here are 2 common situations where you can see the new replication protocol in action.

Skipping transactions

It is well known that the good old SET GLOBAL sql_slave_skip_counter = N is no longer supported when you want to skip a transaction and GTIDs are enabled. Instead, to skip the transaction with GTID XXX:N, you have to inject an empty transaction:

mysql> SET gtid_next = 'XXX:N'; mysql> BEGIN; COMMIT; mysql> SET gtid_next = 'AUTOMATIC';

Why can’t we use sql_slave_skip_counter? Because of the new replication protocol!

Imagine that we have 3 servers like the picture below:

Let’s assume that sql_slave_skip_counter is allowed and has been used on S2 to skip trx 2. What happens if you make S2 a slave of S1?

Both servers will exchange the range of executed GTIDs, and S1 will realize that it has to send trx 2 to S2. Two options then:

  • If trx 2 is still in the binary logs of S1, it will be sent to S2, and the transaction is no longer skipped.
  • If trx 2 no longer exists in the binary logs of S1, you will get a replication error.

This is clearly not safe, that’s why sql_slave_skip_counter is not allowed with GTIDs. The only safe option to skip a transaction is to execute a fake transaction instead of the real one.

Errant transactions

If you execute a transaction locally on a slave (called errant transaction in the MySQL documentation), what will happen if you promote this slave to be the new master?

With the old replication protocol, basically nothing (to be accurate, data will be inconsistent between the new master and its slaves, but that can probably be fixed later).

With the new protocol, the errant transaction will be identified as missing everywhere and will be automatically executed on failover, which has the potential to break replication.

Let’s say you have a master (M), and 2 slaves (S1 and S2). Here are 2 simple scenarios where reconnecting slaves to the new master will fail (with different replication errors):

# Scenario 1

# S1 mysql> CREATE DATABASE mydb; # M mysql> CREATE DATABASE IF NOT EXISTS mydb; # Thanks to 'IF NOT EXITS', replication doesn't break on S1. Now move S2 to S1: # S2 mysql> STOP SLAVE; CHANGE MASTER TO MASTER_HOST='S1'; START SLAVE; # This creates a conflict with existing data! mysql> SHOW SLAVE STATUS\G [...] Last_SQL_Errno: 1007 Last_SQL_Error: Error 'Can't create database 'mydb'; database exists' on query. Default database: 'mydb'. Query: 'CREATE DATABASE mydb' [...]

# Scenario 2

# S1 mysql> CREATE DATABASE mydb; # Now, we'll remove this transaction from the binary logs # S1 mysql> FLUSH LOGS; mysql> PURGE BINARY LOGS TO 'mysql-bin.000008'; # M mysql> CREATE DATABASE IF NOT EXISTS mydb; # S2 mysql> STOP SLAVE; CHANGE MASTER TO MASTER_HOST='S1'; START SLAVE; # The missing transaction is no longer available in the master's binary logs! mysql> SHOW SLAVE STATUS\G [...] Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' [...]

As you can understand, errant transactions should be avoided with GTID-based replication. If you need to run a local transaction, your best option is to disable binary logging for that specific statement:

mysql> SET SQL_LOG_BIN = 0; mysql> # Run local transaction


GTIDs are a great step forward in the way we are able to reconnect replicas to other servers. But they also come with new operational challenges. If you plan to use GTIDs, make sure you correctly understand the new replication protocol, otherwise you may end up breaking replication in new and unexpected ways.

I’ll do more exploration about errant transactions in a future post.

The post GTIDs in MySQL 5.6: New replication protocol; new ways to break replication appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraDB Cluster 5.5.37-25.10 is now available

MySQL Performance Blog - Thu, 2014-05-08 17:40

Percona is glad to announce the release of Percona XtraDB Cluster 5.5.37-25.10 on May 8, 2014. Binaries are available from the downloads area or from our software repositories.

Based on Percona Server 5.5.37-35.0 including all the bug fixes in it, Galera Replicator 2.10 (including fixes in 2.9 and 2.10 milestones) and on wsrep API 25.10 (including fixes in 25.10), Percona XtraDB Cluster 5.5.37-25.10 is now the current stable release. All of Percona‘s software is open-source and free.

New Features:

  • Percona XtraDB Cluster now supports stream compression/decompression with new xtrabackup-sst compressor/decompressor options.
  • Garbd init script and configuration files have been packaged for CentOS and Debian, in addition, in Debian garbd is packaged separately in percona-xtradb-cluster-garbd-2.x package.
  • New meta packages are now available in order to make the Percona XtraDB Cluster installation easier.
  • Debian/Ubuntu debug packages are now available for Galera and garbd.
  • New SST options have been implemented: inno-backup-opts, inno-apply-opts, inno-move-opts which pass options to backup, apply and move stages of innobackupex.
  • Initial configurable timeout, of 100 seconds, to receive a first packet via SST has been implemented, so that if donor dies somewhere in between, joiner doesn’t hang. Timeout can be configured with the sst-initial-timeout variable.
  • The joiner would wait and not fall back to choosing other potential donor nodes (not listed in wsrep_sst_donor) by their state. This happened even when comma was added at the end. This fixes it for that particular case.
  • Support for Query Cache has been implemented.
  • Percona XtraDB Cluster packages are now available for Ubuntu 14.04.

Bugs Fixed:

  • To avoid disabling Parallel Apply in case of SAVEPOINT or ROLLBACK TO SAVEPOINT the wsrep_cert_deps_distance was set to 1.0 at all times. Bug fixed #1277703.
  • First connection would hang after changing the wsrep_cluster_address variable. Bug fixed #1022250.
  • When gmcast.listen_addr variable was set manually it did not allow node’s own address in gcomm address list. Bug fixed #1099478.
  • wsrep_sst_rsync would silently fail on joiner when rsync server port was already taken. Bug fixed #1099783.
  • Server would segfault on INSERT DELAYED with wsrep_replicate_myisam set to 1 due to unchecked dereference of NULL pointer. Bug fixed #1165958.
  • When grastate.dat file was not getting zeroed appropriately it would lead to RBR error during the IST. Bug fixed #1180791.
  • Due to the Provides: line in Percona XtraDB Cluster (which provides Percona-Server-server), the command yum install Percona-Server-server would install Percona XtraDB Cluster instead of the expected Percona Server. Bug fixed #1201499.
  • Replication of partition tables without binlogging enabled failed, partition truncation didn’t work because of lack of TO isolation there. Bug fixed #1219605.
  • Exception during group merge after partitioning event has been fixed. Bug fixed #1232747.
  • Default value for binlog_format is now ROW. This is done so that Percona XtraDB Cluster is not started with wrong defaults leading to non-deterministic outcomes like crash. Bug fixed #1243228.
  • CREATE TABLE AS SELECT was not replicated, if the select result set was empty. Bug fixed #1246921.
  • INSERT would return deadlock instead of duplicate key on secondary unique key collision. Bug fixed #1255147.
  • Joiner node would not initialize storage engines if rsync was used for SST and the first view was non-primary. Bug fixed #1257341.
  • Table level lock conflict resolving was releasing the wrong lock. Bug fixed #1257678.
  • Resolved the perl dependencies needed for Percona XtraDB Cluster 5.5. Bug fixed #1258563.
  • Obsolete dependencies have been removed from Percona XtraDB Cluster. Bug fixed #1259256.
  • GCache file allocation could fail if file size was a multiple of page size. Bug fixed #1259952.
  • Percona XtraDB Cluster didn’t validate the parameters of wsrep_provider_options when starting it up. Bug fixed #1260193.
  • Runtime checks have been added for dynamic variables which are Galera incompatible. Bug fixed #1262188.
  • Node would get stuck and required restart if DDL was performed after FLUSH TABLES WITH READ LOCK. Bug fixed #1265656.
  • xtrabackup-v2 is now used as default SST method in wsrep_sst_method. Bug fixed #1268837.
  • FLUSH TABLES WITH READ LOCK behavior on the same connection was changed to conform to MySQL behavior. Bug fixed #1269085.
  • Read-only detection has been added in clustercheck, which can be helpful during major upgrades (this is used by xinetd for HAProxy etc.) Bug fixed #1269469.
  • Binary log directory is now being cleanup as part of the XtraBackup SST. Bug fixed #1273368.
  • Deadlock would happen when NULL unique key was inserted. Workaround has been implemented to support NULL keys, by using the md5 sum of full row as key value. Bug fixed #1276424.

This release contains almost 100 fixed bugs, complete list of fixed bugs can be found in our release notes.

Please see the 5.5.37-25.10 releases notes for complete details on all of the new features and bug fixes. Help us improve quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraDB Cluster Errata can be found in our documentation.

The post Percona XtraDB Cluster 5.5.37-25.10 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6

MySQL Performance Blog - Thu, 2014-05-08 10:00

Percona XtraDB Cluster 5.6 has been GA for several months now and people are thinking more and more about moving from 5.5 to 5.6. Most people don’t want to upgrade all at once, but would prefer a rolling upgrade to avoid downtime and ensure 5.6 is behaving in a stable fashion before putting all of production on it. The official guide to a rolling upgrade can be found in the PXC 5.6 manual. This blog post will attempt to summarize the basic process.

However, there are a few caveats to trying to do a rolling 5.6 upgrade from 5.5:

  1. If you mix Galera 2 and Galera 3 nodes, you must set wsrep_provider_options=”socket.checksum=1″ on the Galera 3 nodes for backwards compatibility between Galera versions.
  2. You must set some 5.6 settings for 5.5 compatibility with respect to replication:
    1. You can’t enable GTID async replication on the 5.6 nodes
    2. You should use –log-bin-use-v1-row-events on the 5.6 nodes
    3. You should set binlog_checksum=NONE  on the 5.6 nodes
  3. You must not SST a 5.5 donor to a 5.6 joiner as the SST script does not handle mysql_upgrade
  4. You should set the 5.6 nodes read_only and not write to them!
The basic upgrade flow

The basic upgrade flow is:

  1. For some node(s): upgrade to 5.6, do all the above stuff, put them into a read_only pool
  2. Repeat step 1 as desired
  3. Once your 5.6 pool is sufficiently large, cut over writes to the 5.6 pool (turn off read_only, etc.) and upgrade the rest.

This is, in essence, exactly like upgrading a 5.5 master/slave cluster to 5.6 — you upgrade the slaves first, promote a slave and upgrade the master; we just have more masters to think about.

Once your upgrade is fully to 5.6, then you can go back through and remove all the 5.5 backwards compatibility

Why can’t I write to the 5.6 nodes?

The heaviest caveat is probably the fact that in a mixed 5.5 / 5.6 cluster, you are not supposed to write to the 5.6 nodes.  Why is that?  Well, the reason goes back to MySQL itself.  PXC/Galera uses standard RBR binlog events from MySQL for replication.   Replication between major MySQL versions is only ever officially supported:

  • across 1 major version (i.e., 5.5 to 5.6, though multiple version hops do often work)
  • from a lower version master to a higher version slave (i.e., 5.5 is your master and 5.6 is your slave, but not the other way around)

This compatibility requirement (which has existed for a very long time in MySQL) works great when you have a single Master replication topology, but true multi-master (multi-writer) has obviously never been considered.

Some alternativesDoes writing to the 5.6 nodes REALLY break things?

The restriction on 5.6 masters of 5.5 slaves is probably too strict in many cases.  Technically only older to newer replication is ever truly supported, but in practice you may be able to run a mixed cluster with writes to all nodes as long as you are careful.  This means (at least) that any modifications to column type formats in the newer version NOT be upgraded while the old version remains active in the cluster.  There might be other issues, I’m not sure, I cannot say I’ve tested every possible circumstance.

So, can I truly say I recommend this?  I cannot say that officially, but you may find it works fine.  As long as you acknowledge that something unforeseen may break your cluster and your migration plan, it may be reasonable.  If you decide to explore this option, please test this thoroughly and be willing to accept the consequences of it not working before trying it in production!

Using Async replication to upgrade

Another alternative is rather than trying to mix the clusters and keeping 5.6 nodes read_only, why not just setup the 5.6 cluster as an async slave of your 5.5 cluster and migrate your application to the new cluster when you are ready?  This is practically the same as maintaining a split 5.5/5.6 read_write/read_only cluster without so much risk and a smaller list of don’ts.  Cutover in this case would be effectively like promoting a 5.6 slave to master, except you would promote the 5.6 cluster.

One caveat with this approach might be dealing with replication throughput:  async may not be able to keep up replicating your 5.5 cluster writes to a separate 5.6 cluster.  Definitely check out wsrep_preordered to speed things up, it may help.  But realize some busy workloads just may not ever be able to use async into another cluster.

Just take the outage

A final alternative for this post is the idea of simply upgrading the entire cluster to 5.6 all at once during a maintenance window.  I grant that this defeats the point of a rolling upgrade, but it may offer a lot of simplicity in the longer run.


A rolling PXC / Galera upgrade across major MySQL versions is limited by the fact that there is no official support or reason for Oracle to support newer master to older slave.  In practice, it may work much of the time, but these situations should be considered carefully and the risk weighed against all other options.

The post Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6 appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraBackup 2.2.2 beta release is now available

MySQL Performance Blog - Thu, 2014-05-08 09:58

Percona is glad to announce the release of Percona XtraBackup 2.2.2-beta1 on May 8th 2014. Downloads are available from our download site here. This BETA release, will be available in our Debian experimental and CentOS testing repositories.

This is a BETA quality release and it is not intended for production. If you want a high quality, Generally Available release, the current Stable version should be used (Percona XtraBackup 2.1.9 in the 2.1 series at the time of writing).

New Features

  • Percona XtraBackup has now been rebased on MySQL 5.6.17.
  • Percona XtraBackup package is now available for Ubuntu 14.04.

Bugs Fixed

  • Percona XtraBackup couldn’t be built with Bison 3.0. Bug fixed #1262439.
  • The xtrabackup binaries now recognize a new my.cnf option, open_files_limit. The effect is the same as for the server: it changes the maximum number of file descriptors available to the xtrabackup process. The actual limit depends on the platform and ulimit settings. Bug fixed #1183793.
  • If a remote InnoDB tablespace got CREATEd or ALTERed during the backup, an attempt to prepare such a backup later would lead to Percona XtraBackup crash. Bug fixed #1291299.
  • Code in both innobackupex and xtrabackup, that was supposed to make sure no child processes are left running in case innobackupex got killed or failed with an error, relied on the fact the SIGTERM and SIGINT signals were not blocked by the xtrabackup process. However, both SIGTERM and SIGINT might be blocked by the process that had invoked innobackupex, for example, by the Percona XtraDB Cluster server processes doing an SST, in which case they were also blocked by the xtrabackup process, since the signal mask is inherited by child processes. Fixed by replacing SIGTERM in innobackupex and SIGINT in xtrabackup auto-termination with SIGKILL. Bug fixed #1294782.

Release notes with all the bugfixes for Percona XtraBackup 2.2.2 are available in our online documentation. All of Percona‘s software is open source and free, all the details of the release can be found in the 2.2.2 milestone at Launchpad. Bugs can be reported on the launchpad bug tracker.

The post Percona XtraBackup 2.2.2 beta release is now available appeared first on MySQL Performance Blog.

Categories: MySQL
Syndicate content