MySQL

Smallest MySQL Server in the World (under $60) which can even make you toast while fixing MySQL bug #2

MySQL Performance Blog - Wed, 2015-11-25 15:36

Introduction
In my last blog post, Internet of Things, Messaging and MySQL, I have showed how to start your own Internet of Things with Particle Photon board. That implementation is great, but requires constant internet (wi-fi) access as the Particle Photon board does not have any local storage. If you do not have a reliable network access (i.e. in some remote places) or need something really small to store your data you can now use Intel Edison. I’ve even install MySQL on Edison, which makes it the smallest (in size) MySQL server in the world! Other options include:

MySQL Configuration

Intel Edison is a tiny computer based on 22 nm Silvermont dual-core Intel Atom CPU, 500MHz, running Linux (Ubuntu based distribution called Yocto). To program the Edison we will need a breakout board. Options include Arduino compatible breakout board (which includes SD card) and a small Intel breakout board.

The installation and configuration is straightforward. I’ve used the Get Started with Yocto Project on the Intel Edison Board guide to setup and configure the board. First we need to connect to Edison via serial connection and configure sshd and Wi-Fi; when it is done we can connect to Edison using SSH.

The MySQL installation is relatively easy as Linux generic binaries are compatible with Yocto Linux (so you do not have to compile anything). There are 2 challenges though:

  • By default the Yocto linux (as well as the official repository) does not include libraries needed for MySQL: libaio1, libcrypto, libssl
  • The internal storage is tiny and MySQL 5.7 binaries did not even fit into any partition. I had to remove some “tests” and other stuff I do not need. For the real installation one can use SD card (SD slot is available on some boards).

To install the libraries I’ve used the un-official Edison repositories following the excellent guide: Edison package repo configuration. Setup is simple:

To configure your Edison to fetch packages from this repo, replace anything you have in /etc/opkg/base-feeds.conf with the following (other opkg config files don’t need any changes):

src/gz all http://repo.opkg.net/edison/repo/all src/gz edison http://repo.opkg.net/edison/repo/edison src/gz core2-32 http://repo.opkg.net/edison/repo/core2-32

Then we will need to setup the libraries:

# opkg install libaio1_0.3 libcrypto1.0.0 libssl1.0.0

Finally we can download Percona Server 5.6 and place it somewhere (use basedir in my.cnf to point to the installation path):

# wget https://www.percona.com/downloads/Percona-Server-5.6/Percona-Server-5.6.25-73.1/binary/tarball/Percona-Server-5.6.25-rel73.1-Linux.i686.ssl100.tar.gz

Please note that the latest Percona Server 5.6 depends on the Numa library and there is no such library for Yocto (does not make sense for Edison). So 5.6.25 is the latest Percona Server you can install here.

The simple (and rather useless) benchmark on Intel Edison:

root@edison:/usr/local/mysql# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 74 model name : Genuine Intel(R) CPU 4000 @ 500MHz ... mysql> SELECT BENCHMARK(10000000,ENCODE('hello','goodbye')); +-----------------------------------------------+ | BENCHMARK(10000000,ENCODE('hello','goodbye')) | +-----------------------------------------------+ | 0 | +-----------------------------------------------+ 1 row in set (18.77 sec)

Can MySQL make you toast?

The famous MySQL Bug#2, submitted 12 Sep 2002, states that “MySQL Connector/J doesn’t make toast”. With Intel Edison and Arduino compatible breakout board it is now trivial to fix this bug: not only MySQL Connector/J but also MySQL server itself can make you a toast! This can be done via UDF or, in MySQL 5.7, with Query Rewrite Plugins, so you can execute MySQL query:

mysql> make toast;

For the actual implementation you can either “hack” an existing toaster to interface with breakout board pins or use a Arduino compatible Robotic Arm. Ok, MySQL, make me toast!

The post Smallest MySQL Server in the World (under $60) which can even make you toast while fixing MySQL bug #2 appeared first on MySQL Performance Blog.

Categories: MySQL

Amazon Aurora – Looking Deeper

MySQL Performance Blog - Mon, 2015-11-16 21:26

Recently my colleague (by Percona) Yves Trudeau and colleague (by industry) Marco Tusa
published their materials on Amazon Aurora. Indeed, Amazon Aurora is a hot topic these days, and we have a stream of customer inquiries regarding this technology. I’ve decided to form my own opinion, and nothing is better than a personal, hands-on experience, which I am going to share.

The materials I will refer to:

Presentation [1] gives a good general initial overview. However, there is one statement the presenter made I’d like to highlight. It is “Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.”

This does not claim that Amazon Aurora is an open source database, but certainly the goal is to make Amazon look comparable to open source.

I would like to make clear that Amazon Aurora is not open source. Amazon Aurora is based on the Open Source MySQL Edition, but is not open source by itself; it is a proprietary, closed-source database.

By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades. In this regard you are locked-in to Amazon RDS. Though it is not the same lock-in as to a commercial database like Oracle or MS-SQL. You should still be able to relatively easily migrate your application to a community MySQL.

Amazon uses a GPLv2 hole, which allows it to not publish source code in a cloud model. How is Amazon Aurora different from Percona Server or MariaDB? Both of these projects are required to publish their sources. It comes to the distribution model. GPLv2 makes a restriction on a traditional distribution model: if you download software or receive a hard copy of software binaries, you also have rights to request corresponding source code. This is not the case with cloud computing: there you do not download anything, just launch an instance. GPLv2 does not make any restrictions for this case, so Amazon is in compliance with GPLv2.

Bug fixes

Speaking of bug fixes, Amazon Aurora exposes itself as “version: 5.6.10, version_comment: MySQL Community Server (GPL)”. MySQL 5.6.10 was released on 2013-02-05. That was 2.5 year ago. It is not clear if Aurora includes 2.5 years worth of bug fixes and just did not update the version, or if this is really binaries based on a 2.5 year old code base.

For example, let’s take the MySQL bug http://bugs.mysql.com/bug.php?id=70577, which was fixed in 5.6.15. I do not see the bug fix present in Amazon Aurora 5.6.10.

Another bug, http://bugs.mysql.com/bug.php?id=68041, which was fixed in MySQL 5.6.13, but is still present in Amazon Aurora.

What about InnoDB’s code base? The bug, http://bugs.mysql.com/bug.php?id=72548, with InnoDB fulltext search, was fixed in MySQL 5.6.20 (released more than a year ago, on 2014-07-31) and is still present in Amazon Aurora.
This leaves me with the impression that the general Aurora codebase was not updated recently.

Although it seems Amazon changed the innodb_version. Right now it is 1.2.10. A couple of weeks ago it was innodb_version: 1.2.8

My question here, does Amazon have the ability to keep up with MySQL bug fixes and regularly update their software? So far it does not seem so.

Amazon Aurora architecture:

My understanding of Amazon Aurora in simplified form is the following:

That is, all Aurora instances share the same storage, and makes it very easy to start new “Reader” instances over the same data.

Communication between Writer (only 1 Writer allowed) and Readers is done by transferring records similar to InnoDB redo log records. And this really limits how many Writers you can have (only one). I do not believe it is possible to implement a proper transactional coordinator between two Writers based on redo records.

A similar way is used to update data stored on shared storage: Aurora just applies redo log records to data.

So, updating data this way, Aurora is able to:

  • Avoid data writes to storage
  • Avoid binary logs
  • Avoid InnoDB transactional logs
  • Disable doublewrites
  • Disable InnoDB checksums

Aurora makes claims about significant performance improvements, but we need to keep in mind that EACH WRITE goes directly to storage and it has to be acknowledged by 4 out of 6 copies (synchronous writes). Aurora Writer works in some kind of “write through” mode – this is needed, as I understand, to make Reader see changes immediately. I expect it also comes with a performance penalty, so whether the performance gain is bigger than the performance penalty will depend on the workload.

Now, I should give credit to the Amazon engineering team for a proper implementation of shipping and applying transactional redo logs. It must have required a lot of work to change the InnoDB engine, and as we see it took probably a year (from the announcement to the general availability) for Amazon to stabilize Aurora. Too bad Amazon keeps their changes closed, even when the main codebase is an open source database.

Work with transactional isolation

Distributed computing is especially complicated from a transactional processing point of view (see for example a story), so I also wanted to check how Amazon Aurora handles transactional isolation levels on different instances.
Fortunately for Aurora, they have an easy way out, allowing only read statements on Readers, but we still check isolation in some cases.

It seems that the only TRANSACTION ISOLATION level supported is REPEATABLE-READ. When I try to change to SERIALIZABLE or READ-COMMITTED, Aurora accepts this without an error, but silently ignores it. tx_isolation stays REPEATABLE-READ.

mysql> SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @@GLOBAL.tx_isolation, @@tx_isolation; +-----------------------+-----------------+ | @@GLOBAL.tx_isolation | @@tx_isolation | +-----------------------+-----------------+ | REPEATABLE-READ | REPEATABLE-READ | +-----------------------+-----------------+ 1 row in set (0.00 sec) mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @@GLOBAL.tx_isolation, @@tx_isolation; +-----------------------+-----------------+ | @@GLOBAL.tx_isolation | @@tx_isolation | +-----------------------+-----------------+ | REPEATABLE-READ | REPEATABLE-READ | +-----------------------+-----------------+ 1 row in set (0.00 sec)

Actually, there I face another worrisome behaviour: silent changes in Aurora without notification.

I am pretty sure, that when a couple of weeks ago I tried to use SERIALIZABLE level, it failed with an error: “SERIALIZABLE is not supported”. Now it just silently ignores it. So I assume Amazon continues to make changes. I guess this is one of the changes from innodb_version 1.2.8 to 1.2.10. Is there a full Changelog we can see?

The lack of SERIALIZABLE level is not a big problem in my opinion. In the end, we know that Percona XtraDB Cluster does not support it either.

But not being able to use READ-COMMITTED might be an issue for some applications; you need to check if your application is working properly with READ-COMMITTED silently set as REPEATABLE-READ.

I found another unusual behaviour between the reader and writer when I tried to execute ALTER TABLE statement on the Writer (this is another hard area for clusters: to keep a data dictionary synchronized).
Scenario:
READER:
execute long SELECT col1 FROM tab1
WRITER:
while SELECT running, execute ALTER TABLE tab1 ADD COLUMN col2 ;
Effect: SELECT on READER fails immediately with an error: “ERROR 1866 (HY000): Query execution was interrupted on a read-only database because of a metadata change on the master”

So there again I think Aurora does its best given architectural limitations and one-directional communication: it just chooses to kill read statements on Readers.

Query cache
I should highlight improvements to query_cache as a good enhancement. Query cache is enabled by default and Amazon fixed the major issue with MySQL query cache, which is when update queries may stall for a long time waiting on invalidation of query cache entries. This problem does not exist in Amazon Aurora. Also Amazon adjusts query_cache to work properly on Writer-Reader pair. Query_cache on the Reader gets invalidated when data is changed on Writer.

Config review

Let’s make a quick review of the MySQL configuration that Aurora proposes.

These are variables which are set by default and you can’t change:

| innodb_change_buffering | none | | innodb_checksum_algorithm | none | | innodb_checksums | OFF | | innodb_doublewrite | OFF | | innodb_flush_log_at_trx_commit | 1 |

Disabled doublewrite and checksums is not a surprise, I mentioned this above.
Also innodb_flush_log_at_trx_commit is strictly set to 1, I think it is also related to how Aurora deals with InnoDB redo log records.

Disabled innodb_change_buffering is also interesting, and it can’t be good for performance. I guess Amazon had to disable any buffering of updates so changes are immediately written to the shared storage.

Diagnostic

Traditionally RDS does not provide you with good access to system metrics like vmstat and iostat, and it makes troubleshooting quite challenging. MySQL slow-log is also not available, so it leaves us with only PERFORMANCE_SCHEMA (which is OFF by default)

One good thing is that we can access SHOW GLOBAL STATUS, and this can also be used for monitoring software.

Interesting status after a heavy write load, these counters always stay at 0:

| Innodb_buffer_pool_pages_dirty | 0 | | Innodb_buffer_pool_bytes_dirty | 0 | | Innodb_buffer_pool_pages_flushed | 0 | | Innodb_pages_written | 0 |

I think this supports my guess about the architecture, that Aurora does not keep any changed pages in memory and just directly writes everything to storage.

Final thoughts

I will follow up this post with my benchmarks, as I expect the proposed architecture comes with serious performance penalties, although removing binary logs and eliminating doublewrites should show a positive improvement.

In general I think Amazon Aurora is a quite advanced proprietary version of MySQL. It is not revolutionary, however, and indeed not “reimagined relational databases” as Amazon presents it. This technology does not address a problem with scaling writes, sharding and does not handle cross-nodes transactions.

As for other shortcomings, I see there is no public bug database, no detailed user documentation and Aurora’s code is based on old MySQL source code.

The post Amazon Aurora – Looking Deeper appeared first on MySQL Performance Blog.

Categories: MySQL

pt-online-schema-change and innodb_stats_persistent on MySQL 5.6

MySQL Performance Blog - Fri, 2015-11-13 16:46

Recently we released an updated version of our Percona Toolkit with an important update for pt-online-schema-change if you use MySQL 5.6. A new parameter has been added, analyze-before-swap. What does it do? Let me explain it in more detail.

A bug not a bug

A customer contacted us because pt-online-schema-change caused hundred of queries to get stuck after the ALTER was done. All those queries were trying read from the altered table but for some reason the queries never finished. Of course, it caused downtime. The reason behind this is this “bug not a bug”:

http://bugs.mysql.com/bug.php?id=78289

As a summary, if you are running 5.6 with persistent stats enabled (which it is by default), the optimizer in some cases could choose a bad execution plan because it has incorrect statistics to make a good decision. Even simple queries with a WHERE condition trying to find a value on the PK could switch to a full table scan, because the optimizer has no idea there is a PK.

There are two ways to force index calculations:

  • Wait until the background thread recalculates the statistics of the new table. This could take longer than expected if the load on the server is really high.
  • Run analyze table.

So, –analyze-before-swap actually does that. It runs ANALYZE on the new table before the table swap is done. That means that ANALYZE on the new table (before rename) does NOT affect queries of the table we are altering, but could affect the triggers’ operations for the new table.

Analyze table is not the best solution either. Check out this blog post from Peter: https://www.percona.com/blog/2008/09/02/beware-of-running-analyze-in-production/

So in case you want to disable this feature, you would need to add no-analyze-before-swap. If pt-online-schema change finds that the version running is 5.6 and that persistent stats are enabled, then the configuration option will be enabled by default.

The post pt-online-schema-change and innodb_stats_persistent on MySQL 5.6 appeared first on MySQL Performance Blog.

Categories: MySQL

Logical MySQL backup tool Mydumper 0.9.1 now available

MySQL Performance Blog - Thu, 2015-11-12 15:32

The new Mydumper 0.9.1 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

A significant change included in this version now enables Mydumper to handle all schema objects!!  So there is no longer a dependency on using mysqldump to ensure complex schemas are backed up alongside the data.

Let’s review some of the new features:

Full schema support for Mydumper/Myloader

Mydumper now takes care of backing up the schema, including Views and Merged tables. As a result, we now have these new associated options:

-d, --no-data Do not dump table data
-G, --triggers Dump triggers
-E, --events Dump events
-R, --routines Dump stored procedures and functions

These options are not enabled by default to keep backward compatibility with actual mixed solutions using Mysqldump for DDLs.

Locking reduce options

--trx-consistency-only      Transactional consistency only

You can think on this as --single-transaction for mysqldump, but still with binlog position. Obviously this position only applies to transactional tables (TokuDB included).  One of the advantages of using this option is that the global read lock is only held for the threads coordination, so it’s released as soon as the transactions are started.

GTIDs and Multisource Slave 

GTIDs are now recorded on the metadata file.  Also Mydumper is now able to detect a multisource slave (MariaDB 10.1.x) and will record all the slaves coordinates.

Myloader single database restore

Until now the only option was to copy the database files to a different directory and restore from it. However, we now have a new option available:

-s, --source-db                   Database to restore

It can be used also in combination with -B, --database to restore to a different database name.

Full list of Bug Fixes:

#1431410 innodb stats tables
#1440403 *-post and *-triggers compressed files corrupt
#1470891 functions may be needed by SP and views
#1390437 segmentation fault against Percona MySQL 5.6.15-63.0
#1446280 Segmentation fault on Debian Wheezy
#1399715 Typo in –tables-list option in manpage
#1428608 missing -K option in mydumper manpage
#1440437 myloader: wrong database name in message when -B used
#1457091 tokudb detection doesn’t work
#1481747 Unable to compile r179 WITH_BINLOG=ON (undeclared ‘bj’)
#1507574 Assertion when broken mrg tables
#841651 dump view definitions
#1485688 make compile error myloader.c:209

 

The post Logical MySQL backup tool Mydumper 0.9.1 now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Toolkit 2.2.16 is now available

MySQL Performance Blog - Mon, 2015-11-09 21:44

Percona is pleased to announce the availability of Percona Toolkit 2.2.16, released on November 9, 2015. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. It includes new features and bug fixes as well as continued preparation for MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New Features:

  • 1491261: When using MySQL 5.6 or later, and innodb_stats_persistent option is enabled (by default, it is enabled), then pt-online-schema-change will now run with the --analyze-before-swap option. This ensures that queries continue to use correct execution path, instead of switching to full table scan, which could cause possible downtime. If you do not want pt-online-schema-change to run ANALYZE on new tables before the swap, you can disable this behavior using the --no-analyze-before-swap option.
  • 1402051: pt-online-schema-change will now wait forever for slaves to be available and not be lagging. This ensures that the tool does not abort during faults and connection problems on slaves.
  • 1452895: pt-archiver now issues ‘keepalive’ queries during and after bulk insert/delete process that takes a long time. This keeps the connection alive even if the innodb_kill_idle_transaction variable is set to a low value.

Bug Fixes:

  • 1488685: The --filter option for pt-kill now works correctly.
  • 1494082: The pt-stalk tool no longer uses the -warn option when running find, because the option is not supported on FreeBSD.

Details of the release can be found in the release notes and the 2.2.16 milestone on Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.16 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.6.27-75.0 is now available

MySQL Performance Blog - Thu, 2015-11-05 20:45

Percona is glad to announce the release of Percona Server 5.6.27-75.0 on November 5, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.27, including all the bug fixes in it, Percona Server 5.6.27-75.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.27-75.0 milestone on Launchpad.

New Features:

  • Percona Server is now available for Ubuntu 15.10 (Wily).
  • TokuDB MTR tests have been integrated into Percona Server.
  • Linux thread ID is now available in the PROCESSLIST table.
  • Percona Server has now re-enabled savepoints in triggers and stored functions.
  • Variables innodb_buffer_pool_populate and numa_interleave have been mapped to the upstream implementation of the new innodb_numa_interleave option.

Bugs Fixed:

  • Fixed transactional inconsistency with rollback in TokuDB. Rolling back a large transaction with many inserts/updates/deletes could result in some of the changes being committed rather than rolled back. Bug fixed #1512455.
  • Variable tokudb_backup_exclude was not excluding files correctly. Bug fixed #1512457.
  • TokuDB could crash under load if transaction-isolation level READ-COMMITTED was used. Bug fixed #1501633.
  • TokuDB thread pool names were missing in the SHOW ENGINE tokudb STATUS which caused duplicate entries. Bug fixed #1512452.
  • Manipulating the innodb_track_redo_log_now variable dynamically would crash the server if it was started without innodb_track_changed_pages enabled. This variable is available on debug builds only. Bug fixed #1368530.
  • If the user had duplicate pid-file options in config files when running yum upgrade, the upgrade would stop with error because it would think it found the duplicate pid while it was the same pid specified twice. Bug fixed #1454917.
  • On some filesystems server would not start if XtraDB changed page tracking feature was enabled and innodb_flush_method variable was set to O_DIRECT. Bugs fixed #1500720 and #1498891.
  • When User Statistics are enabled, executing any statement of the SHOW family with non-empty result, would bump USER_STATISTICS.ROWS_FETCHED column values erroneously. Bug fixed #1510953.
  • A write operation with innodb_fake_changes enabled could cause a server assertion if it followed the pessimistic B-tree update path internally. Bug fixed #1192898.
  • An online DDL operation could have caused server crash with fake changes enabled. Bug fixed #1226532.
  • Fixed the conflicting meta packages between 5.1, 5.5, and 5.6 release series in Debian and Ubuntu distributions. percona-server-server and percona-server-client meta packages now point to the latest 5.6 release. Bug fixed #1292517.
  • INNODB_CHANGED_PAGES table was unavailable with non-default innodb_data_home_dir setting if the variable had a trailing slash. Bug fixed #1364315.
  • Changing innodb_fake_changes variable value in the middle of a transaction would have an immediate effect, that is, making part of the transaction run with fake changes enabled and the rest with fake changes disabled, resulting in a broken transaction. Fixed by making any innodb_fake_changes value changes becoming effective at the start of the next transaction instead of the next statement. Bug fixed #1395579.
  • UPDATE statement could crash the server with Support for Fake Changes enabled. Bug fixed #1395706.
  • Startup would fail due to a small hard-coded timeout value in the init script for the pid file to appear. This has been fixed by creating default file for Debian init script timeout parameters in etc/default/mysql. Bug fixed #1434022.
  • CMakeLists.txt for tokudb-backup-plugin was missing Valgrind dependency. Bug fixed #1494283.
  • Percona Server would fail to install on CentOS 7 if mariadb-devel package was already installed. Bug fixed #1499721.
  • Fixed suboptimal Support for Fake Changes handling in online ALTER storage engine API. Bug fixed #1204422.
  • The upstream bug #76627 was not fixed for the ALL_O_DIRECT case. Bug fixed #1500741.
  • Fixed multiple TokuDB clang build issues. Bug fixed #1512449.

Other bugs fixed: #1204443, #1384632, #1475117, #1512301, #1452397, #1160960, #1495965, and #1497942.

Release notes for Percona Server 5.6.27-75.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.27-75.0 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.5.46-37.5 is now available

MySQL Performance Blog - Thu, 2015-11-05 20:44


Percona is glad to announce the release of Percona Server 5.5.46-37.5 on November 5, 2015. Based on MySQL 5.5.46, including all the bug fixes in it, Percona Server 5.5.46-37.5 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.46-37.5 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features:

  • Percona Server is now available for Ubuntu 15.10 (Wily).

Bugs Fixed:

  • Manipulating the innodb_track_redo_log_now variable dynamically would crash the server if it was started without innodb_track_changed_pages enabled. This variable is available on debug builds only. Bug fixed #1368530.
  • A potential crash in handling corrupted tables with innodb_corrupt_table_action warn or salvage values has been fixed. Bug fixed #1426610.
  • If the user had duplicate pid-file options in config files when running yum upgrade, the upgrade would stop with error because it would think it found the duplicate pid while it was the same pid specified twice. Bug fixed #1454917.
  • On some filesystems server would not start if XtraDB changed page tracking feature was enabled and innodb_flush_method variable was set to O_DIRECT. Bugs fixed #1500720 and #1498891.
  • When User Statistics are enabled, executing any statement of the SHOW family with non-empty result, would bump USER_STATISTICS ROWS_FETCHED column values erroneously. Bug fixed #1510953.
  • Fixed the conflicting meta packages between 5.1, 5.5, and 5.6 release series in Debian and Ubuntu distributions. percona-server-server and percona-server-client meta packages now point to the latest 5.6 release. Bug fixed #1292517.
  • INNODB_CHANGED_PAGES table was unavailable with non-default innodb_data_home_dir setting if the variable had a trailing slash. Bug fixed #1364315.
  • UPDATE statement could crash the server with Support for Fake Changes enabled. Bug fixed #1395706.
  • Changing innodb_fake_changes variable value in the middle of a transaction would have an immediate effect, that is, making part of the transaction run with fake changes enabled and the rest with fake changes disabled, resulting in a broken transaction. Fixed by making any innodb_fake_changes value changes becoming effective at the start of the next transaction instead of the next statement. Bug fixed #1395579.
  • Startup would fail due to a small hard-coded timeout value in the init script for the pid file to appear. This has been fixed by creating default file for Debian init script timeout parameters in etc/default/mysql. Bug fixed #1434022.
  • Percona Server would fail to install on CentOS 7 if mariadb-devel package was already installed. Bug fixed #1499721.
  • The upstream bug #76627 was not fixed for the ALL_O_DIRECT case. Bug fixed #1500741.

Other bugs fixed: #1512301, #1160960, and #1497942.

Release notes for Percona Server 5.5.46-37.5 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.46-37.5 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL 5.7: CHANGE REPLICATION FILTER online

MySQL Performance Blog - Thu, 2015-11-05 01:16

MySQL 5.7 has a lot of enhancements and new features. I summarized this list previously in this blog post.

Adding replication filters online is one of the MySQL 5.7 features described in this manual. However, I will describe and summarize a few examples in this blog post.

Filtering replication events is also known as Partial Replication. Partial replication can be done from the master or slave. Filtering events on the master server via binlog-do-db and binlog-ignore-db is not a good idea, as explained in this post. However, if you really need it, partial replication on the slave is a better option. Our CEO Peter Zaitsev wrote a detailed blog post some time back on filtered MySQL replication that you may find useful.

Partial replication works differently for statement-based and row-based replication. You can find details in the manual and on this manual page. It is useful to know how MySQL evaluates partial replication rules.

Prior to MySQL 5.7, adding/changing replication rules required bouncing the MySQL server. In MySQL 5.7 adding/changing replication filter rules becomes an online operation without restarting MySQL server, using the CHANGE REPLICATION FILTER command. Let me share a few examples:

Currently, the slave server runs without filtered replication and this can be verified by the slave status and the last five variables. Replicate_* has an empty value, meaning no replication filter rules are set.

mysql> show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 351 Relay_Log_File: centos59-relay-bin.000003 Relay_Log_Pos: 566 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: .

The master database server contains db1-db4. Let’s replicate only db1 and db2 out of the four databases. This can be done with the help of replicate-do-db option.

mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (db1, db2); ERROR 3017 (HY000): This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD first

This error is CHANGE REPLICATION FILTER is not supported on a running slave. We have to stop the SQL slave thread first and re-run the command to set replicate-do-db option. Checking the slave status confirmed that the slave now only replicates db1 and db2 out of the four databases from the master.

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (db1, db2); Query OK, 0 rows affected (0.00 sec) mysql> show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 505 Relay_Log_File: centos59-relay-bin.000003 Relay_Log_Pos: 720 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: db1,db2 Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: .

In order to remove that filter, you need an empty value for the filter name; i.e., replicate-do-db as below. The slave status verified that there are no replication filters set against the replicate-do-db variable.

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (); Query OK, 0 rows affected (0.00 sec) mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 1629 Relay_Log_File: centos59-relay-bin.000003 Relay_Log_Pos: 1844 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: .

Moreover, multiple, different replication filters can be set in one command and should be separated with a comma as below:

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ('db1.db1_new%'), REPLICATE_WILD_IGNORE_TABLE = ('db1.db1_old%'); mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 448 Relay_Log_File: centos59-relay-bin.000006 Relay_Log_Pos: 663 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db1.db1_new% Replicate_Wild_Ignore_Table: db1.db1_old1% .

The slave status verifies that there are a couple of replication filters set where db1.db1_new replicates binary log events to slave, which ignores replication events on the slave for db1.db1_old table(s) as per Replicate_Wild_Ignore_Table filter. Also, if the database or table name doesn’t contain any special characters, then it’s not necessary to be quoted as a value for the filter. However, Replicate_Wild_Do_Table and Replicate_Wild_Ignore_Table are string expressions and may contain wild cards, so they must be quoted.

With the CHANGE REPLICATION FILTER command, you can’t set the same filtering rule multiple times, as opposed to behavior where you can set multiple filters for the same rule in my.cnf file by specifying it multiple times. With the CHANGE REPLICATION FILTER command, if you try to set multiple filters for the same rule then only the last rule will be activated and all above rules will be ignored as illustrated in the following example:

mysql> SELECT * FROM db1.db1_old; Empty set (0.00 sec) mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl1%'), REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl2%'); mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 980 Relay_Log_File: centos59-relay-bin.000006 Relay_Log_Pos: 1195 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db2.db2_tbl2% Replicate_Wild_Ignore_Table: .

As you can see, db2.db2_tbl1 table is ignored and only the last rule for db2.db2_tbl2 table is activated.
As I mentioned before, to unset filters of any given type you need to set that particular filter to an empty value. The below example will unset Replicate_Wild_Do_Table filter.

mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ();

However, you may set multiple filters in one command by separating each rule with a comma as in the previous example with Replicate_Do_DB. Let’s set multiple rules for the Replicate_Wild_Do_Table option via the CHANGE REPLICATION FILTER command.

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl1%','db2.db2_tbl2%'); Query OK, 0 rows affected (0.00 sec) mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000008 Read_Master_Log_Pos: 154 Relay_Log_File: centos59-relay-bin.000013 Relay_Log_Pos: 369 Relay_Master_Log_File: master-bin.000008 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db2.db2_tbl1%,db2.db2_tbl2% Replicate_Wild_Ignore_Table: Last_Errno: 0 . .

Conclusion:
Partial replication is not a great solution in most cases. All of the replicate options replicate-do-table, replicate-ignore-table, replicate-wild-do-table and replicate-wild-ignore-table work differently. You need to use the default database in order to work filtering normally and it behaves differently with a different binlog format. Filters other than replicate-wild-do-table and replicate-wild-ignore-table might not work as expected and events with wild% filters stored procedures and stored functions may be inconsistent.

The post MySQL 5.7: CHANGE REPLICATION FILTER online appeared first on MySQL Performance Blog.

Categories: MySQL

A first look at RDS Aurora

MySQL Performance Blog - Mon, 2015-11-02 20:23

Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.

First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a writer node but you can add up to 15 reader nodes (only one writer though). It is at the storage level that things become interesting. Aurora doesn’t rely on a filesystem type storage, at least not from a database standpoint, it has its own special storage service that is replicated locally and to two other AZ automatically for a total of 6 copies. Furthermore, you pay only for what you use and the storage grows/shrinks automatically in increments of 10 GB, which is pretty cool. You can have up to 64 TB in an Aurora cluster.

Now, all that is fine, but what are the benefits of using Aurora? I must say I barely used Aurora; one week is not a field proven experience. These are claims by Amazon, but, as we will discuss, there are some good arguments in favor of these claims.

The first claim is that the write capacity is increased by up to 4x. So, even if only a single instance is used as writer in Aurora, you get up to 400% the write capacity of a normal MySQL instance. That’s quite huge and amazing, but it basically means replication is asynchronous at the storage level, at least for the multi-AZ part since the latency would be a performance killer. Locally Aurora uses a quorum-based approach with the storage nodes. Given that the object store is a separate service with its own high availability configuration, that is a reasonable trade-off. For example, the clustering solutions with Galera like Percona XtraDB Cluster typically lowers the write capacity since all nodes must synchronize on commit. Other claims are that the readers performance is unaffected by the clustering and that the readers have almost no lag with the writer. Furthermore, as if that is not enough, readers can’t diverge from the master. Finally, since there’s no lag, any readers can replace the writer very quickly, so in terms of failover, all is right.

That seems almost too good to be true; how can it be possible? I happen to be interested in object stores, Ceph especially, and I was toying with the idea of using Ceph to store InnoDB pages. It appears that the Amazon team did a super great job at putting an object store under InnoDB and they went way further than what I was thinking. Here, I may be speculating a bit and I would be happy to be found wrong. The writer never writes dirty pages back to the store… it only writes fragments of InnoDB log to the object store as objects, one per transaction, and notifies the readers of the set of pages that have been updated by this fragment log object. Just have a look at the show global status of an Aurora instance and you’ll see what I mean… Said otherwise, it is like having an infinitely large set of InnoDB log files; you can’t reach the max checkpoint age. Also, if the object store supports atomic operations, there’s no need for the double-write buffer, a high source of contention in MySQL. Just those two aspects are enough, in my opinion, to explain the up to 4x performance claim for the write capacity, but also considering the amount of writes and the log files are a kind of binary diff, that’s usually much less stuff to write than whole pages.

Something is needed to remove the fragment log objects, since over time, the accumulation of these log objects and the need to apply them would impact performance, a phenomenon called log amplification. With Aurora, that seems to be handled at the storage level and the storage system is wise enough to know that a requested page is dirty and apply the log fragments before sending it back to the reader. The shared object store can also explain why the readers have almost no lag and why they can’t diverge. The only lag the readers can have is the notification time which has to be short if within the same AZ.

So, how does Aurora compares to a technology like Galera?

Pros:

  • Higher write capacity, writer is unaffected by the other nodes
  • Simpler logic, no need for certification
  • No need for an SST to provision a new node
  • Can’t diverge
  • Scale iops tremendously
  • Fast failover
  • No need for quorum (handled by the object store)
  • Simple to deploy

Cons:

  • Likely asynchronous at the storage level
  • Only one node is writable
  • Not open source

Aurora is a mind shift in term of database and a jewel in the hands of Amazon. Openstack currently has no database service that can offer similar features. I wonder how hard it would be to produce an equivalent solution using well known opensource components like Ceph for the object store and corosync or zookeeper or zeroMQ or else for the communication layer. Also, would there be a use case?

The post A first look at RDS Aurora appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL and Trojan.Chikdos.A

MySQL Performance Blog - Thu, 2015-10-29 16:23

Symantec published a blog post yesterday regarding MySQL and the Trojan.Chikdos.A as can be seen here

The Symantec post gives detail into the behavior of the Trojan and it’s effects on the Windows system registry, yet gives little detail as to how the required first stage (namely a malicious UDF) is injected, citing:

“In the latest Chikdos campaign that we observed, the attackers likely used an automated scanner or possibly a worm to compromise MySQL servers and install the UDF.”

I’m going to give my thoughts on the pre-requisites to infection here.

  1. The MySQL server has poor network isolation (i.e. is likely accessible from everywhere e.g. shodan.io examples)
  2. The MySQL user has poor (read: overly permissive) grants e.g. “GRANT ALL”
  3. The MySQL @@GLOBAL.plugin_dir is writeable (likely poor file ACL & no D.A.C such as SELinux / AppArmor)

This is pretty much the exact same method I discussed at PLUK13 and updated to be discussed again at PLMCE14. There are YouTube videos embedded in each showing the attack aimed at a Linux deployment. The above 3 pre-requisite points, however, should still hold true on Windows.

So what is the real takeaway here?

  1. Ensure you properly isolate your MySQL service on the network, allowing access from only known hosts which require access.
    1. There really is no reason at all to have the service accessible from everywhere on the Internet
  2. Ensure your user grants follow the Principle of Least Privilege; aka, only grant the required access for the user to do the job they are supposed to be doing.
  3. Never EVER chmod 777 / setenforce 0

I can see that last point raising some arguments; please keep in mind we’re talking about the security of your MySQL data here. Simply turning off security measures “to make it work” is a recipe for disaster.

 

 

The post MySQL and Trojan.Chikdos.A appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL 5.7 first impressions on group-replication

MySQL Performance Blog - Thu, 2015-10-29 13:55

During the last few weeks I’ve been testing and playing a bit with the new group-replication plugin available for MySQL 5.7. Before continuing I’d like to clarify some aspects: the plugin is only available in labs and is not yet ready for production. The current version is 0.6. I used 5.7.9 GA running in a Vagrant 3 nodes cluster with CentOS 7.
As an additional note, I’ve tested previous version of plugin 0.5 against 5.7.8.rc and there are some good changes, so I recommend starting with the GA version.

For the matter of my tests I’ve followed instructions from this post. It’s not as straightforward as it looks; there were some issues that needed to be handled, but I finally managed to get a consistent 3 nodes cluster running:

mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec)

My impressions about installation

1- The previous version relies only on Corosync for group communication, meaning that you need to install and configure an extra piece of software. For those who aren’t very familiar with it (like me), it can be a bit tricky and complex. From version 0.6.0 for MySQL 5.7.9, a new communication engine called XCom was added. This is now the default communication framework which is included in the plugin, so no extra pieces of software are needed.

2- When you initialize the MySQL database (mysqld –initialize function now replaces mysql_install_db script) you need to disable binary logging in the configuration file, otherwise information such as ‘create database mysql’ will be pushed to binary logs and cause issues with nodes joining the cluster due errors like:
2015-10-21T20:18:52.059231Z 8 [Warning] Slave: Can't create database 'mysql'; database exists Error_code: 1007

3- In group replication there isn’t a concept like SST (State Snapshot Transfer) which basically drops and recreates the datadir if it finds data discrepancies. With group replication you may end up having different datasets and replication will continue working (more on this later in the post).

4- For Incremental State Transfer (a.k.a. IST in Galera), group replication trusts in binary logs present in any of the potential donors (at the moment the selection of a donor is done randomly). So, if a node is disconnected, when it comes back online, it requests binary logs from the donor using the same IO thread as regular replication. The problem here is that if the binary log was purged on the donor then the joiner can’t be recovered and a full rebuild is needed. This is a similar approach to the gcache in Galera, but when gcache is not able to provide transactions needed for IST, an SST is performed instead. Group replication can’t do this (yet?).

These are some of the installation issues I faced. Now that we have the cluster running, what works? Well let’s try some samples.

Simple write tests
I tried running simple write operations like a few inserts, create tables and so on using sysbench like this:

sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run

And checked status in the other nodes. It does what it is supposed to do; data and records are found in the rest of nodes just like this:
Node1:

mysql> select * from sbtest1 order by 1 limit 4; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 | | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | | 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 | | 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 4 rows in set (0.08 sec)

Node2

mysql> select * from sbtest1 order by 1 limit 4; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 | | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | | 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 | | 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 4 rows in set (0.08 sec)

Well we expected this so, yay, we are fine.

What about trying to write in 2 nodes at the same time? This should fire things like conflict resolution during certification; in a nutshell, if we expect to use group replication to write in multiple nodes at the same time, we need a way to resolve conflicts with the data. These are most common in PK/UK violations; in other words 2 transactions trying to insert the same record/id. This is not recommended because it is not an approach we can use to scale up writes (same as Galera) but it’s still possible to do.

An easier way to test is to run sysbench in more than one member of a cluster and wait for a failure. As expected, it does what it is supposed to do:
Node1

[root@node1 data]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run |grep tps [ 1s] threads: 8, tps: 0.00, reads: 0.00, writes: 1416.75, response time: 25.65ms (95%), errors: 0.00, reconnects: 0.00 [ 2s] threads: 8, tps: 0.00, reads: 0.00, writes: 1893.78, response time: 20.94ms (95%), errors: 0.00, reconnects: 0.00 [ 3s] threads: 8, tps: 0.00, reads: 0.00, writes: 1421.19, response time: 28.44ms (95%), errors: 0.00, reconnects: 0.00 [ 4s] threads: 8, tps: 0.00, reads: 0.00, writes: 1397.83, response time: 34.92ms (95%), errors: 0.00, reconnects: 0.00 [ 5s] threads: 8, tps: 0.00, reads: 0.00, writes: 1734.31, response time: 22.75ms (95%), errors: 0.00, reconnects: 0.00

Node2

[root@node2 vagrant]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run WARNING: Both max-requests and max-time are 0, running endless test sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 8 Report intermediate results every 1 second(s) Random number generator seed is 0 and will be ignored Threads started! ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12608`: ALERT: Error 1180 Got error 149 during COMMIT FATAL: failed to execute function `event': (null) ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12468`: ALERT: Error 1180 Got error 149 during COMMIT FATAL: failed to execute function `event': (null)

Node2 eventually failed, but what happened? Let’s check the error log to see what’s reporting:

2015-10-27T17:12:33.894759Z 3 [Note] InnoDB: Blocking transaction: ID: 223804 - Blocked transaction ID: 223820 - MySQL thread id 59, OS thread handle 139838350866176, query id 197913 localhost root query end UPDATE sbtest5 SET k=k+1 WHERE id=12510

At the commit stage there was a conflict with an already committed transaction in Node1, so it forced a failure and a rollback of the operation. So far, so good.

What about a node going down?
One of the tests I ran was to kill one of the nodes during the operations to see if it resumes replication properly when back to life. For this we need to set up some variables in the configuration file as follows:

loose-group_replication_group_name="8a94f357-aab4-11df-86ab-c80aa9429562" loose-group_replication_start_on_boot=1 loose-group_replication_recovery_user='rpl_user' loose-group_replication_recovery_password='rpl_pass'

Note: This is interesting, that the replication credentials are not saved into a table (as is done with slave_master_info in regular replication). I guess this is part of a to do section, but it’s something to keep in mind since this implies a security risk.

Back to our test. I ran the regular sysbench command in one of my nodes and then went to node2 and killed mysql daemon. After the regular crash recovery messages we can see:

[root@node2 data]# killall mysqld [root@node2 data]# tail -500 error.log 2015-10-27T17:15:26.460674Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active. 2015-10-27T17:15:26.460711Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active. 2015-10-27T17:15:26.461001Z 0 [Note] InnoDB: Waiting for purge to start 2015-10-27T17:15:26.514015Z 0 [Note] InnoDB: 5.7.9 started; log sequence number 77370253 2015-10-27T17:15:26.515417Z 0 [Note] Plugin 'FEDERATED' is disabled. 2015-10-27T17:15:26.525466Z 0 [Note] InnoDB: not started 2015-10-27T17:15:26.525914Z 0 [Note] InnoDB: Loading buffer pool(s) from /data/ib_buffer_pool [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configuring Xcom group: XCom Group ID=1827295128 Name=8a94f357-aab4-11df-86ab-c80aa9429562 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.2:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.3:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Total number of peers: 2 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Local Node: 192.168.70.3:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Bootstrap: false 2015-10-27T17:15:26.723392Z 0 [Note] InnoDB: Buffer pool(s) load completed at 151027 17:15:26 2015-10-27T17:15:27.135089Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key 2015-10-27T17:15:27.136449Z 0 [Note] Server hostname (bind-address): '*'; port: 3306 2015-10-27T17:15:27.145198Z 0 [Note] IPv6 is available. 2015-10-27T17:15:27.145247Z 0 [Note] - '::' resolves to '::'; 2015-10-27T17:15:27.145265Z 0 [Note] Server socket created on IP: '::'. 2015-10-27T17:15:27.171615Z 0 [Warning] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.171711Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.172447Z 0 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.173089Z 0 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.192881Z 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.205764Z 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=node2-relay-bin' to avoid this problem. 2015-10-27T17:15:27.676222Z 1 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:27.685374Z 3 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_applier.000002' position: 51793711 2015-10-27T17:15:27.685985Z 0 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2015-10-27T17:15:27.686009Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7' 2015-10-27T17:15:27.686017Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 2' [XCOM BINDING DEBUG] ::join() connecting to 192.168.70.3 10300 state 0 action xa_init connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 ... [XCOM BINDING DEBUG] ::join():: I am NOT the boot node. [XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.2:10300 connecting to 192.168.70.2 10300 connected to 192.168.70.2 10300 [XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node cli_err 0 state 3489 action xa_snapshot new state x_recover state 3505 action xa_complete new state x_run get_nodeno(get_site_def()) = 2 task_now() = 1445966128.920615 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12385 1} [XCOM BINDING DEBUG] ::join():: GCS_OK [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. 2015-10-27T17:15:28.926806Z 0 [Note] Event Scheduler: Loaded 0 events 2015-10-27T17:15:28.928485Z 0 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.7.9-log' socket: '/data/mysql.sock' port: 3306 MySQL Community Server (GPL) [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. .... [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data 2015-10-27T17:15:30.084101Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:7' [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 1382963399 2015-10-27T17:15:30.091414Z 5 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/86400' 2015-10-27T17:15:30.107078Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:30.117379Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor e5263489-7cb7-11e5-a8ee-0800275ff74d at node3 port: 3306.' 2015-10-27T17:15:30.118109Z 6 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-10-27T17:15:30.130001Z 7 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_recovery.000001' position: 4 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 2015-10-27T17:15:30.169817Z 6 [ERROR] Slave I/O for channel 'group_replication_recovery': error connecting to master 'rpl_user@node3:3306' - retry-time: 60 retries: 1, Error_code: 1130 2015-10-27T17:15:30.169856Z 6 [Note] Slave I/O thread for channel 'group_replication_recovery' killed while connecting to master 2015-10-27T17:15:30.169862Z 6 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4 2015-10-27T17:15:30.174955Z 5 [ERROR] Plugin group_replication reported: 'There was an error when connecting to the donor server. Check group replication recovery's connection credentials.' 2015-10-27T17:15:30.175573Z 5 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/86400' 2015-10-27T17:15:30.178016Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:30.189233Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor a0ef74a1-7cb3-11e5-845e-0800275ff74d at node1 port: 3306.' 2015-10-27T17:15:30.190787Z 8 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-10-27T17:15:30.610531Z 8 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_user@node1:3306',replication started in log 'FIRST' at position 4 get_nodeno(get_site_def()) = 2 task_now() = 1445966131.000425 n = 74 (n - old_n) / (task_now() - old_t) = 11.136156

During this process we can check the status in any node as follows:

mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | RECOVERING | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec)

Again, as expected, the node connected to the cluster, requested binary logs from latest GTID executed position and applied remaining changes to be back online.

The final test I’ve done so far is about data consistency. For example, what if I stop group replication in a node and make some data changes? When it gets back to replication will it send these changes?

Let’s see a very simple example:
Node2:

mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.12 sec)

Node1:

mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> stop group_replication; Query OK, 0 rows affected (0.03 sec) mysql> delete from sbtest1 where id=15; Query OK, 1 row affected (0.02 sec) mysql> start group_replication; Query OK, 0 rows affected (0.02 sec) mysql> select * from sbtest1 where id=15; Empty set (0.00 sec)

And now Node2 again:

mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.00 sec)

Hmmmm, not cool, what if I try to remove a row from Node2?

mysql> delete from sbtest1 where id=15; Query OK, 1 row affected (0.01 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec)

Hmmmm, strange, everything seems to be working correctly. Is it? Let’s check node1 again:

mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.01 sec) mysql> stop group_replication; Query OK, 0 rows affected (4.01 sec) mysql> start group_replication; Query OK, 0 rows affected (2.41 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | OFFLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 1 row in set (0.00 sec) mysql> exit Bye [root@node1 data]# tail -100 error.log 2015-10-27T17:52:50.075274Z 15 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Delete_rows event on table test.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 346, Error_code: 1032 2015-10-27T17:52:50.075274Z 15 [Warning] Slave: Can't find record in 'sbtest1' Error_code: 1032 2015-10-27T17:52:50.075274Z 15 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0 2015-10-27T17:52:50.075294Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2015-10-27T17:52:50.075308Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7' 2015-10-27T17:52:50.075312Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1' [XCOM BINDING DEBUG] ::join() connecting to 192.168.70.2 10300 state 3489 action xa_init connecting to 192.168.70.2 10300 .... XCOM BINDING DEBUG] ::join():: I am NOT the boot node. [XCOM BINDING DEBUG] ::join():: Skipping own address. [XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.3:10300 connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 [XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node cli_err 0 state 3489 action xa_snapshot new state x_recover state 3505 action xa_complete new state x_run [XCOM BINDING DEBUG] ::join():: GCS_OK [XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 0 [XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: node set: peer: 0 flag: 1 peer: 1 flag: 1 peer: 2 flag: 1 [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: Processing new view on Handler [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: My node_id is 2 [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: State Exchange started. get_nodeno(get_site_def()) = 2 task_now() = 1445968372.450627 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12585 1} [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 1 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 2 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12586 message_id.node= 0 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data 2015-10-27T17:52:52.455340Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:11' 2015-10-27T17:52:52.456474Z 16 [ERROR] Plugin group_replication reported: 'Can't evaluate the group replication applier execution status. Group replication recovery will shutdown to avoid data corruption.' 2015-10-27T17:52:52.456503Z 16 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.' [XCOM BINDING DEBUG] ::leave() [XCOM BINDING DEBUG] ::leave():: Skipping own address. [XCOM BINDING DEBUG] ::leave():: xcom_client_open_connection to 192.168.70.3:10300 connecting to 192.168.70.3 10300 [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806 connected to 192.168.70.3 10300 [XCOM BINDING DEBUG] ::leave():: Calling xcom_client_remove_node cli_err 0 handle_remove_node /export/home2/pb2/build/sb_0-16846472-1445524610.82/build/BUILD/mysql-server/plugin/group_replication/gcs/src/bindings/xcom/xcom/xcom_base.c:1987 nodes: 0x3d05fa8 nodes->node_list_len = 1 nodes->node_list_val: 0x3da7da0 node_address n.address: 0x3d238d0 192.168.70.2:10300 getstart group_id 84bca5ce state 3551 action xa_terminate new state x_start state 3489 action xa_exit Exiting xcom thread new state x_start [XCOM BINDING DEBUG] ::leave():: Installing Leave view [XCOM_BINDING_DEBUG] ::install_view():: No exchanged data [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806 [XCOM BINDING DEBUG] ::leave():: Exiting with error=GCS_OK

So it looks like a member that has data inconsistencies might be reported as ONLINE erroneously, but whenever group replication is restarted it will fail and won’t be able to join to the cluster. It seems there should be better error handling when a data inconsistency is found.

What about the operational perspective?
It looks very limited, just a few variables and status counters, plus some status tables in performance schema as follows:

mysql> show global variables like '%group_repli%'; +---------------------------------------------------+---------------------------------------+ | Variable_name | Value | +---------------------------------------------------+---------------------------------------+ | group_replication_allow_local_lower_version_join | OFF | | group_replication_auto_increment_increment | 7 | | group_replication_bootstrap_group | OFF | | group_replication_components_stop_timeout | 31536000 | | group_replication_gcs_engine | xcom | | group_replication_group_name | 8a94f357-aab4-11df-86ab-c80aa9429562 | | group_replication_local_address | 192.168.70.3:10300 | | group_replication_peer_addresses | 192.168.70.2:10300,192.168.70.3:10300 | | group_replication_pipeline_type_var | STANDARD | | group_replication_recovery_complete_at | TRANSACTIONS_CERTIFIED | | group_replication_recovery_password | | | group_replication_recovery_reconnect_interval | 60 | | group_replication_recovery_retry_count | 86400 | | group_replication_recovery_ssl_ca | | | group_replication_recovery_ssl_capath | | | group_replication_recovery_ssl_cert | | | group_replication_recovery_ssl_cipher | | | group_replication_recovery_ssl_crl | | | group_replication_recovery_ssl_crlpath | | | group_replication_recovery_ssl_key | | | group_replication_recovery_ssl_verify_server_cert | OFF | | group_replication_recovery_use_ssl | OFF | | group_replication_recovery_user | rpl_user | | group_replication_start_on_boot | ON | +---------------------------------------------------+---------------------------------------+ 24 rows in set (0.00 sec) mysql> show global status like '%group_repli%'; +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | Com_group_replication_start | 8 | | Com_group_replication_stop | 7 | +-----------------------------+-------+ 2 rows in set (0.01 sec) mysql> show tables from performance_schema like '%group%'; +----------------------------------------+ | Tables_in_performance_schema (%group%) | +----------------------------------------+ | replication_group_member_stats | | replication_group_members | +----------------------------------------+ 2 rows in set (0.00 sec)

Most of values above are self-descriptive. I still need to dig into it a bit more to find the function for some of them.

Conclusions:

So far the work done with group replication is very impressive. Of course there is still a long road to travel, but it doesn’t look to be fair to compare group replication against Galera, unless it is not a side by side comparison.

Even if I like the idea of using a legacy component, I don’t like the need to install and configure Corosync because it’s another piece of software that could eventually fail. Fortunately this can be avoided with the newer version of the plugin, which can use the new XCom communication framework. I tested both versions and using XCom is far easier to setup and configure; however, the error log file can become very verbose, maybe too verbose in my opinion.
With regards to installation and configuration it’s pretty easy once you find the proper way to do it. There are few variables to configure to have a working cluster and most of the settings works just fine by default (like group_replication_auto_increment_increment).

I would still like to have some automatic control on data inconsistency handling (like SST in Galera), but in my opinion this new feature can be a good approach to consider in the future when looking for high availability solutions. A lot of tests need to be done and I’d also like to see some benchmarks. These are just my first impressions and we should wait some time before seeing this feature as GA. Paraphrasing that song “it’s a long way to the top if you wanna rock ‘n’ roll.”

The post MySQL 5.7 first impressions on group-replication appeared first on MySQL Performance Blog.

Categories: MySQL

How Big Can Your Galera Transactions Be

MySQL Performance Blog - Mon, 2015-10-26 18:17

While we should be aiming for small and fast transactions with Galera, it is always possible at some point you might want a single large transaction, but what is involved?

First, this is supposed to be controlled by two settings, wsrep_max_ws_rows  and wsrep_max_ws_size . The first variable is not yet enforced and has no effect – see here and here – so don’t bother tuning this knob just yet. In my opinion, I would rather implement only one – having a limit by rows is hard to control as a DBA since each row’s size can be very different per workload.

The second variable restricts the writeset size in bytes and has better control on cluster performance. If your network and CPU can only process N amount of bytes per second, this is a good variable to enforce. Additionally, the maximum allowed value for this setting is only 2GB, with a default of 1GB. You can actually set this higher than 2GB, but only the 2GB is being enforced in my tests.

Let’s see what happens when we adjust this setting to large values. I have a sysbench table with 10M rows below and will update all rows in a single transaction.

-rw-rw---- 1 mysql mysql 8.5K Oct 20 03:25 t.frm -rw-rw---- 1 mysql mysql 2.4G Oct 20 03:46 t.ibd node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*2; Query OK, 0 rows affected (0.00 sec) node1 mysql> update t set k=k+1; ERROR 1180 (HY000): Got error 5 during COMMIT

On the MySQL error log this translates to:

2015-10-20 04:27:14 10068 [Warning] WSREP: transaction size limit (2147483648) exceeded: 2147516416 2015-10-20 04:27:14 10068 [ERROR] WSREP: rbr write fail, data_len: 0, 2

Let’s see what happens if we set this to 4GB:

node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*4; Query OK, 0 rows affected, 1 warning (0.00 sec) node1 mysql> show warnings; +---------+------+-----------------------------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------------------------+ | Warning | 1292 | Truncated incorrect wsrep_max_ws_size value: '4294967296' | +---------+------+-----------------------------------------------------------+ 1 row in set (0.00 sec) node1 mysql> select @@wsrep_max_ws_size; +---------------------+ | @@wsrep_max_ws_size | +---------------------+ | 4294901759 | +---------------------+ 1 row in set (0.00 sec)

We got a warning that it exceeds the allowed value, not the 2GB, but something else a bit lower than the 32bit UNSIGNED MAX VALUE. So let’s try the transaction again:

node1 mysql> update t set k=k+1; ERROR 1180 (HY000): Got error 5 during COMMIT

We clearly got the same error on the client, and on the MySQL error log, it’s a bit different:

2015-10-20 04:54:14 10068 [ERROR] WSREP: Maximum writeset size exceeded by 1737995426: 90 (Message too long) at galera/src/write_set_ng.hpp:check_size():662 2015-10-20 04:54:14 10068 [ERROR] WSREP: transaction size exceeded

If you really must process a large number of rows, one way to try and reduce the amount of writeset size is to set binlog_row_image  to minimal . Let’s test this again with a 2GB wsrep_max_ws_size  and see how it goes:

node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*2; Query OK, 0 rows affected (0.00 sec) node1 mysql> select @@wsrep_max_ws_size; +---------------------+ | @@wsrep_max_ws_size | +---------------------+ | 2147483648 | +---------------------+ 1 row in set (0.00 sec) node1 mysql> SET GLOBAL binlog_row_image=minimal; Query OK, 0 rows affected (0.00 sec) node1 mysql> select @@binlog_row_image; +--------------------+ | @@binlog_row_image | +--------------------+ | MINIMAL | +--------------------+ 1 row in set (0.00 sec) node1 mysql> show global status like 'wsrep_replicated_bytes'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | wsrep_replicated_bytes | 13211964556 | +------------------------+-------------+ 1 row in set (0.00 sec) node1 mysql> update t set k=k+1; Query OK, 10000000 rows affected (11 min 18.33 sec) Rows matched: 10000000 Changed: 10000000 Warnings: 0 node1 mysql> show global status like 'wsrep_replicated_bytes'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | wsrep_replicated_bytes | 13402597135 | +------------------------+-------------+ 1 row in set (0.00 sec) node1 mysql> select (13402597135-13211964556)/1024/1024 as ws_size; +--------------+ | ws_size | +--------------+ | 181.80139446 | +--------------+ 1 row in set (0.01 sec)

In our tests, we reduced it to at least 10x the original writeset size. Thanks to my colleague Przemyslaw for pointing this out.

Now we know how big we can go in terms of size and how to go further with minimal row image, although this does not mean that you can and should be allowing it. Make sure to set a sane limit on the sizes depending on your workload and realistic performance expectations. Large transactions will not only cause unexpected performance issues with your cluster, but will lead to usability issues as well in terms of increased deadlocks. Lastly, make sure to review what limitations there would be when switching to Galera-based clusters here for an enjoyable experience :).

The post How Big Can Your Galera Transactions Be appeared first on MySQL Performance Blog.

Categories: MySQL

State of Percona Server 5.6, MySQL 5.6 and MySQL 5.7 RC

MySQL Performance Blog - Mon, 2015-10-26 14:39

This week Oracle will release MySQL 5.7 GA, so it’s a perfect time to do a quick review of the current state of Percona Server 5.6.26, MySQL 5.6.26 and MySQL-5.7.8 RC. We used two boxes from our benchmark lab for this:
– Box 1: 16 Cores+HT (32 virt cores)/fast PCIe ssd card/RAM: 192GB
– Box 2: 24 Cores+HT(48 virt cores)/fast PCIe ssd card/RAM: 128GB

Dataset: sysbench/uniform, 32 tables with 12M rows each, ~95GB
Tests: sysbench – point select, oltp read only, oltp read/write
Test sequence: start server, warmup, series of the tests (each lasts 5 minutes) from 1 to 4096 threads

Tests were run for two setups:
– CPU bound (in memory) – innodb_buffer_pool_size=100GB
– IO bound – innodb_buffer_pool_size=25GB

Observations:

* CPU bound
– It’s clear that MySQL 5.7 RC, in both read-only scenarios (adhoc and transaction), outperforms MySQL 5.6/Percona Server 5.6 and scales very well up to 4k threads, especially on Box 2 with 48 cores. It shows great improvements over 5.6 in the read only scalability area. In the read-write scenario there are still some problems with the 5.7 RC. It shows a stable result on the 16 core box, but notably degrades for high threads on Box 2 with 48 cores. Percona Server 5.6 is OK up to 1024/2048 threads for both types of boxes, and then tps drops as well. MySQL 5.6 in this test scales up to 512 threads only and then tps dramatically decreases.

In general, in the CPU-bound scenario, 5.7 RC on Box 1 with 16 cores showed a bit worse results than 5.6. It looks like it is limited by something, and this may require additional analysis. We will recheck that after GA.

* IO bound
– Again, 5.7 RC shines in read-only scenarios. For Box 1 with 32 cores, Percona Server 5.6 competes with 5.7 RC, but on Box 2 with 48 cores the difference is quite notable with higher threads. Read/write workload in the IO-bound scenario is the most problematic case for 5.7 – it shows an almost similar pattern to MySQL 5.6 on Box 1 and is slightly better on Box 2. We have checked that case with Performance Schema for all 3 servers on each box and according to that, (see charts below) the most notable waits for 5.7 are caused by a doublewrite mutex. MySQL 5.6 is affected by contention of the buffer pool mutex and for Percona Server 5.6 log_sys mutex is the hottest one.

Charts with mutex info above are for the OLTP_RW test for the runs with 64 and 1024 threads for Percona Server 5.6.26/MySQL 5.7.8/MySQL 5.6.26
mysql server settings

innodb_log_file_size=10G innodb_doublewrite=1 innodb_flush_log_at_trx_commit=1 innodb_buffer_pool_instances=8 innodb_change_buffering=none innodb_adaptive_hash_index=OFF innodb_flush_method=O_DIRECT innodb_flush_neighbors=0 innodb_read_io_threads=8 innodb_write_io_threads=8 innodb_lru_scan_depth=8192 innodb_io_capacity=15000 innodb_io_capacity_max=25000 loose-innodb-page-cleaners=4 table_open_cache_instances=64 table_open_cache=5000 loose-innodb-log_checksum-algorithm=crc32 loose-innodb-checksum-algorithm=strict_crc32 max_connections=50000 skip_name_resolve=ON loose-performance_schema=ON loose-performance-schema-instrument='wait/synch/%=ON',

The post State of Percona Server 5.6, MySQL 5.6 and MySQL 5.7 RC appeared first on MySQL Performance Blog.

Categories: MySQL

Internet of Things, Messaging and MySQL

MySQL Performance Blog - Fri, 2015-10-23 20:56

Personal Projects with the Internet of Things

So you want to do a personal project with the Internet of Things (maybe a home automation or metrics collection or something else)? In this blog post I will tell about my recent experience with this. I will give a talk on this topic at Oracle OpenWord 2015 (Tuesday, Oct 27, 6:15 p.m., Moscone South, 274).

Here is what I did: I was looking for the best place to plant some trees in my backyard. I learned that I need to know how much sun I have in the backyard before planting (some trees need full sun, while others need partial sun or shade). So I needed to measure the sunlight in different spots.

The “old” way of doing it was to use Arduino or Raspberry Pi. (Of course this can be done by just manually checking the spot every hour and writing the measurements on a piece of paper, but this was out of the question.) Both solutions require additional components (i.e. Wi-Fi module for Arduino or connecting Raspberry Pi to Arduino).

As the Internet of Things is getting more and more popular for personal projects, there are a number of new low-priced and easy-to-use hardware solutions. I decided to use a Particle Photon board.

Particle Photon

Major advantages:

  • Wi-Fi chip, which enables you to send readings to the Internet right from the board
  • A self-sufficient board, which can be programmed/flushed over the air (Web IDE)
  • $19 pricetag
  • Sensors for Arduino should work for Photon

100% Cloud

This is what my architecture looks like:

  1. Particle Photon + light sensor (photo resistor) + temperature sensor (DHT11) will take measurements every minute.
  2. Particle Photon is connected to my home Wifi router and sends the information over the Internet to an MQTT server. Particle hosts the MQTT server and a library for that, a RESTFull API for that server, as well as a web-based dashboard. You can also use any other MQTT server, assuming it has a library for Photon.
  3. I also wanted to record all measurements, so I needed to store them. I started an EC2 micro instance with MySQL and NodeJS and used Particle JavaScript API to “subscribe” to the events and save it to MySQL.

The whole system is 100% in the cloud; it is sending all events to the Particle’s server and I’m receiving those events from yet another server in AWS. Thus this system only needs: a) Particle Photon and b) a Wi-Fi router/access point. I can then pass the assembled Photon board to my friends and they can just connect it to another Wi-Fi router (this can be done with the phone app).

Fully Assembled Project

The system runs on a standard USB battery (the black “thing” below the board). I placed a quarter on the board to demonstrate the size.

I followed the instructions from the open home automation blog post to attach a DHT11 sensor for temperature and humidity measurements.

Particle Build provides the Web IDE; from there I can add those lines to publish my events:

... Spark.publish("temperature", String(temperature)); Spark.publish("humidity", String(humidity)); Spark.publish("light", String(light)); ...

Now I will need to subscribe to the events and load the readings to MySQL. I used the Particle JavaScript API to do that; here is my simple code (I did not insert a date but used MySQL’s timestamp. You can use the event’s date instead, which will probably be better).

console.log("Starting..."); var spark = require('spark'); spark.login({accessToken: '<place your token here>'}); // MySQL Connection var mysql = require('mysql'); var connection = mysql.createConnection({ host : 'localhost', user : 'photon', password : 'photon', database : 'particle' }); connection.connect(function(err) { if (err) { console.error('error connecting: ' + err.stack); return; } console.log('connected as id ' + connection.threadId); }); spark.getEventStream(false, 'mine', function(data) { console.log("Event: " + data.name + ':' + data.data); var query = connection.query('INSERT INTO cloud_data (name, data) values (?, ?)', [data.name, data.data], function(err, result) { if (err) { console.log('Error in ' + query.sql + err + JSON.stringify(result)); } }); console.log(query.sql); });

That is it.  Here is the listing:

$ nodejs particle_mysql.js Starting... connected as id 73 INSERT INTO cloud_data (name, data) values ('spark/status', 'online') Event: light:164 INSERT INTO cloud_data (name, data) values ('light', '164') Event: light:165 INSERT INTO cloud_data (name, data) values ('light', '165') Event: light:162

Conclusion

As the Internet of Things gets more and more popular, it is also becoming extremely easy to do a home project with it. The Particle Photon is a great option. Other options include:

  1. Intel Edison, $60, which is similar to Raspberry Pi and has more functions
  2. WiFi Module – ESP8266, $6. This is a much simpler module and will require an additional board to “flush” the code

The post Internet of Things, Messaging and MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraBackup 2.3.2 GA is now available

MySQL Performance Blog - Thu, 2015-10-22 16:12

Percona is glad to announce the release of Percona XtraBackup 2.3.2 GA on October 22nd, 2015. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release is the first GA (Generally Available) stable release in the 2.3 series.

This release contains all the features and bug fixes in Percona XtraBackup 2.2.13, plus the following:

New Features:

Bugs Fixed:

  • xbcloud contained password in the processlist which would allow an unprivileged user privileged access to the swift service, and more likely the entire openstack deploy for which keystone is providing the identity service. Bug fixed #1447610.
  • Percona XtraBackup 2.3 didn’t set wait_timeout session variable in order to prevent server to kill the connection while it is copying data files. Bug fixed #1495367.
  • xbcloud would fail to create a container with error: curl_easy_perform() failed: Failed sending data to the peer. Bug fixed #1500508.
  • In some cases streaming backup could be corrupted due to a broken pipe error, particularly if error occurred when xtrabackup copied set of tiny files (*.frm or similar), but xtrabackup would not notice it and complete successfully.Bug fixed #1452387
  • xtrabackup 2.3 now adds timestamps to the STDERR output. Bug fixed #1454692.
  • Stream decryption would fail if the encryption options were in my.cnf configuration file because they were ignored by innobackupex. Bug fixed #1190335.
  • Fixed broken out-of-source tree builds in 2.3 trunk. Bug fixed #1457016.
  • Percona XtraBackup now supports --datadir as a command line option. Bug fixed #1042887.

NOTE: As this is a major version upgrade, on Debian and Ubuntu to upgrade from 2.2.12 you’ll need to run one of the following commands: apt-get install percona-xtrabackup, apt-get dist-upgrade or aptitude safe-upgrade.

Release notes with all the bugfixes for Percona XtraBackup 2.3.2 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona XtraBackup 2.3.2 GA is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraBackup 2.2.13 is now available

MySQL Performance Blog - Thu, 2015-10-22 16:12

Percona is glad to announce the release of Percona XtraBackup 2.2.13 on October 22nd, 2015. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

Package name has been changed from percona-xtrabackup to percona-xtrabackup-22 (percona-xtrabackup now points to the latest GA release 2.3.2).

Bugs Fixed:

  • Improved the detection when the log block that has the different number from what is expected, was caused by log block not being flushed to position xtrabackup is reading from or if it was caused by using incorrect last checkpoint LSN offset in our calculations. Bug fixed #1497912.
  • Fixed false positive error: The log was not applied to the intended LSN which was happening even when the redo log was applied correctly. Bug fixed #1505017.
  • xtrabackup_logfile was not compressed when --compress option was used. Bug fixed #1242309.
  • innobackupex wrote error message to STDOUT instead of STDIN which broke xbstream and tar stream. Bug fixed #1503964.
  • Incremental backups did not work with MariaDB below version 10.1.6. Bug fixed #1505865.

Other bugs fixed: #1493015.

Release notes with all the bugfixes for Percona XtraBackup 2.2.13 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona XtraBackup 2.2.13 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

RocksDB 101

MySQL Performance Blog - Wed, 2015-10-21 14:30

After we announced that Percona offers support for RocksDB, we saw many people looking for more details about this storage engine. Here is a quick list of some of the most frequent questions we get.

Q: What is RocksDB?

A: Quoting the homepage of the project:

RocksDB is an embeddable persistent key-value store for fast storage. RocksDB can also be the foundation for a client-server database, but our current focus is on embedded workloads.

RocksDB builds on LevelDB to be scalable to run on servers with many CPU cores, to efficiently use fast storage, to support IO-bound, in-memory and write-once workloads, and to be flexible to allow for innovation.

Q: Where is it available?

A: You have 2 main options to get RocksDB:

In both cases you will then need to start MongoDB with --storageEngine = rocksdb.

Q: What are the main features of RocksDB?

A: Because of its design using LSM trees, RocksDB offers excellent write performance without sacrificing too much read performance. And as a modern storage engine, it compresses data.

So each time you are concerned with MongoDB write performance, RocksDB is a good candidate.

Also note that RocksDB has been developed with fast storage in mind.

Q: Why is RocksDB write optimized?

A: RocksDB uses LSM trees to store data, unlike most other storage engines which are using B-Trees.

In most cases, B-Trees offer a very good tradeoff between read performance and write performance; this is why they are so widely used in the database world. However when the working set no longer fits in memory, writes become extremely slow because at least an I/O is needed for each write operation.

LSM trees are designed to amortize the cost of writes: data is written to log files that are sequentially written to disk and never modified. Then a background thread merges the log files (compaction). With this design a single I/O can flush to disk tens or hundreds of write operations.

The tradeoff is that reading a document is more complex and therefore slower than for a B-Tree; because we don’t know in advance in which log file the latest version of the data is stored, we may need to read multiple files to perform a single read. Tricks like bloom filters help alleviate this issue.

Q: How is RocksDB performance compared to other storage engine?

Mark Callaghan from Facebook published results for cached databases (data fits in memory) some time ago.

Vadim Tkachenko from Percona published additional results when data is larger than memory.

Q: Where can I find RocksDB support?

A: You can report issues here, go to this Facebook group to discuss RocksDB-related topics, or hire us.

Q: How can I run backups?

Storage-engine agnostic methods like cold backups or volume snapshots work with RocksDB.

RocksDB also has native support for hot backups with the following command:

db.adminCommand({setParameter:1, rocksdbBackup: “/path/to/backup/dir”})

See this post from Facebook/Parse engineering team for more details.

The LSM tree design makes incremental backups much easier than with many technologies and rocks-strata is probably a good place to start.

Conclusion

The storage engine ecosystem for MongoDB is quickly advancing now with at least 3 strong contenders: WiredTiger, RocksDB and PerconaFT. If you want to learn more from RocksDB, PerconaFT and Percona Server for MongoDB, please register for my free webinar on Wed Oct 28 at 11am Pacific Time.

The post RocksDB 101 appeared first on MySQL Performance Blog.

Categories: MySQL

Slow query graphs using Performance Schema and Graphite

MySQL Performance Blog - Tue, 2015-10-20 13:25

I love graphs. They just make things easier when it comes to finding patterns. I also love visibility. Having the ability to known what is going on inside the database is priceless. How about having visibility of the slow queries execution time on a graph? Let’s do it.

We’ve already described how to get query digest using performance schema. Since the MySQL server is already doing the heavy lifting for you with little-to-no overhead, this information is available practically at will. So let’s make some graphs with that data.

To accomplish this I will use the well-known tool Graphite to store and render time-series data. For those who are not familiar with Graphite, it’s actually a 3-piece tool, consisting of:

  • The Carbon metric processing daemons
  • The Whisper time-series database library
  • The Graphite-Web, a Django-based web application that renders graphs and dashboards

Instructions for installing it are out of the scope of this post, but it is not hard to find instructions surfing the web and it’s not more complicated than using yum/apt to do it.

One of the greatest things about Graphite is that feeding data to it can be quite simple. The most basic way to do it is by using the “plaintext protocol” which is nothing more than sending the data to a specific port with the proper format. The format is:

<metric_path> <metric_value> <metric_timestamp>

You can use netcat to send the information. That’s it.

I’ve decided to use an AWS MySQL RDS instance running sysbench for testing purposes. For the metric path format, I’ve used: mysql.rds.<query_formatted> <value> <timestamp>. Now, assuming that the carbon is in the localhost machine and knowing that the default port number is 2003, to send a value I just need to do something like:

echo "mysql.rds.SELECT__ 0.00123 1445026691" | nc -w1 localhost 2003

And Graphite will take care of the rest!

The questions now are: Where do I get the data from? And how do I feed Graphite in a continuous way?

The first answer is: From the Performance Schema!
The second answer is: Bash.

You can write a small script that takes care of consulting the performance_schema.events_statements_summary_by_digest table, gives a good format to the queries and sends the data to Graphite. This is the one I used for my testing:

#!/bin/bash query="set session group_concat_max_len=2048; select concat(digest_text,'dash', round(sum_timer_wait/ 1000000000000, 6),'\n') from performance_schema.events_statements_summary_by_digest order by sum_timer_wait desc limit 20;" IFS=" " for i in $(mysql -Nr -e"$query"); do digest=$(echo ${i%%dash*}) digest=${digest%%,*} digest=$(echo $digest | tr -d "`") digest=$(echo $digest | tr " " "_") digest=$(echo $digest | tr -d "?") digest=$(echo $digest | tr "." "-") digest=$(echo $digest | tr "(" "_") digest=$(echo $digest | tr ")" "_") value=$(echo ${i##*dash}) echo "mysql.rds.$digest $value $(date +%s)" | nc -w 1 localhost 2003 done

Not too fancy, but it will do the trick. Calling the script inside an infinite loop at every 1 second for several hours, the result is:

The above graph shows the delta between subsequent data points, previously normalized to 1 per second. The data points here are all the INSERT, SELECT, DELETE and UPDATE’s from the server, combined with wildcards, like this:

In plain English: the graph shows how long the queries are taking. This will give you pretty good insight information from the queries’ perspective.

You can have more detailed graphs; for example, one with explicit queries that you want to track:

From the tree on the left you can see all the queries that the bash script has collected and sent to Graphite. This is somehow dynamic, since it will add “new” queries that fulfill the criteria.

This is just a small example of what can be done, but both Performance Schema and Graphite are very versatile and you can find creative ways to use them. Readers that are familiar with tools like Anemometer or the great blog post from openark might find this very similar; however this is a direct approach, that instead of using pt-query-digest with all it’s complexity, instead uses the P_S in a more customized and simpler way. Have fun!

The post Slow query graphs using Performance Schema and Graphite appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Live 2016 Call for Papers Open! Introducing: Community Voting

MySQL Performance Blog - Tue, 2015-10-20 10:56

The Call for Papers for the fifth annual Percona Live Data Performance Conference and Expo (formerly MySQL Conference and Expo) taking place April 18-21, 2016, in Santa Clara, California, is now officially open!

Ask yourself… “Do I have…”

  • Fresh ideas?
  • Enlightening case studies?
  • Insight on best practices?
  • In-depth technical knowlege?

If the answer to any of these is “YES,” then you need to submit your proposal for a chance to speak at Percona Live 2016. Speaking is a great way to broaden not only the awareness of your company with an intelligent and engaged audience of software architects, senior engineers and developers, but also your own!

Tracks/Topics: 

This year, the conference will feature a variety of formal tracks, including New and Trending Topics, Big Data, Analytics, Security, Scalability and Performance, Architecture and Design, Operations and Management, Development, Case Studies, and Sponsored Talks. Speaker proposals are welcome on a variety of related technologies, including MySQL®, MongoDB®, Amazon Web Services (AWS), OpenStack, Redis, Docker and many more. Focus areas can include Devops, High Availability, Programming, Web Services, Optimization and other areas of interest.

Community Voting:

In an effort to involve the community in the selection of speaking sessions, Percona has implemented a community voting process. After a speaker submits a proposal they are provided a link to share with the community asking for their vote. The more highly ranked proposals will continue onto the next phase of the voting process with the conference committee.

Action Items:

With the call for papers ending NOVEMBER 29th, the time to submit is now! Here’s what you need to submit:

1) Pull together your proposal information:

-Talk title

-Abstract

-Speaker bio, headshot, video

2) Submit your proposal here before Nov 29th: SUMBIT

3) After submitting, you will be given a direct link and badge to advertise to the community. Tweet, Facebook, blog… get the word out about your submission and get your followers to vote!

That’s all it takes! Looking forward to your submissions!

Sponsoring/ Super Saver Tickets! 

*Interested in sponsoring? Take advantage of early sponsorship opportunities to reach the influential Percona Live audience. Sponsorship Information

*Want to attend at the cheapest rate possible? Super Saver tickets are on sale NOW! Register today.

The post Percona Live 2016 Call for Papers Open! Introducing: Community Voting appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Welcomes David Murphy as Practice Manager for MongoDB

MySQL Performance Blog - Mon, 2015-10-19 17:35

We are excited this week to welcome David Murphy to Percona as our Practice Manager for MongoDB®. A veteran of ObjectRocket, Electronic Arts, and Rackspace, David is a welcome addition to our team and will provide vision, direction and best practices for all things MongoDB. David also has an extensive background in MySQL and is a well-known Open Source advocate and contributor.

At ObjectRocket, David was the Lead DBA and MongoDB Master. He was the first DBA hired by the company and built the DBA team, developed training plans for them, and provided support and guidance to other groups within the company on the use of new technologies. During this time, he helped build out systems for online migration for MongoDB, consulting with clients on database design and shard key analysis. As expected with any enterprise-grade operations team, he oversaw database testing and readiness, MongoDB/NoSQL/MySQL escalation support, and version and bug tracking. Additionally, he has contributed to the MongoDB source code and is known for advocating for operational stability and improvements, even when it did not align with MongoDB Inc. desired communications. Throughout, he gave talks to assist others in the real-world running of MongoDB, released scripts to aid others in operational areas, and gave tutorials on complex topics such as sharding and its after effects.

At Electronic Arts, David was the subject-matter expert for major sports titles like FIFA, which accounted for 27 percent of the company’s net revenue. From there he took on the additional responsibilities to plan the company’s entry into the NoSQL data space as the DB Architect – NoSQL/MySQL. Among many duties, he handled major sports titles, support-service architecture, and proof-of-concept planning for new technologies, including MongoDB, MySQL Cluster, and Clustrix, to help determine the feasibility and logistical fit within EA projects. As a trusted expert in the company, he was regularly sent along with another architect to “parachute” into major issues and bring them back into a successful outcome.

At Rackspace, David was a Cloud Sites – Linux Systems Operations Administrator, where he maintained, deployed and managed hundreds of Linux, MySQL, PHP, and storage clusters. David also held administrator and engineering positions at Lead Geniuses, ByDomino, McAfee and The Planet.

A true master of the art and science of MongoDB, David will help Percona become a disruptive force in the MongoDB space by bringing an operational-centric view to everything we do. We look forward to some very exciting months and years ahead.

The post Percona Welcomes David Murphy as Practice Manager for MongoDB appeared first on MySQL Performance Blog.

Categories: MySQL
Syndicate content