MySQL

Percona XtraDB Cluster 5.6.19-25.6 is now available

MySQL Performance Blog - Mon, 2014-07-21 19:24

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on July 21st 2014. Binaries are available from downloads area or from our software repositories. We’re also happy to announce that Ubuntu 14.04 LTS users can now download, install, and upgrade Percona XtraDB Cluster 5.6 from Percona’s software repositories.

Based on Percona Server 5.6.19-67.0 including all the bug fixes in it, Galera Replicator 3.6, and on Codership wsrep API 25.6, Percona XtraDB Cluster 5.6.19-25.6 is now the current General Availability release. All of Percona‘s software is open-source and free, and all the details of the release can be found in the 5.6.19-25.6 milestone at Launchpad.

New Features:

  • Percona XtraDB Cluster now supports storing the Primary Component state to disk by setting the pc.recovery variable to true. The Primary Component can then recover automatically when all nodes that were part of the last saved state reestablish communications with each other. This feature can be used for automatic recovery from full cluster crashes, such as in the case of a data center power outage and graceful full cluster restarts without the need for explicitly bootstrapping a new Primary Component.
  • When joining the cluster, the state message exchange provides us with gcache seqno limits. That information is now used to choose a donor through IST first, and, if this is not possible, only then SST is attempted. The wsrep_sst_donor setting is honored, though, and it is also segment aware.
  • An asynchronous replication slave thread was stopped when the node tried to apply the next replication event while the node was in non-primary state. But it would then remain stopped after the node successfully re-joined the cluster. A new variable, wsrep_restart_slave, has been implemented which controls if the MySQL slave should be restarted automatically when the node re-joins the cluster.
  • Handling install message and install state message processing has been improved to make group forming a more stable process in cases when many nodes are joining the cluster.
  • A new wsrep_evs_repl_latency status variable has been implemented which provides the group communication replication latency information.
  • Node consistency issues with foreign key grammar have been fixed. This fix introduces two new variables: wsrep_slave_FK_checks and wsrep_slave_UK_checks. These variables are set to TRUE and FALSE respectively by default. They control whether Foreign Key and Unique Key checking is done for applier threads.

Bugs Fixed:

  • Fixed the race condition in Foreign Key processing that could cause assertion. Bug fixed #1342959.
  • The restart sequence in scripts/mysql.server would fail to capture and return if the start call failed to start the server. As a result, a restart could occur that failed upon start-up, and the script would still return 0 as if it worked without any issues. Bug fixed #1339894.
  • Updating a unique key value could cause the server to hang if a slave node had enabled parallel slaves. Bug fixed #1280896.
  • Percona XtraDB Cluster has implemented threadpool scheduling fixes. Bug fixed #1333348.
  • garbd was returning an incorrect return code, ie. when garbd was already started, return code was 0. Bug fixed #1308103.
  • rsync SST would silently fail on joiner when the rsync server port was already taken. Bug fixed #1099783.
  • When gmcast.listen_addr was configured to a certain address, the local connection point for outgoing connections was not bound to the listen address. This would happen if the OS has multiple interfaces with IP addresses in the same subnet. The OS would pick the wrong IP for a local connection point and other nodes would see connections originating from an IP address which was not listened to. Bug fixed #1240964.
  • An issue with re-setting galera provider (in wsrep_provider_options) has been fixed. Bug fixed #1260283.
  • Variable wsrep_provider_options couldn’t be set in runtime if no provider was loaded. Bug fixed #1260290.
  • Percona XtraDB Cluster couldn’t be built with Bison 3.0. Bug fixed #1262439.
  • MySQL wasn’t handling exceeding the max writeset size wsrep error correctly. Bug fixed #1270920.
  • Fixed the issue which caused a node to hang/fail when SELECTs/SHOW STATUS was run after FLUSH TABLES WITH READ LOCK was used on a node with wsrep_causal_reads set to 1 while there was a DML on other nodes. Bug fixed #1271177.
  • Lowest group communication layer (evs) would fail to handle the situation properly when a large number of nodes would suddenly start recognizing each other. Bugs fixed #1271918 and #1249805.
  • Percona XtraBackup SST would fail if the progress option was used with a large number of files. Bug fixed #1294431.

NOTE: When performing an upgrade from an older 5.6 version on Debian/Ubuntu systems, in order to upgrade the Galera package correctly, you’ll need to pin the Percona repository and run: apt-get install percona-xtradb-cluster-56. This is required because older Galera deb packages have an incorrect version number. The correct wsrep_provider_version after upgrade should be 3.6(r3a949e6).

This release contains 50 fixed bugs. The complete list of fixed bugs can be found in our release notes.

Release notes for Percona XtraDB Cluster 5.6.19-25.6 are available in our online documentation along with the installation and upgrade instructions.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraDB Cluster Errata can be found in our documentation.

[UPDATE 2014-07-24]: Package Percona-XtraDB-Cluster-client-56-5.6.19-25.6.824.el6.x86_64.rpm has been updated to resolve the conflict with Percona-XtraDB-Cluster-devel package.

The post Percona XtraDB Cluster 5.6.19-25.6 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Monitoring Plugins 1.1.4 release

MySQL Performance Blog - Mon, 2014-07-21 15:42

Percona is glad to announce the release of Percona Monitoring Plugins 1.1.4.

Changelog:

* Added login-path support to Nagios plugins with MySQL client 5.6 (bug 1338549)
* Added a new threshold option for delayed slaves to pmp-check-mysql-replication-delay (bug 1318280)
* Added delayed slave support to pmp-check-mysql-replication-running (bug 1332082)
* Updated Nagios plugins and Cacti script to leverage lock-free SHOW SLAVE STATUS in Percona Server (bug 1297442)
* Fixed pmp-check-mysql-replication-delay integer-float issue with MariaDB and MySQL 5.6 (bugs 1245934, 1295795)
* ss_get_rds_stats.py was not installed with 755 permissions from the package (bug 1316943)
* Cacti MySQL template item “handler_savepoint_rollback” was GAUGE type instead of DERIVE (bug 1334173)
* Fixed Zabbix running-slave check issue on some Debian systems (bug 1310723)
* Fixed paths in percona-cacti-templates package (bug 1349564)

A new tarball is available from downloads area or RPM and DEB packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. In addition as part of Percona’s Remote DBA installation and setup of these tools are included with our services. You can find links to the documentation, forums and more at the project homepage.

About Percona Monitoring Plugins
Percona Monitoring Plugins are high-quality components to add enterprise-grade MySQL monitoring and graphing capabilities to your existing in-house, on-premises monitoring solutions. The components are designed to integrate seamlessly with widely deployed solutions such as Nagios, Cacti and Zabbix and are delivered in the form of templates, plugins, and scripts which make it easy to monitor MySQL performance.

The post Percona Monitoring Plugins 1.1.4 release appeared first on MySQL Performance Blog.

Categories: MySQL

A schema change inconsistency with Galera Cluster for MySQL

MySQL Performance Blog - Mon, 2014-07-21 07:00

I recently worked on a case where one node of a Galera cluster had its schema desynchronized with the other nodes. And that was although Total Order Isolation method was in effect to perform the schema changes. Let’s see what happened.

Background

For those of you who are not familiar with how Galera can perform schema changes, here is a short recap:

  • Two methods are available depending on the value of the wsrep_OSU_method setting. Both have benefits and drawbacks, it is not the main topic of this post.
  • With TOI (Total Order Isolation), a DDL statement is performed at the same point in the replication flow on all nodes, giving strong guarantees that the schema is always identical on all nodes.
  • With RSU (Rolling Schema Upgrade), a DDL statement is not replicated to the other nodes. Until the DDL statement has been executed on all nodes, the schema is not consistent everywhere (so you should be careful not to break replication).

You can look at the official document here.

If you read carefully the section on TOI, you will see that “[...] TOI transactions will never fail certification and are guaranteed to be executed.” But also that “The system replicates the TOI query before execution and there is no way to know whether it succeeds or fails. Thus, error checking on TOI queries is switched off.”

Confusing? Not really. It simply means that with TOI, a DDL statement will always pass certification. But if for some reason, the DDL statement fails on one of the nodes, it will not be rolled back on the other nodes. This opens the door for schema inconsistencies between nodes.

A test case

Let’s create a table on a 3-node Percona XtraDB Cluster 5.6 cluster and insert a few rows:

pxc1> create table t (id int not null auto_increment primary key, c varchar(10)); pxc1> insert into t (c) values ('aaaa'),('bbbb');

Then on node 3, let’s introduce a schema change on t that can make other schema changes fail:

pxc3> set global wsrep_OSU_method=RSU; pxc3> alter table t add d int; pxc3> set global wsrep_OSU_method=TOI;

As the schema change was done on node 3 with RSU, it is not replicated to the other nodes.

Now let’s try another schema change on node 1:

pxc1> alter table t add d varchar(10); Query OK, 0 rows affected (0,14 sec) Records: 0 Duplicates: 0 Warnings: 0

Apparently everything went well and indeed on node 1 and 2, we have the correct schema:

pxc2>show create table tG *************************** 1. row *************************** Table: t Create Table: CREATE TABLE t ( id int(11) NOT NULL AUTO_INCREMENT, c varchar(10) DEFAULT NULL, d varchar(10) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1

But on node 3, the statement failed so the schema has not been changed:

pxc3> show create table tG *************************** 1. row *************************** Table: t Create Table: CREATE TABLE t ( id int(11) NOT NULL AUTO_INCREMENT, c varchar(10) DEFAULT NULL, d int(11) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=latin1

The error is visible in the error log of node 3:

2014-07-18 10:37:14 9649 [ERROR] Slave SQL: Error 'Duplicate column name 'd'' on query. Default database: 'repl_test'. Query: 'alter table t add d varchar(10)', Error_code: 1060 2014-07-18 10:37:14 9649 [Warning] WSREP: RBR event 1 Query apply warning: 1, 200 2014-07-18 10:37:14 9649 [Warning] WSREP: Ignoring error for TO isolated action: source: 577ffd51-0e52-11e4-a30e-4bde3a7ad3f2 version: 3 local: 0 state: APPLYING flags: 65 conn_id: 3 trx_id: -1 seqnos (l: 17, g: 200, s: 199, d: 199, ts: 7722177758966)

But of course it is easy to miss. And then a simple INSERT can trigger a shutdown on node3:

pxc2> insert into t (c,d) values ('cccc','dddd'); Query OK, 1 row affected (0,00 sec)

will trigger this on node 3:

2014-07-18 10:42:27 9649 [ERROR] Slave SQL: Column 2 of table 'repl_test.t' cannot be converted from type 'varchar(10)' to type 'int(11)', Error_code: 1677 2014-07-18 10:42:27 9649 [Warning] WSREP: RBR event 3 Write_rows apply warning: 3, 201 2014-07-18 10:42:27 9649 [Warning] WSREP: Failed to apply app buffer: seqno: 201, status: 1 at galera/src/trx_handle.cpp:apply():340 [...] 2014-07-18 10:42:27 9649 [Note] WSREP: Received NON-PRIMARY. 2014-07-18 10:42:27 9649 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 201) 2014-07-18 10:42:27 9649 [Note] WSREP: Received self-leave message. 2014-07-18 10:42:27 9649 [Note] WSREP: Flow-control interval: [0, 0] 2014-07-18 10:42:27 9649 [Note] WSREP: Received SELF-LEAVE. Closing connection. 2014-07-18 10:42:27 9649 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 201) 2014-07-18 10:42:27 9649 [Note] WSREP: RECV thread exiting 0: Success 2014-07-18 10:42:27 9649 [Note] WSREP: recv_thread() joined. 2014-07-18 10:42:27 9649 [Note] WSREP: Closing replication queue. 2014-07-18 10:42:27 9649 [Note] WSREP: Closing slave action queue. 2014-07-18 10:42:27 9649 [Note] WSREP: bin/mysqld: Terminated.

Conclusion

As on regular MySQL, schema changes are challenging with Galera. Some subtleties can create a lot of troubles if you are not aware of them. So before running DDL statement, make sure you fully understand how TOI and RSU methods work.

The post A schema change inconsistency with Galera Cluster for MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue

MySQL Performance Blog - Fri, 2014-07-18 14:38

The Percona Managed Services team recently faced a somewhat peculiar client issue. We’d receive pages about their MySQL service being unreachable. However, studying the logs showed nothing out of the ordinary…. for the most part it appeared to be a normal shutdown and there was nothing in anyone’s command history nor a cron task to speak of that was suspicious.

This is one of those obscure and peculiar (read: unique) issues that triggered an old memory; I’d seen this behavior before and I had just the tool to catch the culprit in the act.

Systemtap made diagnostics of this issue possible and I can’t state enough how much of a powerful and often under-utilized tool set systemtap really is.

cat > signals.stp << EOF
probe signal.send {
if (sig_name == “SIGKILL” || sig_name == “SIGTERM”)
printf(“[%s] %s was sent to %s (pid:%d) by %s uid:%dn”,
ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), uid())
}
EOF

sudo stap ./signals.stp > signals.log 2>signals.err

grep mysqld signals.log
[Wed Jun 11 19:03:23 2014] SIGKILL was sent to mysqld (pid:8707) by cfagent uid:0
[Fri Jun 13 21:37:27 2014] SIGKILL was sent to mysqld (pid:6583) by cfagent uid:0
[Sun Jun 15 05:05:34 2014] SIGKILL was sent to mysqld (pid:19818) by cfagent uid:0
[Wed Jul 9 07:03:47 2014] SIGKILL was sent to mysqld (pid:4802) by cfagent uid:0

Addendum: It had been so long since I had used this tooling that I could not remember the original source from which I derived the module above; some cursory searching to rectify this issue for this blog post found this original source by Eugene Teo of Red Hat made available under GPLv2.

From this we were able to show that cfagent was killing the mysqld process presumably via a misconfigured job; this information was returned to the client and this has continued to be run in production for two months now at the client’s request with no issues to speak of.

This is by no means the limit to what systemtap can be used to achieve; you can hook into functions though whilst you may need to install the debug packages to find what functions are available run for example:

sudo stap -L 'process("/usr/sbin/mysqld").function("*")' > /tmp/mysql_stapfunc
...
head /tmp/mysql_stapfunc
process("/usr/sbin/mysqld").function("TIME_from_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_set_hhmmss")
process("/usr/sbin/mysqld").function("TIME_set_yymmdd")
process("/usr/sbin/mysqld").function("TIME_to_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_to_ulonglong")
...

This is also true of the kernel using sudo stap -L 'kernel.function("*")' > /tmp/kernel_stapfunc however you must be booted into a debug kernel for this to function.

Systemtap is more than a worthy tool to have at your disposal with plenty of examples available.

Finally I invite you to join me July 23 at 10 a.m. Pacific time for my webinar, “What Every DBA Needs to Know About MySQL Security.” This detailed technical webinar provides insight into best security practices for either setting up a new MySQL environment or upgrading the security of an existing one. I hope to see you there!

The post Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue appeared first on MySQL Performance Blog.

Categories: MySQL

Q&A: Even More Deadly Mistakes of MySQL Development

MySQL Performance Blog - Thu, 2014-07-17 16:57

On Wednesday I gave a presentation on “How to Avoid Even More Common (but Deadly) MySQL Development Mistakes” for Percona MySQL Webinars.  If you missed it, you can still register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as we had time for  during the session, but here are all the questions with my complete answers:

Q: Disk bandwidth also not infinite

Indeed, you’re right!

We discussed in the webinar the impact on network bandwidth from using column wildcards in queries like SELECT *, but it’s also possible that using SELECT * can impact disk operations. Varchar, Blob, or Text columns can be stored on extra pages in the database, and if you include those columns in your query needlessly, it can cause the storage engine to do a lot of seeks and page reads unnecessarily.

For more details on string storage in InnoDB, see Peter Zaitsev’s blog on Blob Storage in Innodb.

Q: How many tables can be joined in a single query? What is the optimal number of joins?

MySQL has a limit of 63 table references in a given query. This limits how many JOIN operations you can do, and also limits the number of UNIONs. Actually you can go over this limit if your JOIN or UNION don’t reference any tables, that is, create a derived table of one row of expressions.

If you do join a lot of tables (or even self-join the same table many times), you’re likely to hit a practical scaling limit long before you reach 63 table references. The practical limit in your case depends on many factors, including the length of the tables, the data types, the type of join expressions in your queries, and your physical server’s capabilities. It’s not a fixed limit I can cite for you.

If you think you need dozens of table references in a single query, you should probably step back and reconsider your database design or your query design.

I often see this type of question (“what is the limit on the number of joins?”) when people try to use key/value tables, also called Entity-Attribute-Value, and they’re trying to pivot attributes from rows into columns, as if the table were stored in a conventional way with one column per attribute. This is a broken design for many reasons, and the scalability of many-way joins is just one problem with it.

Q: How many indexes can be created in a single table? Any limitation? What is the optimal number of indexes?

All MySQL storage engines support at least 16 indexes per table.

As far as the optimal number of indexes, I don’t pay attention to the number of indexes (as long as it remains lower than the max of 16). I try to make sure I have the right indexes for my queries. If you put an arbitrary cap of for example 8 or 10 indexes on a given table, then you might be running queries that lack a needed index, and the unnecessary extra cost of running that query is probably greater than the cost of maintaining the one extra index it needs.

That said, there are cases where you have such variation in query types that there’s no way to have optimal indexes to cover every possible case. Given that you can have multi-column indexes, and multi-column indexes with columns in different orders, there are n-factorial possible indexes on a table with n columns.

Q: There is a table with 3 columns: id(int), user_id(int), day(date). There is a high chance same user_id will ‘exist’ for every day. I read data by “where user_id = some_id” (very high throuhput) and delete all entries once a day by cron using “where sent_date = ’2014-01-01′ “. Have approx 6M rows per day deletion is pretty painfull. Will partitioning by column ‘day’ help me deleting those bulks faster? If yes – how much faster? How much will it slow down SELECTs? – not all entries are deleted, but only entries for some specific old day, e.g. ‘ WHERE day = ’1 week ago’

Range partitioning by date would give you the opportunity to ALTER TABLE…DROP PARTITION, so you could remove all data for a given date very quickly, much faster than deleting millions of rows. The performance of DROP PARTITION is like that of DROP TABLE, because each partition is physically stored like a separate table.

Searching for “where user_id = ?” would not be able to take advantage of partition pruning, but it would still be able to use an index on user_id. And if you drop old partitions, the benefit of searching a smaller table could be a good tradeoff.

Q: Regarding 20% selectivity as a threshold for the optimizer preferring a table-scan to an index lookup – is that a tunable?

No, it’s not tunable, it’s a fixed behavior of the query optimizer. If you search for a value and the optimizer estimates that > 20% of rows contain the value you search for, it will bypass the index and just do a table-scan.

For the same reason that the index of a book doesn’t contain very common words, because the list of pages that word appears on would be too long, and flipping back and forth from the back of the book to each listed page would actually be more work than just reading the book.

Also keep in mind my figure of 20% is approximate. Your results may vary. This is not a magic threshold in the source code, it’s just a tendency I have observed.

Q: Regarding generating synthetic test data, it sounds like a pretty easy perl script to write.

Yes, it might be easy to do that for one given table. But every table is different, and you might have hundreds of tables in dozens of applications to generate test data for. You might also want to vary the distribution of data values from one test to another.

Writing a test-data generator for one particular case is easy, so you might reasonably do it as a one-off task. Writing a general-purpose test-data generator that you can use for many cases is more work.

Q: Would love to have the set of URLs cited in the presentation without having to go back and mine them out of the presentation.

Open source message queues:

MySQL Performance Blog articles:

Open source test-data generator:

Load-testing tools for web applications:

Load-testing tools to replay query logs:

Further reading for implementing business rules:

Q: How to best use mysql query cache?

Any cache is best used if you read from it many times for each time you write to it. So we’d like to estimate the average ratio of query cache reads to writes, to estimate how much leverage it’s giving us.

mysql> SHOW GLOBAL STATUS LIKE 'Qcache%';

Check the values for QCache_hits (which are cases when a query result was read from the query cache) over QCache_inserts (which are cases when the desired query result was not in the cache, and had to be run and then the result stored in the cache). I like to see a ratio of 1000% or more (i.e. 10:1 hits to inserts).

If you have a poor ratio, for example less than 1:1 or less than 100%, then consider disabling the query cache, because it may be costing more to maintain it than the performance benefit it’s giving you.

Keep in mind that this is only a guideline, because the calculation I described is only an average. It could be that the queries served by the query cache are very expensive, so using the cached result is a great benefit even if it accounts for a small number of hits. The only way to be certain is to load-test your application under your load, and compare overall performance results with the query cache enabled or disabled, and at different sizes.

Q: How to detect when too much indexes start to affect performance?

Some people are reluctant to create indexes because they have been warned that indexes require synchronous updates when you INSERT, UPDATE, or DELETE rows. Some people also make the generalization that indexes harm writes but benefit reads. Bot of these are not true.

Your DML operations aren’t really updating indexes in real time. InnoDB includes a feature called change buffering, which defers index updates. The change buffer is gradually merged into the index over time. That way, InnoDB can handle a big spike in traffic without it hurting throughput as much. You can monitor how much content in the change buffer remains to be merged:
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_ibuf_size';

It’s also not accurate that indexes hurt writes. UPDATE and DELETE statements usually have a WHERE clause, to apply the changes to particular rows. These conditions use indexes to reduce the examined rows, just like in SELECT statements. But in UPDATE and DELETE statements, it’s even more important to use indexes, because otherwise the statement has to lock a lot of rows to ensure it locks the rows you’re changing.

So I generally say, don’t avoid indexes based only on the number of indexes you have, just make sure your indexes are being employed by the queries you run, and drop indexes that aren’t used. Here are a couple of past blog posts that show how to do this:

Thanks again for attending my webinar!  Here are some more tips:

  • Check out upcoming Percona Training classes in North America and Europe.
  • Join Percona and the MySQL community at our Percona Live.
  • Watch more webinars from Percona in the future!

The post Q&A: Even More Deadly Mistakes of MySQL Development appeared first on MySQL Performance Blog.

Categories: MySQL

High Availability with mysqlnd_ms on Percona XtraDB Cluster

MySQL Performance Blog - Wed, 2014-07-16 14:11

This is the second part of my series on High Availability with mysqlnd_ms. In my first post, “Simple MySQL Master HA with mysqlnd_ms,” I showed a simple HA solution using asynchronous MySQL replication. This time we will see how to leverage an all-primary cluster where you can write to all nodes. In this post I used Percona XtraDB Cluster, but you should also be able to do the same with MySQL NDB Cluster or Tungsten Replicator.

To start with, here is the mysqlnd_ms configuration I used:mysqlnd_ms_mm.ini.  All of these files are available from my Github repository. Below, I have three Percona XtraDB Cluster nodes, all defined as masters and no slaves. I’ve configured a roundrobin filter where all connections will happen on the first node, in this case192.168.56.44 . In case the first node fails, the second node will be used and so forth until no more nodes are available. Another interesting configuration option here is the loop_before_master strategy whereby if connection or a statement to the current server fails, it will be retried silently on the remaining nodes before returning an error to the user, more on this below.

{ "primary": { "master": { "master_1": { "host": "192.168.56.44", "port": "3306" }, "master_2": { "host": "192.168.56.43", "port": "3306" }, "master_3": { "host": "192.168.56.42", "port": "3306" } }, "slave": { }, "filters": { "roundrobin": [ ] }, "failover": { "strategy": "loop_before_master", "remember_failed": true } } }

Similar to my previous post, I also used a custom INI file for PHP to use, this time aptly namedmaster-master.ini :

mysqlnd_ms.enable = 1 mysqlnd_ms.disable_rw_split = 1 mysqlnd_ms.multi_master = 1 mysqlnd_ms.force_config_usage = 1 mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_mm.ini

A new addition to this configuration ismysqlnd_ms.multi_master , when enabled it would allow you to use all nodes or just one and treat the others as passive. The PHP script I used this time is calledmaster-master.php , it is largely similar tomaster-slave-ng.phpwith a few differences:

  1. There is no need for /tmp/PRIMARY_HAS_FAILED  sentinel as all nodes were writable.
  2. There is no need for /*ms=master*/  SQL hint when validating a connection from connect_mysql function since all nodes acts as master.

So here is a quick test, first with roundrobin filter, after 4 INSERTs, I shutdown 192.168.56.44  which sends my connection to the next server in the configuration,192.168.56.43 . When I started back 192.168.56.44  again, the script resumed connections there. Pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php Last value 3564 from host 192.168.56.44 via TCP/IP and thread id 19 Last value 3565 from host 192.168.56.44 via TCP/IP and thread id 20 Last value 3566 from host 192.168.56.44 via TCP/IP and thread id 21 Last value 3567 from host 192.168.56.44 via TCP/IP and thread id 22 Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 Warning: connect_mysql(): Error while reading greeting packet. PID=23464 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30 Last value 0 from host and thread id 0 Last value 3568 from host 192.168.56.43 via TCP/IP and thread id 1552 Last value 3569 from host 192.168.56.43 via TCP/IP and thread id 1553 [...] Last value 3584 from host 192.168.56.43 via TCP/IP and thread id 1568 Last value 3585 from host 192.168.56.44 via TCP/IP and thread id 18

Here’s another test using the random filter which allows you to write to all nodes, on my mysqlnd_ms_mm.ini above, I just changed roundrobin  torandom . As you can see, all three nodes were being used, of course in random, at the same time you will also see when I shutdown 192.168.56.44  around where the connect_mysql  errors and then the server was used again near the bottom after a started it back up. Still pretty cool right?

[revin@forge phpugph201407]$ php -c master-master.ini master-master.php Last value 3590 from host 192.168.56.42 via TCP/IP and thread id 2060 Last value 3591 from host 192.168.56.43 via TCP/IP and thread id 1569 Last value 3592 from host 192.168.56.43 via TCP/IP and thread id 1570 Warning: connect_mysql(): MySQL server has gone away in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 Warning: connect_mysql(): Error while reading greeting packet. PID=23919 in /home/revin/git/demo-me/phpugph201407/master-master.php on line 63 ERRROR: 192.168.56.43 via TCP/IP [2006] MySQL server has gone away on line 30 Last value 0 from host and thread id 0 Last value 3593 from host 192.168.56.42 via TCP/IP and thread id 2061 Last value 3594 from host 192.168.56.42 via TCP/IP and thread id 2062 Last value 3595 from host 192.168.56.42 via TCP/IP and thread id 2063 Last value 3596 from host 192.168.56.42 via TCP/IP and thread id 2064 Last value 3597 from host 192.168.56.43 via TCP/IP and thread id 1576 Last value 3598 from host 192.168.56.43 via TCP/IP and thread id 1577 Last value 3599 from host 192.168.56.43 via TCP/IP and thread id 1578 Last value 3600 from host 192.168.56.43 via TCP/IP and thread id 1579 Last value 3601 from host 192.168.56.42 via TCP/IP and thread id 2065 Last value 3602 from host 192.168.56.43 via TCP/IP and thread id 1581 Last value 3603 from host 192.168.56.43 via TCP/IP and thread id 1582 Last value 3604 from host 192.168.56.42 via TCP/IP and thread id 2066 Last value 3605 from host 192.168.56.44 via TCP/IP and thread id 19 Last value 3606 from host 192.168.56.43 via TCP/IP and thread id 1583 Last value 3607 from host 192.168.56.44 via TCP/IP and thread id 21

So here are some issues I’ve observed during these tests:

  1. remember_failed  during failover does not work as advertised. Supposedly, a failed node should not be used again for every connection request but in my test, this is not the case. See more from this bug. This means that if you have 2 out of 3 failed nodes in this scenario the overhead would be too big when testing both connections. Perhaps some sort of in memory shared TTL can be used to overcome this? I’m not sure.
  2. If you look closely around line 7 on my last output above the error displayed is kind of misleading. In particular it saysERRROR: 192.168.56.43 via TCP/IP , whereby it was not 192.168.56.43  that failed, it was192.168.56.43 . This is because under the hood, immediately after failure the next node will be cycled to, this is especially true since we have loop_before_master configured. I sure do have a bug on the script that should capture the host_info  properly, but this is something to always keep in mind so you don’t keep scratching your head.

So we’ve seen these two forms of possibilities and they definitely have use cases and advantages. On the other hand because of the issues we have found so far (I’ve reported 4 bugs on the PHP bugs database during the course of these tests including one crashing), I recommend to make sure you test seriously before putting this on production.

The post High Availability with mysqlnd_ms on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Categories: MySQL

TokuDB tips: MySQL backups

MySQL Performance Blog - Tue, 2014-07-15 10:00

In my recent post, “TokuDB gotchas: slow INFORMATION_SCHEMA TABLES,” I saw a couple questions and tweets asking if we use TokuDB in production. Actually I mentioned it in that post and we also blogged about it in a couple of other recent posts:

So, yes, we are using Percona Server + TokuDB as a main storage engine in Percona Cloud Tools to store timeseries data.

And, yes, Percona Server + TokuDB is available GA Percona Server 5.6.19-67.0 with TokuDB (GA).

Just having good performance is not enough to make it into production; there are also operational questions and one such question is about backups. I want to explain how we do backups for Percona Server + TokuDB in Percona Cloud Tools.

I should say up front, that we DO NOT have support for TokuDB in Percona XtraBackup. TokuDB internals are significantly different from InnoDB/XtraDB, so it will be a major project to add this to Percona XtraBackup and we do not have any plans at the moment to work on this.

It does not mean that TokuDB users do not have options for backups. There is Tokutek Hot back-up, included in the Tokutek Enterpise Subscription. And there is a method we use in Percona Cloud Tools: LVM Backups. We use mylvmbackup scripts for this task and it works fairly well for us.

There is however some gotchas to be aware. If you understand an LVM backups mechanic, this is basically a managed crash recovery process when you restore from a backup.

Now we need to go in a little detail for TokuDB. To support transactions that involve both TokuDB and InnoDB engines, TokuDB uses a two-phase commit mechanism in MySQL. When involved, the two-phase commit requires binary logs presented for a proper recovery procedures.

But now we need to take a look at how we setup a binary log in Percona Cloud Tools. We used SSD for the main data storage (LVM partition is here) and we use a Hardware RAID1 over two hard-drives for binary logs. We choose this setup as we care about SSD lifetime. In write-intensive workloads, binary logs will produce a lot of write operations and in our calculation we will just burn these SSDs, so we have to store them on something less expensive.

So the problem there is that when we take an LVM snapshot over main storage, we do not have a consistent view of binary logs (although it is possible to modify backup scripts to copy the current binary log under FLUSH TABLES WITH READ LOCK operation, this is probably what we will do next). But binary logs are needed for recovery, without them we face these kind of errors during restoring from backup:

2014-DD-MM 02:15:16 16414 [Note] Found 1 prepared transaction(s) in TokuDB 2014-DD-MM 02:15:16 16414 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions. 2014-DD-MM 02:15:16 16414 [ERROR] Aborting

The error message actually hints a way out. Unfortunately it seems that we are the first ones to have ever tried this option, as tc-heuristic-recover is totally broken in current MySQL and not supposed to work… and it would be noticed if someone really tried it before us (which gives me an impression that Oracle/MySQL never properly tested it, but that is a different story).

We will fix this in Percona Server soon.

So the way to handle a recovery from LVM backup without binary logs is to start mysqld with –tc-heuristic-recover switch (unfortunately I did not figure out yet, should it be COMMIT or ROLLBACK value, hehe).

The proper way to use LVM backup is to have a corresponding binary log file, like I said it will require a modification to mylvmbackup script.

I should say this is not the only way we do backups in Percona Cloud Tools. In this project we use Percona Backup Service provided by the Percona Managed Services team, and our team also uses mydumper to perform a logical backup of data.
While it works acceptably to backup hundreds of gigabytes worth of data (it is just a sequential scan, which should be easy for TokuDB), the full recovery is painful and takes unacceptably long. So mydumper backup (recovery) will be used if we ever need to perform a fine-grained recovery (i.e only small amount of specific tables).

So I hope this tip is useful if you are looking for info about how to do backups for TokuDB.

The post TokuDB tips: MySQL backups appeared first on MySQL Performance Blog.

Categories: MySQL

Priority queue optimization for filesort is now visible in MariaDB 10.0

Sergey Petrunia's blog - Mon, 2014-07-14 17:11

TL;DR: Priority queue optimization for filesort with small LIMIT is now visible in MariaDB: there is a status variable and you can also see it in the slow query log (KB page link).

A longer variant:
One of the new optimizations in MySQL 5.6 is ability to use a priority queue instead of sorting for ORDER BY … LIMIT queries. The optimization was ported into MariaDB 10.0 long ago, but we still get questions if/when it will be ported. I guess, the reason for this is that, besides the query speed, you can’t see this optimization. Neither EXPLAIN, nor EXPLAIN FORMAT=JSON or PERFORMANCE_SCHEMA or status variables give any indication whether filesort used priority queue or the regular quicksort+merge algorithm.

MySQL 5.6 has only one way one can check whether filesort used priority queue. You need to enable optimizer_trace (set optimizer_trace=1), and then run the query (not EXPLAIN, but the query itself). Then, you can look into the optimizer trace and find something like this:

... "filesort_priority_queue_optimization": { "limit": 10, "rows_estimate": 198717, "row_size": 215, "memory_available": 262144, "chosen": true }, ...

MariaDB doesn’t support optimizer_trace at the moment. Even if it did, I think it would be wrong to require one to look into the optimizer trace to find out about the picked query plan.

The natural place to show the optimization would be EXPLAIN output. We could show something like “Using filesort (priority queue)”. This was my initial intent. After looking into the source code, this turned out to be difficult to do. The logic that makes the choice between using quicksort+merge and using priority queue is buried deep inside query execution code. (As if the mess caused by late optimizations of ORDER BY and UNIONs didn’t teach anybody in MySQL team anything).

As for query execution, there are two facilities where one could record execution-time details about the query plan. They are the status variables and the slow query log.

Status variables

We’ve added Sort_priority_queue_sorts status variable. Now, the list of sort-related status variables is:

MariaDB [test]> show status like 'Sort%'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | Sort_merge_passes | 0 | | Sort_priority_queue_sorts | 1 | | Sort_range | 0 | | Sort_rows | 11 | | Sort_scan | 1 | +---------------------------+-------+

(Sort_range + Sort_scan) gives total number of sorts. Sort_priority_queue_sorts gives number of sorts that were done using priority queue.

Slow query log

Percona’s Extended statistics in the slow query log shows Filesort/Filesort_on_disk fields. We thought that adding information about priority queue use would be appropriate. Now, slow query log entries look like this:

# Time: 140714 18:30:39 # User@Host: root[root] @ localhost [] # Thread_id: 3 Schema: test QC_hit: No # Query_time: 0.053857 Lock_time: 0.000188 Rows_sent: 11 Rows_examined: 100011 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: Yes Filesort_on_disk: No Merge_passes: 0 Priority_queue: Yes SET timestamp=1405348239; select * from t1 where col1 between 10 and 20 order by col2 limit 100;

pt-query-digest is able to parse slow query logs with the new field.

What about PERFORMANCE_SCHEMA

What about PERFORMANCE_SCHEMA? After all, it is the most powerful tool for tracking query execution. It has “absorbed” some status variables into events_statements_history table. For sorting, it has these columns:

| SORT_MERGE_PASSES | bigint(20) unsigned | NO | | NULL | | | SORT_RANGE | bigint(20) unsigned | NO | | NULL | | | SORT_ROWS | bigint(20) unsigned | NO | | NULL | | | SORT_SCAN | bigint(20) unsigned | NO | | NULL | |

Should we add a SORT_PRIORITY_QUEUE_SORTS column there? We didn’t add it into 10.0 right now because of compatibility concerns. Some tools may rely on the structure of PERFORMANCE_SCHEMA tables. Also, PERFORMANCE_SCHEMA table definitions are stored on disk, and one would have to run mysql_fix_privilege_tables after a minor upgrade, which is not good.

Categories: MySQL

Simple MySQL Master HA with mysqlnd_ms

MySQL Performance Blog - Mon, 2014-07-14 14:51

I had the pleasure of presenting to the PHP Users Group Philippines a few days ago about mysqlnd_ms. The mysqlnd plugin, MySQL Master Slave, is a transparent layer on top of mysqlnd extension. This allows you to do read-write splitting and slave reads load balancing without needing to change anything from your application. But do you know you can also achieve a form of high availability with this plugin? I shared 2 forms on my presentation, using async MySQL replication either in master-slave configuration or master-master configuration, while the second form is having an all primary cluster where you can write to all nodes.

This first part is to demonstrate how you can achieve a simple HA solution using the first form. First, all the sample code here can be found on my GitHub repository. So, to use the mysqlnd_ms plugin, it uses an additional external configuration file in JSON format. This configuration file, will define your master and slave nodes, failover properties and any filters (connection selection method) you want to dictate how the algorithm will provide you the connection.

Let’s start with the mysqlnd_ms configuration I used,mysqlnd_ms_ms.ini :

{ "primary": { "master": { "master_1": { "host": "127.0.0.1", "port": "33001" } }, "slave": { } }, "standby": { "master": { "master_1": { "host": "127.0.0.1", "port": "33002" } }, "slave": { } } }

Here, I have two applications defined, one called “primary” and another called “standby”, I have not defined any slaves for simplicity. The two MySQL instances running on port 33001 and 33002 are in master-master configuration.

mysqlnd_ms.enable = 1 mysqlnd_ms.disable_rw_split = 1 mysqlnd_ms.force_config_usage = 1 mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_ms.ini

This is the custom INI file I used for the tests,master-slave.ini . The first line simply enables the plugin for use. The second line, mysqlnd_ms.disable_rw_split instructs the plugin that I should only send all queries to the master because I only have masters for this test.

As for the PHP script, the full copy can be found here, as it is a bit lengthy I will just explain the logic on what it does.

  1. To start the test, it bootstraps the test table via DROP and then CREATE queries.
  2. It then enters a for loop where it will execute an INSERT followed by a SELECT to validate the newly inserted row and additional information like the current active server id and the connection id.
  3. For every iteration of the loop, a new mysqli object is created to simulate non-persistent connections to the database server.
  4. To create the new connection, a call to the function connect_mysql  is made which returns a mysqli object when successful. An important thing to remember here is that mysqlnd_ms uses lazy connections by default, this means that when the mysqli object is created, it is not really connected yet to the server. One has to issue a statement like 'SELECT 1'  to start the connection manually or callmysqli::real_connect . Not even mysqli::ping  does not work without the former, I’ve opened this bug.
  5. After the mysqli object is returned, the INSERT statement will trigger mysqlnd_ms to actually establish the connection and then execute the statement. This is where the good part is, if the connection cannot be made, the query_write_mysql function will know and will re-request the connection from connect_mysql, this time within the connect_mysql function, connection to the primary will be retried at least 10 times if the type of error from the previous failure is something related to a connection like error numbers 2002  and2003 . If the connection cannot be established after 10 retries, the application creates a sentinel file as /tmp/PRIMARY_HAS_FAILED  and will retry the connection to the secondary (slave or passive-master).

Here is an example run, my primary has a server id or 101 while my standby is 102:

[revin@forge phpugph201407]$ php -c master-slave.ini master-slave-ng.php Last value 0001 from server id 101 thread id 7 Last value 0003 from server id 101 thread id 8 37: [2002] Connection refused Connection to host 'primary' failed: [0] Connection refused, retrying (1 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (2 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (3 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (4 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (5 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (6 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (7 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (8 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (9 of 10) in 3 seconds Connection to host 'primary' failed: [0] Connection refused, retrying (10 of 10) in 3 seconds The primary host 'primary' has failed after 30 seconds, failing over to standby! 52: [2002] Connection refused Last value 0004 from server id 102 thread id 635 Last value 0006 from server id 102 thread id 636 Last value 0008 from server id 102 thread id 637 [...]

This is not the perfect setup and there are a number of limitations, however it tells us that if you have a simple HA requirement like if you’re not running a very critical application but still do not want to be waken up at night but rather deal with issues in the morning, this might just fit. So here are some more notes:

    • If you have a master-slave configuration, you just basically shot your primary (master) in the foot during the failover. You may need to rebuild its data in the morning.
    • If instead you have master-master, you might just be able to bring the primary master back online, get it caught up in replication and then delete /tmp/PRIMARY_HAS_FAILED  file to switch your application back to it.
    • The use of /tmp/PRIMARY_HAS_FAILED  sentinel file is rudimentary, its not the only way. You should consider sending notifications to yourself when failover happens because this method requires human intervention to put back the primary master back in rotation.

The same effect can be achieved with a little more coding, but you can already take advantage of the plugin with less.

I’ve also tested the plugin on the second form where you can write to multiple masters using Percona XtraDB Cluster. I’ve found a few interesting issues there so stay tuned.

The post Simple MySQL Master HA with mysqlnd_ms appeared first on MySQL Performance Blog.

Categories: MySQL

Managing shards of MySQL databases with MySQL Fabric

MySQL Performance Blog - Fri, 2014-07-11 09:00

This is the fourth post in our MySQL Fabric series. In case you’re joining us now, we started with an introductory post, and then discussed High Availability (HA) using MySQL Fabric here (Part 1) and here (Part 2). Today we will talk about how MySQL Fabric can help you scale out MySQL databases with sharding.

Introduction

At the time of writing, MySQL Fabric includes support for range- and hash-based sharding. As with HA, the functionality is split between client, through a MySQL Fabric-aware connector; and server, through the mysqlfabric utility and the XML-RPC server we’ve talked about before.

In this post, we’ll go through the process of setting up a sharded table for use with MySQL Fabric, and then go through some usage examples, again using the Python connector.

In our next post, we’ll talk about shard management operations, and go into more detail about how we can combine the Sharding and HA features of MySQL Fabric.

The architecture

For our examples, we’ll be using a sharding branch from our vagrant-fabric repository. If you have been following previous posts and already have a local copy of the repo, you can get this one just by running the following command:

git checkout sharding

from the root of your copy. Bear in mind that the node names are the same in the Vagrantfile, so while in theory  just running vagrant provision should be enough, you may have to run vagrant destroy and vagrant up again, if you hit unexpected behavior.

The only difference between this branch and the original one is that you’ll have two mysqld instances per node: one on port 3306 and one on port 13306. This will let us achieve high availability for our shard groups. But don’t worry about that for now, it’s something we’ll discuss more in depth in our next post.

In today’s examples, we’ll be using the three group architecture described by this diagram:

The blue boxes represent shard-groups and the green box represent the global-group. The red arrows indicate the flow of replication and the violet arrows represent client connections.

Setting up sharding

The official documentation about sharding with MySQL Fabric can be found here. We’ll be using the same example employees database and shard the salaries table.

As we said, to keep things simple for the introduction, we’ll create all the groups but only add one instance to each one of them. In our next post, we’ll use two instances per group to evaluate how MySQL Fabric can make our shards highly available, and how it can rearrange replication topologies automatically after a failure.

To start, let’s create three groups:

[vagrant@store ~]$ mysqlfabric group create salaries-global Procedure : { uuid = 390aa6c0-acda-40e2-ad52-8c0869613635, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ for i in 1 2; do mysqlfabric group create salaries-$i; done Procedure : { uuid = 274742a2-5e84-49b8-8446-5a8fc55f1899, finished = True, success = True, return = True, activities = } Procedure : { uuid = 408cfd6a-ff3a-493e-b39b-a3241d83fda6, finished = True, success = True, return = True, activities = }

 

The global group will be used to propagate schema changes and to store unpartitioned data. Think of configuration tables that don’t need to be sharded, for example.

The other two groups will host shards, that is, tables that will have the same structure across all the nodes, but not the same data (and that will be empty in the global group’s nodes).

Now, let’s add one instance to each group:

[vagrant@store ~]$ mysqlfabric group add salaries-global node1:3306 Procedure : { uuid = 0d0f657c-9304-4e3f-bf5b-a63a5e2e4390, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric group add salaries-1 node2:3306 Procedure : { uuid = b0ee9a52-49a2-416e-bdfd-eda9a384f308, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric group add salaries-2 node3:3306 Procedure : { uuid = ea5d8fc5-d4f9-48b1-b349-49520aa74e41, finished = True, success = True, return = True, activities = }

We also need to promote the groups. Even though each group has a single node, MySQL Fabric sets up that node as SECONDARY, which means it can’t take writes.

[vagrant@store ~]$ mysqlfabric group promote salaries-global Procedure : { uuid = 5e764b97-281a-49f0-b486-25088a96d96b, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ for i in 1 2; do mysqlfabric group promote salaries-$i; done Procedure : { uuid = 7814e96f-71d7-4865-a278-cb6ed32a2d11, finished = True, success = True, return = True, activities = } Procedure : { uuid = cd30e9a9-b9ea-4b2d-a8ae-5e70f22363d6, finished = True, success = True, return = True, activities = }

Finally, we are ready to create a shard definition and associate ranges to groups:

[vagrant@store ~]$ mysqlfabric sharding create_definition RANGE salaries-global Procedure : { uuid = fffcbb5f-24c6-47a2-9348-f1d810c8ef2f, finished = True, success = True, return = 1, activities = } [vagrant@store ~]$ mysqlfabric sharding add_table 1 employees.salaries emp_no Procedure : { uuid = 8d0a3c51-d543-49a6-b47a-36a4ab499ab4, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric sharding add_shard 1 "salaries-1/1, salaries-2/25000" --state=ENABLED Procedure : { uuid = 2585a5ea-a097-44a4-89fa-a948298d0595, finished = True, success = True, return = True, activities =

The integer after each shard group is the lower bound for emp_no values found on that shard.

After the last command, the shard groups should be replicating off the global one. We can verify that this is the case by checking salaries-3:

[vagrant@node3 ~]$ mysql -uroot -e 'show slave statusG' *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: node1 Master_User: fabric Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 151 Relay_Log_File: mysqld-relay-bin.000002 Relay_Log_Pos: 361 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 151 Relay_Log_Space: 566 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 870101 Master_UUID: e34ab4cd-00b9-11e4-8ced-0800274fb806 Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 1

Looks good. Let’s go ahead and create the database schema. To avoid being too verbose, we’re only including the create statement for the salaries table in this example. Notice we run this on the PRIMARY node for the global group:

[vagrant@node1 ~]$ mysql -uroot Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 8 Server version: 5.6.19-log MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> CREATE DATABASE IF NOT EXISTS employees; Query OK, 1 row affected (0.01 sec) mysql> USE employees; Database changed mysql> CREATE TABLE salaries ( -> emp_no INT NOT NULL, -> salary INT NOT NULL, -> from_date DATE NOT NULL, -> to_date DATE NOT NULL, -> KEY (emp_no)); Query OK, 0 rows affected (0.06 sec)

And again, check that it made it to the shard groups:

[vagrant@node2 ~]$ mysql -uroot -e 'show databases' +--------------------+ | Database | +--------------------+ | information_schema | | employees | | mysql | | performance_schema | +--------------------+

Good. We’re now ready to use the Python connector and load some data into this table. We’ll be using the following script:

import mysql.connector from mysql.connector import fabric from mysql.connector import errors import time import random import datetime config = { 'fabric': { 'host': 'store', 'port': 8080, 'username': 'admin', 'password': 'admin', 'report_errors': True }, 'user': 'fabric', 'password': 'f4bric', 'database': 'employees', 'autocommit': 'true' } from_min = datetime.datetime(1980,1,1,00,00,00) to_max = datetime.datetime(2014,1,1,00,00,00) fcnx = None print "starting loop" while 1: if fcnx == None: print "connecting" fcnx = mysql.connector.connect(**config) fcnx.reset_cache() try: print "will run query" emp_no = random.randint(1,50000) salary = random.randint(1,200000) from_date = from_min + datetime.timedelta(seconds=random.randint(0, int((to_max - from_min).total_seconds()))) to_date = from_min + datetime.timedelta(seconds=random.randint(0, int((to_max - from_min).total_seconds()))) fcnx.set_property(tables=["employees.salaries"], key=emp_no, mode=fabric.MODE_READWRITE) cur = fcnx.cursor() cur.execute("insert into employees.salaries (emp_no,salary,from_date,to_date) values (%s, %s, %s, %s)",(emp_no,salary,from_date,to_date)) print "inserted", emp_no, ", will now sleep 1 second" time.sleep(1) except (errors.DatabaseError, errors.InterfaceError): print "sleeping 1 second and reconnecting" time.sleep(1) del fcnx fcnx = None

This is similar to the script we used in our HA post. It inserts rows with random data in an endless loop. The sleep on every iteration is there just to make it easier to cancel the script, and to keep row insert rate under control.

If you leave this running for a while, you should then be able to check the global server and individual shards, and confirm they have different data:

[vagrant@store ~]$ for i in 1 2 3; do mysql -ufabric -pf4bric -hnode$i -e "select count(emp_no),max(emp_no) from employees.salaries"; done Warning: Using a password on the command line interface can be insecure. +---------------+-------------+ | count(emp_no) | max(emp_no) | +---------------+-------------+ | 0 | NULL | +---------------+-------------+ Warning: Using a password on the command line interface can be insecure. +---------------+-------------+ | count(emp_no) | max(emp_no) | +---------------+-------------+ | 36 | 24982 | +---------------+-------------+ Warning: Using a password on the command line interface can be insecure. +---------------+-------------+ | count(emp_no) | max(emp_no) | +---------------+-------------+ | 43 | 49423 | +---------------+-------------+

As you can see, the global group’s server has no data for this table, and each shard’s server has data within the defined boundaries.

Querying data is done similarly (though with a READ_ONLY connection), and we can also lookup the group a row belongs to using the mysqlfabric utility directly:

[vagrant@store ~]$ mysqlfabric sharding lookup_servers employees.salaries 2045 Command : { success = True return = [['ecab7dd2-00b9-11e4-8cee-0800274fb806', 'node2:3306', True]] activities = } [vagrant@store ~]$ mysqlfabric sharding lookup_servers employees.salaries 142045 Command : { success = True return = [['f8a90096-00b9-11e4-8cee-0800274fb806', 'node3:3306', True]] activities = }

Bear in mind that this lookups only use the fabric store, which means they can tell you on which servers a given row may be, but can’t confirm if the row exists or not. You need to actually query the given servers for that. If you use the connector, both steps are done for you when you issue the query.

The following code snippets illustrate the point:

>>> fcnx = mysql.connector.connect(**config) >>> emp_no = random.randint(1,50000) >>> fcnx.set_property(tables=["employees.salaries"], key=emp_no, mode=fabric.MODE_READONLY) >>> >>> cur = fcnx.cursor() >>> cur.execute("select count(*) as cnt from employees.salaries where emp_no = %s", (emp_no,)) >>> >>> for row in cur: ... print row ... (0,) >>> fcnx.set_property(tables=["employees.salaries"], key=20734, mode=fabric.MODE_READONLY) >>> cur = fcnx.cursor() >>> cur.execute("select count(*) as cnt from employees.salaries where emp_no = 20734") >>> for row in cur: ... print row ... ... (1,) >>>

In our examples, we connected directly to the PRIMARY node of the global group in order to execute DDL statements, but the same can be done requesting a global connection to MySQL Fabric, like so:

fcnx.set_property(group="salaries-global",scope=fabric.SCOPE_GLOBAL,mode=fabric.MODE_READWRITE) cur = fcnx.cursor() cur.execute("USE employees") cur.execute("CREATE TABLE test (id int not null primary key)")

We can see that the table gets replicated as expected:

[vagrant@node3 ~]$ mysql -uroot -e 'show tables from employees' +---------------------+ | Tables_in_employees | +---------------------+ | salaries | | test | +---------------------+

Note that we’re explicitly indicating we want to connect to the global group here. When establishing a MySQL Fabric connection, we need to specify either a group name or a key and table pair (as in the insert example).

Summary

Today we’ve presented the basics of how MySQL Fabric can help you scale out by sharding, but we’ve intentionally left a few things out of the picture to keep this example simple.

In our next post, we’ll see how we can combine MySQL Fabric’s HA and sharding features, what support we have for shard operations and how HASH sharding works in MySQL Fabric.

The post Managing shards of MySQL databases with MySQL Fabric appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Toolkit 2.2.9 is now available

MySQL Performance Blog - Thu, 2014-07-10 11:05

Percona is glad to announce the release of Percona Toolkit 2.2.9 on July 10, 2014 (downloads are available here and from the Percona Software Repositories). This release is the current GA (Generally Available) stable release in the 2.2 series.

Bugs Fixed:

  • Fixed bug 1335960: pt-query-digest could not parse the binlogs from MySQL 5.6 because the binlog format was changed.
  • Fixed bug 1315130: pt-online-schema-change did not find child tables as expected. It could incorrectly locate tables – tables which reference a table with the same name in a different schema, and could miss tables referencing the altered table – if they were in a different schema..
  • Fixed bug 1335322: pt-stalk would fail when variable or threshold was a non-integer.
  • Fixed bug 1258135: pt-deadlock-logger was inserting older deadlocks into the deadlock table even if it was already there, therby creating unnecessary noise. For example, if the deadlock happened 1 year ago, and MySQL keeps it in the memory, pt-deadlock-logger would INSERT it into percona.deadlocks table every minute until server was restarted. This was fixed by comparing with the last deadlock fingerprint before issuing the INSERT query.
  • Fixed bug 1329422: pt-online-schema-change foreign-keys-method=none can break FK constraints in a way that is hard to recover from. Although this method of handling foreign key constraints is provided so that the database administrator can disable the tool’s built-in functionality if desired, a warning and confirmation request when using alter-foreign-keys-method “none” has been added to warn users when using this option.

Percona Toolkit is free and open-source. Details of the release can be found in the release notes and the 2.2.9 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.9 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

TokuDB gotchas: slow INFORMATION_SCHEMA TABLES

MySQL Performance Blog - Wed, 2014-07-09 16:01

We are using Percona Server + TokuDB engine extensively in Percona Cloud Tools and getting real usage operational experience with this engine. So I want to share some findings we came across, in hope it may help someone in their work with TokuDB.

So, one problem I faced is that SELECT * FROM INFORMATION_SCHEMA.TABLES is quite slow when I have thousands tables in TokuDB. How slow? For example…

select * from information_schema.tables limit 1000; ... 1000 rows in set (18 min 31.93 sec)

This is very similar to what InnoDB faced a couple years back. InnoDB solved it by adding variable innodb_stats_on_metadata.

So what happens with TokuDB? There is an explanation from Rich Prohaska at Tokutek: “Tokudb has too much overhead for table opens. TokuDB does a calculation on the table when it is opened to see if it is empty. This calculation can be disabled when ‘tokudb_empty_scan=disabled‘. ”

So let’s see what we have with tokudb_empty_scan=disabled

select * from information_schema.tables limit 1000; ... 1000 rows in set (3 min 4.59 sec)

An impressive improvement, but still somewhat slow. Tokutek promises a fix to improve it in the next TokuDB 7.2 release.

The post TokuDB gotchas: slow INFORMATION_SCHEMA TABLES appeared first on MySQL Performance Blog.

Categories: MySQL

Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator

MySQL Performance Blog - Wed, 2014-07-09 13:28

A while back, I made some changes to the plugin interface for pt-online-schema-change which allows custom replication checks to be written. As I was adding this functionality, I also added the --plugin option to pt-table-checksum. This was released in Percona Toolkit 2.2.8.

With these additions, I spent some time writing a plugin that allows Percona Toolkit tools to use Tungsten Replicator to check for slave lag, you can find the code at https://github.com/grypyrg/percona-toolkit-plugin-tungsten-replicator

Requirements

The plugin uses the perl JSON::XS module (perl-JSON-XS rpm package, http://search.cpan.org/dist/JSON-XS/XS.pm), make sure it’s available or the plugin will not work.

Preparation

We need to use the --recursion-method=dsns as the Percona Toolkit tools are not able to automatically find the tungsten replicator slaves that are connected to the master database. (I did add a blueprint on launchpad to make this possible https://blueprints.launchpad.net/percona-toolkit/+spec/plugin-custom-recursion-method)

The dsns recursion-method gets the list of slaves from a database table you specify:

CREATE TABLE `percona`.`dsns` ( `id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL, `dsn` varchar(255) NOT NULL, PRIMARY KEY (`id`) );

Here one slave node3 is replicating from the master:

node1 mysql> select * from percona.dsns; +----+-----------+---------+ | id | parent_id | dsn | +----+-----------+---------+ | 2 | NULL | h=node3 | +----+-----------+---------+

Configuration

Currently, it is not possible to specify extra options for the plugin with Percona Toolkit, so some manual editing of the perl file is still necessary to configure it.

So before we can run a checksum, we need to configure the plugin:

## CONFIGURATION # trepctl command to run my $trepctl="/opt/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl"; # what tungsten replicator service to check my $service="bravo"; # what user does tungsten replicator use to perform the writes? # See Binlog Format for more information my $tungstenusername = 'tungsten';

Running A Checksum

Here I did a checksum of a table with pt-table-checksum. During the checksum process, I brought the slave node offline and brought it back online again:

# pt-table-checksum -u checksum --no-check-binlog-format --recursion-method=dsn=D=percona,t=dsns --plugin=/vagrant/pt-plugin-tungsten_replicator.pl --databases app --check-interval=5 --max-lag=10 Created plugin from /vagrant/pt-plugin-tungsten_replicator.pl. PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lag Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting Replica node3 is stopped. Waiting. Tungsten Replicator status of host node3 is OFFLINE:NORMAL, waiting Replica lag is 125 seconds on node3. Waiting. Replica lag is 119 seconds on node3. Waiting. Checksumming app.large_table: 22% 00:12 remain TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 07-03T10:49:54 0 0 2097152 7 0 213.238 app.large_table

I recommend to change the check-interval higher than the default 1 second as running trepctl takes a while. This could slow down the process quite a lot.

Making Schema Changes

The plugin also works with pt-online-schema-change:

# pt-online-schema-change -u schemachange --recursion-method=dsn=D=percona,t=dsns --plugin=/vagrant/pt-plugin-tungsten_replicator.pl --check-interval=5 --max-lag=10 --alter "add index (column1) " --execute D=app,t=large_table Created plugin from /vagrant/pt-plugin-tungsten_replicator.pl. Found 1 slaves: node3 Will check slave lag on: node3 PLUGIN get_slave_lag: Using Tungsten Replicator to check replication lag Operation, tries, wait: copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `app`.`large_table`... Creating new table... Created new table app._large_table_new OK. Waiting forever for new table `app`.`_large_table_new` to replicate to node3... Altering new table... Altered `app`.`_large_table_new` OK. 2014-07-03T13:02:33 Creating triggers... 2014-07-03T13:02:33 Created triggers OK. 2014-07-03T13:02:33 Copying approximately 8774670 rows... Copying `app`.`large_table`: 26% 01:21 remain Copying `app`.`large_table`: 50% 00:59 remain Replica lag is 12 seconds on node3. Waiting. Replica lag is 12 seconds on node3. Waiting. Copying `app`.`large_table`: 53% 02:22 remain Copying `app`.`large_table`: 82% 00:39 remain 2014-07-03T13:06:06 Copied rows OK. 2014-07-03T13:06:06 Swapping tables... 2014-07-03T13:06:06 Swapped original and new tables OK. 2014-07-03T13:06:06 Dropping old table... 2014-07-03T13:06:06 Dropped old table `app`.`_large_table_old` OK. 2014-07-03T13:06:06 Dropping triggers... 2014-07-03T13:06:06 Dropped triggers OK. Successfully altered `app`.`large_table`.

As you can see, there was some slave lag during the schema changes.

Binlog Format & pt-online-schema-change

pt-online-schema-change uses triggers in order to do the schema changes. Tungsten Replicator has some limitations with different binary log formats and triggers (https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication).

In Tungsten Replicator, ROW based binlog events will be converted to SQL statements, which causes triggers to be executed on the slave as well, this does not happen with traditional replication.

Different settings:

  • STATEMENT based binary logging works by default
  • ROW based binary logging works, the plugin recreates the triggers and uses the technique documented at https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication
  • MIXED binary logging does not work, as there is currently no way to determine whether an event was written to the binary log in statement or row based format, so it’s not possible to know if triggers should be run or not. The tool will exit and and error will be returned:
    Error creating --plugin: The master it's binlog_format=MIXED, pt-online-schema change does not work well with Tungsten Replicator and binlog_format=MIXED.
Be Warned

The binlog_format can be overriden on a per session basis, make sure that this does NOT happen when using pt-online-schema-change.

Summary

The documentation on the Continuent website already mentions how you can compare data with pt-table-checksum.

I believe this plugin is a good addition to it. The features in Percona Toolkit that monitor replication lag can now be used with Tungsten Replicator and therefore gives you control on how much replication lag is tolerated while using those tools.

The post Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator appeared first on MySQL Performance Blog.

Categories: MySQL

TIMESTAMP Columns, Amazon RDS 5.6, and You

MySQL Performance Blog - Tue, 2014-07-08 14:18

This comes from an issue that I worked on recently, wherein a customer reported that their application was working fine under stock MySQL 5.6 but producing erroneous results when they tried running it on Amazon RDS 5.6. They had a table which, on the working server, contained two TIMESTAMP columns, one which defaulted to CURRENT_TIMESTAMP and the other which defaulted to ’0000-00-00 00:00:00′, like so:

CREATE TABLE mysql56 ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, ts1 TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, ts2 TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00', );

However, under Amazon RDS, the same table looked like this:

CREATE TABLE rds56 (  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,   ts1 TIMESTAMP NULL DEFAULT NULL, ts2 TIMESTAMP NULL DEFAULT NULL,  );

They mentioned that their schema contains TIMESTAMP column definitions without any modifiers for nullability or default values. In other words, they were doing something like this:

CREATE TABLE foo56 ( id NOT NULL AUTO_INCREMENT PRIMARY KEY, ts1 TIMESTAMP, ts2 TIMESTAMP );

It’s a known issue (or change, or difference, whatever we choose to call it) that MySQL is deprecating defaults for TIMESTAMP columns that don’t have any nullability or default-value specifiers; this is covered in the 5.6 documentation. However, the docs also mention that the default value for this setting is OFF – i.e., if you create a table with TIMESTAMP columns without any defaults, it will fill them in for you, similarly to what I’ve described above.

As it turns out, the RDS default for this setting is ON, hence the “NULL DEFAULT NULL” modifiers when creating the table under RDS. We changed the parameter group, restarted the instance (note that this variable is NOT dynamic), and their schema-creation script created the tables in the proper way.

So, what have we learned here?
  • Migrating from standalone MySQL to Amazon RDS sometimes has hidden pitfalls that aren’t always readily apparent. Many times it will “just work” – but sometimes it doesn’t. Percona is, of course, happy to help review your configurations and assist with any Amazon RDS implementation plans you might have.
  • When in doubt, fully-specify your TIMESTAMP columns. If you want them NOT NULL, say so. If you want a default value or an on-updated value, set it. Even the configuration variable explicit_defaults_for_timestamp is deprecated and slated for removal in a future version, so eventually it won’t be possible to get the old pre-5.6 behavior at all.

The post TIMESTAMP Columns, Amazon RDS 5.6, and You appeared first on MySQL Performance Blog.

Categories: MySQL

Looking out for max values in integer-based columns in MySQL

MySQL Performance Blog - Mon, 2014-07-07 10:00

Yay! My first blog post! As long as at least 1 person finds it useful, I’ve done my job. Recently, one of my long-term clients was noticing that while their INSERTs were succeeding, a particular column counter was not incrementing. A quick investigation determined the column was of type int(11) and they had reached the maximum value of 2147483647. We fixed this by using pt-online-schema-change to change the column to int(10) unsigned, thus allowing values up to 4294967295.

My client was now concerned about all his other integer-based columns and wanted me to check them all. So I wrote a quick-n-dirty script in Go to check all integer-based columns on their current value compared to the maximum allowed for that column type.

You can find the full source code in my git repo.

Here’s a quick overview; the code is pretty simple.

First we connect to MySQL and verify the connection:

db, err := sql.Open("mysql", fmt.Sprintf("%s:%s@tcp(%s:3306)/%s", mysqlUn, mysqlPw, hostToCheck, dbToCheck)) if err != nil { fmt.Printf("Error connecting to MySQL on '%s': n", hostToCheck, err) db.Close() os.Exit(1) } // Check connection is alive. err = db.Ping() if err != nil { fmt.Printf("Unable to ping mysql at '%s': %sn", hostToCheck, err) db.Close() os.Exit(1) }

Next, we query the information_schema.columns table for the names of all integer-based columns and calculate what their maximum value can be (credit for the clever SQL goes to Peter Boros).

// Construct our base i_s query var tableExtraSql string if tableToCheck != "" { tableExtraSql = fmt.Sprintf("AND TABLE_NAME = '%s'", tableToCheck) } baseSql := fmt.Sprintf(` SELECT TABLE_NAME, COLUMN_NAME, COLUMN_TYPE, (CASE DATA_TYPE WHEN 'tinyint' THEN 255 WHEN 'smallint' THEN 65535 WHEN 'mediumint' THEN 16777215 WHEN 'int' THEN 4294967295 WHEN 'bigint' THEN 18446744073709551615 END >> IF(LOCATE('unsigned', COLUMN_TYPE) > 0, 0, 1)) AS MAX_VALUE FROM information_schema.columns WHERE TABLE_SCHEMA = '%s' %s AND DATA_TYPE IN ('tinyint', 'int', 'mediumint', 'bigint')`, dbToCheck, tableExtraSql)

Now that we have this list of columns to check, we simply loop over this result set, get the MAX() of each column and print a pretty report.

// Loop over rows received from i_s query above. for columnsToCheck.Next() { err := columnsToCheck.Scan(&tableName, &columnName, &columnType, &maxValue) if err != nil { log.Fatal("Scanning Row Error: ", err) } // Check this column query := fmt.Sprintf("SELECT MAX(%s), ROUND((MAX(%s)/%d)*100, 2) AS ratio FROM %s.%s", columnName, columnName, maxValue, dbToCheck, tableName) err = db.QueryRow(query).Scan(&currentValue, &ratio) if err != nil { fmt.Printf("Couldn't get MAX(%s.%s): %sn", tableName, columnName, err) fmt.Println("SQL: ", query) continue } // Print report if ratio.Valid && ratio.Float64 >= float64(reportPct) { fmt.Printf("'%s'.'%s' - Type: '%s' - ", tableName, columnName, columnType) fmt.Printf("ColumMax: '%d'", maxValue) fmt.Printf(" - CurVal: '%d'", currentValue.Int64) fmt.Printf(" - FillRatio: '%.2f'n", ratio.Float64) } }

There are more options to the app that allow you to silence some of the verbosity and to only print report lines where the value-to-max ratio is > a user-defined threshold. If you have frequently changing schemas, this should allow you to cron the app and only receive email reports when there is a potential problem. Otherwise, this tool could be useful to run once a month/quarter, just to verify things are in good standing.

Like I said before, hopefully this helps at least 1 person catch a potential problem sooner rather than later.

The post Looking out for max values in integer-based columns in MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Failover with the MySQL Utilities: Part 2 – mysqlfailover

MySQL Performance Blog - Thu, 2014-07-03 07:00

In the previous post of this series we saw how you could use mysqlrpladmin to perform manual failover/switchover when GTID replication is enabled in MySQL 5.6. Now we will review mysqlfailover (version 1.4.3), another tool from the MySQL Utilities that can be used for automatic failover.

Summary
  • mysqlfailover can perform automatic failover if MySQL 5.6′s GTID-replication is enabled.
  • All slaves must use --master-info-repository=TABLE.
  • The monitoring node is a single point of failure: don’t forget to monitor it!
  • Detection of errant transactions works well, but you have to use the --pedantic option to make sure failover will never happen if there is an errant transaction.
  • There are a few limitations such as the inability to only fail over once, or excessive CPU utilization, but they are probably not showstoppers for most setups.
Setup

We will use the same setup as last time: one master and two slaves, all using GTID replication. We can see the topology using mysqlfailover with the health command:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root health [...] MySQL Replication Failover Utility Failover Mode = auto Next Interval = Tue Jul 1 10:01:22 2014 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB mysql-bin.000003 700 GTID Executed Set a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3 Replication Health Status +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13001 | MASTER | UP | ON | OK | | localhost | 13002 | SLAVE | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+

Note that --master-info-repository=TABLE needs to be configured on all slaves or the tool will exit with an error message:

2014-07-01 10:18:55 AM CRITICAL Failover requires --master-info-repository=TABLE for all slaves. ERROR: Failover requires --master-info-repository=TABLE for all slaves.

Failover

You can use 2 commands to trigger automatic failover:

  • auto: the tool tries to find a candidate in the list of servers specified with --candidates, and if no good server is found in this list, it will look at the other slaves to see if one can be a good candidate. This is the default command
  • elect: same as auto, but if no good candidate is found in the list of candidates, other slaves will not be checked and the tool will exit with an error.

Let’s start the tool with auto:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto

The monitoring console is visible and is refreshed every --interval seconds (default: 15). Its output is similar to what you get when using the health command.

Then let’s kill -9 the master to see what happens once the master is detected as down:

Failed to reconnect to the master after 3 attemps. Failover starting in 'auto' mode... # Candidate slave localhost:13002 will become the new master. # Checking slaves status (before failover). # Preparing candidate for failover. # Creating replication user if it does not exist. # Stopping slaves. # Performing STOP on all slaves. # Switching slaves to new master. # Disconnecting new master as slave. # Starting slaves. # Performing START on all slaves. # Checking slaves for errors. # Failover complete. # Discovering slaves for master at localhost:13002 Failover console will restart in 5 seconds. MySQL Replication Failover Utility Failover Mode = auto Next Interval = Tue Jul 1 10:59:47 2014 Master Information ------------------ Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB mysql-bin.000005 191 GTID Executed Set a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3 Replication Health Status +------------+--------+---------+--------+------------+---------+ | host | port | role | state | gtid_mode | health | +------------+--------+---------+--------+------------+---------+ | localhost | 13002 | MASTER | UP | ON | OK | | localhost | 13003 | SLAVE | UP | ON | OK | +------------+--------+---------+--------+------------+---------+

Looks good! The tool is then ready to fail over to another slave if the new master becomes unavailable.

You can also run custom scripts at several points of execution with the --exec-before, --exec-after, --exec-fail-check, --exec-post-failover options.

However it would be great to have a --failover-and-exit option to avoid flapping: the tool would detect master failure, promote one of the slaves, reconfigure replication and then exit (this is what MHA does for instance).

Tool registration

When the tool is started, it registers itself on the master by writing a few things in the specific table:

mysql> SELECT * FROM mysql.failover_console; +-----------+-------+ | host | port | +-----------+-------+ | localhost | 13001 | +-----------+-------+

This is nice as it avoids that you start several instances of mysqlfailover to monitor the same master. If we try, this is what we get:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto [...] Multiple instances of failover console found for master localhost:13001. If this is an error, restart the console with --force. Failover mode changed to 'FAIL' for this instance. Console will start in 10 seconds..........starting Console.

With the fail command, mysqlfailover will monitor replication health and exit in the case of a master failure, without actually performing failover.

Running in the background

In all previous examples, mysqlfailover was running in the foreground. This is very good for demo, but in a production environment you are likely to prefer running it in the background. This can be done with the --daemon option:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto --daemon=start --log=/var/log/mysqlfailover.log

and it can be stopped with:

$ mysqlfailover --daemon=stop

Errant transactions

If we create an errant transaction on one of the slaves, it will be detected:

MySQL Replication Failover Utility Failover Mode = auto Next Interval = Tue Jul 1 16:29:44 2014 [...] WARNING: Errant transaction(s) found on slave(s). Replication Health Status [...]

However this does not prevent failover from occurring! You have to use --pedantic:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root --pedantic auto [...] # WARNING: Errant transaction(s) found on slave(s). # - For slave 'localhost@13003': db906eee-012d-11e4-8fe1-9cebe8067a3f:1 2014-07-01 16:44:49 PM CRITICAL Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option. ERROR: Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option.

Limitations
  • Like for mysqlrpladmin, the slave election process is not very sophisticated and it cannot be tuned.
  • The server on which mysqlfailover is running is a single point of failure.
  • Excessive CPU utilization: once it is running, mysqlfailover hogs one core. This is quite surprising.
Conclusion

mysqlfailover is a good tool to automate failover in clusters using GTID replication. It is flexible and looks reliable. Its main drawback is that there is no easy way to make it highly available itself: if mysqlfailover crashes, you will have to manually restart it.

The post Failover with the MySQL Utilities: Part 2 – mysqlfailover appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.5.38-35.2 is now available

MySQL Performance Blog - Wed, 2014-07-02 14:36

Percona is glad to announce the release of Percona Server 5.5.38-35.2 on July 2, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.38, including all the bug fixes in it, Percona Server 5.5.38-35.2 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.38-35.2 milestone at Launchpad.

Bugs Fixed:

Release notes for Percona Server 5.5.38-35.2 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.38-35.2 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Using MySQL triggers and views in Amazon RDS

MySQL Performance Blog - Wed, 2014-07-02 10:00

I recently had an opportunity to migrate a customer from a physical server into Amazon’s RDS environment. In this particular case the customers’ platform makes extensive use of MySQL triggers and views.  I came across two significant issues that prevented me from following Amazon’s documentation, which basically states “use mysqldump” but doesn’t call out a specific method of dealing with MySQL triggers and views.

Amazon Relational Database Service (Amazon RDS) is a great platform if you’re looking for complete hands-off management of your MySQL environment, but comes at a cost in the area of flexibility, i.e. you don’t have SUPER privilege and this brings up additional challenges.

  1. You need to ensure you set log_bin_trust_function_creators=1 ( by default this is off, 0).
  2. You need to clean up your mysqldump syntax.

#1 is easy, you simply make a configuration change within the Amazon RDS GUI on the node’s Parameter Group to set log_bin_trust_function_creators=1 and then a restart of your Amazon RDS node.  The restart is required since without the SUPER privilege you lose access to changing DYNAMIC variables on the fly.
#2 is a little more complex.  If you go with vanilla mysqldump (from say a 5.5 mysqldump binary) on a schema that has triggers and views, you will see error 1227, something like this:

ERROR 1227 (42000) at line 27311: Access denied; you need (at least one of) the SUPER privilege(s) for this operation

You’re seeing this message because MySQL in Amazon RDS doesn’t provide the SUPER privilege, and thus you cannot set up a trigger or view to run as a different user — only a user with SUPER can do that.

mysqldump will generate syntax for a trigger like this:

DELIMITER ;; /*!50003 CREATE*/ /*!50017 DEFINER=`root`@`%`*/ /*!50003 TRIGGER `after_insert_lead` AFTER INSERT ON `leads` FOR EACH ROW BEGIN UPDATE analytics.mapping SET id_lead = NEW.id_lead WHERE mc_email = NEW.email; END */;; DELIMITER ;

and for a view like this:

/*!50001 CREATE ALGORITHM=UNDEFINED */ /*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER */ /*!50001 VIEW `admin_user_view` AS SELECT ...

The problem is in the “DEFINER” lines.

Here’s one method that worked for me:

  1. Identify all the DEFINER lines in your schema. I found it helpful to dump out a –no-data and then weed through that to get a unique list of the DEFINER lines
  2. Create a sed line for each unique DEFINER line (see my example in a moment)
  3. Include this sed line in your dump/load script

Here’s what my sed matches looked like:

sed -e 's//*!50017 DEFINER=`root`@`localhost`*///' -e 's//*!50017 DEFINER=`root`@`%`*///' -e 's//*!50017 DEFINER=`web`@`%`*///' -e 's//*!50017 DEFINER=`cron`@`%`*///' -e 's//*!50013 DEFINER=`cron`@`%` SQL SECURITY DEFINER *///' -e 's//*!50013 DEFINER=`root`@`localhost` SQL SECURITY DEFINER *///' -e 's//*!50013 DEFINER=`root`@`%` SQL SECURITY DEFINER *///' -e 's//*!50013 DEFINER=`web`@`%` SQL SECURITY DEFINER *///'

Note: the example above won’t directly work due to WordPress “helpfully” stripping my text… you need to escape the forward slashes and asterisks.

A big caveat: this method is akin to a brute force method of getting your data into Amazon RDS — you’ve lost the elegance & security of running your triggers and views as separate defined users within the database — they are all now going to run as the user you loaded them in as. If this is a show-stopper for you, contact Percona and I’d be happy to take on your case and develop a more comprehensive solution. 

Now all that’s left is to integrate this into your dump flow.  Something like this should work:

mysqldump --host=source | sed -e ... lots of lines | mysql --host=destination

I hope this helps someone!

The post Using MySQL triggers and views in Amazon RDS appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.6.19-67.0 with TokuDB (GA) now available

MySQL Performance Blog - Tue, 2014-07-01 15:15

Percona is glad to announce the release of Percona Server 5.6.19-67.0 on July 1, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.19, including all the bug fixes in it, Percona Server 5.6.19-67.0 is the current GA release in the Percona Server 5.6 series. All of Percona’s software is open-source and free. Complete details of this release can be found in the 5.6.19-67.0 milestone on Launchpad.

New Features:

  • Percona has merged a contributed patch by Kostja Osipov implementing the Multiple user level locks per connection feature. This feature fixes the upstream bugs: #1118 and #67806.
  • TokuDB storage engine support is now considered general availability (GA) quality. The TokuDB storage engine from Tokutek improves scalability and the operational efficiency of MySQL with faster performance and increased compression. It is available as a separate package and can be installed along with the Percona Server by following the instructions in the release documentation.
  • Percona Server now supports the MTR --valgrind option for a server that is either statically or dynamically linked with jemalloc.

Bugs Fixed:

  • The libperconaserverclient18.1 package was missing the library files. Bug fixed #1329911.
  • Percona Server introduced a regression in 5.6.17-66.0 when support for TokuDB storage engine was initially introduced. This regression caused spurious “wrong table structure” errors for PERFORMANCE_SCHEMA tables. Bug fixed #1329772.
  • Race condition in group commit code could lead to a race condition in PFS instrumentation code resulting in a server crash. Bug fixed #1309026 (upstream #72681).

Other bugs fixed: #1326348 and #1167486.

NOTE: There was no Percona Server 5.6.18 release because there was no MySQL Community Server 5.6.18 release. That version number was used for a MySQL Enterprise Edition release to address the OpenSSL “Heartbleed” issue.

Release notes for Percona Server 5.6.19-67.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

The post Percona Server 5.6.19-67.0 with TokuDB (GA) now available appeared first on MySQL Performance Blog.

Categories: MySQL

New feature in MariaDB 10.1: ANALYZE statement

Sergey Petrunia's blog - Mon, 2014-06-30 16:09

Last week, yours truly has pushed a new feature into MariaDB 10.1 tree: ANALYZE statement.

The idea of this feature is to make it easy to compare query plan with query execution. ANALYZE statement will run the statement, and produce EXPLAIN-like output, where optimizer’s estimates are followed by numbers that were observed when running the query. The output looks like this:

Here,

  • Next to rows there is r_rows column which shows how many records were read from the table.
  • Next to filtered there is r_filtered column which shows which fraction of records was left after the part of the WHERE condition attached to the table was checked.

I think this should explain the feature. If you want more details, please refer to the KB article ANALYZE statement. It also discusses the meaning of the above EXPLAIN output.

Technical details and further plans

ANALYZE currently uses its own counters. Counting is done for all queries, including non-ANALYZE queries. This should be okay (not have visible overhead) as long as counting just increments integer variables in the query plan, without doing any atomic operations or making syscalls.

The upside of this approach is that it’s now trivial to make Explain in the slow query log also print ANALYZE output. When a query runs slowly, you will be able to know where exactly the optimizer was wrong.

The downside is that getting more data will not be as easy. So far, the most requested numbers beyond r_rows and r_filtered were r_time(amount of time spent in reading the table) and r_io(amount of IO that we did on the table). Counting the amount of time that was spent while reading each row will impose CPU overhead, it is a known problem. Counting IO is just incrementing a counter, but it will require interaction between ANALYZE code and storage engine(s) code, which will add complexity.

There is PERFORMANCE_SCHEMA feature, where others have already spent a lot of effort to count wait time and IO. It’s tempting to reuse it. The problem is, P_S collects the wrong data. P_S counters are global, while ANALYZE needs to count IO for each table reference separately. Consider a self-join. From P_S point of view, it is reading from the same table. From ANALYZE point of view, it is reads from two different table references. I’m currently not sure whether ANALYZE should/could rely on PERFORMANCE_SCHEMA.

A totally different angle is that tabular EXPLAIN output doesn’t allow to show much data (for example, how many rows were there before/after GROUP BY?). Here the solution is clear, I think: support EXPLAIN FORMAT=JSON and then add ANALYZE FORMAT=JSON where we can provide lots of detail.

Categories: MySQL
Syndicate content