Measuring Docker IO overhead

MySQL Performance Blog - Thu, 2016-02-11 20:38

This will be another post on using Percona Server via a Docker image. I want to follow up on my previous post regarding CPU/Network overhead in Docker “Measuring Percona Server Docker CPU/network overhead” by measuring  if there is any docker IO overhead on operations.

After running several tests, it appears (spoiler alert) that there is no Docker IO overhead. I still think it is useful to understand the different ways Docker can be used with data volumes, however. Docker’s philosophy is to provide ephemeral containers, but ephemeral does not work well for data – we do not want our data to disappear.

So, the first pattern is to create data inside a docker container. This is the default mode:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -p 3306:3306 -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

(I am using --net=host to avoid network overhead; check the previous post for more information.)

The second pattern is to use an external data volume, there we need to substitute the data volume with -v /data/flash/d1/:/var/lib/mysql. The full command is:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -p 3306:3306 -v /data/flash/d1/:/var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

Finally, there is third pattern: using data volume containers. For this example, I created a dummy container:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13-data-volume -v /var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

After stopping the ps13-data-volume container, we can start a real one using the data volume from ps13-data-volume  as:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps14 --volumes-from ps13-data-volume -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

I compared all these modes with Percona Server running on a bare metal box, and direct mounted in sysbench, for both read-intensive and write-intensive IO workloads. For the reference, sysbench command is:

./sysbench --test=tests/db/oltp.lua --oltp_tables_count=16 --oltp_table_size=10000000 --num-threads=16 --mysql-host= --mysql-user=root --oltp-read-only=off --max-time=1800 --max-requests=0 --report-interval=10 run

I’m not going to show the final numbers or charts, as the results are identical for all docker modes and for the bare metal case. So I can confidently say there is NO IO overhead for any docker data volume pattern described above.

As next experiment, I want to measure the Docker container overhead in a multi-host network environment.

Categories: MySQL

Percona Live featured talk with Sergej Jurecko: ActorDB — an alternative view of a distributed database

MySQL Performance Blog - Wed, 2016-02-10 21:28

Welcome to the first of several discussions with some of our upcoming Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the talks that will happen at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!

In this first installment, we’ll meet Sergej Jurecko, co-founder of Biokoda d.o.o. His talk will be ActorDB – an alternative view of a distributed database.  ActorDB is a database that was developed using a distributed model: it uses an SQL database that speaks the MySQL client/server protocol. I had a chance to speak with Sergej and get some insight into what his talk will cover:

Percona: Give me brief history of yourself: how did you get into database development, where do you work, what you love about it?

Sergej: I am a co-founder of a private company in Slovenia named Biokoda. Our clients range from small companies to telecoms who offer our solutions to their customers and government institutions. The requirements that our products try to solve always include high availability, ease-of-management, ease-of-scale and self-hosted.

A few years ago we were tasked with building a file sync app. This requires you to store a potentially very large file hierarchy for every user. When it came to choosing a database, our options were KV stores, traditional SQL databases and document stores (which were much less mature then they are now).

Designing a database that would be an ideal fit for our use case and requirements became a fun engineering challenge. Then writing it became a fun engineering challenge. Sure, it would have been safer and easier to stick with an existing mature SQL database, but I’m an eternal engineering optimist. Sometimes you have to take a crazy chance if you believe in it!

Percona: Your talk is going to be on “ActorDB – an alternative view of a distributed database.” What is it about distributed databases that causes concern for people? What are the pros and cons? And what affects could it have on application performance?

Sergej: The biggest concern, and rightfully so, is safety. There are many pitfalls developers of distributed databases can fall into. The most basic issues are: What is the consensus algorithm, is it implemented correctly and thoroughly tested? What is the storage engine, is it custom built? If yes how well is its reliability tested?

The advantage of getting it right is a way to store state without a single point of failure. It is a way to horizontally grow your database with your needs. If your database is a part of your products, these things become important selling points for your products.

When it comes to performance, distributed databases tend to lose out on a per-node basis. But because they can scale out to more nodes, they can achieve  higher performance by an order of magnitude.

What we tried to do is base ActorDB on as much solid, proven ground as possible. We avoided developing our own storage and SQL engine, and instead based it on existing proven technology. We even avoided developing our own client protocol and libraries.

Percona: Does scaling horizontally cause difficulties with expense justification? How would you characterize the ROI for horizontal versus vertically scaling?

Sergej: I’m not sure how much that is even a factor. It depends on what kind of customers you are speaking to and what their needs are. The kind of companies we are in contact with often use horrifically inefficient languages as a base for their products, because those languages make solving problems in them easier. The ROI is in faster development time.

One could make the same case with distributed databases. Excluding KV stores, you still have structured values, indexes and sometimes SQL. If horizontally distributed databases solve more problems for you than vertically distributed ones, then that is the ROI. They spare you from solving those difficult problems.

The industry has moved on from the one-size-fits-all mentality. Distributed databases are not a replacement for traditional monolithic ones. There are things possible in monolithic databases that are not possible in distributed ones, and vice versa. There is room for both, and now developers have a choice as to what best fits their needs.

I think the tech industry has more in common with the fashion industry than we like to admit. Technologies grow and die in popularity much like fashion. When new concepts like eventual consistency rise up, we sometimes get a bit too enthusiastic about them. Right now I think the traditional way of thinking is coming back a bit. It turns out a nice SQL interface to the database is important, and new and untested storage engines are pretty dangerous.

Percona: What do you see as an issue that we the community needs to be on top of with regard to distributed database development? What keeps you up at night with regard to ActorDB and the implementation of your solution?

Sergej: Well I’ve already mentioned the main issues: distributed consensus and storage engine. The key issue for us is in our Raft implementation. At the end of the day, a database must have solid performance – which means you can’t just grab an off the shelf Raft implementation and use it. It must be tightly integrated with the storage engine.

But what literally keeps me up at night is the unexplored potential that we see in the product. There are so many interesting avenues we have not developed yet.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Sergej: I’m looking forward to discovering something new and finding out what others are doing. But mostly getting more feedback, especially if negative! That is often the most useful kind of feedback.

You can read more of Sergej’s thoughts and about ActorDB at the Biokoda blog.

Want to find out more about Sergej and ActorDB? Register for Percona Live Data Performance Conference 2016, and come see his talk ActorDB – an alternative view of a distributed database. Use the code “FeaturedTalk”and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Categories: MySQL

Estimating potential for MySQL 5.7 parallel replication

MySQL Performance Blog - Wed, 2016-02-10 18:19

Unlike MySQL 5.6, where parallel replication can only be used when replicas have several schemas, MySQL 5.7 replicas can read binlog group commit information coming from the master to replicate transactions in parallel even when a single schema is used. Now the question is: how many replication threads should you use?

A simple benchmark

Let’s assume we have one master and three slaves, all running MySQL 5.7.

One slave is using regular single-threaded replication (the control slave in the graph below), one is using 20 parallel workers (MTS 20 workers) and the last one is using 100 parallel workers (MTS 100 workers).

As a reminder, here is the settings that need to be adjusted for 5.7 parallel replication:

slave_parallel_type = LOGICAL_CLOCK slave_parallel_workers = 20

GTID replication is also highly recommended, if you don’t want to face annoying issues.

Now let’s run a simple sysbench workload, inserting records in 25 separate tables in the same database with 100 concurrent threads:

sysbench --mysql-user=root --mysql-db=db1 --test=/usr/share/doc/sysbench/tests/db/insert.lua --max-requests=500000 --num-threads=100 --oltp-tables-count=25 run

Because we’re using 100 concurrent threads on the master, we can expect that some parallelization is possible. This means that if we see replication lag with the control slave, we’ll probably see less lag with the 20-worker slave and even less with the 100-worker slave.

This is not exactly what we get:

Parallel replication is indeed useful, but the 100-worker slave doesn’t provide any benefits compared to the 20-worker slave. Replication lag is even slightly worse.

What happened?

Some instrumentation with performance_schema

To have a better understanding of how efficiently the parallel replication threads are used, let’s enable some instrumentation on slaves (in other words, recording executed transactions):

UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME LIKE 'events_transactions%'; UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' WHERE NAME = 'transaction';

For better readability of the results, let’s create a new view (tracking how many transactions are executed by each replication thread):

CREATE VIEW mts_summary_trx AS select performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID AS THREAD_ID, performance_schema.events_transactions_summary_by_thread_by_event_name.COUNT_STAR AS COUNT_STAR from performance_schema.events_transactions_summary_by_thread_by_event_name where performance_schema.events_transactions_summary_by_thread_by_event_name.THREAD_ID in (select performance_schema.replication_applier_status_by_worker.THREAD_ID from performance_schema.replication_applier_status_by_worker);

Now, after running sysbench again, let’s do some math to see how often each replication thread was run:

SELECT SUM(count_star) FROM mts_summary_trx INTO @total; SELECT 100*(COUNT_STAR/@total) AS PCT_USAGE FROM mts_summary_trx; +-----------+ | PCT_USAGE | +-----------+ | 39.5845 | | 31.4046 | | 12.0119 | | 5.9081 | | 3.0375 | | 1.6527 | | 1.0550 | | 0.7576 | | 0.6089 | | 0.5208 | | 0.4642 | | 0.4157 | | 0.3832 | | 0.3682 | | 0.3408 | | 0.3247 | | 0.3076 | | 0.2925 | | 0.2866 | | 0.2749 | +-----------+

We can see that the workload has a limited potential for parallelism – therefore, it’s not worth configuring more than 3-4 replication threads.

The slight performance degradation with 100 replication threads is probably due to the overhead of the coordinator thread.


Estimating the optimal number of replication threads with MySQL 5.7 parallel replication is quite difficult if your just guessing. The performance_schema provides a simple way to understand how the workload is handled by the replication threads.

It also allows you to see if tuning binlog_group_commit_sync_delay provides more throughput on slaves without too much impact on the master’s performance.

Categories: MySQL

EXPLAIN FORMAT=JSON: buffer_result is not hidden!

MySQL Performance Blog - Tue, 2016-02-09 20:41

Time for another entry in the EXPLAIN FORMAT=JSON is cool! series. Today we’re going to look at how you can view the buffer result using JSON (instead of the regular EXPLAIN command.

Regular EXPLAIN does not identify if SQL_BUFFER_RESULT was used at all. To demonstrate, let’s run this query:

mysql> explain select * from salariesG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.01 sec) Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Now, let’s compare it to this query:

mysql> explain select sql_buffer_result * from salariesG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: Using temporary 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,`employees`.`salaries`.`from_date` AS `from_date`,`employees`.`salaries`.`to_date` AS `to_date` from `employees`.`salaries`

Notice there is no difference, except the expected "Using temporary" value in the "Extra" row of the second query. The field "Using temporary"  is expected here, because SQL_BUFFER_RESULT  directly instructs the MySQL server to put a result set into a temporary table to free locks. But what if the query uses the temporary table by itself? For example, for a grouping operation? In this case, the EXPLAIN result for the original query and the query that contains the SQL_BUFFER_RESULT  clause will be 100% identical.


mysql> explain select emp_no, salary/avg(salary) from salaries group by emp_no, salaryG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`


mysql> explain select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: Using temporary; Using filesort 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

There is no difference! We not able to tell if we used a temporary table to resolve the query, or simply put the result set into the buffer. The EXPLAIN FORMAT=JSON  command can help in this case as well. Its output is clear, and shows all the details of the query optimization:

mysql> explain format=json select sql_buffer_result emp_no, salary/avg(salary) from salaries group by emp_no, salaryG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "3073970.40" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "2557022.00" }, "buffer_result": { "using_temporary_table": true, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "emp_no", "salary", "from_date" ] } } } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select sql_buffer_result `employees`.`salaries`.`emp_no` AS `emp_no`,(`employees`.`salaries`.`salary` / avg(`employees`.`salaries`.`salary`)) AS `salary/avg(salary)` from `employees`.`salaries` group by `employees`.`salaries`.`emp_no`,`employees`.`salaries`.`salary`

Firstly, we can see how the grouping_operation was optimized:

"grouping_operation": { "using_temporary_table": true, "using_filesort": true,

And it does indeed use the temporary table.

Now we can follow the details for SQL_BUFFER_RESULT:

"buffer_result": { "using_temporary_table": true,

With this output, we can be absolutely certain that the temporary table was created for both the  SQL_BUFFER_RESULT and the grouping operation. This is especially helpful for support engineers who need the EXPLAIN  output to help their customers to tune queries, but are afraid to ask for the same query twice — once with the SQL_BUFFER_RESULT clause and once without.

Conclusion: EXPLAIN FORMAT=JSON  does not hide important details for query optimizations.

Categories: MySQL

Percona Server 5.7.10-2 second RC available

MySQL Performance Blog - Mon, 2016-02-08 17:37

Percona is glad to announce the second release candidate of Percona Server 5.7.10-2 on February 8, 2016. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.7.10, including all the bug fixes in it, Percona Server 5.7.10-2 is the current Release Candidate release in the Percona Server 5.7 series. All of Percona’s software is open-source and free, all the details of the release can be found in the 5.7.10-2 milestone at Launchpad.

New Features:

  • Complete list of changes between Percona Server 5.6 and 5.7 can be seen in Changed in Percona Server 5.7.
  • 5.7 binlog group commit algorithm is now supported in TokuDB as well.
  • New TokuDB index statistics reporting has been implemented to be compatible with the changes implemented in upstream 5.7. Following the InnoDB example, the default value for tokudb_cardinality_scale_percent has been changed from 50% to 100%. Implementing this also addresses a server crash deep in the optimizer code.

Known Issues:

  • In Percona Server 5.7 super_read_only feature has been replaced with the upstream implementation. There are currently two known issues compared to Percona Server 5.6 implementation:
    • Bug #78963, super_read_only aborts STOP SLAVE if variable relay_log_info_repository is set to TABLE which could lead to a server crash in Debug builds.
    • Bug #79328, super_read_only set as a server option has no effect.
  • InnoDB crash recovery might fail if innodb_flush_method is set to ALL_O_DIRECT. The workaround is to set this variable to a different value before starting up the crashed instance (bug #1529885).

Bugs Fixed:

  • Clustering secondary index could not be created on a partitioned TokuDB table. Bug fixed #1527730 (#720).
  • Percona TokuBackup was failing to compile with Percona Server 5.7. Bug fixed #123.
  • Granting privileges to a user authenticating with PAM Authentication Plugin could lead to a server crash. Bug fixed #1521474.
  • TokuDB status variables were missing from Percona Server 5.7.10-1. Bug fixed #1527364 (#923).
  • Attempting to rotate the audit log file would result in audit log file name foo.log.%u (literally) instead of a numeric suffix. Bug fixed #1528603.
  • Adding an index to an InnoDB temporary table while expand_fast_index_creation was enabled could lead to server assertion. Bug fixed #1529555.
  • TokuDB would not be upgraded on Debian/Ubuntu distributions while performing an upgrade from Percona Server 5.6 to Percona Server 5.7 even if explicitly requested. Bug fixed #1533580.
  • Server would assert when both TokuDB and InnoDB tables were used within one transaction on a replication slave which has binary log enabled and slave updates logging disabled. Bug fixed #1534249 (upstream bug #80053).
  • MeCab Full-Text Parser Plugin has not been included in the previous release. Bug fixed #1534617.
  • Fixed server assertion caused by Performance Schema memory key mix-up in SET STATEMENT ... FOR ... statements. Bug fixed #1534874.
  • Setting the innodb_sched_priority_purge (available only in debug builds) while purge threads were stopped would cause a server crash. Bug fixed #1368552.
  • Enabling TokuDB with ps_tokudb_admin script inside the Docker container would cause an error due to insufficient privileges even when running as root. In order for this script to be used inside docker containers this error has been changed to a warning that a check is impossible. Bug fixed #1520890.
  • Write-heavy workload with a small buffer pool could lead to a deadlock when free buffers are exhausted. Bug fixed #1521905.
  • InnoDB status will start printing negative values for spin rounds per wait, if the wait number, even though being accounted as a signed 64-bit integer, will not fit into a signed 32-bit integer. Bug fixed #1527160 (upstream #79703).
  • Percona Server 5.7 couldn’t be restarted after TokuDB has been installed with ps_tokudb_admin script. Bug fixed #1527535.
  • Fixed memory leak when utility_user is enabled. Bug fixed #1530918.
  • Page cleaner worker threads were not instrumented for Performance Schema. Bug fixed #1532747 (upstream bug #79894).
  • Busy server was preferring LRU flushing over flush list flushing too strongly which could lead to performance degradation. Bug fixed #1534114.
  • was missing from a binary tarball. Bug fixed #1537129.
  • When cmake/make/make_binary_distribution workflow was used to produce binary tarballs it would produce tarballs with mysql-... naming instead of percona-server-.... Bug fixed #1540385.
  • Added proper memory cleanup if for some reason a table is unable to be opened from a dead closed state. This prevents an assertion from happening the next time the table is attempted to be opened. Bug fixed #917.
  • Variable tokudb_support_xa has been modified to prevent setting it to anything but ON/ENABLED and to print a SQL warning anytime an attempt is made to change it, just like innodb_support_xa. Bug fixed #928.

Other bugs fixed: #1179451, #1534246, #1524763, #1525109 (upstream #79569), #1530102, #897, #898, #899, #900, #901, #902, #903, #905, #906, #907, #908, #909, #910, #911, #912, #913, #915, #919, and #904.

Release notes for Percona Server 5.7.10-2 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

Categories: MySQL

Percona XtraBackup 2.4.0-rc1 is now available

MySQL Performance Blog - Mon, 2016-02-08 17:35

Percona is glad to announce the first release candidate of Percona XtraBackup 2.4.0-rc1 on February 8th 2016. Downloads are available from our download site and from apt and yum repositories.

This is a Release Candidate quality release and it is not intended for production. If you want a high quality, Generally Available release, the current Stable version should be used (currently 2.3.3 in the 2.3 series at the time of writing).

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release contains all of the features and bug fixes in Percona XtraBackup 2.3.3, plus the following:

New Features:

  • Percona XtraBackup has implemented basic support for MySQL 5.7 and Percona Server 5.7.

Known Issues:

  • Backed-up table data could not be recovered if backup was taken while running OPTIMIZE TABLE (bug #1541763) or ALTER TABLE ... TABLESPACE (bug #1532878) on that table.
  • Compact Backups currently don’t work due to bug #1192834.

Release notes with all the bugfixes for Percona XtraBackup 2.4.0-rc1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Categories: MySQL

MySQL 5.7: Introduction for Operational DBAs

MySQL Performance Blog - Mon, 2016-02-08 15:18

Join us Tuesday, February 16, 2016 9:00am PST (UTC-8) for a webinar on the operational and monitoring features of MySQL 5.7 with Percona CEO Peter Zaitsev.

MySQL 5.7 is a great release, providing valuable features for both daily operations and ongoing development. In this, part two of our ongoing 5.7 webinar series, we will look into the new MySQL 5.7 features and enhancements that improve operations, with a specific focus on monitoring. These include:

  • An improved optimizer, including updates to EXPLAIN and enhanced JSON support
  • Performance and scalability improvements for the InnoDB storage engine, including temp tables, spatial types and full text search parsing
  • Security improvements, such as a password expiration policy
  • Performance and sys schema improvements, such as memory usage, metadata locks, monitoring capabilities and reduced footprint/overhead
  • Better online server management
  • Improved replication functions, including new statements for multi-source replication and better monitoring

MySQL 5.7 promises to be faster, more robust, and more secure. We look forward to seeing you at the webinar!

Peter Zaitsev co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Percona was named to the Inc. 5000 in 2013. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. As CEO of Percona, Peter enjoys mixing business leadership with hands on technical expertise. Peter is co-author of High Performance MySQL published by O’Reilly, one of the most popular books on MySQL performance. Peter blogs regularly on and speaks frequently at conferences. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

Categories: MySQL

Measuring Percona Server Docker CPU/network overhead

MySQL Performance Blog - Fri, 2016-02-05 18:55

Now that we have our Percona Server Docker images, I wanted to measure the performance overhead when we run the database in the container. Since Docker promises to use a lightweight container, in theory there should be very light overhead. We need to verify that claim, however. In this post I will show the numbers for CPU and network intensive workloads, and later I will take a look at IO.

For the CPU-bound load, I will use a sysbench OLTP read-only workload with data fitting into memory (so there is no IO performed, and the execution path only goes through the network and CPU).

My server is 24 cores (including hyper-threads), with Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz CPUs, RAM: 256GB, OS: Ubuntu 14.04. The Docker version is the latest on the day of publishing, which is 1.9.1.

First, I measured the throughput on a bare server, without containers – this will be the baseline. For reference, the command I used is the following:

/opt/sysbench/sysbench --test=/opt/tests/db/oltp.lua --oltp_tables_count=8 --oltp_table_size=10000000 --num-threads=16 --mysql-host= --mysql-user=root --oltp-read-only=on --max-time=1800 --max-requests=0 --report-interval=10 run

On the bare metal system, the throughput is 7100 transactions per second (tps).

In the next experiment, I started Percona Server in a Docker container and connect to it from the host:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -p 3306:3306 -v /data/flash/d1/:/var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

In this case, the container exposed port 3306 to the host, and we used that as an access point in sysbench.

The throughput in this scenario is 2200 tps!!! That is a significant overhead. I suspect it comes from the Docker gateway, which is added to the execution path when we connect through port forwarding.

So to avoid the Docker gateway, in the next run I used the host network by running the container with --net=host:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -v /data/flash/d1/:/var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

In this case the container ran directly in the host network stack, so this should exclude any Docker network overhead. In this case, the throughput is basically back to 7100 tps.

From these tests, I can make an important conclusion. There is NO measurable CPU overhead when running Percona Server in a Docker container. But the network path raises some questions.

So in the next experiment I ran both sysbench and MySQL in two different containers, connected over the Docker network bridge.

I created a sysbench container, which you can get from:

To run sysbench:

docker run --name sb -t percona/sysbench

Just for the reference, I created a Docker network:

docker network create sysbenchnet

and connected both containers to the same network:

docker network connect sysbenchnet ps13; docker network connect sysbenchnet sb;

In this configuration, the throughput I’ve observed is 6300 tps.

So there is still some network overhead, but not as significant as with the port gateway case.

For the last example, I again excluded the network path and ran the sysbench container inside the MySQL container network stack using the following command:

docker run --name sb --net container:ps13 -t percona/sysbench

The throughput in this configuration is back to 7100 tps. 

And the conclusion, again, is that there is no CPU overhead even if we run both client and server inside containers, but there is some network overhead – even when running on the same host. It will be interesting to measure the network overhead when the containers are on different physical hosts.

The following chart summarizes the results:

Next time I will try to measure IO overhead in Docker containers.

Categories: MySQL

MySQL password expiration features to help you comply with PCI-DSS

MySQL Performance Blog - Thu, 2016-02-04 15:12

PCI Compliance (section 8.2.4) requires users to change password every 90 days. Until MySQL 5.6.6 there wasn’t a built-in way to comply with this requirement.

Since MySQL version 5.6.6 there’s a password_expired feature which allows to set a user’s password as expired.
This has been added to the mysql.user table and its default value it’s “N.” You can change it to “Y” using the ALTER USER statement.

Here’s an quick example on how to set expiration date for a MySQL user account:

mysql> ALTER USER 'testuser'@'localhost' PASSWORD EXPIRE;

Once this is set to “Y” the username will still be able to login to the MySQL server, but it will not be able to run any queries before setting the new password. You will instead get an ERROR 1820 message:

mysql> SHOW DATABASES; ERROR 1820 (HY000): You must SET PASSWORD before executing this statement

Keep in mind that this does not affect any current connections the account has open.

After setting a new password, all operations performed using the account will be allowed (according to the account privileges):

mysql> SET PASSWORD=PASSWORD('mechipoderranen'); Query OK, 0 rows affected (0.00 sec) mysql> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | data | | logs | | mysql | | performance_schema | | test | +--------------------+ 6 rows in set (0.00 sec) mysql>

This allows administrators to perform password expiration by scheduling the ALTER USER via cron.

Since MySQL 5.7.4, this has been improved and there’s a new feature to set a policy for password expiration, that provides more control through a global variable, default_password_lifetime which allows to set a global automatic password expiration policy.

Example usage:

Setting a default value on our configuration file. This will set all account passwords to expire every 90 days, and will start counting from the day this variable was set effective on your MySQL server:

[mysqld] default_password_lifetime=90

Setting a global policy for the passwords to never expire. Note this is the default value (so it is not strictly necessary to declare in the configuration file):

[mysqld] default_password_lifetime=0

This variable can also be changed at runtime if the user has SUPER privileges granted:

mysql> SET GLOBAL default_password_lifetime = 90; Query OK, 0 rows affected (0.00 sec)

You can also set specific values for each user account using ALTER USER. This will override the global password expiration policy. Please note that ALTER USER only understands INTERVAL expressed in DAY:


Disable password expiration:

ALTER USER 'testuser'@'localhost' PASSWORD EXPIRE NEVER;

Set to default value, which is the current value of default_password_lifetime:


Since MySQL 5.7.6, you can use the ALTER USER to change the user’s password:

mysql> ALTER USER USER() IDENTIFIED BY '637h1m27h36r33K'; Query OK, 0 rows affected (0.00 sec)

For more information on this variable, please refer to the documentation page:

Bonus post:

Another new feature in MySQL 5.7.8 related to user management is locking/unlocking user accounts when CREATE USER, or at a later time running the ALTER USER statement.

In this example, we will first create a username with the ACCOUNT LOCK:

mysql> CREATE USER 'furrywall'@'localhost' IDENTIFIED BY '71m32ch4n6317' ACCOUNT LOCK; Query OK, 0 rows affected (0.00 sec)

As you can see below, the newly created user gets an ERROR 3118 message while trying to login:

$ mysql -ufurrywall -p Enter password: ERROR 3118 (HY000): Access denied for user 'furrywall'@'localhost'. Account is locked.

We can unlock the account using the ALTER USER ... ACCOUNT UNLOCK; statement:

mysql>ALTER USER 'furrywall'@'localhost' ACCOUNT UNLOCK; Query OK, 0 rows affected (0.00 sec)

Now the user account is unlocked and accessible:

$ mysql -ufurrywall -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 17 Server version: 5.7.8-rc MySQL Community Server (GPL) Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>

If necessary, you can lock it again:

mysql> ALTER USER 'furrywall'@'localhost' ACCOUNT LOCK; Query OK, 0 rows affected (0.00 sec)

Please check this following documentation for more details:

Categories: MySQL

New GIS Features in MySQL 5.7

MySQL Performance Blog - Wed, 2016-02-03 15:25

MySQL 5.7 has been released, and there are some exciting new features now available that I’m going to discuss in this blog — specifically around geographic information system (GIS).

I’ve used GIS features in MySQL for a long time. In my previous blog entries I’ve shown how to create geo-enabled applications with MySQL 5.6 and use MySQL 5.6 geo-spatial functions. In this blog post, I’ll look into what is new in MySQL 5.7 and how we can use those new features in practice for geo-enabled applications.

New in MySQL 5.7

MySQL 5.7 introduces the following major improvements and features for GIS:

  1. Spatial indexes for InnoDB. Finally it is here! This was a long overdue feature, which also prevented many companies from converting all tables to InnoDB.
  2. st_distance_sphere: native function to calculate a distance between two points on earth. Finally it is here as well! Like many others, I’ve created my stored procedure to calculate the distance between points on earth using haversine formula. The native function is ~20x faster than the stored procedure (in an artificial benchmark, see below). This is not surprising, as stored procedures are slow computationally – especially for trigonometrical functions.
  3. New functions: GeoHash and GeoJSON. With GeoJSON we can generate the results that are ready for visualizing on Google Maps.
  4. New GIS implementation based on Boost.Geometry library. This is great news, as originally GIS was implemented independently from scratch with a very limited set of features. Manyi Lu from MySQL server team provides more reasoning behind the choice of Boost.Geometry.

This is the great news. The bad news is that except for the st_distance_sphere, all other functions use planar geometry (no change since MySQL 5.6) and do not support Spatial Reference System Identifier (SRID). That means that if I want to calculate the distance of my favorite bike path in miles or kilometers, I’ll still have to use a stored function (see below for an example) or write an application code for that. Native function st_distance will ignore SRID for now and return a value which represents a distance on a planar – not very useful for our purposes (may be useful for order by / compare).

Distance on Sphere

MySQL 5.7 introduces the function st_distance_sphere, which uses a haversine formula to calculate distance. He is the example:

mysql> select st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)); +--------------------------------------------------------------------------------+ | st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)) | +--------------------------------------------------------------------------------+ | 3855600.7928957273 | +--------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

The distance is in meters by default (you can also change the radius of the earth to meters using the 3rd optional parameter, default: 6,370,986). Although our earth is represented as an oblate spheroid, all practical applications use the distance on a sphere. The difference between the haversine formula and more precise (and much slower) functions is negligible for our purposes.

The st_distance_sphere is much faster than using stored routines. Here is the artificial benchmark:

mysql> select benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)); +-----------------------------------------------------------------------------------------+ | benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)) | +-----------------------------------------------------------------------------------------+ | 0 | +-----------------------------------------------------------------------------------------+ 1 row in set (22.55 sec) mysql> select benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))); +----------------------------------------------------------------------------------------------------+ | benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))) | +----------------------------------------------------------------------------------------------------+ | 0 | +----------------------------------------------------------------------------------------------------+ 1 row in set (0.77 sec)

haversine_distance_sp is a stored routine implementation of the same algorithm.

InnoDB GIS example: find 10 restaurants near me 

In my previous blog post I’ve demonstrated how to use st_within function to find restaurants inside my zipcode (US postal code) and sort by distance. In MySQL 5.7 there will be 2 changes:

  1. We can use InnoDB table
  2. We can use st_distance_sphere function

For this example, I’ve converted Open Street Map data to MySQL and then created a new InnoDB table:

CREATE TABLE `points_new` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `osm_id` text, `name` text, `barrier` text, `highway` text, `ref` text, `address` text, `is_in` text, `place` text, `man_made` text, `other_tags` text, UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE=InnoDB AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1

SHAPE is declared as geometry (and stores points in this table). We also have SPATIAL KEY SHAPE in the InnoDB table.

The following query will find all cafe or restaurants in Durham, NC (zipcode: 27701):

SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist FROM points_new WHERE st_within(shape, (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') and name is not null ORDER BY dist asc LIMIT 10;

Table tl_2013_us_zcta510 stores the shapes of polygons for all US zipcodes. (It needs to be converted to MySQL.) In this example I’m using st_within to filter only the POIs I need, and st_distance_sphere to get the distance from my location (-78.9064543 35.9975194 are the coordinates of Percona’s office in Durham) to the restaurants.

Explain plan:

mysql> EXPLAIN -> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist -> FROM points_new -> WHERE st_within(shape, -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') -> and name is not null -> ORDER BY dist asc LIMIT 10G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: points_new partitions: NULL type: range possible_keys: SHAPE key: SHAPE key_len: 34 ref: NULL rows: 21 filtered: 18.89 Extra: Using where; Using filesort *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: tl_2013_us_zcta510 partitions: NULL type: ref possible_keys: zcta5ce10 key: zcta5ce10 key_len: 8 ref: const rows: 1 filtered: 100.00 Extra: NULL 2 rows in set, 1 warning (0.00 sec)

That looks pretty good: MySQL is using and index on the SHAPE field (even with the subquery, btw).


mysql> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist, st_astext(shape) -> FROM points_new -> WHERE st_within(shape, -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') -> and name is not null -> ORDER BY dist asc LIMIT 10; +------------+----------------------------+--------+--------------------------------------+ | osm_id | name | dist | st_astext(shape) | +------------+----------------------------+--------+--------------------------------------+ | 880747417 | Pop's | 127.16 | POINT(-78.9071795 35.998501) | | 1520441350 | toast | 240.55 | POINT(-78.9039761 35.9967069) | | 2012463902 | Pizzeria Toro | 256.44 | POINT(-78.9036457 35.997125) | | 398941519 | Parker & Otis | 273.39 | POINT(-78.9088833 35.998997) | | 881029843 | Torero's | 279.96 | POINT(-78.90829140000001 35.9995516) | | 299540833 | Fishmonger's | 300.01 | POINT(-78.90850250000001 35.9996487) | | 1801595418 | Lilly's Pizza | 319.83 | POINT(-78.9094462 35.9990732) | | 1598401100 | Dame's Chicken and Waffles | 323.82 | POINT(-78.9031929 35.9962871) | | 685493947 | El Rodeo | 379.18 | POINT(-78.909865 35.999523) | | 685504784 | Piazza Italia | 389.06 | POINT(-78.9096472 35.9998794) | +------------+----------------------------+--------+--------------------------------------+ 10 rows in set (0.13 sec)

0.13 seconds response time on AWS t2.medium box sounds reasonable to me. The same query on the MyISAM table shows ~same response time: 0.14 seconds.

GeoJSON feature and Google Maps

Another nice feature of MySQL 5.7 GIS is GeoJSON function: you can convert your result set to GeoJSON, which can be used with other applications (for example Google Maps API).

Let’s say I want to visualize the above result set on Google Map. As the API requires a specific format, I can use concat / group_concat to apply the format inside the SQL:

SELECT CONCAT('{ "type": "FeatureCollection", "features": [ ', GROUP_CONCAT('{ "type": "Feature", "geometry": ', ST_AsGeoJSON(shape), ', "properties": {} }'), '] }') as j FROM points_new WHERE st_within(shape, (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') and name is not null

I will get all the restaurants and cafes in zipcode 27701. Here I’m using ST_AsGeoJSON(shape) to convert to GeoJSON, and concat/group_concat to “nest” the whole result into the format suitable for Google Maps.


mysql> set group_concat_max_len = 1000000; Query OK, 0 rows affected (0.00 sec) mysql> SELECT CONCAT('{ '> "type": "FeatureCollection", '> "features": [ '> ', -> GROUP_CONCAT('{ '> "type": "Feature", '> "geometry": ', ST_AsGeoJSON(shape), ', '> "properties": {} '> }'), -> '] '> }') as j -> FROM points_new -> WHERE st_within(shape, -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') -> and name is not null *************************** 1. row *************************** j: { "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.890852, 35.9903403]}, "properties": {} },{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.8980807, 35.9933562]}, "properties": {} },{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.89972490000001, 35.995879]}, "properties": {} } ... ,{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.9103211, 35.9998494]}, "properties": {} },{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.9158326, 35.9967114]}, "properties": {} }] } 1 row in set (0.14 sec)

I did not include the full result set for the lack of space; I also had to change the group concat max length, otherwise MySQL will cut the result of the group_concat function.

Now I can visualize it:











Example: Find the longest bike path

MySQL 5.7 (as well as the older versions) supports st_length function to calculate a length of a linestring. However, even in MySQL 5.7 st_length can’t calculate the distance on earth. To find the distance of a linestring I’ve created a very simple stored procedure:

DELIMITER // CREATE DEFINER=CURRENT_USER() FUNCTION `ls_distance_sphere`(ls GEOMETRY) RETURNS DECIMAL(20,8) DETERMINISTIC BEGIN DECLARE i, n INT DEFAULT 0; DECLARE len DECIMAL(20,8) DEFAULT 0; SET i = 1; SET n = ST_NumPoints(ls); WHILE i < n DO SET len = len + st_distance_sphere(st_pointN(ls, i), st_pointN(ls, i+1)); SET i = i + 2; END WHILE; RETURN len; END // DELIMITER ;

As the Open Street Map data has the information about roads in North America, we can use this function to calculate the length (in meters) for every road it stores:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null limit 10; +---------------------------------------+---------------------------+ | name | ls_distance_sphere(shape) | +---------------------------------------+---------------------------+ | Highbury Park Drive Bypass | 0.97386664 | | Ygnacio Canal Trail | 0.86093199 | | South Marion Parkway | 1.06723424 | | New River Greenway | 1.65705401 | | Northern Diversion Trail | 2.08269808 | | Gary L. Haller Trail;Mill Creek Trail | 2.09988209 | | Loop 1 | 2.05297129 | | Bay Farm Island Bicycle Bridge | 2.51141623 | | Burrard Street | 1.84810259 | | West 8th Avenue | 1.76338236 | +---------------------------------------+---------------------------+ 10 rows in set (0.00 sec)

Index the polygon/area distance using MySQL 5.7 virtual fields

To really answer the question “what is the longest bikepath (cyclepath) in North America?” we will have to order by stored function result. This will cause a full table scan and a filestort, which will be extremely slow for 30 millions of rows. The standard way to fix this is to materialize this road distance: add an additional field to the table and store the distance there.

In MySQL 5.7 we can actually use Generated (Virtual) Columns feature:

CREATE TABLE `lines_new` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `osm_id` int(11) DEFAULT NULL, `name` varchar(255) DEFAULT NULL, `highway` varchar(60) DEFAULT NULL, `waterway` text, `aerialway` text, `barrier` text, `man_made` text, `other_tags` text, `linestring_length` decimal(15,8) GENERATED ALWAYS AS (st_length(shape)) VIRTUAL, PRIMARY KEY (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`), KEY `linestring_length` (`linestring_length`), KEY `highway_len` (`highway`,`linestring_length`) ) ENGINE=InnoDB AUTO_INCREMENT=27077492 DEFAULT CHARSET=latin1

Unfortunately, MySQL 5.7 does not support non-native functions (stored procedures or UDF) in generated columns, so I have to use st_length in this example. Ordering by value of st_length may be OK though:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10; +-----------------------------+---------------------------+ | name | ls_distance_sphere(shape) | +-----------------------------+---------------------------+ | Confederation Trail | 55086.92572725 | | Cowboy Trail | 43432.06768706 | | Down East Sunrise Trail | 42347.39791330 | | Confederation Trail | 29844.91038542 | | Confederation Trail | 26141.04655981 | | Longleaf Trace | 29527.66063726 | | Cardinal Greenway | 30613.24487294 | | Lincoln Prairie Grass Trail | 19648.26787218 | | Ghost Town Trail | 25610.52158647 | | Confederation Trail | 27086.54829531 | +-----------------------------+---------------------------+ 10 rows in set (0.02 sec)

The query is very fast as it uses an index on both highway and linestring:

mysql> explain select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: lines_new partitions: NULL type: ref possible_keys: highway_len key: highway_len key_len: 63 ref: const rows: 119392 filtered: 90.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)


MySQL 5.7 contains a great set of features to work with geospatial data. Finally, spatial indexes are supported in InnoDB; st_distance_sphere as a native function is very useful. Unfortunately, other spatial functions only work with planar coordinates and do not support SRID. I hope this will be fixed in the new releases.

Categories: MySQL

Percona Live Crash Courses: for MySQL and MongoDB!

MySQL Performance Blog - Tue, 2016-02-02 19:45

Percona Live Crash Courses for MySQL and MongoDB

The database community constantly tells us how hard it is to find someone with MySQL and MongoDB DBA skills who can help with the day-to-day management of their databases. This is especially difficult when companies don’t have a full-time requirement for a DBA. Developers, system administrators and IT staff spend too much time trying to solve basic database problems that keep them from doing their day job. Eventually the little problems or performance inefficiencies that start to pile up  lead to big problems.  

In answer to this growing need, Percona Live is once again hosting Crash Courses for developers, systems administrators, and other technical resources. This year, we’ve compacted the training into a single day, and are offering two options: MySQL 101 and MongoDB 101!

Don’t let the name fool you: these courses are led by Percona MySQL experts who will show you the fundamentals of MySQL or MongoDB tools and techniques.  

And it’s not just for DBAs: developers are encouraged to attend to hone their database skills. Developers who create code that can scale to match the demands of the online community are both a resource and and an investment.

Below are a list of the topics covered for each course:

MySQL 101 Topics

MongoDB 101 Topics

  • Schema Review 101: How and What You Should Be Looking at…
  • Choosing a MySQL High Availability Solution Today
  • MySQL Performance Troubleshooting Best Practices
  • Comparing Synchronous Replication Solutions in the Cloud
  • Cost Optimizations Through MySQL Performance Optimizations
  • SQL with MySQL or NoSQL with MongoDB?
  • MongoDB for MySQL DBA’s
  • MongoDB Storage Engine Comparison
  • MongoDB 3.2: New Features Overview


Attendees will return ready to quickly and correctly take care of the day-to-day and week-to-week management of your MySQL or MongoDB environment.

The schedule and non-conference cost for the 101 courses are:

  • MySQL 101: Tuesday April 19th ($400)
  • MongoDB 101: Wednesday April 20th ($400)
  • Both MySQL and MongoDB 101 sessions ($700)

(Tickets to the 101 sessions do not grant access to the main Percona Live breakout sessions. Full Percona Live conferences passes will grant admission to the 101 sessions. 101 Crash Course attendees will have full access to Percona Live keynote speakers the exhibit hall and receptions.)

As a special promo, the first 101 people to purchase the 101 talks receive a $299.00 discount off the ticket price! Each session only costs $101! Get both sessions for a mere $202! Register now, and use the following codes for your first 101 discount:

  • Single101= $299 off of either the MySQL or MongoDB tickets
  • Double101= $498 off of the combined MySQL/MongoDB ticket

Sign up now for special track pricing. Click here to register.

Birds of a Feather

Birds of a Feather (BOF) sessions enable attendees with interests in the same project or topic to enjoy some quality face time. BOFs can be organized for individual projects or broader topics (e.g., best practices, open data, standards). Any attendee or conference speaker can propose and moderate an engaging BOF. Percona will post the selected topics and moderators online and provide a meeting space and time. The BOF sessions will be held Tuesday, April 19, 2016 at 6:00 p.m. The deadline for BOF submissions is February 7.

Lightning Talks

Lightning Talks provide an opportunity for attendees to propose, explain, exhort, or rant on any MySQL, NoSQL or Data in the Cloud-related topic for five minutes. Topics might include a new idea, successful project, cautionary story, quick tip, or demonstration. All submissions will be reviewed, and the top 10 will be selected to present during one of the scheduled breakout sessions during the week. Lighthearted, fun or otherwise entertaining submissions are highly welcome. The deadline for submitting a Lightning Talk topic is February 7, 2016.

Categories: MySQL

Experimental Percona Docker images for Percona Server

MySQL Performance Blog - Tue, 2016-02-02 17:02

Docker is incredibly popular tool for deploying software, so we decided to provide a Percona Docker image for both Percona Server MySQL and Percona Server for MongoDB.

We want to create an easy way to try our products.

There are actually some images available from, but these images are provided by Docker itself, not from Percona.

In our images, we provide all the varieties of storage engines available in Percona Server (MySQL/MongoDB).

Our images are available from

The simplest way to get going is to run the following:

docker run --name ps -e MYSQL_ROOT_PASSWORD=secret -d percona/percona-server:latest

for Percona Server/MySQL, and:

docker run --name psmdb -d percona/percona-server-mongodb:latest

for Percona Server/MongoDB.

It is very easy to try the different storage engines that comes with Percona Server for MongoDB. For example, to use RocksDB, run:

docker run --name psmdbrocks -d percona/percona-server-mongodb:latest --storageEngine=RocksDB

or PerconaFT:

docker run --name psmdbperconaft -d percona/percona-server-mongodb:latest --storageEngine=PerconaFT

We are looking for any feedback  you’d like to provide: if this is useful, and what improvements we could make.

Categories: MySQL

InnoDB and TokuDB on AWS

MySQL Performance Blog - Mon, 2016-02-01 16:38

In a recent post, Vadim compared the performance of Amazon Aurora and Percona Server on AWS. This time, I am comparing write throughput for InnoDB and TokuDB, using the same workload (sysbench oltp/update/update_non_index) and a similar set-up (r3.xlarge instance, with general purpose ssd, io2000 and io3000 volumes) to his experiments.

All the runs used 16 threads for sysbench, and the following MySQL configuration files for InnoDB and TokuDB respectively:

[mysqld] table-open-cache-instances=32 table_open_cache=8000 innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 16G innodb-flush-log-at-trx-commit = 1 innodb_log_compressed_pages =0 innodb-file-per-table = 1 innodb-buffer-pool-size = 20G innodb_write_io_threads = 8 innodb_read_io_threads = 32 innodb_open_files = 1024 innodb_old_blocks_pct =10 innodb_old_blocks_time =2000 innodb_checksum_algorithm = crc32 innodb_file_format =Barracuda innodb_io_capacity=1500 innodb_io_capacity_max=2000 metadata_locks_hash_instances=256 innodb_max_dirty_pages_pct=90 innodb_flush_neighbors=1 innodb_buffer_pool_instances=8 innodb_lru_scan_depth=4096 innodb_sync_spin_loops=30 innodb-purge-threads=16

[mysqld] tokudb_read_block_size=16K tokudb_fanout=128 table-open-cache-instances=32 table_open_cache=8000 metadata_locks_hash_instances=256 [mysqld_safe] thp-setting=never

You can see the full set of graphs here, and the complete results here.

Let me start illustrating the results with this summary graph for the io2000 volume, showing how write throughput varies over time, per engine and workload (for all graphs, size is in 1k rows, so 1000 is actually 1M):

We can see a few things already:

  • InnoDB has better throughput for smaller table sizes.
  • The reverse is true as size becomes big enough (after 10M rows here).
  • TokuDB’s advantage is not noticeable on the oltp workload, though it is for InnoDB.

Let’s dig in a bit more and look at the extreme ends in terms of table size, starting with 1M rows:

and ending in 50M:

In the first case, we can see that not only does InnoDB show better write throughput, it also shows less variance. In the second case, we can confirm that the difference does not seem significant for oltp, but it is for the other workloads.

This should come as no surprise, as one of the big differences between TokuDB’s Fractal trees and InnoDB’s B-tree implementation is the addition of message buffers to nodes, to handle writes (the other big difference, for me, is node size). For write-intensive workloads, TokuDB needs to do a lot  less tree traversing than InnoDB (in fact, this is done only to validate uniqueness constraints when required, otherwise writes are just injected into the message buffer and the buffer is flushed to lower levels of the tree asynchronously. I refer you to this post for more details).

For oltp, InnoDB is at advantage at smaller table sizes, as it does not need to scan message buffers all across the search path when reading (nothing is free in life, and this is the cost for TokuDB’s advantage for writes). I suspect this advantage is lost for high enough table sizes because at that point, either engine will be I/O bound anyway.

My focus here was write throughput, but as a small example see how this is reflected on response time if we pick the 50M table size and drop oltp from the mix:

At this point, you may be wondering why I focused on the io2000 results (and if you’re not, bear with me please!). The reason is the results for io3000 and the general purpose ssd showed characteristics that I attribute to latency on the volumes. You can see what I mean by looking at the io3000 graph:

I say “I attribute” because, unfortunately, I do not have any metrics other than sysbench’s output to go with (an error I will amend on future benchmarks!). I have seen the same pattern while working on production systems on AWS, and in those cases I was able to correlate it with increases in stime and/or qtime on diskstats. The fact that this is seen on the lower and higher capacity volumes for the same workload, but not the io2000 one, increases my confidence in this assumption.


I would not consider TokuDB a general purpose replacement for InnoDB, by which I mean I would never blindly suggest someone to migrate from one to the other, as the performance characteristics are different enough to make this risky without a proper assessment.

That said, I believe TokuDB has great advantages for the right scenarios, and this test highlights some of its strengths:

  • It has a significant advantage over InnoDB on slower devices and bigger data sets.
  • For big enough data sets, this is even the case on fast devices and write intensive workloads, as the B-tree becomes I/O bound much faster

Other advantages of TokuDB over InnoDB, not directly evidenced from these results, are:

  • Better compression (helped by the much larger block size).
  • Better SSD lifetime, due to less and more sequential writes (sequential writes have, in theory at least, no write amplification compared to random ones, so even though the sequential/random difference should not matter for SSDs for performance, it does for lifetime).
Categories: MySQL

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

MySQL Performance Blog - Fri, 2016-01-29 19:09

Ready for another post in the EXPLAIN FORMAT=JSON is Cool series! Great! This post will discuss how to see all the information that is contained in optimized queries with UNION using the union_result and query_specifications commands.


When optimizing complicated queries with UNION, it is easy to get lost in the regular EXPLAIN  output trying to identify which part of the output belongs to each part of the UNION.

Let’s consider the following example:

mysql> explain -> select emp_no, last_name, 'low_salary' from employees -> where emp_no in (select emp_no from salaries -> where salary < (select avg(salary) from salaries)) -> union -> select emp_no, last_name, 'high salary' from employees -> where emp_no in (select emp_no from salaries -> where salary >= (select avg(salary) from salaries))G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 299778 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.employees.emp_no rows: 9 filtered: 33.33 Extra: Using where; FirstMatch(employees) *************************** 3. row *************************** id: 3 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: NULL *************************** 4. row *************************** id: 4 select_type: UNION table: employees partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 299778 filtered: 100.00 Extra: NULL *************************** 5. row *************************** id: 4 select_type: UNION table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.employees.emp_no rows: 9 filtered: 33.33 Extra: Using where; FirstMatch(employees) *************************** 6. row *************************** id: 6 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: NULL *************************** 7. row *************************** id: NULL select_type: UNION RESULT table: <union1,4> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: NULL Extra: Using temporary 7 rows in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

While we can guess that subquery 3 belongs to the first query of the union, and subquery 6 belongs to the second (which has number 4 for some reason), we have to be very careful (especially in our case) when queries use the same tables in both parts of the UNION.

The main issue with the regular EXPLAIN for UNION  is that it has to re-present the hierarchical structure as a table. The same issue occurs when you want to store objects created in programming language, such as Java, in the database.

EXPLAIN FORMAT=JSON, on the other hand, has hierarchical structure and more clearly displays how UNION was optimized:

mysql> explain format=json select emp_no, last_name, 'low_salary' from employees where emp_no in (select emp_no from salaries where salary < (select avg(salary) from salaries)) union select emp_no, last_name, 'high salary' from employees where emp_no in (select emp_no from salaries where salary >= (select avg(salary) from salaries))G *************************** 1. row *************************** EXPLAIN: { "query_block": { "union_result": { "using_temporary_table": true, "table_name": "<union1,4>", "access_type": "ALL", "query_specifications": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 1, "cost_info": { "query_cost": "921684.48" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 299778, "rows_produced_per_join": 299778, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59955.60", "prefix_cost": "60884.60", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "last_name" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.employees.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 299778, "filtered": "33.33", "first_match": "employees", "cost_info": { "read_cost": "302445.97", "eval_cost": "59955.60", "prefix_cost": "921684.48", "data_read_per_join": "4M" }, "used_columns": [ "emp_no", "salary" ], "attached_condition": "(`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))", "attached_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "516948.40" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "salary" ] } } } ] } } ] } }, { "dependent": false, "cacheable": true, "query_block": { "select_id": 4, "cost_info": { "query_cost": "921684.48" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 299778, "rows_produced_per_join": 299778, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59955.60", "prefix_cost": "60884.60", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "last_name" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.employees.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 299778, "filtered": "33.33", "first_match": "employees", "cost_info": { "read_cost": "302445.97", "eval_cost": "59955.60", "prefix_cost": "921684.48", "data_read_per_join": "4M" }, "used_columns": [ "emp_no", "salary" ], "attached_condition": "(`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))", "attached_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 6, "cost_info": { "query_cost": "516948.40" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "salary" ] } } } ] } } ] } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

First it puts member union_result in the query_block  at the very top level:

EXPLAIN: { "query_block": { "union_result": {

The union_result object contains information about how the result set of the UNION was processed:

"using_temporary_table": true, "table_name": "<union1,4>", "access_type": "ALL",

And also contains the query_specifications array which also contains all the details about queries in the UNION:

"query_specifications": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 1, <skipped> { "dependent": false, "cacheable": true, "query_block": { "select_id": 4,

This representation is much more clear, and also contains all the details which the regular EXPLAIN misses for regular queries.

Conclusion: EXPLAIN FORMAT=JSON not only contains additional optimization information for each query in the UNION, but also has a hierarchical structure that is more suitable for the hierarchical nature of the UNION.

Categories: MySQL

Percona XtraDB Cluster 5.6.28-25.14 is now available

MySQL Performance Blog - Fri, 2016-01-29 13:34

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on January 29, 2016. Binaries are available from the downloads area or from our software repositories.

Percona XtraDB Cluster 5.6.28-25.14 is now the current release, based on the following:

All of Percona software is open-source and free, and all the details of the release can be found in the 5.6.28-25.14 milestone at Launchpad.

For more information about relevant Codership releases, see this announcement.

Bugs Fixed:

  • 1494399: Fixed issue caused by replication of events on certain system tables (for example, mysql.slave_master_info, mysql.slave_relay_log_info). Replication in the Galera eco-system is now avoided when bin-logging is disabled for said tables.
    NOTE: As part of this fix, when bin-logging is enabled, replication in the Galera eco-system will happen only if BINLOG_FORMAT is set to either ROW or STATEMENT. The recommended format is ROW, while STATEMENT is required only for the pt-table-checksum tool to operate correctly. If BINLOG_FORMAT is set to MIXED, replication of events in the Galera eco-system tables will not happen even with bin-logging enabled for those tables.
  • 1522385: Fixed GTID holes caused by skipped replication. A slave might ignore an event replicated from master, if the same event has already been executed on the slave. Such events are now propagated in the form of special GTID events to maintain consistency.
  • 1532857: The installer now creates a /var/lib/galera/ directory (assigned to user nobody), which can be used by garbd in the event it is started from a directory that garbd cannot write to.

Known Issues:

  • 1531842: Two instances of garbd cannot be started from the same working directory. This happens because each instance creates a state file (gvwstate.dat) in the current working directory by default. Although garbd is configured to use the base_dir variable, it was not registered due to a bug. Until garbd is fixed, you should start each instance from a separate working directory.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Categories: MySQL

Vote Percona Server in Members Choice Awards

MySQL Performance Blog - Thu, 2016-01-28 21:13

Percona is calling on you! Vote Percona for Database of the Year in Members Choice Awards 2015. Help our Percona Server get recognized as one of the best database options for data performance. Percona Server is a free, fully compatible, enhanced, open source drop-in replacement for MySQL® that provides superior performance, scalability and instrumentation., or LQ for short, is a community-driven, self-help web site for Linux users. Each year, holds an annual competition to recognize the year’s best-in-breed technologies. The winners of each category are determined by the online Linux community!

You can vote now for your favorite products of 2015 (Percona, of course!). This is your chance to be heard!

Voting ends on February 10th, 2016. You must be a registered member of with at least one post on their forums to vote.

Categories: MySQL

Setup a MongoDB replica/sharding set in seconds

MySQL Performance Blog - Thu, 2016-01-28 19:09

In the MySQL world, we’re used to playing in the MySQL Sandbox. It allows us to deploy a testing replication environment in seconds, without a great deal of effort or navigating multiple virtual machines. It is a tool that we couldn’t live without in Support.

In this post I am going to walk through the different ways we have to deploy a MongoDB replica/sharding set test in a similar way. It is important to mention that this is not intended for production, but to be used for troubleshooting, learning or just playing around with replication.

Replica Set regression test’s diagnostic commands

MongoDB includes a .js that allows us to deploy a replication set from the MongoDB’s shell. Just run the following:

# mongo --nodb > var rstest = new ReplSetTest( { name: 'replicaSetTest', nodes: 3 } ) > rstest.startSet() ReplSetTest Starting Set ReplSetTest n is : 0 ReplSetTest n: 0 ports: [ 31000, 31001, 31002 ] 31000 number { "useHostName" : true, "oplogSize" : 40, "keyFile" : undefined, "port" : 31000, "noprealloc" : "", "smallfiles" : "", "rest" : "", "replSet" : "replicaSetTest", "dbpath" : "$set-$node", "restart" : undefined, "pathOpts" : { "node" : 0, "set" : "replicaSetTest" } } ReplSetTest Starting.... [...]

At some point our mongod daemons will be running, each with its own data directory and port:

2133 pts/0 Sl+ 0:01 mongod --oplogSize 40 --port 31000 --noprealloc --smallfiles --rest --replSet replicaSetTest --dbpath /data/db/replicaSetTest-0 --setParameter enableTestCommands=1 2174 pts/0 Sl+ 0:01 mongod --oplogSize 40 --port 31001 --noprealloc --smallfiles --rest --replSet replicaSetTest --dbpath /data/db/replicaSetTest-1 --setParameter enableTestCommands=1 2213 pts/0 Sl+ 0:01 mongod --oplogSize 40 --port 31002 --noprealloc --smallfiles --rest --replSet replicaSetTest --dbpath /data/db/replicaSetTest-2 --setParameter enableTestCommands=1

Perfect. Now we need to initialize the replicaset:

> rstest.initiate() { "replSetInitiate" : { "_id" : "replicaSetTest", "members" : [ { "_id" : 0, "host" : "debian:31000" }, { "_id" : 1, "host" : "debian:31001" }, { "_id" : 2, "host" : "debian:31002" } ] } } m31000| 2016-01-24T10:42:36.639+0100 I REPL [ReplicationExecutor] Member debian:31001 is now in state SECONDARY m31000| 2016-01-24T10:42:36.639+0100 I REPL [ReplicationExecutor] Member debian:31002 is now in state SECONDARY [...]

and it is done!

> rstest.status() { "set" : "replicaSetTest", "date" : ISODate("2016-01-24T09:43:41.261Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "debian:31000", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 329, "optime" : Timestamp(1453628552, 1), "optimeDate" : ISODate("2016-01-24T09:42:32Z"), "electionTime" : Timestamp(1453628554, 1), "electionDate" : ISODate("2016-01-24T09:42:34Z"), "configVersion" : 1, "self" : true }, { "_id" : 1, "name" : "debian:31001", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 68, "optime" : Timestamp(1453628552, 1), "optimeDate" : ISODate("2016-01-24T09:42:32Z"), "lastHeartbeat" : ISODate("2016-01-24T09:43:40.671Z"), "lastHeartbeatRecv" : ISODate("2016-01-24T09:43:40.677Z"), "pingMs" : 0, "configVersion" : 1 }, { "_id" : 2, "name" : "debian:31002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 68, "optime" : Timestamp(1453628552, 1), "optimeDate" : ISODate("2016-01-24T09:42:32Z"), "lastHeartbeat" : ISODate("2016-01-24T09:43:40.672Z"), "lastHeartbeatRecv" : ISODate("2016-01-24T09:43:40.690Z"), "pingMs" : 0, "configVersion" : 1 } ], "ok" : 1 }

There are many more commands you can run, just type rstest. and then press Tab twice to get the list. Follow this link if you need more info:

What about sharding? Pretty similar:

> var shtest = new ShardingTest({ shards: 2, mongos: 1 })

This is the documentation link if you need more info:

It is important to mention that if you close the mongo shell where you run the commands, then all the spawned mongod will also shut down.


mtools is a collection of tools and scripts that make MongoDB’s DBA lives much easier. It includes mlaunch, which can be used to start replicate sets and sharded systems for testing.

The mlaunch tool requires pymongo, so you need to install it:

# pip install pymongo

You can also use pip to install mtools:

# pip install mtools

Then, we can just start our replica set. In this case, with two nodes and one arbiter:

# mlaunch --replicaset --nodes 2 --arbiter --name "replicaSetTest" --port 3000 launching: mongod on port 3000 launching: mongod on port 3001 launching: mongod on port 3002 replica set 'replicaSetTest' initialized. # ps -x | grep mongod 10246 ? Sl 0:03 mongod --replSet replicaSetTest --dbpath /root/data/replicaSetTest/rs1/db --logpath /root/data/replicaSetTest/rs1/mongod.log --port 3000 --logappend --fork 10257 ? Sl 0:03 mongod --replSet replicaSetTest --dbpath /root/data/replicaSetTest/rs2/db --logpath /root/data/replicaSetTest/rs2/mongod.log --port 3001 --logappend --fork 10274 ? Sl 0:03 mongod --replSet replicaSetTest --dbpath /root/data/replicaSetTest/arb/db --logpath /root/data/replicaSetTest/arb/mongod.log --port 3002 --logappend --fork

Done. You can also deploy a shared cluster, or a sharded replica set. More information in the following link:

Ognom Toolkit

“It is a set of utilities, functions and tests with the goal of making the life of MongoDB/TokuMX administrators easier.”

This toolkit has been created by Fernando Ipar and Sveta Smirnova, and includes a set of scripts that allow us to deploy a testing environment for both sharding and replication configurations. The main difference is that you can specify what storage engine will be the default, something you cannot do with other to methods.

We have the tools we need under “lab” directory. Most of the names are pretty self-explanatory:

~/ognom-toolkit/lab# ls start_multi_dc_simulation start_sharded_test stop_all_mongo stop_sharded_test start_replica_set start_single stop_replica_set stop_single

So, let’s say we want a replication cluster with four nodes that will use PerconaFT storage engine. We have to do the following:

Set a variable with the storage engine we want to use:

# export MONGODB_ENGINE=PerconaFT

Specify where is our mongod binary:

# export MONGOD=/usr/bin/mongod

Start our 4 nodes replica set:

# ./start_replica_set Starting 4 mongod instances 2016-01-25T12:36:04.812+0100 I STORAGE Compression: snappy 2016-01-25T12:36:04.812+0100 I STORAGE MaxWriteMBPerSec: 1024 2016-01-25T12:36:04.813+0100 I STORAGE Crash safe counters: 0 about to fork child process, waiting until server is ready for connections. forked process: 1086 child process started successfully, parent exiting [...] MongoDB shell version: 3.0.8 connecting to: { "set" : "rsTest", "date" : ISODate("2016-01-25T11:36:09.039Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "debian:27001", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 5, "optime" : Timestamp(1453721767, 5), "optimeDate" : ISODate("2016-01-25T11:36:07Z"), "electionTime" : Timestamp(1453721767, 2), "electionDate" : ISODate("2016-01-25T11:36:07Z"), "configVersion" : 4, "self" : true }, { "_id" : 1, "name" : "debian:27002", "health" : 1, "state" : 5, "stateStr" : "STARTUP2", "uptime" : 1, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2016-01-25T11:36:07.991Z"), "lastHeartbeatRecv" : ISODate("2016-01-25T11:36:08.093Z"), "pingMs" : 0, "configVersion" : 2 }, { "_id" : 2, "name" : "debian:27003", "health" : 1, "state" : 0, "stateStr" : "STARTUP", "uptime" : 1, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2016-01-25T11:36:07.991Z"), "lastHeartbeatRecv" : ISODate("2016-01-25T11:36:08.110Z"), "pingMs" : 2, "configVersion" : -2 }, { "_id" : 3, "name" : "debian:27004", "health" : 1, "state" : 0, "stateStr" : "STARTUP", "uptime" : 1, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2016-01-25T11:36:08.010Z"), "lastHeartbeatRecv" : ISODate("2016-01-25T11:36:08.060Z"), "pingMs" : 18, "configVersion" : -2 } ], "ok" : 1 }

Now, just start using it:

rsTest:PRIMARY> db.names.insert({ "a" : "Miguel"}) rsTest:PRIMARY> db.names.stats() { "ns" : "mydb.names", "count" : 1, "size" : 36, "avgObjSize" : 36, "storageSize" : 16384, "capped" : false, "PerconaFT" : { [...]


When dealing with bugs, troubleshooting or testing some application that needs a complex MongoDB infrastructure, these processes can save us lot of time. No need of set up multiple virtual machines, deal with networking and human mistakes. Just say “I want a sharded cluster, do it for me.”

Categories: MySQL

MongoDB revs you up: What storage engine is right for you? (Part 4)

MySQL Performance Blog - Wed, 2016-01-27 20:13
Differentiating Between MongoDB Storage Engines: PerconaFT

In this series of posts, we discussed what a storage engine is, and how you can determine the characteristics of one versus the other:

“A database storage engine is the underlying software that a DBMS uses to create, read, update and delete data from a database. The storage engine should be thought of as a “bolt on” to the database (server daemon), which controls the database’s interaction with memory and storage subsystems.”

Generally speaking, it’s important to understand what type of work environment the database is going to interact with, and to select a storage engine that is tailored to that environment.

The first post looked at MMAPv1, the original default engine for MongoDB (through release 3.0). The second post examined WiredTiger, the new default MongoDB engine. The third post reviewed RocksDB, an engine developed for the Facebook environment.

This post will cover PerconaFT. PerconaFT was developed out of Percona’s acquisition of Tokutek, from their TokuDB product.


Find it in: Percona Builds

PerconaFT is the newest version of the Fractal Tree storage engine that was designed and implemented by Tokutek, which was acquired by Percona in April of 2015. Designed at MIT, SUNY Stony Brook and Rutgers, the Fractal Tree is a data structure that aimed to remove disk bottlenecks from databases that were using the B-tree with datasets that were several times larger that cache.

PerconaFT is arguably the most “mature” storage engine for MongoDB, with support for document level concurrency and compression. The Fractal Tree was first commercially implemented in June of 2013 in TokuMX, a fork of MongoDB, with an advanced feature set.

As described previously, the Fractal Tree (which is available for MongoDB in the PerconaFT storage engine) is a write-optimized data structure utilizing many log-like “queues” called message buffers, but has an arrangement like that of a read-optimized data structure. With the combination of these properties, PerconaFT can provide high performance for applications with high insert rates, while providing very efficient lookups for update/query-based applications. This will theoretically provide very predictable and consistent performance as the database grows. Furthermore, PerconaFT typically provides, comparatively, the deepest compression rates of any of the engines we’ve discussed in this series.

An ideal fit for the PerconaFT storage engine is a system with varied workloads, where predictable vertical scaling is required in addition to the horizontal scaling provide MongoDB. Furthermore, the ability of PerconaFT to maintain performance while compressing – along with support for multiple compression algorithms (snappy, quicklz, zlib and lzma) – make it one of the best options for users looking to optimize their data footprint.


Most people don’t know that they have a choice when it comes to storage engines, and that the choice should be based on what the database workload will look like. Percona’s Vadim Tkachenko performed an excellent benchmark test comparing the performances of PerconaFT and WiredTiger to help specifically differentiate between these engines.

Part 1: Intro and the MMAPv1 storage engine.

Part 2: WiredTiger storage engine.

Part 3: RocksDB storage engine.

Categories: MySQL

Percona CEO Peter Zaitsev discusses working remotely with Fortune Magazine

MySQL Performance Blog - Wed, 2016-01-27 19:33

As a company that believes in and supports the open source community, embracing innovation and change is par for the course at Percona. We wouldn’t be the company we are today without fostering a culture that rewards creative thinking and rapid evolution.

Part of this culture is making sure that Percona is a place where people love to work, and can transmit their passion for technology into tangible rewards – both personally and financially. One of the interesting facts about Percona’s culture is that almost 95 percent of its employees are working remotely. Engineers, support, marketing, even executive staff – most of these people interact daily via electronic medium rather than in person. Percona’s staff is worldwide across 29 countries and 19 U.S. states. How does that work? How do you make sure that the staff is happy, committed, and engaged enough to stay on? How do you attract prospective employees with this unusual model?

It turns out that not only does it work, but it works very well. It can be challenging to manage the needs of such a geographically diverse group, but the rewards (and the results) outweigh the effort.

The secret is, of course, good communication, an environment of respect and personal empowerment.

Percona’s CEO Peter Zaitsev recently provided some of his thoughts to Fortune magazine about how our business model helps to not only to foster incredible dedication and innovation, but create a work environment that encourages passion, commitment and teamwork.

Read about his ideas on Percona’s work model here.

Oh, and by the way, Percona is currently hiring! Perhaps a career here might fit in with your plans . . .

Categories: MySQL

Finding MySQL Table Size on Disk

MySQL Performance Blog - Tue, 2016-01-26 22:16

So you want to know how much space a given MySQL table takes on disk. Looks trivial, right? Shouldn’t this information be readily available in the INFORMATION_SCHEMA.TABLES? Not so fast!

This simple question actually is quite complicated in MySQL. MySQL supports many storage engines (some of which don’t store data on disk at all) and these storage engines often each store data in different layouts. For example, there are three “basic” layouts that the InnoDB storage engine supports for MySQL 5.7, with multiple variations for row_formats and two types of available compression.

So let’s simplify the situation: instead of a general question, let’s ask how to find the table size on disk for an InnoDB table stored in its own tablespace (as the parameter innodb_file_per_table=1 provides).

Before we get to the answer, let me show you the table size graph that I get by running sysbench prepare (basically populating tables with multi-value inserts):

Click graphic to enlarge

This graphs shows the table size defined by data_length plus index_length captured from INFORMATION_SCHEMA.TABLES. You would expect gradual table growth as data is inserted into it, rather than a flat table size followed by jumps (sometimes by 10GB or more).

The graph does not match how data is changing on disk, where it is growing gradually (as expected):

-rw-r----- 1 mysql mysql 220293234688 Jan 25 17:03 sbtest1.ibd -rw-r----- 1 mysql mysql 220310011904 Jan 25 17:03 sbtest1.ibd -rw-r----- 1 mysql mysql 222499438592 Jan 25 17:07 sbtest1.ibd

As we see from this experiment, MySQL does not really maintain live data_length and index_length values,  but rather refreshes them periodically – and rather irregularly. The later part of the graph is especially surprising, where we see a couple of data refreshes becoming more regular. This is different from the first part of the graph which seems to be in line with statistics being updated when when 10 percent of the rows are changed.  (manual)

What makes it especially confusing is that there are other values such as table_rows, data_free or update_time  that are updated in the real time (even though I can’t imagine why table size related values would be any more difficult to maintain in real time!).

Is there way to get real time data_length and index_length updates as we query information_schema? There is, but it is costly.

To get information_schema to provide accurate information in MySQL 5.7, you need to do two things: disable innodb_stats_persistent and enable innodb_stats_on_metadata – both of which come with significant side effects.

Disabling persistent statistics means InnoDB has to refresh the statistics each time the server starts, which is expensive and can produce volatile query plans between restarts. Enabling innodb_stats_on_metadata makes access to information_schema slower, much slower, as I wrote few years ago.

Is there a better way? It turns out there is. You can look into the tablespaces information table using INNODB_SYS_TABLESPACES to see the actual file size. Unlike index_length and data_length, INNODB_SYS_TABLESPACES is updated in real time with no special configuration required:

mysql> select * from INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES where name='sbinnodb/sbtest1' G *************************** 1. row ***************************         SPACE: 42          NAME: sbinnodb/sbtest1          FLAG: 33   FILE_FORMAT: Barracuda    ROW_FORMAT: Dynamic     PAGE_SIZE: 16384 ZIP_PAGE_SIZE: 0    SPACE_TYPE: Single FS_BLOCK_SIZE: 4096     FILE_SIZE: 245937209344 ALLOCATED_SIZE: 245937266688 1 row in set (0.00 sec)

The great thing about using this table is that it also handles new “Innodb Page Compression” properly showing the difference between file_size  (which is the logical file size on disk) and allocated_size (which is space allocated for this file and can be significantly smaller):

mysql> select * from INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES where name='sbinnodb/testcomp' G *************************** 1. row ***************************         SPACE: 48          NAME: sbinnodb/testcomp          FLAG: 33   FILE_FORMAT: Barracuda    ROW_FORMAT: Dynamic     PAGE_SIZE: 16384 ZIP_PAGE_SIZE: 0    SPACE_TYPE: Single FS_BLOCK_SIZE: 4096     FILE_SIZE: 285212672 ALLOCATED_SIZE: 113004544 1 row in set (0.00 sec)

Finally, let’s look into how different InnoDB compression variants impact the information provided in information_schema.   

If you use the old Innodb compression (Innodb Table Compression) you will see the compressed data size shown in data_length and index_length as result. For example, avg_row_length will be much lower than you would expect.

If you use the new InnoDB compression in MySQL 5.7 (Innodb Page Compression) you will see the values corresponding to file size, not allocated size as shown in information_schema.

Answering the trivial question “How much space does this table take on disk?” is really not a simple request in MySQL – look at the obvious place and you’re likely to get the wrong value. Look at INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES to get the actual file size value for InnoDB tables.  

Categories: MySQL
Syndicate content