MySQL

Query Language Type Overview

MySQL Performance Blog - Thu, 2016-12-29 22:46

This blog provides a query language type overview.

The idea for this blog originated from some customers asking me questions. When working in a particular field, you often a dedicated vocabulary that makes sense to your peers. It often includes phrases and abbreviations because it’s efficient. It’s no different in the database world. Much of this language might make sense to DBA’s, but it might sound like “voodoo” to people not used to it. The overview below covers the basic types of query languages inside SQL. I hope it clarifies what they mean, how they’re used and how you should interpret them.

DDL (Data Definition Language)

A database schema is a visualization of information. It contains the data structure separated by tables structures, views and anything that contains structure for your data. It defines how you want to store and visualize the information.

It’s like a skeleton, defining how data is organized. Any action that creates/updates/changes this skeleton is DDL.

Do you remember spreadsheets? A table definition describes something like:

Account number Account name Account owner Creation date Amount Sorted ascending Unique, indexed Date, indexed Number, linked with transactions


Whenever you want to create a table like this, you must use a DDL query. For example:

CREATE TABLE Accounts ( Account_number Bigint(16) , Account_name varchar(255), Account_name varchar(255), Creation_date date, Amount Bigint(16), PRIMARY KEY (Account_number), UNIQUE(Account_name), FOREIGN KEY (Amount) REFERENCES transactions(Balancevalue) );

CREATE, ALTER, DROP, etc.: all of these types of structure modification queries are DDL queries!

Defining the structure of the tables is important as this defines how you would potentially access the information stored in the database while also defining how you might visualize it.

Why should you care that much?

DDL queries define the structure on which you develop your application. Your structure will also define how the database server searches for information in a table, and how it is linked to other tables (using foreign keys, for example).

You must design your MySQL schema before adding information to it (unlike NoSQL solutions such as MongoDB). MySQL might be more rigid in this manner, but it often makes sense to design the pattern for how you want to store your information and query it properly.

Due to the rigidity of an RDBMS system, changing the data structure (or table schema) requires the system to rebuild the actual table in most cases. This is potentially problematic for performance or table availability (locking). Often this is a “hot” procedure (since MySQL 5.6), requiring no downtime for active operations. Additionally, tools like pt-osc or other open source solutions can be used for migrating the data structure to a new format without requiring downtime.

An example:

ALTER TABLE accounts ADD COLUMN wienietwegisisgezien varchar(20)

DML (Data Manipulation Language)

Data manipulation is what it sounds like: working with information inside a structure. Inserting information and deleting information (adding rows, deleting rows) are examples of data manipulation.

An example:

INSERT into resto_visitor values(5,'Julian',’highway 5’,12); UPDATE resto_visitor set name='Evelyn',age=17 where id=103;

Sure, but why should I use it?

Having a database environment makes no sense unless you insert and fetch information out of it. Remember that databases are plentiful in the world: whenever you click on a link on your favorite blog website, it probably means you are fetching information out of a database (and that data was at one time inserted or modified).

Interacting with a database requires that you write DML queries.

DCL (Data Control Language)

Data control language is anything that is used for administrating access to the database content. For example, GRANT queries:

GRANT ALL PRIVILEGES ON database.table to ‘jeffbridges’@’ourserver’;

Well that’s all fine, but why another subset “language” in SQL?

As a user of database environments, at some point you’ll get access permission from someone performing a DCL query. Data control language is used to define authorization rules for accessing the data structures (tables, views, variables, etc.) inside MySQL.

TCL (Transaction Control Language) Queries

Transaction control language queries are used to control transactional processing in a database. What do we mean by transactional processes? Transactional processes are typically bundled DML queries. For example:

BEGIN FETCH INFORMATION OF TABLE B INSERT DATA INTO A REMOVE STALE DATA FROM B COMMIT or ROLLBACK

This gives you the ability to perform or rollback a complete action. Only storage engines offering transaction support (like InnoDB) can work with TCL.

Yet another term, but why?

Ever wanted to combine information and perform it as one transaction? In some circumstances, for example, it makes sense to make sure you perform an insert first and then perform an update. If you don’t use transactions, the insert might fail and the associated update might be an invalid entry. Transactions make sure that either the complete transaction (a group of DML queries) takes place, or it’s completely rolled back (this is also referred to as atomicity).

Conclusion

Hopefully this blog post helps you understand some of the “insider” database speech. Post comments below.

Categories: MySQL

Percona Live Featured Tutorial with Øystein Grøvlen — How to Analyze and Tune MySQL Queries for Better Performance

MySQL Performance Blog - Thu, 2016-12-29 16:52

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Øystein Grøvlen, Senior Principal Software Engineer at Oracle. His tutorial is on How to Analyze and Tune MySQL Queries for Better Performance. SQL query performance plays a big role in application performance. If some queries execute slowly, these queries or the database schema may need tuning. I had a chance to speak with Øystein and learn a bit more about the MySQL query tuning:

Percona: How did you get into database technology? What do you love about it?

Øystein: I got into database technology during my Ph.D. studies. I got in touch with a research group in Trondheim, Norway, that did research on highly available distributed database systems. I ended up writing a thesis on query processing in such database systems.

What I love most about my job on the MySQL Optimizer Team is that it involves a lot of problem-solving. Why is a query so slow? What can we do to improve it? I have always been very interested in sports results and statistics. Working with query execution times gives me much of the same feeling. Searching for information is another interest of mine, and that is really what query execution is about.

Percona: What impacts database performance the most?

Øystein: From my point of view – mainly concerned with the performance of read-only queries – the most important performance metric is how much data needs to be accessed in order to answer a query. For update-intensive workloads, it is often about concurrency issues. For SELECT statements, the main thing is to not access more data than necessary.

Users should make sure to design their database schema so that the database system can efficiently access the needed data. This includes creating the right indexes. As MySQL developers, we need to develop the right algorithms to support efficient retrieval. We also need to provide a query optimizer that can pick the best query execution plan.

Of course, there are other performance aspects that are important. Especially if your data cannot fit in your database buffer pool. In that case, the order in which you access the data becomes more important. The best query plan when your data is disk-bound is not necessarily the same as when all data is in memory.

Percona: Your tutorial is called “How to Analyze and Tune MySQL Queries for Better Performance.” What are the most recent MySQL updates that help with tuning queries?

Øystein: I think the biggest improvements came in MySQL 5.6, with increased observability through performance schema and new variants of
EXPLAIN (Structured EXPLAIN (JSON format) and visual EXPLAIN in MySQL Workbench). We also added Optimizer Trace, which gives insight into how the optimizer arrived at a certain query plan. All this made it easier to identify queries that need tuning, understand how a query is executed and what might be done to improve it.

In MySQL 5.7, we added a new syntax for optimizer hints, and provided a lot of new hints that can be used to influence the optimizer to change a non-optimal query plan. We also provided a query rewrite plugin that makes it possible to tune queries even when it is not possible to change the application.

MySQL 5.7 also came with improvements to EXPLAIN. It is now possible to get the query plan for a running query, and Structured EXPLAIN shows both estimated total query cost and the cost per table. A more experimental feature allows you to provide your own cost constants to the optimizer.  This way, you can configure the optimizer to better suit your particular system.

For MySQL 8.0 we are continuing to improve tunability by adding more optimizer hints.  At the same time, we are working hard on features that will reduce the need for tuning. Histograms and awareness of whether data is in memory or on disk make the optimizer able to pick better query plans.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Øystein: While the query optimizer in most cases will come up with a good query plan, there are some cases where it won’t generate the most optimal query plan. This tutorial will show how you can identify which queries need tuning, how you can further investigate the issues and what types of tuning options you have for different types of queries. By attending this tutorial, you will learn how to improve the performance of applications through query tuning.

Percona: What are you most looking forward to at Percona Live?

Øystein: I am looking forward to interacting with MySQL users, discussing the query performance issues they might have, and learning how I can help with their issues.

You can find out more about Øystein Grøvlen and his work with databases at his blog, or follow him on Twitter: @ogrovlen. Want to find out more about Øystein and MySQL query optimization? Register for Percona Live Data Performance Conference 2017, and see his tutorial How to Analyze and Tune MySQL Queries for Better Performance. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Categories: MySQL

Quickly Troubleshoot Metadata Locks in MySQL 5.7

MySQL Performance Blog - Wed, 2016-12-28 19:52

In a previous article, Ovais demonstrated how a DDL can render a table blocked from new queries. In another article, Valerii introduced performance_schema.metadata_locks, which is available in MySQL 5.7 and exposes metadata lock details. Given this information, here’s a quick way to troubleshoot metadata locks by creating a stored procedure that can:

  • Find out which thread(s) have the metadata lock
  • Determine which thread has been waiting for it the longest
  • Find other threads waiting for the metadata lock
Setting up instrumentation

First, you need to enable instrumentation for metadata locks:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES' WHERE NAME = 'wait/lock/metadata/sql/mdl';

Second, you need to add this stored procedure:

USE test; DROP PROCEDURE IF EXISTS procShowMetadataLockSummary; delimiter // CREATE PROCEDURE procShowMetadataLockSummary() BEGIN DECLARE table_schema VARCHAR(64); DECLARE table_name VARCHAR(64); DECLARE id bigint; DECLARE time bigint; DECLARE info longtext; DECLARE curMdlCount INT DEFAULT 0; DECLARE curMdlCtr INT DEFAULT 0; DECLARE curMdl CURSOR FOR SELECT * FROM tmp_blocked_metadata; DROP TEMPORARY TABLE IF EXISTS tmp_blocked_metadata; CREATE TEMPORARY TABLE IF NOT EXISTS tmp_blocked_metadata ( table_schema varchar(64), table_name varchar(64), id bigint, time bigint, info longtext, PRIMARY KEY(table_schema, table_name) ); REPLACE tmp_blocked_metadata(table_schema,table_name,id,time,info) SELECT mdl.OBJECT_SCHEMA, mdl.OBJECT_NAME, t.PROCESSLIST_ID, t.PROCESSLIST_TIME, t.PROCESSLIST_INFO FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON mdl.OWNER_THREAD_ID = t.THREAD_ID WHERE mdl.LOCK_STATUS='PENDING' and mdl.LOCK_TYPE='EXCLUSIVE' ORDER BY mdl.OBJECT_SCHEMA,mdl.OBJECT_NAME,t.PROCESSLIST_TIME ASC; OPEN curMdl; SET curMdlCount = (SELECT FOUND_ROWS()); WHILE (curMdlCtr < curMdlCount) DO FETCH curMdl INTO table_schema, table_name, id, time, info; SELECT CONCAT_WS(' ','PID',t.PROCESSLIST_ID,'has metadata lock on', CONCAT(mdl.OBJECT_SCHEMA,'.',mdl.OBJECT_NAME), 'with current state', CONCAT_WS('','[',t.PROCESSLIST_STATE,']'), 'for', t.PROCESSLIST_TIME, 'seconds and is currently running', CONCAT_WS('',"[",t.PROCESSLIST_INFO,"]")) AS 'Process(es) that have the metadata lock' FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON t.THREAD_ID = mdl.OWNER_THREAD_ID WHERE mdl.LOCK_STATUS='GRANTED' AND mdl.OBJECT_SCHEMA = table_schema and mdl.OBJECT_NAME = table_name AND mdl.OWNER_THREAD_ID NOT IN(SELECT mdl2.OWNER_THREAD_ID FROM performance_schema.metadata_locks mdl2 WHERE mdl2.LOCK_STATUS='PENDING' AND mdl.OBJECT_SCHEMA = mdl2.OBJECT_SCHEMA and mdl.OBJECT_NAME = mdl2.OBJECT_NAME); SELECT CONCAT_WS(' ','PID', id, 'has been waiting for metadata lock on',CONCAT(table_schema,'.', table_name),'for', time, 'seconds to execute', CONCAT_WS('','[',info,']')) AS 'Oldest process waiting for metadata lock'; SET curMdlCtr = curMdlCtr + 1; SELECT CONCAT_WS(' ','PID', t.PROCESSLIST_ID, 'has been waiting for metadata lock on',CONCAT(table_schema,'.', table_name),'for', t.PROCESSLIST_TIME, 'seconds to execute', CONCAT_WS('','[',t.PROCESSLIST_INFO,']')) AS 'Other queries waiting for metadata lock' FROM performance_schema.metadata_locks mdl JOIN performance_schema.threads t ON t.THREAD_ID = mdl.OWNER_THREAD_ID WHERE mdl.LOCK_STATUS='PENDING' AND mdl.OBJECT_SCHEMA = table_schema and mdl.OBJECT_NAME = table_name AND mdl.OWNER_THREAD_ID AND t.PROCESSLIST_ID <> id ; END WHILE; CLOSE curMdl; END// delimiter ;

Testing

Now, let’s call the procedure to see if there are threads waiting for metadata locks:

mysql> CALL test.procShowMetadataLockSummary(); +----------------------------------------------------------------------------------------------------------------+ | Process(es) that have the metadata lock | +----------------------------------------------------------------------------------------------------------------+ | PID 10 has metadata lock on sbtest.sbtest with current state [] since 274 seconds and is currently running [] | | PID 403 has metadata lock on sbtest.sbtest with current state [] since 291 seconds and is currently running [] | +----------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec) +------------------------------------------------------------------------------------------------------------------------+ | Oldest process waiting for metadata lock | +------------------------------------------------------------------------------------------------------------------------+ | PID 1264 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [truncate table sbtest.sbtest] | +------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) +---------------------------------------------------------------------------------------------------------------------------+ | Other queries waiting for metadata lock | +---------------------------------------------------------------------------------------------------------------------------+ | PID 1269 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1270 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1271 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1272 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | | PID 1273 has been waiting for metadata lock on sbtest.sbtest for 264 seconds to execute [SELECT c from sbtest where id=?] | +---------------------------------------------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec)

So, as you can see above, you have several choices. You could (a) do nothing and wait for threads 10 and 403 to complete and then thread 1264 can get the lock.

If you can’t wait, you can (b) kill the threads that have the metadata lock so that the TRUNCATE TABLE in thread 1264 can get the lock. Although, before you decide to kill threads 10 and 403, you should check SHOW ENGINE INNODB STATUS to see if the undo log entries for those threads are high. If they are, rolling back these transactions might take a long time.

Lastly, you can instead (c) kill the DDL thread 1264 to free up other queries. You should then reschedule the DDL to run during offpeak hours.

Happy metadata lock hunting!

Categories: MySQL

Using Percona XtraBackup on a MySQL Instance with a Large Number of Tables

MySQL Performance Blog - Wed, 2016-12-28 16:51

In this blog post, we’ll find out how to use Percona XtraBackup on a MySQL instance with a large number of tables.

As of Percona Xtrabackup 2.4.5, you are required to have enough open files to open every single InnoDB tablespace in the instance you’re trying to back up. So if you’re running innodb_file_per_table=1, and have a large number of tables, you’re very likely to see Percona XtraBackup fail with the following error message:

InnoDB: Operating system error number 24 in a file operation. InnoDB: Error number 24 means 'Too many open files' InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html InnoDB: File ./sbtest/sbtest132841.ibd: 'open' returned OS error 124. Cannot continue operation InnoDB: Cannot continue operation.

If you run into this issue, here is what you need to do:

  1. Find out how many files you need:

root@ts140i:~# find /var/lib/mysql/ -name "*.ibd" | wc -l 1000005

I would add at least another 1000 to this number for system tablespace and other miscellaneous open file needs. You might want to go even higher to accommodate for a growing number of tables.

  1. Check the maximum number of files you can keep open in the system. If this number is too small Percona Xtrabackup might monopolize the open files in the system, causing other processes to fail when they try to open files. This can cause MySQL Server to crash, and other processes to fail.

root@ts140i:/mnt/data/backup# cat /proc/sys/fs/file-max 3262006

If you need to, here is how to  increase the number:

sysctl -w fs.file-max=5000000 echo "fs.file-max=5000000" >> /etc/sysctl.conf

  1. Increase the limit on the number of files the Percona XtraBackup process can open:

The best way to do this is using --open-files-limit option. For example, you can specify the following in your my.cnf:

[xtrabackup] open-files-limit=2000000

Alternatively, you can pass it as a command-line option, or run ulimit -n 2000000 before running the backup command.

You need to be sure your user account has permissions to set open files limit this high. If you are doing backups under the “root” user, it shouldn’t be a problem. Otherwise, you might need to adjust the limits in  /etc/security/limits.conf:

mysql hard nofile 2000000 mysql soft nofile 2000000

Specifying a “soft” limit in this file eliminates the need to run ulimit before Percona XtraBackup, or specifying it in the configuration.

  1. There is one more possible limit to overcome. Even running as a root user, you might get the following error message:

root@ts140i:/mnt/data/backup# ulimit -n 2000000 -su: ulimit: open files: cannot modify limit: Operation not permitted

If this happens, you might need to increase the kernel limit on the number of processes any can have:

pz@ts140i:~$ cat /proc/sys/fs/nr_open 1048576

The limit I have on this system is slightly above 1 million. You can increase it using the following:

sysctl -w fs.nr_open=2000000 echo "fs.nr_open=2000000" >> /etc/sysctl.conf

With these configuration adjustments, you should be able to use Percona XtraBackup to backup MySQL instances containing millions of tables without problems.

What if you can’t allow Percona XtraBackup to open that many files? Then there is the option –close-files that won’t normally require increasing the limit to the number of open files. Using this option, however, might cause the backup corruption if you’re doing DDL operations during the backup.

From where does this strange limitation requiring you to keep all tablespaces open come? It comes from this issue. In some cases, DDL operations such as RENAME TABLE might cause the wrong file to be copied, and unable to be caught up by replying to InnoDB redo logs. Keeping the file open clearly shows which file corresponds to a given tablespace at the start of a backup process, and gets handled correctly.

This problem is not unique to Percona XtraBackup. If anything, Percona Xtrabackup goes the extra mile to ensure database backups are safe.  For comparison, MySQL Enterprise Backup 4.0  simply states:  

Do not run the DDL operations ALTER TABLE, TRUNCATE TABLE, OPTIMIZE TABLE, REPAIR TABLE, RESTORE TABLE or CREATE INDEX while a backup operation is going on. The resulting backup might become corrupted.”

Categories: MySQL

Webinar Thursday December 29: JSON in MySQL 5.7

MySQL Performance Blog - Tue, 2016-12-27 22:44

Please join Percona’s Consultant David Ducos on Thursday, December 29, 2016 at 10 am PST/ 1:00 pm EST (UTC-8) as he presents JSON in MySQL 5.7.

Since it was implemented in MySQL 5.7, we can use JSON as a data type. In this webinar, we will review some of the useful functions that have been added to work with JSON.

We will examine and analyze how JSON works internally, and take into account some of the costs related to employing this new technology. 

At the end of the webinar, you will know the answers to the following questions: 

  • What is JSON?
  • Why don’t we keep using VARCHAR?
  • How does it work? 
  • What are the costs?
  • What limitations should we take into account?
  • What are the benefits of using MySQL JSON support?

Register for the webinar here.

David Ducos, Percona Consultant

David studied Computer Science at the National University of La Plata, and has worked as a Database Consultant since 2008. He worked for three years in a worldwide platform of free classifieds, until starting work for Percona in November 2014 as part of the Consulting team.

Categories: MySQL

Don’t Let a Leap Second Leap on Your Database!

MySQL Performance Blog - Tue, 2016-12-27 21:00

This blog discusses how to prepare your database for the new leap second coming in the new year.

At the end of this year, on December 31, 2016, a new leap second gets added. Many of us remember the huge problems this caused back in 2012. Some of our customers asked how they should prepare for this year’s event to avoid any unexpected problems.

It’s a little late, but I thought discussing the issue might still be useful.

The first thing is to make sure your systems avoid the issue with abnormally high CPU usage. This was an problem in 2012 due to a Linux kernel bug. After the leap second was added, CPU utilization sky-rocketed on many systems, taking down many popular sites. This issue was addressed back in 2012, and similar global problems did not occur in 2015 thanks to those fixes. So it is important to make sure you have an up-to-date Linux kernel version.

It’s worth knowing that in the case of any unpredicted system misbehavior from the leap second problem, the quick remedy for the CPU overheating was restarting services or rebooting servers (in the worst case).

(Please do not reboot the server without being absolutely sure that your serious problems started exactly when the leap second was added.)

The following are examples of bug records:

The second thing is to add proper support for the upcoming event. Leap second additions are announced some time before they are implemented, as it isn’t known exactly when the next one will occur for sure.

Therefore, you should upgrade your OS tzdata package to prepare your system for the upcoming leap second. This document shows how to check if your OS is already “leap second aware”:

zdump -v right/America/Los_Angeles | grep Sat.Dec.31.*2016

A non-updated system returns an empty output. On an updated OS, you should receive something like this:

right/America/Los_Angeles Sat Dec 31 23:59:60 2016 UTC = Sat Dec 31 15:59:60 2016 PST isdst=0 gmtoff=-28800 right/America/Los_Angeles Sun Jan 1 00:00:00 2017 UTC = Sat Dec 31 16:00:00 2016 PST isdst=0 gmtoff=-28800

If your systems use the NTP service though, the above is not necessary (as stated in https://access.redhat.com/solutions/2441291). Still, you should make sure that the NTP services you use are also up-to-date.

With regards to leap second support in MySQL there is nothing to do, regardless of the version. MySQL doesn’t allow an extra second numeration within the 60 seconds part of timestamp datatype, so you should expect rows with 59 instead of 60 seconds when the additional second is added, as described here: https://dev.mysql.com/doc/refman/5.7/en/time-zone-leap-seconds.html

Similarly, MongoDB expects no serious problems either.

Let’s “smear” the second

Many big Internet properties, however, introduced a technique to adapt to the leap second change more gracefully and smoothly, called Leap Smear or Slew. Instead of introducing the additional leap second immediately, the clock slows down a bit, allowing it to gradually get in sync with the new time. This way there is no issue with extra abnormal second notation, etc.

This solution is used by Google, Amazon, Microsoft, and others. You can find a comprehensive document about Google’s use here: https://developers.google.com/time/smear

You can easily introduce this technique with the ntpd -x or Chronyd slew options, which are nicely explained in this document: https://developers.redhat.com/blog/2015/06/01/five-different-ways-handle-leap-seconds-ntp/

Summary

Make sure you have your kernel up-to-date, NTP service properly configured and consider using the Slew/Smear technique to make the change easier. After the kernel patches in 2012, no major problems happened in 2015. We expect none this year either (especially if you take time to properly prepare).

Categories: MySQL

Percona Server for MongoDB 3.4 Beta is now available

MySQL Performance Blog - Fri, 2016-12-23 14:43

Percona is pleased to announce the release of Percona Server for MongoDB 3.4.0-1.0beta on December 23, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

NOTE: Beta packages are available from testing repository.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as adding features like external authentication, audit logging, and profiling rate limiting. Percona Server for MongoDB requires no changes to MongoDB applications or code.

This beta release is based on MongoDB 3.4.0 and includes the following additional changes:

  • Red Hat Enterprise Linux 5 and derivatives (including CentOS 5) are no longer supported.
  • MongoRocks is now based on RocksDB 4.11.
  • PerconaFT and TokuBackup were removed.
    As alternatives, we recommend using MongoRocks for write-heavy workloads and Hot Backup for physical data backups on a running server.

Percona Server for MongoDB 3.4.0-1.0beta release notes are available in the official documentation.

 

Categories: MySQL

Percona Blog Poll: What Programming Languages are You Using for Backend Development?

MySQL Performance Blog - Wed, 2016-12-21 18:53

Take Percona’s blog poll on what programming languages you’re using for backend development.

While customers and users focus and interact with applications and websites, these are really just the tip of the iceberg for the whole end-to-end system that allows applications to run. The backend is what makes a website or application work. The backend has three parts to it: server, application, and database. A backend operation can be a web application communicating with the server to make a change in a database stored on a server. Technologies like PHP, Ruby, Python, and others are the ones backend programmers use to make this communication work smoothly, allowing the customer to purchase his or her ticket with ease.

Backend programmers might not get a lot of credit, but they are the ones that design, maintain and repair the machinery that powers a system.

Please take a few seconds and answer the following poll on backend programming languages. Which are you using? Help the community learn what languages help solve critical database issues. Please select from one to six languages as they apply to your environment.

If you’re using other languages, or have specific issues, feel free to comment below. We’ll post a follow-up blog with the results!

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Categories: MySQL

Percona Poll Results: What Database Technologies Are You Using?

MySQL Performance Blog - Wed, 2016-12-21 18:46

This blog shows the results from Percona’s poll on what database technologies our readers use in their environment.

We design different databases for different scenarios. Using one database technology for every situation doesn’t make sense, and can lead to non-optimal solutions for common issues. Big data and IoT applications, high availability, secure backups, security, cloud vs. on-premises deployment: each have a set of requirements that might need a special technology. Relational, document-based, key-value, graphical, column family – there are many options for many problems. More and more, database environments combine more than one solution to address the various needs of an enterprise or application (known as polyglot persistence).

The following are the results of our poll on database technologies:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

We’ve concluded our database technology poll that looks at the technologies our readers are running in 2016. Thank you to the more than 1500 people who responded! Let’s look at what the poll results tell us, and how they compare to the similar poll we did in 2013.

Since the wording of the two poll questions is slightly different, the results won’t be directly comparable.  

First, let’s set the record straight: this poll does not try to be an unbiased, open source database technology poll. We understand our audience likely has many more MySQL and MongoDB users than other technologies. So we should look at the poll results as “how MySQL and MongoDB users look at open source database technology.”

It’s interesting to examine which technologies we chose to include in our 2016 poll, compared to the 2013 poll. The most drastic change can be seen in the full-text search technologies. This time, we decided not to include Lucene and Sphinx this time. ElasticSearch, which wasn’t included back in 2013, is now the leading full-text search technology. This corresponds to what we see among our customers.

The change between Redis versus Memcached is also interesting. Back in 2013, Memcached was the clear supporting technology winner. In 2016, Redis is well ahead.

We didn’t ask about PostgreSQL back in 2013 (few people probably ran PostgreSQL alongside MySQL then). Today our poll demonstrates its very strong showing.

We are also excited to see MongoDB’s strong ranking in the poll, which we interpret both as a result of the huge popularity of this technology and as recognition of our success as MongoDB support and services provider. We’ve been in the MongoDB solutions business for less than two years, and already seem to have a significant audience among MongoDB users.

In looking at other technologies mentioned, it is interesting to see that Couchbase and Riak were mentioned by fewer people than in 2013, while Cassandra came in about the same. I don’t necessarily see it as diminishing popularity for these technologies, but as potentially separate communities forming that don’t extensively cross-pollinate.

Kafka also deserves special recognition: with the initial release in January 2011, it gets a mention back in our 2013 poll. Our current poll shows it at 7%. This is a much larger number than might be expected, as Kafka is typically used in complicated, large-scale applications.

Thank you for participating!

Categories: MySQL

Installing Percona Monitoring and Management on Google Container Engine (Kubernetes)

MySQL Performance Blog - Wed, 2016-12-21 18:19

This blog discusses installing Percona Monitoring and Management on Google Container Engine.

I am working with a client that is on Google Cloud Services (GCS) and wants to use Percona Monitoring and Management (PMM). They liked the idea of using Google Container Engine (GKE) to manage the docker container that pmm-server uses.

The regular install instructions are here: https://www.percona.com/doc/percona-monitoring-and-management/install.html

Since Google Container Engine runs on Kubernetes, we had to do some interesting changes to the server install instructions.

First, you will want to get the gcloud shell. This is done by clicking the gcloud shell button at the top right of your screen when logged into your GCS project.

Once you are in the shell, you just need to run some commands to get up and running.

Let’s set our availability zone and region:

manjot_singh@googleproject:~$ gcloud config set compute/zone asia-east1-c Updated property [compute/zone].

Then let’s set up our auth:

manjot_singh@googleproject:~$ gcloud auth application-default login ... These credentials will be used by any library that requests Application Default Credentials.

Now we are ready to go.

Normally, we create a persistent container called pmm-data to hold the data the server collects and survive container deletions and upgrades. For GCS, we will create persistent disks, and use the minimum (Google) recommended size for each.

manjot_singh@googleproject:~$ gcloud compute disks create --size=200GB --zone=asia-east1-c pmm-prom-data-pv Created [https://www.googleapis.com/compute/v1/projects/googleproject/zones/asia-east1-c/disks/pmm-prom-data-pv]. NAME              ZONE          SIZE_GB  TYPE         STATUS pmm-prom-data-pv  asia-east1-c  200      pd-standard  READY manjot_singh@googleproject:~$ gcloud compute disks create --size=200GB --zone=asia-east1-c pmm-consul-data-pv Created [https://www.googleapis.com/compute/v1/projects/googleproject/zones/asia-east1-c/disks/pmm-consul-data-pv]. NAME                ZONE          SIZE_GB  TYPE         STATUS pmm-consul-data-pv  asia-east1-c  200      pd-standard  READY manjot_singh@googleproject:~$ gcloud compute disks create --size=200GB --zone=asia-east1-c pmm-mysql-data-pv Created [https://www.googleapis.com/compute/v1/projects/googleproject/zones/asia-east1-c/disks/pmm-mysql-data-pv]. NAME               ZONE          SIZE_GB  TYPE         STATUS pmm-mysql-data-pv  asia-east1-c  200      pd-standard  READY manjot_singh@googleproject:~$ gcloud compute disks create --size=200GB --zone=asia-east1-c pmm-grafana-data-pv Created [https://www.googleapis.com/compute/v1/projects/googleproject/zones/asia-east1-c/disks/pmm-grafana-data-pv]. NAME                 ZONE          SIZE_GB  TYPE         STATUS pmm-grafana-data-pv  asia-east1-c  200      pd-standard  READY

Ignoring messages about disk formatting, we are ready to create our Kubernetes cluster:

manjot_singh@googleproject:~$ gcloud container clusters create pmm-server --num-nodes 1 --machine-type n1-standard-2 Creating cluster pmm-server...done. Created [https://container.googleapis.com/v1/projects/googleproject/zones/asia-east1-c/clusters/pmm-server]. kubeconfig entry generated for pmm-server. NAME ZONE MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS pmm-server asia-east1-c 1.4.6 999.911.999.91 n1-standard-2 1.4.6 1 RUNNING

You should now see something like:

manjot_singh@googleproject:~$ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS gke-pmm-server-default-pool-73b3f656-20t0 asia-east1-c n1-standard-2 10.14.10.14 911.119.999.11 RUNNING

Now that our container manager is up, we need to create 2 configs for the “pod” we are creating to run our container. One will be used only to initialize the server and move the container drives to the persistent disks and the second one will be the actual running server.

manjot_singh@googleproject:~$ vi pmm-server-init.json {   "apiVersion": "v1",   "kind": "Pod",   "metadata": {       "name": "pmm-server",       "labels": {           "name": "pmm-server"       }   },   "spec": {     "containers": [{         "name": "pmm-server",         "image": "percona/pmm-server:1.0.6",         "env": [{                 "name":"SERVER_USER",                 "value":"http_user"             },{                 "name":"SERVER_PASSWORD",                 "value":"http_password"             },{                 "name":"ORCHESTRATOR_USER",                 "value":"orchestrator"             },{                 "name":"ORCHESTRATOR_PASSWORD",                 "value":"orch_pass"             }         ],         "ports": [{             "containerPort": 80             }         ],         "volumeMounts": [{           "mountPath": "/opt/prometheus/d",           "name": "pmm-prom-data"         },{           "mountPath": "/opt/c",           "name": "pmm-consul-data"         },{           "mountPath": "/var/lib/m",           "name": "pmm-mysql-data"         },{           "mountPath": "/var/lib/g",           "name": "pmm-grafana-data"         }]       }     ],     "restartPolicy": "Always",     "volumes": [{       "name":"pmm-prom-data",       "gcePersistentDisk": {           "pdName": "pmm-prom-data-pv",           "fsType": "ext4"       }     },{       "name":"pmm-consul-data",       "gcePersistentDisk": {           "pdName": "pmm-consul-data-pv",           "fsType": "ext4"       }     },{       "name":"pmm-mysql-data",       "gcePersistentDisk": {           "pdName": "pmm-mysql-data-pv",           "fsType": "ext4"       }     },{       "name":"pmm-grafana-data",       "gcePersistentDisk": {           "pdName": "pmm-grafana-data-pv",           "fsType": "ext4"       }     }]   } }

manjot_singh@googleproject:~$ vi pmm-server.json { "apiVersion": "v1", "kind": "Pod", "metadata": { "name": "pmm-server", "labels": { "name": "pmm-server" } }, "spec": { "containers": [{ "name": "pmm-server", "image": "percona/pmm-server:1.0.6", "env": [{ "name":"SERVER_USER", "value":"http_user" },{ "name":"SERVER_PASSWORD", "value":"http_password" },{ "name":"ORCHESTRATOR_USER", "value":"orchestrator" },{ "name":"ORCHESTRATOR_PASSWORD", "value":"orch_pass" } ], "ports": [{ "containerPort": 80 } ], "volumeMounts": [{ "mountPath": "/opt/prometheus/data", "name": "pmm-prom-data" },{ "mountPath": "/opt/consul-data", "name": "pmm-consul-data" },{ "mountPath": "/var/lib/mysql", "name": "pmm-mysql-data" },{ "mountPath": "/var/lib/grafana", "name": "pmm-grafana-data" }] } ], "restartPolicy": "Always", "volumes": [{ "name":"pmm-prom-data", "gcePersistentDisk": { "pdName": "pmm-prom-data-pv", "fsType": "ext4" } },{ "name":"pmm-consul-data", "gcePersistentDisk": { "pdName": "pmm-consul-data-pv", "fsType": "ext4" } },{ "name":"pmm-mysql-data", "gcePersistentDisk": { "pdName": "pmm-mysql-data-pv", "fsType": "ext4" } },{ "name":"pmm-grafana-data", "gcePersistentDisk": { "pdName": "pmm-grafana-data-pv", "fsType": "ext4" } }] } }

Then create it:

manjot_singh@googleproject:~$ kubectl create -f pmm-server-init.json pod "pmm-server" created

Now we need to move data to persistent disks:

manjot_singh@googleproject:~$ kubectl exec -it pmm-server bash root@pmm-server:/opt# supervisorctl stop grafana grafana: stopped root@pmm-server:/opt# supervisorctl stop prometheus prometheus: stopped root@pmm-server:/opt# supervisorctl stop consul consul: stopped root@pmm-server:/opt# supervisorctl stop mysql mysql: stopped root@pmm-server:/opt# mv consul-data/* c/ root@pmm-server:/opt# chown pmm.pmm c root@pmm-server:/opt# cd prometheus/ root@pmm-server:/opt/prometheus# mv data/* d/ root@pmm-server:/opt/prometheus# chown pmm.pmm d root@pmm-server:/var/lib# cd /var/lib root@pmm-server:/var/lib# mv mysql/* m/ root@pmm-server:/var/lib# chown mysql.mysql m root@pmm-server:/var/lib# mv grafana/* g/ root@pmm-server:/var/lib# chown grafana.grafana g root@pmm-server:/var/lib# exit manjot_singh@googleproject:~$ kubectl delete pods pmm-server pod "pmm-server" deleted

Now recreate the pmm-server container with the actual configuration:

manjot_singh@googleproject:~$ kubectl create -f pmm-server.json pod "pmm-server" created

It’s up!

Now let’s get access to it by exposing it to the internet:

manjot_singh@googleproject:~$ kubectl expose deployment pmm-server --type=LoadBalancer service "pmm-server" exposed

You can get more information on this by running:

manjot_singh@googleproject:~$ kubectl describe services pmm-server Name: pmm-server Namespace: default Labels: run=pmm-server Selector: run=pmm-server Type: LoadBalancer IP: 10.3.10.3 Port: <unset> 80/TCP NodePort: <unset> 31757/TCP Endpoints: 10.0.0.8:80 Session Affinity: None Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 22s 22s 1 {service-controller } Normal CreatingLoadBalancer Creating load balancer

To find the public IP of your PMM server, look under “EXTERNAL-IP”

manjot_singh@googleproject:~$ kubectl get services NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes 10.3.10.3 <none> 443/TCP 7m pmm-server 10.3.10.99 999.911.991.91 80/TCP 1m

That’s it, just visit the external IP in your browser and you should see the PMM landing page!

One of the things we didn’t resolve was being able to access the pmm-server container within the vpc. The client had to go through the open internet and hit PMM via the public IP. I hope to work on this some more and resolve this in the future.

I have also talked to our team about making mounts for persistent disks easier so that we can use less mounts and make the configuration and setup easier.

 

 

Categories: MySQL

Securing MongoDB Instances

MySQL Performance Blog - Mon, 2016-12-19 17:53

In this blog post we’ll look at how to go about securing MongoDB instances.

Authentication is one of the most important features of a database, and MongoDB supports it in different ways. Although it allows you to work without any authentication, the best practice is to enable authentication and give users only the permissions they need.

Instances without authentication allow all users to perform all the operations they feel like, and that is not safe at all.

Native Authentication

The MongoDB community version features more than one authentication method: SCRAM-SHA-1, MONGODB-CR, and x.509 Certificate Authentication. The current default method is SCRAM-SHA-1. The version prior to 3.0 used to have MONGODB-CR as its default method.

Percona Server for MongoDB also offers LDAP authentication free of charge where this feature is only available in the MongoDB enterprise version.

SCRAM-SHA-1 and MONGODB-CR check whether the user/password exists against a specific database and use challenge response authentication to verify user’s authenticity.

The x.509 authentication is based on certificates. It does not run challenge response algorithms. This method instead validates a certificate to prove client’s authenticity. It depends on a certificate authority and each client must have a valid certificate.

LDAP Authentication

The LDAP authentication uses an external LDAP server to authenticate the user by using authsasld in Linux. LDAP is commonly used to manage users in a network. There are advantages and disadvantages when using LDAP. One advantage is that it centralizes users. However, it depends on network connectivity to check user credentials and sasld tries to help with caching but it does have limitations. Please see further details here.

There are two different internal authentication methods for replica-set and shadings, where instances need to prove that they are expected members of the deployment. The first method is using a shared keyfile for all instances, and the second one is using a different x.509 certificate for each instance. It is important to know x.509 forces proper SSL coverage or replication while a key will not, but we will cover this topic in a different blog post.

Authorization and Roles

Once authenticated, users must be allowed to perform commands against the instance/replica-set/sharding. There are a few built-in roles that are able to cover almost all the user cases, and creating a user defined role is possible.
The current built-in roles are:

read readWrite dbAdmin dbOwner userAdmin clusterAdmin clusterManager clusterMonitor hostManager backup/restore readAnyDatabase readWriteAnyDatabase userAdminAnyDatabase dbAdminAnyDatabase root and many more…

There is also the __system role, which is solely used for internal purposes.

Customer user and role by example

This shows how to both enable MongoDB authentication and create a user-defined role, where the user will only be able to read a specific collection. We are using tarballs for testing only. To perform a production installation please follow our docs.

  1. Download Percona Server MongoDB: >wget https://www.percona.com/downloads/percona-server-mongodb-3.2/percona-server-mongodb-3.2.10-3.0/binary/tarball/percona-server-mongodb-3.2.10-3.0-trusty-x86_64.tar.gz >tar -xvzf percona-server-mongodb-3.2.10-3.0-trusty-x86_64.tar.gz >mv percona-server-mongodb-3.2.10-3.0/ perc-mongodb >cd perc-mongodb/ >mkdir bin/data
  2. Start the service with authentication: cd bin ./mongod --dbpath data --smallfiles --logpath data/mongod.log --fork --auth
  3. Create root/admin user:

    We are able to create the first user without authentication. The next users must be created by an authenticated user.

    > use admin > db.createUser({user : 'administrator', pwd : '123', roles : ['root'] }) > Successfully added user: { "user" : "administrator", "roles" : [ "root" ] }
  4. Login with the just created credentials: >mongo --authenticationDatabase admin -u administrator -p
  5. Create database and collection: > use percona > db.simple_collection.insert({ random_number : Math.random()}) > db.secure_collection.insert({ name : 'John', annual_wage : NumberLong(130000.00), target_bonus : NumberLong(15000.00)})
  6. create a user that can read all the collections in the percona database: db.createUser( { user: "app_read", pwd: "123456", roles: [ { role: "read", db: "percona" }]}) // testing ./mongo --authenticationDatabase admin -u app_read -p MongoDB shell version: 3.2.10-3.0 Enter password: connecting to: test > use percona switched to db percona > show collections foo secure_collection > db.employee_dependents.find() { "_id" : ObjectId("583c5afe38c4be98b24e86e6"), "emp_id" : DBRef(‘employees’,’583c5bea38c4be98b24e86e8’} > db.employees.find() { "_id" : ObjectId("583c5bea38c4be98b24e86e8"), "name" : "John", "annual_wage" : NumberLong(130000), "target_bonus" : NumberLong(15000) }
  7. Now we see that this user can read not only the simple_collection but also the secury_collection. We don’t want users to read the secury_collection, so we are going to create a user-defined role. > db.createRole( { role : 'readOnly_nonPrivilegedCollections', roles : [], privileges: [ { resource: { db: "percona", collection: "foo" }, actions: [ "find"] } ] });
  8. Assign created role to the user: db.createUser( { user: "app_non_secure", pwd: "123456", roles: [ { role: "readOnly_nonPrivilegedCollections", db: 'admin' }] })
  9. Test access: ./mongo --authenticationDatabase admin -u app_non_secure -p > db.foo.find() { "_id" : ObjectId("583c5afe38c4be98b24e86e6"), "random_number" : 0.2878080930921183 } > db.secure_collection.find() Error: error: { "ok" : 0, "errmsg" : "not authorized on percona to execute command { find: "secure_collection", filter: {} }", "code" : 13 }

Please feel free to ping us on Twitter @percona with any questions and suggestions for securing MongoDB instances.

Categories: MySQL

MongoDB PIT Backups In Depth

MySQL Performance Blog - Fri, 2016-12-16 18:54

In this blog is an in-depth discussion of MongoDB PIT backups.

Note: INTIMIDATION FREE ZONE!! This post is meant to give the reader most of the knowledge that is needed to understand the material. This includes basic MongoDB knowledge and other core concepts. I have tried to include links for further research where I can. Please use them where you need background and ask questions. We’re here to help!

Intro

In this two-part series, we’re going to fill in some of the gaps you might have to help you get awesome Point-in-Time (PIT) backups in MongoDB. These procedures will work for both MongoDB and Percona Server for MongoDB. This is meant to be a prequel to David Murphy’s MongoDB PIT Backup blog post. In that blog, David shows you how to take and to restore a dump up until a problem operation happened. If you haven’t read that post yet, hold off until you read this one. This foundation will help you better understand the how and why of the necessary steps. Let’s move onto some core concepts in MongoDB – but first, let me tell you what to expect in each part.

Blog 1 (this one): Core concepts – replica set backups, problem statement and solution

Blog 2: Getting Shardy – why backup consistency is tough and how to solve it

Core Concepts

Replica Set (process name: mongod) – MongoDB uses replica sets to distribute data for DR purposes. Replica sets are made up of primaries, secondaries and/or arbiters. These are much like master/slave pairs in MySQL, but with one big caveat. There’s an election protocol included that also handles failover! That’s right, high availability (HA) too! So, in general, there is a “rule of three” when thinking about the minimum number of servers to put in your replica sets. This is necessary to avoid a split-brain scenario.

Oplog (collection name: local.oplog.rs) – The oplog is the log that records changes to the data on the MongoDB primary (secondaries also have an oplog). Much like MySQL, the non-primary members (secondaries) pull operations from the oplog and apply them to their own collections. Secondaries can pull from any member of the replica set that is ahead of them. The operations in the oplog are idempotent, meaning they always result in the same change to the database no matter how many times they’re performed.

Sharding (process name: mongos) – MongoDB also has built in horizontal scalability. This is implemented in a “shared nothing” architecture. A sharded cluster is made up of several replica sets. Each replica set contains a unique range of data. The data is distributed amongst replica sets based on a sharding key. There is also a sharding router (mongos) that runs as a routing umbrella over the cluster. In a sharded setup the application solely interfaces with the sharding router (never the replica sets themselves). This is the main function for scaling reads or writes in MongoDB. Scaling both takes very thoughtful design, but may not be possible.

Mongodump (utility name: mongodump) – MongoDB has built in database dump utility that can interface with mongod or mongos. Mongodump can also use the oplog of the server that it is run on to create a consistent point in time backup by using a “roll forward” strategy.

Mongorestore (utility name: mongorestore) – MongoDB has a built in database restore utility. Mongorestore is a rather simple utility that will replay binary dumps created by mongodump. When used with –oplogReplay when restoring a dump made with mongodump’s –oplog switch, it can make for a very functional backup facility.

Tip: make sure that user permissions are properly defined when using –oplogReplay – besides restore, anyAction and anyResource need to be granted.

OK, So What?

We’re going to first need to understand how backups work in a simple environment (a single replica set). Things are going to get much more interesting when we look at sharded clusters in the next post.

Backing Up

In a single replica set, things are pretty simple. There is no sharding router to deal with. You can get an entire data set by interacting with one server. The only problem that you need to deal with is the changes that are being made to the database while your mongodump is in process. If the concept of missed operations is a little fuzzy to you, just consider this simple use case:

We’re going to run a mongodump, and we have a collection with four documents:

We start mongodump on this collection. We’re also running our application at the same time, because we can’t take down production. Mongodump scans from first to last in the collection (like a range scan based on ID). In this case mongodump has backed up of all documents from id:1 through id:4

At this same moment in time, our application inserts id:3 into the collection.

Is the document with id:3 going to be included in the mongodump? The answer is: most likely not. The problem is that you would expect it to be in the completed dump. However, if you need to restore this backup, you’re going to lose id:3. Now, this is perfectly normal in Disaster Recovery scenarios. Knowing that this is happening is the key part. Your backups will have the consistency of swiss cheese if you don’t have a way to track changes being made while the backup is running. Unknown data loss is one of the worst situations one can be in. What we need is a process to capture changes while the backup is running.

Here’s where the inclusion of the –oplog flag is very important. The –oplog flag will allow mongodump to capture changes that are being made to the database while the backup is running. Since the oplog is idempotent, there is chance that we’ll change the data during a restore. This gives the mongodump a consistent snapshot of when the dump completes, like most “clone” type operations.

Restoring

When running mongorestore, you can use the –oplogReplay option. Using oplog recovers to the point in time when the dump completed. Back to the use case, we may not capture id:3 on the first pass in this case, but as long as we’ve captured the oplog up until the backup completes, we’ll have id:3 available. When replaying the oplog during mongorestore, we will basically re-run the insert operation, completing the dataset. The oplog BSON timestamps all entries, so we know for sure until what point in time we’ve captured.

TIP: If you need to convert the timestamp to something human-readable, here’s something helpful

The Wrap Up

Now we have a firm understanding of the problem. Once we understand the problem, we can easily design a solution to ensure our backups have the integrity that our environment demands. In the next post, we’re going to step up the complexity by examining backups in conjunction with the most complex feature MongoDB has: sharding. Until then, post your feedback and questions in the comments section below.

Categories: MySQL

Percona Live 2017 Sneak Peek Schedule Up Now! See the Available Sessions!

MySQL Performance Blog - Fri, 2016-12-16 15:05

We are excited to announce that the sneak peek schedule for the Percona Live 2017 Open Source Database Conference is up! The Percona Live Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

The Percona Live Open Source Database Conference 2017 is the premier event for the rich and diverse MySQL, MongoDB and open source database ecosystems. This conference provides an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience.

Below are some of our top picks for MySQL, MongoDB and open source database sessions:

Tutorials

MySQL 101 Tracks

MongoDB 101 Tracks

Breakout Talks

Register for the Percona Live Open Source Database Conference here.

Early Bird Discounts

Just a reminder to everyone out there: our Early Bird discount rate for the Percona Live Open Source Database Conference 2017 is only available ‘til January 8, 2017, 11:30 pm PST! This rate gets you all the excellent and amazing opportunities that Percona Live offers, at a very reasonable price!

Sponsor Percona Live

Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, MongoDB and open source database event. Sponsors become a part of a dynamic and growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Categories: MySQL

Percona XtraDB Cluster 5.7.16-27.19 is now available

MySQL Performance Blog - Thu, 2016-12-15 19:21

Percona announces the release of Percona XtraDB Cluster 5.7.16-27.19 on December 15, 2016. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.7.16-27.19 is now the current release, based on the following:

All Percona software is open-source and free.

Deprecated

  • The following encryption modes are now deprecated:
    • encrypt=1
    • encrypt=2
    • encrypt=3

The default is encrypt=0 with encryption disabled. The recommended mode now is the new encrypt=4, which uses SSL files generated by MySQL.

For more information, see Encrypting PXC Traffic.

New Features

  • Added encrypt=4 mode for SST encryption that uses SSL files generated by MySQL. Modes 1, 2, and 3 are now deprecated.
  • ProxySQL assisted maintenance mode that enables you to take a node down without adjusting ProxySQL manually. The mode is controlled using the pxc_maint_mode variable, which can be set to one of the following values:
    • DISABLED: This is the default state that tells ProxySQL to route traffic to the node as usual.
    • SHUTDOWN: This state is set automatically when you initiate node shutdown.
    • MAINTENANCE: You can change to this state if you need to perform maintenance on a node without shutting it down.

For more information, see Assisted Maintenance Mode.

  • Simplified SSL configuration for Galera/SST traffic with pxc-encrypt-cluster-traffic option, which auto-configures SSL encryption.

For more information, see SSL Automatic Configuration.

  • Added the wsrep_flow_control_interval status variable that displays the lower and upper limits of the flow control system used for the Galera receive queue.

Fixed Bugs

  • Optimized IST donor selection logic to avoid SST. Child processes are now cleaned-up and node state is resumed if SST fails.
  • Added init.ok to the list of files that do not get removed during SST.
  • Fixed error with ASIO library not acknowledging an EPOLLIN event when building Galera.
  • Fixed stalling of DML workload on slave node caused by FLUSH TABLE executed on the master.
    For more information, see 1629296.
  • Fixed super_read_only to not apply to Galera replication applier.
    For more information, see 1634295.
  • Redirected netcat output to stdout to avoid it in the log.
    For more information, see 1625968.
  • Enabled replication of ALTER USER statements.
    For more information, see 1376269.
  • Changed the wsrep_max_ws_rows variable to ignore non-replicated write-sets generated by DML action on temporary tables (explict or implicit).
    For more information, see 1638138.
  • Fixed SST to fail with an error if SSL is not supported by socat, instead of switching to unencrypted mode.
  • Fixed SST with SSL to auto-generate a 2048-bit dhparams file for versions of socat before 1.7.3. These older versions use 512-bit dhparams file by default that gets rejected by newer clients with dh key too small error.
  • PXC-731: Changed the wsrep_cluster_name variable to read-only, because changing it dynamically leads to high overhead.
    For more information, see 1620439.
  • PXC-732: Improved error message when any of the SSL files required for SST are missing.
  • PXC-735: Fixed SST to fail with an error when netcat is used (transferfmt=nc) with SSL encryption (encrypt set to 2, 3 or 4), instead of silently switching to unencrypted mode.
  • Fixed faulty switch case that caused cluster to stall when the repl.commit_order variable was set to 2 (LOCAL_OOOC mode that should allow out-of-order committing for local transactions).

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Categories: MySQL

Percona Live Featured Tutorial with Giuseppe Maxia — MySQL Document Store: SQL and NoSQL United

MySQL Performance Blog - Thu, 2016-12-15 18:38

Welcome to a new series of blogs: Percona Live featured tutorial speakers! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Giuseppe Maxia, Quality Assurance Architect at VMware. His tutorial is on MySQL Document Store: SQL and NoSQL United. MySQL 5.7 introduced document store, which allows asynchronous operations and native document store handling. Since the changes are huge, both developers and DBAs are uncertain about what is possible and how to do it.  I had a chance to speak with Giuseppe and learn a bit more about the MySQL document store feature:

Percona: How did you get into database technology? What do you love about it?

Giuseppe: I am fascinated by the ability to organize and dynamically retrieve data. I got my first experiences with databases several decades ago. Relational databases were not as dominant then as they are now. In an early job, I actually wrote a relational database interface in C, with the purpose of getting more flexibility than what I could get from commercial products. This happened several years before the creation of MySQL. This experience with the internals of a DBMS gave me some insight on how raw data, when appropriately processed, becomes useful information. All the jobs that I had since then related to database usage or development. With each job I learned something, and most of what I accumulated over the years is still relevant today when I use top-notch databases.

What I love today about databases is the same thing that made me start working with them: I see them as powerful tools to help people make order out of the chaos of their data.

Percona: Your tutorial is “MySQL Document Store: SQL and NoSQL United.” What exactly is MySQL document store, and why is it such an exciting new feature?

Giuseppe: The “Document Store” is a feature introduced as a plugin in MySQL 5.7. It is different from most anything that MySQL has done before, for two reasons:

  1. It is a feature added to a server that is already GA – not directly as a change in the server code, but as an addition that users need to enable. Document store is the first of several additions that will come using the same paradigm. It allows the MySQL team to add functionalities without waiting for the natural cycle of development, which usually takes a few years.
  2. It allows users to treat some of the data stored in MySQL as schema-less documents, i.e. data that does not have to be restricted by the stiff paradigm of rows and columns that are the foundation of relational databases. In a nutshell, by using this plugin we can write collections of heterogeneous documents instead of tables and relations. Moreover, we can handle the data using non-SQL languages, such as JavaScript and Python, with a syntax that is more natural to developers that are not very familiar with relational theory.

Why is this such an exciting feature? I think it’s an attempt by Oracle to lure no-SQL users into the MySQL arena. By offering the ability to combine structured and unstructured data into the same entity with a proven record of safety and stability, Oracle may have created the perfect match between relational educated DBAs and developers who usually think in terms of hierarchical or nested data structures.

Percona: How can the document store make DBAs’ easier? How more complicated?

Giuseppe: This depends on the organizational needs that the DBA has to address. There is a simplification, if the organization needs to deal with both structured and unstructured data. Instead of installing and maintaining two databases (e.g., MySQL and MongoDB) they can use just one.

What can go wrong? The plugin isn’t GA software (“Using MySQL as a document store is currently a preproduction feature”) and therefore DBAs should be ready to apply patches and take extra steps to keep the data safe, should a defect arise.

Percona: What benefits does document store hold for a business’ database environment?

Giuseppe: As mentioned before, it could be a simplification of overall operations. It exposes data as collections containing unstructured documents. This matches closely the kind of information that we deal with in many modern environments. Consider, for instance, current operations with cloud computing appliances: we mostly encode the data sent and received in such an environment as JSON or XML (which in turn can be easily converted into JSON.) Storing the documents retrieved from such operations directly as they are produced is a great advantage. A further benefit is the ability to index the data without converting it into structured tables, and retrieving information quickly and dynamically.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Giuseppe: The document store comes with a gargantuan amount of documentation. Kudos to the MySQL team for providing such detail on a new feature. However, the sheer size of the data might intimidate casual users who want to take advantage of the new feature. They might also fail to grasp the starting points. This tutorial’s main purpose is explaining the document store in simple terms, how to get started, and the common pitfalls.

Everyone who wants to deal with unstructured documents without maintaining two DBMS should attend. Developers will probably have more interest than DBAs, but there is food for everyone’s taste with the live demos.

On the practical side, the tutorial will show how data can get created in MySQL and consumed in MongoDB, and the other way around.

Percona: What are you most looking forward to at Percona Live?

Giuseppe: The MySQL world has been boiling over with new or enhanced features lately. I look forward to seeing the latest news about MySQL and related technologies. Percona Live is the place where MySQL professionals meet and exchange ideas. In addition to exposing myself to new things, though, I also enjoy seeing my friends in the MySQL world, and meeting new ones.

Want to find out more about Giuseppe and MySQL document store? Register for Percona Live Data Performance Conference 2017, and see his talk MySQL Document Store: SQL and NoSQL United. Use the code FeaturedTalk and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Categories: MySQL

Row Store and Column Store Databases

MySQL Performance Blog - Thu, 2016-12-15 00:35

In this blog post, we’ll discuss the differences between row store and column store databases.

Clients often ask us if they should or could be using columnar databases. For some applications, a columnar database is a great choice; for others, you should stick with the tried and true row-based option.

At a basic level, row stores are great for transaction processing. Column stores are great for highly analytical query models. Row stores have the ability to write data very quickly, whereas a column store is awesome at aggregating large volumes of data for a subset of columns.

One of the benefits of a columnar database is its crazy fast query speeds. In some cases, queries that took minutes or hours are completed in seconds. This makes columnar databases a good choice in a query-heavy environment. But you must make sure that the queries you run are really suited to a columnar database.

Data Storage

Let’s think about a basic database, like a stockbroker’s transaction records. In a row store, each client would have a record with their basic information – name, address, phone number, etc. – in a single table. It’s likely that each record would have a unique identifier. In our case, it would probably be an account_number.

There is another table that stored stock transactions. Again, each transaction is uniquely identified by something like a transaction_id. Each transaction is associated to one account_number, but each account_number is associated with multiple transactions. This provides us with a one-to-many relationship, and is a classic example of a transactional database.

We store all these tables on a disk and, when we run a query, the system might access lots of data before it determines what information is relevant to the specific query. If we want to know the account_number, first_name, last_name, stock, and purchase_price for a given time period, the system needs to access all of the information for the two tables, including fields that may not be relevant to the query. It then performs a join to relate the two tables’ data, and then it can return the information. This can be inefficient at scale, and this is just one example of a query that would probably run faster on a columnar database.

With a columnar database, each field from each table is stored in its own file or set of files. In our example database, all account_number data is stored in one file, all transaction_id data is stored in another file, and so on. This provides some efficiencies when running queries against wide tables, since it is unlikely that a query needs to return all of the fields in a single table. In the query example above, we’d only need to access the files that contained data from the requested fields. You can ignore all other fields that exist in the table. This ability to minimize i/o is one of the key reasons columnar databases can perform much faster.

Normalization Versus Denormalization

Additionally, many columnar databases prefer a denormalized data structure. In the example above, we have two separate tables: one for account information and one for transaction information. In many columnar databases, a single table could represent this information. With this denormalized design, when a query like the one presented is run, no joins would need to be processed in the columnar database, so the query will likely run much faster.

The reason for normalizing data is that it allows data to be written to the database in a highly efficient manner. In our row store example, we need to record just the relevant transaction details whenever an existing customer makes a transaction. The account information does not need to be written along with the transaction data. Instead, we reference the account_number to gain access to all of the fields in the accounts table.

The place where a columnar database really shines is when we want to run a query that would, for example, determine the average price for a specific stock over a range of time. In the case of the columnar database, we only need a few fields – symbol, price, and transaction_date – in order to complete the query. With a row store, we would gather additional data that was not needed for the query but was still part of the table structure.

Normalization of data also makes updates to some information much more efficient in a row store. If you change an account holder’s address, you simply update the one record in the accounts table. The updated information is available to all transactions completed by that account owner. In the columnar database, since we might store the account information with the transactions of that user, many records might need updating in order update the available address information.

Conclusion

So, which one is right for you? As with so many things, it depends. You can still perform data analysis with a row-based database, but the queries may run slower than they would on a column store. You can record transactions in a column-based model, but the writes may takes longer to complete. In an ideal world, you would have both options available to you, and this is what many companies are doing.

In most cases, the initial write is to a row-based system. We know them, we love them, we’ve worked with them forever. They’re kind of like that odd relative who has some real quirks. We’ve learned the best ways to deal with them.

Then, we write the data (or the relevant parts of the data) to a column based database to allow for fast analytic queries.

Both databases incurred write transactions, and both also likely incur read transactions. Due to the fact that a column-based database has each column’s data in a separate file, it is less than ideal for a “SELECT * FROM…” query, since the request must access numerous files to process the request. Similarly, any query that selects a single or small subset of files will probably perform better in a row store. The column store is awesome for performing aggregation over large volumes of data. Or when you have queries that only need a few fields from a wide table.

It can be tough to decide between the two if you only have one database. But it is more the norm that companies support multiple database platforms for multiple uses. Also, your needs might change over time. The sports car you had when you were single is less than optimal for your current family of five. But, if you could, wouldn’t you want both the sports car and the minivan? This is why we often see both database models in use within a single company.

Categories: MySQL

Percona XtraDB Cluster 5.6.34-26.19 is now available

MySQL Performance Blog - Wed, 2016-12-14 18:53

Percona announces the release of Percona XtraDB Cluster 5.6.34-26.19 on December 14, 2016. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.34-26.19 is now the current release, based on the following:

All Percona software is open-source and free. Details of this release can be found in the 5.6.34-26.19 milestone on Launchpad.

Deprecated

  • The following encryption modes are now deprecated:
    • encrypt=1
    • encrypt=2
    • encrypt=3

The default is encrypt=0 with encryption disabled. The recommended mode now is the new encrypt=4, which uses SSL files generated by MySQL.

For more information, see Encrypting PXC Traffic.

New Features

  • Added encrypt=4 mode for SST encryption that uses SSL files generated by MySQL. Modes 1, 2, and 3 are now deprecated.

Fixed Bugs

  • Optimized IST donor selection logic to avoid SST. Child processes are now cleaned-up and node state is resumed if SST fails.
  • Added init.ok to the list of files that do not get removed during SST.
  • Fixed error with ASIO library not acknowledging an EPOLLIN event when building Galera.
  • Fixed stalling of DML workload on slave node caused by FLUSH TABLE executed on the master.
    For more information, see 1629296.
  • Fixed super_read_only to not apply to Galera replication applier.
    For more information, see 1634295.
  • Redirected netcat output to stdout to avoid it in the log.
    For more information, see 1625968.
  • Enabled replication of ALTER USER statements.
    For more information, see 1376269.
  • Changed the wsrep_max_ws_rows variable to ignore non-replicated write-sets generated by DML action on temporary tables (explicit or implicit).
    For more information, see 1638138.
  • Fixed SST to fail with an error if SSL is not supported by socat, instead of switching to unencrypted mode.
  • Fixed SST with SSL to auto-generate a 2048-bit dhparams file for versions of socat before 1.7.3. These older versions use 512-bit dhparams file by default that gets rejected by newer clients with dh key too small error.
  • PXC-731: Changed the wsrep_cluster_name variable to read-only, because changing it dynamically leads to high overhead.
    For more information, see 1620439.
  • PXC-732: Improved error message when any of the SSL files required for SST are missing.
  • PXC-735: Fixed SST to fail with an error when netcat is used (transferfmt=nc) with SSL encryption (encrypt set to 2, 3 or 4), instead of silently switching to unencrypted mode.
  • Fixed faulty switch case that caused the cluster to stall when the repl.commit_order variable was set to 2 (LOCAL_OOOC mode that should allow out-of-order committing for local transactions).

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Categories: MySQL

Webinar Wednesday 12/14: MongoDB System Tuning Best Practices

MySQL Performance Blog - Tue, 2016-12-13 17:57

Please join Percona Senior Technical Operations Architect Tim Vaillancourt on Wednesday December 14, at 10:00 am PST/ 1:00pm EST (UTC-8) as he presents MongoDB System Tuning Best Practices.

People give much love to optimizing  document design, provisioning, and even selecting an engine in MongoDB. They give little attention to tuning Linux to handle databases efficiently. In this session we will talk about what schedulers you should use, what network settings, what memory and cache settings, what file systems, should you use NUMA and Huge Pages, and more.

This will be a data-packed webinar for the advanced user, but still accessible by the budding systems admin type that wants to learn more about system internals.

Register for this webinar here.

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB, with a goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming – combined with experience tuning systems from the hard disk all the way up to the end-user – Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim lives in Amsterdam, NL and enjoys traveling, coding and music. Prior to Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Prior to moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

Categories: MySQL

MongoDB 3.4: Facet Aggregation Features and SERVER-27395 Mongod Crash

MySQL Performance Blog - Tue, 2016-12-13 17:41

This blog discusses MongoDB 3.4 GA facet aggregation features and the SERVER-27395 mongod crash bug.

As you may have heard, in late November MongoDB 3.4 GA was released. One feature that stuck out for me, a Lucene enthusiast, was the addition of powerful grouping and faceted search features in MongoDB 3.4.

Faceted Search

For those unfamiliar with the term faceted search, this is a way of grouping data using one or many different grouping criteria over a large result. It’s a tough idea to define specifically, but the aim of a faceted search is generally to show the most relevant information possible to the user and allow them to further filter what is usually a very large result of a given search criteria.

The most common day-to-day example of a faceted search is performing a search for a product on an e-commerce website such as eBay, Amazon, etc. As e-commerce sites commonly have the challenge of supplying a massive range of items to users that often provide limited search criteria, it is rare to see an online store today that does not have many “filters” in the right-side of their website to further narrow down a given product search.

Here is an example of me searching the term “mongodb” on a popular auction site:

While this may seem like a specific search to some, at large volume this search term might not immediately show something relevant to some users. What if the user only wants a “used” copy of a MongoDB book from a specific year? What if the user was looking for a MongoDB sticker and not a book at all? This is why you’ll often see filters alongside search results (which we can call “facets”) showing item groupings such as different store departments, different item conditions (such as used/new), publication years, price ranges, review ratings, etc.

In some traditional databases, to get this kind of result we might need to issue many different expensive “GROUP BY” queries that could be painful for a database to process. Each of these queries would independently scan data, even if all queries are summarizing the same “result set.” This is very inefficient. A faceted search offers powerful groupings using a single operation on result data.

When I made my search for “mongodb”, under a faceted search model the page of items (in this case MongoDB books) and all the different groupings of departments, condition, rrice, etc., are performed as a single grouping operation in one “pass” of the data. The result from a faceted search contain items matching the search criteria AND the grouping results of the matched items as a single response.

Traditionally faceted searches were mostly limited to Lucene-based search engines such as Apache Solr, Elasticsearch and various closed-source solutions. With the release of MongoDB 3.4, this has changed!

The new Aggregation Pipeline features named $bucket and $bucketAuto provide functionality for processing groupings of result data in a single aggregation stage, and $facet allows the processing of many aggregation pipelines on the same result for even more complex cases.

New Facetting Features

MongoDB 3.4 introduces these new Aggregation Pipeline operators, allowing some advanced grouping and faceted-search-like features:

  1. $facet – Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
  2. $bucket – Categorizes incoming documents into groups, called buckets, based on a specified expression and bucket boundaries.
  3. $bucketAuto – Similar to $bucket, however bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.

As a very basic example, let’s consider this collection of store items:

> db.items.find() { "_id" : ObjectId("58502ade9a49537a011226fb"), "name" : "scotch", "price_usd" : 90, "department" : "food and drinks" } { "_id" : ObjectId("58502ade9a49537a011226fc"), "name" : "wallet", "price_usd" : 95, "department" : "clothing" } { "_id" : ObjectId("58502ade9a49537a011226fd"), "name" : "watch", "price_usd" : 900, "department" : "clothing" } { "_id" : ObjectId("58502ade9a49537a011226fe"), "name" : "flashlight", "price_usd" : 9, "department" : "hardware" }

From this example data, I’d like to gather a count of items in buckets by price (field ‘price_usd’):

  1. $0.99 to $9.99
  2. $9.99 to $99.99
  3. $99.99 to $999.99

For each price-bucket, I would also like a list of unique “department” names for the matches. Here is how I would do this with $bucket (and the result):

> db.items.aggregate([ ... { $bucket: { ... groupBy: "$price_usd", ... boundaries: [ 0.99, 9.99, 99.99, 999.99 ], ... output: { ... count: { $sum: 1 }, ... departments: { $addToSet: "$department" } ... } ... } } ... ]) { "_id" : 0.99, "count" : 1, "departments" : [ "hardware" ] } { "_id" : 9.99, "count" : 2, "departments" : [ "clothing", "food and drinks" ] } { "_id" : 99.99, "count" : 1, "departments" : [ "clothing" ] }

If you wanted to do something more complex, you have the flexibility of either making the $bucket stage more complex or you can even chain multiple stages together with $facet!

Mongod Crash: SERVER-27395

As I mentioned in my explanation of faceted search, it is a very complex/advanced feature that – due to the implementation challenges – is bound to have some bugs and inefficiencies.

During the evaluation of these new features, I noticed a very serious issue: I was able to crash the entire MongoDB 3.4.0 database instance using the $bucketAuto feature in combination with an $addToSet accumulator in the output definition. This is very serious!

This the example output from my issue reproduction script, responsible for sending the $bucketAuto query to the mongo instance and then checking if it crashed:

$ bash -x ./run.sh + js='db.tweets.aggregate([ { $bucketAuto: { groupBy: "$user.location", buckets: 1, output: { count: { $sum: 1 }, location: { $addToSet: "$user.location" } } } } ])' + echo '### Running crashing $bucketAuto .aggregate() query' ### Running crashing $bucketAuto .aggregate() query + /opt/mongodb-linux-x86_64-3.4.0/bin/mongo --port=27017 '--eval=db.tweets.aggregate([ { $bucketAuto: { groupBy: "$user.location", buckets: 1, output: { count: { $sum: 1 }, location: { $addToSet: "$user.location" } } } } ])' test MongoDB shell version v3.4.0 connecting to: mongodb://127.0.0.1:27017/test MongoDB server version: 3.4.0 2016-12-13T12:59:10.066+0100 E QUERY [main] Error: error doing query: failed: network error while attempting to run command 'aggregate' on host '127.0.0.1:27017' : DB.prototype.runCommand@src/mongo/shell/db.js:132:1 DB.prototype.runReadCommand@src/mongo/shell/db.js:109:16 DBCollection.prototype._dbReadCommand@src/mongo/shell/collection.js:183:12 DBCollection.prototype.aggregate/doAgg<@src/mongo/shell/collection.js:1298:30 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1301:15 @(shell eval):1:1 + sleep 1 ++ tail -1 mongod.log + '[' '----- END BACKTRACE -----' = '----- END BACKTRACE -----' ']' + echo '### Crashed mongod 3.4.0!' ### Crashed mongod 3.4.0!

As you can see above, a full server crash occurred in my test when using $bucketAuto with $addToSet accumulators. The “network error” is caused by the MongoDB shell losing connection to the now-crashed server.

The mongod log file reports the following lines before the crash (and backtrace):

2016-12-13T12:59:10.048+0100 F - [conn2] Invalid operation at address: 0x7f1d43ba990a 2016-12-13T12:59:10.061+0100 F - [conn2] Got signal: 8 (Floating point exception). 0x7f1d443e0f91 0x7f1d443e0089 0x7f1d443e06f6 0x7f1d42153100 0x7f1d43ba990a 0x7f1d43ba91df 0x7f1d43bc8d2e 0x7f1d43bcae3a 0x7f1d43bce255 0x7f1d43ca4492 0x7f1d43a3b0a5 0x7f1d43a3b29c 0x7f1d43a3b893 0x7f1d43d3c31a 0x7f1d43d3cc3b 0x7f1d4398447b 0x7f1d439859a9 0x7f1d438feb2b 0x7f1d438ffd70 0x7f1d43f12afd 0x7f1d43b1c54d 0x7f1d4371082d 0x7f1d4371116d 0x7f1d4435ec22 0x7f1d4214bdc5 0x7f1d41e78ced

This has been reported as the ticket SERVER-27395, and exists in MongoDB 3.4.0. Please see the ticket for more details, updates and a full issue reproduction: https://jira.mongodb.org/browse/SERVER-27395. If this issue is important to you, please vote for this issue at the ticket URL.

This highlights the importance of testing new features with your exact application usage pattern, especially during a major version release such as MongoDB 3.4.0. With all the new exciting ways one can aggregate data in MongoDB 3.4.0, and the infinite ways to stitch those features together in a pipeline, there are bound to be some cases where the code needs improvement.

Nonetheless, I am very excited to see the addition of these powerful new features and I look forward to them maturing.

Links

  1. https://docs.mongodb.com/manual/reference/operator/aggregation/facet/
  2. https://docs.mongodb.com/manual/reference/operator/aggregation/bucket/
  3. https://docs.mongodb.com/manual/reference/operator/aggregation/bucketAuto/
  4. https://docs.mongodb.com/manual/release-notes/3.4/#aggregation
  5. https://docs.mongodb.com/v3.4/core/aggregation-pipeline/
  6. https://en.wikipedia.org/wiki/Faceted_search
  7. https://jira.mongodb.org/browse/SERVER-27395
Categories: MySQL

Database Solutions Engineer FAQs

MySQL Performance Blog - Mon, 2016-12-12 23:32

In this blog series, I will discuss common questions I receive as a database Solutions Engineer at Percona. In this role, I speak with a wide array of MySQL and MongoDB users responsible for both extremely large and complex environments to smaller single-server environments. Typically we are contacted when the customer is about to embark on an architecture migration or redesign, or they have performance issues in their production environment. The purpose of this blog is to put together a list of common questions I field while speaking with active MySQL and MongoDB users.

We are considering a migration to AWS. What solution is right for us: EC2, RDS, or Aurora?

We get this question a lot. Moving to AWS is a hot trend. Fellow Solution Engineer Rick Golba wrote a blog post dedicated to the specifics of each of these Amazon offerings, as well as the freedom you give up moving down the tiers. This is the primary concern when considering these cloud-based solutions. With Aurora, you give up a large amount of control of your database environment. With an EC2 deployment, you can keep most of it. However, there are other considerations to make.

Datasize

The largest benefit to choosing one of these Amazon offerings is reducing the cost associated with managing a physical database environment. This does not eliminate the necessary task of right-sizing your environment. Doing so can make a huge difference in the yearly costs associated with acquiring a large Amazon instance. This can also open up options when it comes to choosing between EC2, RDS, and Aurora as there are certain limitations and restrictions with regards to tablesize and total datasize. Here is a quick reference:

  • Amazon RDS – 6 TB*
  • Aurora – 64 TB**
  • EC2 – Depends***

* Max table size from Amazon’s documentation.

** Max size of Aurora cluster volume.

** There are too many options to list one.

There are several strategies when it comes to right-sizing your environment. The first and easiest way is to archive old, unused data. Percona Toolkit offers a tool that can assist with this process called pt-archiver. This tool allows you to archive unused MySQL rows into other tables or a file. The documentation for pt-archiver is here. Another strategy used by large organizations is to employ different databases for different tasks. The advantage of this strategy is that you can use the right database for a specific use-case. The disadvantage is the overhead of having experts to manage each of this varying database types and instances. This requires a significant amount of engineering effort that is not suitable for smaller deployments.

Some people might ask, “Why right-size my environment?” Most of the time, all of that data is not needed in a production database. There is likely data that is never touched taking a significant amount of space. When you lower your datasize, more Amazon options become possible. In addition to this, the operational tasks associated with managing your database environment become easier. If you’ve managed to turn a bloated table into a more manageable one, you might see increased performance as well. This reduces costs when it comes to a cloud migration.

Compatibility

Amazon is compatible with most MySQL deployments, but there are some exceptions. Amazon Aurora is currently compatible with MySQL 5.6. If you are interested in MySQL 5.7 features such as storing data with the JSON datatype, then Aurora might not be the right option. For a full list of MySQL 5.7 features, see the MySQL documentation. Amazon RDS and EC2 are both compatible with MySQL 5.7. One limitation of RDS is that it is not compatible with MongoDB. Amazon does offer its own cloud-hosted NoSQL solution called DynamoDB, but migration is not as seamless as it is with Amazon’s MySQL offerings. The best option for migrating to the cloud with MongoDB is an EC2 instance.

Summary

Percona has assisted with Amazon optimizations and migrations for many customers through our consulting services. Our architects have in-depth knowledge of high-performing MySQL deployments in the cloud and can assist with both your design and implementation/migration. One example of this success is Wattpad. Through performance optimizations recommended by Percona, Wattpad was able to reduce the size of their Amazon instance and save money over the course of the year.

Can we replace our enterprise monitoring solution with Percona Monitoring and Management (PMM)?

As with most answers in the database world, the short answer is “it depends.” Percona Monitoring and Management (PMM) offers a robust array of monitoring features for your database environment and is perfectly capable of replacing certain features of enterprise-grade MySQL and MongoDB monitoring platforms. Here is a short list of what PMM brings to the table:

  • Free and Open Source. Our CEO Peter Zaitsev is dedicated to keeping this true. PMM uses existing open-source elements and integrates some of Percona’s own plugins to form a complete, robust monitoring solution.
  • MongoDB Integration. If you have both MySQL and MongoDB deployments, you can see metrics and query analysis for both in one place.
  • Remotely Monitor MySQL in the CloudPMM is compatible with RDS.
  • Visual Query Analysis. Quickly identify problem queries.
  • Fix and Find Expensive Queries. Analyze expensive queries without needing scripts or command line tools.
  • InnoDB Monitoring. Get in-depth stats on InnoDB metrics.
  • Disk Monitoring. Be aware of system level metrics in addition to MySQL and MongoDB metrics.
  • Cluster Monitor. The recent addition of Orchestrator to PMM added this functionality.
  • Replication Dashboard. Orchestrator can also show the status of replication in an intuitive GUI.

If the list above satisfies your monitoring needs, then you should definitely be using PMM. Our development team is actively working to enhance this product and appreciates input from the community using this solution. The PMM forums are a great place to ask questions or offer feedback/suggestions.

Is moving to a synchronous replication a solution for us?

At first glance, a synchronous replication solution seems to solve all of the limitations that come with a standard MySQL deployment. It brings with it loads of great features like high availability multi-master nodes, each capable of handling writes and read scaling. However, there are several things to consider when answering this question.

Will a simpler solution meet your needs?

One of Percona’s Technical Account Managers, Michael Patrick, wrote a fantastic blog concerning choosing an HA solution. Typically the reason for moving to a clustered solution is for high-availability. If you’ve been bit by downtime due to a failed master and a slow transition to a slave, moving to a cluster could be a knee-jerk reaction. However, solutions like MHA or MySQL Orchestrator might ease these pains sufficiently while adding little complexity to the environment.

Is your application and database design compatible with a clustered solution?

You must make some application-based considerations when moving to a clustered solution. One consideration is storage engine limitations with clustered solutions. Percona XtraDB Cluster and MariaDB Cluster both require InnoDB. MySQL Cluster requires the NDB storage engine. By committing to a clustered solution, other storage engine options become unavailable.

Another application consideration is how clustered solutions handle synchronous write set replication. If your application has write hot-spots, deadlocks will occur given simultaneous write transactions. There are solutions to dealing with these, such as re-engineering database structure to remove the hotspot or allowing the application layer to retry these transactions. If neither of these are an option, a clustered solution might not fit your environment.

Is your database spread across multiple geographic regions?

You can deploy cluster solutions across WAN environments. However, these solutions contain latency issues. If your application is capable of enduring longer flight times due to a cluster being spread across multiple geographic regions, this will not be a problem. However, if this delay is not tolerable, a WAN cluster might not be the right solution. There are multiple strategies for alleviating this pain-point when it comes to deploying a cluster across WAN environments, a Webinar given by Percona XtraDB Cluster’s Lead Software Engineer, Krunal Bauskar, covers this topic. One example is asynchronous replication between geographic regions with clusters in each. The benefit of this is that the cluster in each geographic region will have eliminated the WAN latency delay. The downside of this is the addition of many more nodes (likely three for each data center). This solution also complicates the environment.

Closing Thoughts

I plan to continue this blog series with more frequently asked questions that I receive when talking to MySQL and MongoDB users. If you would like to speak with an account representative (or me!) to see how Percona can help you meet your database performance needs, feel free to reach out.

Categories: MySQL
Syndicate content