MySQL

Group Replication: Shipped Too Early

MySQL Performance Blog - Thu, 2017-02-16 00:02

This blog post is my overview of Group Replication technology.

With Oracle clearly entering the “open source high availability solutions” arena with the release of their brand new Group Replication solution, I believe it is time to review the quality of the first GA (production ready) release.

TL;DR: Having examined the technology, it is my conclusion that Oracle seems to have released the GA version of Group Replication too early. While the product is definitely “working prototype” quality, the release seems rushed and unfinished. I found a significant number of issues, and I would personally not recommend it for production use.

It is obvious that Oracle is trying hard to ship technology to compete with Percona XtraDB Cluster, which is probably why they rushed to claim Group Replication GA quality.

If you’re all set to follow along and test Group Replication yourself, simplify the initial setup by using this Docker image. We can review some of the issues you might face together.

For the record, I tested the version based on MySQL 5.7.17 release.

No automatic provisioning

First off, the first thing you’ll find is there is NO way to automatically setup of a new node.

If you need to setup new node or recover an existing node from a fatal failure, you’ll need to manually provision the slave.

Of course, you can clone a slave using Percona XtraBackup or LVM by employing some self-developed scripts. But given the high availability nature of the product, one would expect Group Replication to automatically re-provision any failed node.

Bug: stale reads on nodes

Please see this bug:

One line summary: while any secondary nodes are “catching up” to whatever happened on a first node (it takes time to apply changes on secondary nodes), reads on a secondary node could return stale data (as shown in the bug report).

This behavior brings us back to the traditional asynchronous replication slave behavior (i.e., Group Replication’s predecessor).

It also contradicts the Group Replication documentation, which states: “There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time.” (See https://dev.mysql.com/doc/refman/5.7/en/group-replication.html.)

I might also mention here that Percona XtraDB Cluster prevents stale reads (see https://www.percona.com/doc/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait).

Bug: nodes become unusable after a big transaction, refusing to execute further transactions

There are two related bugs:

One line summary: after running a big transaction, any secondary nodes become unusable and refuse to perform any further transactions.

Obscure error messages

It is not uncommon to see cryptic error messages while testing Group Replication. For example:

mysql> commit; ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.

This is fairly useless and provides little help until I check the mysqld error log. The log provides a little bit more information:

2017-02-09T02:05:36.996776Z 18 [ERROR] Plugin group_replication reported: '[GCS] Gcs_packet's payload is too big. Only the packets smaller than 2113929216 bytes can be compressed.'

Discussion:

The items highlighted above might not seem too bad at first, and you could assume that your workload won’t be affected. However, stale reads and node dysfunctions basically prevent me from running a more comprehensive evaluation.

My recommendation:

If you care about your data, then I recommend not using Group Replication in production. Currently, it looks like it might cause plenty of headaches, and it is easy to get inconsistent results.

For the moment, Group Replication appears an advanced – but broken – traditional MySQL asynchronous replication.

I understand Oracle’s dilemma. Usually people are hesitant to test a product that is not GA. So in order to get feedback from users, Oracle needs to push the product to GA. Oracle must absolutely solve the issues above during future QA cycles.

Categories: MySQL

Docker Images for Percona Server for MySQL Group Replication

MySQL Performance Blog - Wed, 2017-02-15 16:27

In this blog post, we’ll point to a new Docker image for Percona Server for MySQL Group Replication.

Our most recent release of Percona Server for MySQL (Percona Server for MySQL 5.7.17) comes with Group Replication plugins. Unfortunately, since this technology is very new, it requires some fairly complicated steps to setup and get running. To help with that process, I’ve prepare Docker images that simplify its setup procedures.

You can find the image here: https://hub.docker.com/r/perconalab/pgr-57/.

To start the first node (bootstrap the group):

docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 perconalab/pgr-57

To add nodes into the group after:

docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 -e CLUSTER_JOIN=CONTAINER_ID_FROM_THE_FIRST_STEP perconalab/pgr-57

You can also get a full script that starts “N” number of nodes, here: https://github.com/Percona-Lab/percona-docker/blob/master/pgr-57/start_node.sh

 

Categories: MySQL

Percona Live Open Source Database Conference 2017 Crash Courses: MySQL and MongoDB!

MySQL Performance Blog - Tue, 2017-02-14 20:23

The Percona Live Open Source Database Conference 2017 will once again host crash courses on MySQL and MongoDB. Read below to get an outstanding discount on either the MySQL or MongoDB crash course (or both).

The database community constantly tells us how hard it is to find someone with MySQL and MongoDB DBA skills who can help with the day-to-day management of their databases. This is especially difficult when companies don’t have a full-time requirement for a DBA. Developers, system administrators and IT staff spend too much time trying to solve basic database problems that keep them from doing their other job duties. Eventually, the little problems or performance inefficiencies that start to pile up lead to big problems.

In answer to this growing need, Percona Live is once again hosting crash courses for developers, systems administrators and other technical resources. A crash course is a one-day training session on either MySQL 101 or MongoDB 101.

Don’t let the name fool you: these courses are led by Percona database experts who will show you the fundamentals of MySQL or MongoDB tools and techniques.

And it’s not just for DBAs: developers are encouraged to attend to hone their database skills.

Below is a list of the topics covered in each course this year:

MySQL 101 Topics MongoDB 101 Topics

Attendees will return ready to quickly and correctly take care of the day-to-day and week-to-week management of your MySQL or MongoDB environment.

The schedule and cost for the 101 courses (without a full-conference pass) are:

  • MySQL 101: Tuesday, April 25 ($400)
  • MongoDB 101: Wednesday, April 26 ($400)
  • Both MySQL and MongoDB 101 sessions ($700)

(Tickets to the 101 sessions do not grant access to the main Percona Live breakout sessions. Full Percona Live conferences passes grant admission to the 101 sessions. 101 Crash Course attendees will have full access to Percona Live keynote speakers the exhibit hall and receptions.)

As a special promo, the first 101 people to purchase the single 101 talks receive a $299.00 discount off the ticket price! Each session only costs $101! Get both sessions for a mere $202 and save $498.00! Register now using the following codes for your discount:

  • 101: $299 off of either the MySQL or MongoDB tickets
  • 202: $498 off of the combined MySQL/MongoDB ticket

Click here to register.

Register for Percona Live 2017 now! Advanced Registration lasts until March 5, 2017. Percona Live is a very popular conference: this year’s Percona Live Europe sold out, and we’re looking to do the same for Percona Live 2017. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.

Percona Live 2017 sponsorship opportunities are available now. Click here to find out how to sponsor.

Categories: MySQL

Percona Server for MongoDB 3.4 Product Bundle Release is a Valentine to the Open Source Community

MySQL Performance Blog - Tue, 2017-02-14 15:46

Percona today announced a Percona Server for MongoDB 3.4 solution bundle of updated products. This release enables any organization to create a robust, secure database environment that can be adapted to changing business requirements.

Percona Server for MongoDB 3.4, Percona Monitoring and Management 1.1, and Percona Toolkit 3.0 offer more features and benefits, with enhancements for both MongoDB® and MySQL® database environments. When these Percona products are used together, organizations gain all the cost and agility benefits provided by free, proven open source software that delivers all the latest MongoDB Community Edition 3.4 features, additional Enterprise features, and a greater choice of storage engines. Along with improved insight into the database environment, the solution provides enhanced control options for optimizing a wider range of database workloads with greater reliability and security.

The solution will be generally available the week of Feb. 20.

New Features and Benefits Summary

Percona Server for MongoDB 3.4

  • All the features of MongoDB Community Edition 3.4, which provides an open source, fully compatible, drop-in replacement
  • Integrated, pluggable authentication with LDAP to provide a centralized enterprise authentication service
  • Open-source auditing for visibility into user and process actions in the database, with the ability to redact sensitive information (such as usernames and IP addresses) from log files
  • Hot backups for the WiredTiger engine protect against data loss in the case of a crash or disaster, without impacting performance
  • Two storage engine options not supported by MongoDB Community Edition 3.4:

    • MongoRocks, the RocksDB-powered storage engine, designed for demanding, high-volume data workloads such as in IoT applications, on-premises or in the cloud
    • Percona Memory Engine is ideal for in-memory computing and other applications demanding very low latency workloads

Percona Monitoring and Management 1.1

  • Support for MongoDB and Percona Server for MongoDB
  • Graphical dashboard information for WiredTiger, MongoRocks and Percona Memory Engine

Percona Toolkit 3.0

  • Two new tools for MongoDB:
    • pt-mongodb-summary (the equivalent of pt-mysql-summary) provides a quick, at-a-glance overview of a MongoDB and Percona Server for MongoDB instance.
    • pt-mongodb-query-digest (the equivalent of pt-query-digest for MySQL) offers a query review for troubleshooting.

For more information, see Percona’s press release.

Categories: MySQL

ClickHouse: New Open Source Columnar Database

MySQL Performance Blog - Mon, 2017-02-13 23:24

For this blog post, I’ve decided to try ClickHouse: an open source column-oriented database management system developed by Yandex (it currently powers Yandex.Metrica, the world’s second-largest web analytics platform).

In my previous set of posts, I tested Apache Spark for big data analysis and used Wikipedia page statistics as a data source. I’ve used the same data as in the Apache Spark blog post: Wikipedia Page Counts. This allows me to compare ClickHouse’s performance to Spark’s.

I’ve spent some time testing ClickHouse for relatively large volumes of data (1.2Tb uncompressed). Here is a list of ClickHouse advantages and disadvantages that I saw:

ClickHouse advantages

  • Parallel processing for single query (utilizing multiple cores)
  • Distributed processing on multiple servers
  • Very fast scans (see benchmarks below) that can be used for real-time queries
  • Column storage is great for working with “wide” / “denormalized” tables (many columns)
  • Good compression
  • SQL support (with limitations)
  • Good set of functions, including support for approximated calculations
  • Different storage engines (disk storage format)
  • Great for structural log/event data as well as time series data (engine MergeTree requires date field)
  • Index support (primary key only, not all storage engines)
  • Nice command line interface with user-friendly progress bar and formatting

Here is a full list of ClickHouse features

ClickHouse disadvantages

  • No real delete/update support, and no transactions (same as Spark and most of the big data systems)
  • No secondary keys (same as Spark and most of the big data systems)
  • Own protocol (no MySQL protocol support)
  • Limited SQL support, and the joins implementation is different. If you are migrating from MySQL or Spark, you will probably have to re-write all queries with joins.
  • No window functions

Full list of ClickHouse limitations

Group by: in-memory vs. on-disk

Running out of memory is one of the potential problems you may encounter when working with large datasets in ClickHouse:

SELECT min(toMonth(date)), max(toMonth(date)), path, count(*), sum(hits), sum(hits) / count(*) AS hit_ratio FROM wikistat WHERE (project = 'en') GROUP BY path ORDER BY hit_ratio DESC LIMIT 10 ↖ Progress: 1.83 billion rows, 85.31 GB (68.80 million rows/s., 3.21 GB/s.) ██████████▋ 6%Received exception from server: Code: 241. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 1048576 bytes), maximum: 9.31 GiB: (while reading column hits):

By default, ClickHouse limits the amount of memory for group by (it uses a hash table for group by). This is easily fixed – if you have free memory, increase this parameter:

SET max_memory_usage = 128000000000; #128G

If you don’t have that much memory available, ClickHouse can “spill” data to disk by setting this:

set max_bytes_before_external_group_by=20000000000; #20G set max_memory_usage=40000000000; #40G

According to the documentation, if you need to use max_bytes_before_external_group_by it is recommended to set max_memory_usage to be ~2x of the size of max_bytes_before_external_group_by.

(The reason for this is that the aggregation is performed in two phases: (1) reading and building an intermediate data, and (2) merging the intermediate data. The spill to disk can only happen during the first phase. If there won’t be spill, ClickHouse might need the same amount of RAM for stage 1 and 2.)

Benchmarks: ClickHouse vs. Spark

Both ClickHouse and Spark can be distributed. However, for the purpose of this test I’ve run a single node for both ClickHouse and Spark. The results are quite impressive.

Benchmark summary

 Size / compression  Spark v. 2.0.2  ClickHouse  Data storage format  Parquet, compressed: snappy   Internal storage, compressed   Size (uncompressed: 1.2TB)   395G  212G

 

 Test  Spark v. 2.0.2  ClickHouse   Diff  Query 1: count (warm)  7.37 sec (no disk IO)  6.61 sec   ~same  Query 2: simple group (warm)   792.55 sec (no disk IO)   37.45 sec  21x better  Query 3: complex group by   2522.9 sec  398.55 sec  6.3x better

 

ClickHouse vs. MySQL

I wanted to see how ClickHouse compared to MySQL. Obviously, we can’t compare some workloads. For example:

  • Storing terabytes of data and querying (“crunching” would be a better word here) data without an index. It would take weeks (or even months) to load data and build the indexes. That is a much more suitable workload for ClickHouse or Spark.
  • Real-time updates / OLTP. ClickHouse does not support real-time updates / deletes.

Usually big data systems provide us with real-time queries. Systems based on map/reduce (i.e., Hive on top of HDFS) are just too slow for real-time queries, as it takes a long time to initialize the map/reduce job and send the code to all nodes.

Potentially, you can use ClickHouse for real-time queries. It does not support secondary indexes, however. This means it will probably scan lots of rows, but it can do it very quickly.

To do this test, I’m using the data from the Percona Monitoring and Management system. The table I’m using has 150 columns, so it is good for column storage. The size in MySQL is ~250G:

mysql> show table status like 'query_class_metrics'G *************************** 1. row *************************** Name: query_class_metrics Engine: InnoDB Version: 10 Row_format: Compact Rows: 364184844 Avg_row_length: 599 Data_length: 218191888384 Max_data_length: 0 Index_length: 18590056448 Data_free: 6291456 Auto_increment: 416994305

Scanning the whole table is significantly faster in ClickHouse. Retrieving just ten rows by key is faster in MySQL (especially from memory).

But what if we only need to scan limited amount of rows and do a group by? In this case, ClickHouse may be faster. Here is the example (real query used to create sparklines):

MySQL

SELECT (1480888800 - UNIX_TIMESTAMP(start_ts)) / 11520 as point, FROM_UNIXTIME(1480888800 - (SELECT point) * 11520) AS ts, COALESCE(SUM(query_count), 0) / 11520 AS query_count_per_sec, COALESCE(SUM(Query_time_sum), 0) / 11520 AS query_time_sum_per_sec, COALESCE(SUM(Lock_time_sum), 0) / 11520 AS lock_time_sum_per_sec, COALESCE(SUM(Rows_sent_sum), 0) / 11520 AS rows_sent_sum_per_sec, COALESCE(SUM(Rows_examined_sum), 0) / 11520 AS rows_examined_sum_per_sec FROM query_class_metrics WHERE query_class_id = 7 AND instance_id = 1259 AND (start_ts >= '2014-11-27 00:00:00' AND start_ts < '2014-12-05 00:00:00') GROUP BY point; ... 61 rows in set (0.10 sec) # Query_time: 0.101203 Lock_time: 0.000407 Rows_sent: 61 Rows_examined: 11639 Rows_affected: 0 explain SELECT ... *************************** 1. row *************************** id: 1 select_type: PRIMARY table: query_class_metrics partitions: NULL type: range possible_keys: agent_class_ts,agent_ts key: agent_class_ts key_len: 12 ref: NULL rows: 21686 filtered: 100.00 Extra: Using index condition; Using temporary; Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: NULL partitions: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: NULL Extra: No tables used 2 rows in set, 2 warnings (0.00 sec)

It is relatively fast.

ClickHouse (some functions are different, so we will have to rewrite the query):

SELECT intDiv(1480888800 - toRelativeSecondNum(start_ts), 11520) AS point, toDateTime(1480888800 - (point * 11520)) AS ts, SUM(query_count) / 11520 AS query_count_per_sec, SUM(Query_time_sum) / 11520 AS query_time_sum_per_sec, SUM(Lock_time_sum) / 11520 AS lock_time_sum_per_sec, SUM(Rows_sent_sum) / 11520 AS rows_sent_sum_per_sec, SUM(Rows_examined_sum) / 11520 AS rows_examined_sum_per_sec, SUM(Rows_affected_sum) / 11520 AS rows_affected_sum_per_sec FROM query_class_metrics WHERE (query_class_id = 7) AND (instance_id = 1259) AND ((start_ts >= '2014-11-27 00:00:00') AND (start_ts < '2014-12-05 00:00:00')) GROUP BY point; 61 rows in set. Elapsed: 0.017 sec. Processed 270.34 thousand rows, 14.06 MB (15.73 million rows/s., 817.98 MB/s.)

As we can see, even though ClickHouse scans more rows (270K vs. 11K – over 20x more) it is faster to execute the ClickHouse query (0.10 seconds in MySQL compared to 0.01 second in ClickHouse). The column store format helps a lot here, as MySQL has to read all 150 columns (stored inside InnoDB pages) and ClickHouse only needs to read seven columns.

Wikipedia trending article of the month

Inspired by the article about finding trending topics using Google Books n-grams data, I decided to implement the same algorithm on top of the Wikipedia page visit statistics data. My goal here is to find the “article trending this month,” which has significantly more visits this month compared to the previous month. As I was implementing the algorithm, I came across another ClickHouse limitation: join syntax is limited. In ClickHouse, you can only do join with the “using” keyword. This means that the fields you’re joining need to have the same name. If the field name is different, we have to use a subquery.

Below is an example.

First, create a temporary table to aggregate the visits per month per page:

CREATE TABLE wikistat_by_month ENGINE = Memory AS SELECT path, mon, sum(hits) / total_hits AS ratio FROM ( SELECT path, hits, toMonth(date) AS mon FROM wikistat WHERE (project = 'en') AND (lower(path) NOT LIKE '%special%') AND (lower(path) NOT LIKE '%page%') AND (lower(path) NOT LIKE '%test%') AND (lower(path) NOT LIKE '%wiki%') AND (lower(path) NOT LIKE '%index.html%') ) AS a ANY INNER JOIN ( SELECT toMonth(date) AS mon, sum(hits) AS total_hits FROM wikistat WHERE (project = 'en') AND (lower(path) NOT LIKE '%special%') AND (lower(path) NOT LIKE '%page%') AND (lower(path) NOT LIKE '%test%') AND (lower(path) NOT LIKE '%wiki%') AND (lower(path) NOT LIKE '%index.html%') GROUP BY toMonth(date) ) AS b USING (mon) GROUP BY path, mon, total_hits ORDER BY ratio DESC Ok. 0 rows in set. Elapsed: 543.607 sec. Processed 53.77 billion rows, 2.57 TB (98.91 million rows/s., 4.73 GB/s.)

Second, calculate the actual list:

SELECT path, mon + 1, a_ratio AS ratio, a_ratio / b_ratio AS increase FROM ( SELECT path, mon, ratio AS a_ratio FROM wikistat_by_month WHERE ratio > 0.0001 ) AS a ALL INNER JOIN ( SELECT path, CAST((mon - 1) AS UInt8) AS mon, ratio AS b_ratio FROM wikistat_by_month WHERE ratio > 0.0001 ) AS b USING (path, mon) WHERE (mon > 0) AND (increase > 2) ORDER BY mon ASC, increase DESC LIMIT 100 ┌─path───────────────────────────────────────────────┬─plus(mon, 1)─┬──────────────────ratio─┬───────────increase─┐ │ Heath_Ledger │ 2 │ 0.0008467223172121601 │ 6.853825241458039 │ │ Cloverfield │ 2 │ 0.0009372609760313347 │ 3.758937474560766 │ │ The_Dark_Knight_(film) │ 2 │ 0.0003508532447770276 │ 2.8858100355450484 │ │ Scientology │ 2 │ 0.0003300109101992719 │ 2.52497180013816 │ │ Barack_Obama │ 3 │ 0.0005786473399980557 │ 2.323409928527576 │ │ Canine_reproduction │ 3 │ 0.0004836300843539438 │ 2.0058985801174662 │ │ Iron_Man │ 6 │ 0.00036261003907049 │ 3.5301196568303888 │ │ Iron_Man_(film) │ 6 │ 0.00035634745198422497 │ 3.3815325090507193 │ │ Grand_Theft_Auto_IV │ 6 │ 0.0004036713142943461 │ 3.2112732008504885 │ │ Indiana_Jones_and_the_Kingdom_of_the_Crystal_Skull │ 6 │ 0.0002856570195547951 │ 2.683443198030021 │ │ Tha_Carter_III │ 7 │ 0.00033954377342889735 │ 2.820114216429247 │ │ EBay │ 7 │ 0.0006575000133427979 │ 2.5483158977946787 │ │ Bebo │ 7 │ 0.0003958340022793501 │ 2.3260912792668162 │ │ Facebook │ 7 │ 0.001683658379576915 │ 2.16460972864883 │ │ Yahoo!_Mail │ 7 │ 0.0002190640575012259 │ 2.1075879062784737 │ │ MySpace │ 7 │ 0.001395608643577507 │ 2.103263660621813 │ │ Gmail │ 7 │ 0.0005449834079575953 │ 2.0675919337716757 │ │ Hotmail │ 7 │ 0.0009126863121737026 │ 2.052471735190232 │ │ Google │ 7 │ 0.000601645849087389 │ 2.0155448612416644 │ │ Barack_Obama │ 7 │ 0.00027336526076130943 │ 2.0031305241832302 │ │ Facebook │ 8 │ 0.0007778115183044431 │ 2.543477658022576 │ │ MySpace │ 8 │ 0.000663544314346641 │ 2.534512981232934 │ │ Two-Face │ 8 │ 0.00026975137404447024 │ 2.4171743959768803 │ │ YouTube │ 8 │ 0.001482456447101451 │ 2.3884527929836152 │ │ Hotmail │ 8 │ 0.00044467667764940547 │ 2.2265750216262954 │ │ The_Dark_Knight_(film) │ 8 │ 0.0010482536106662156 │ 2.190078096294301 │ │ Google │ 8 │ 0.0002985028319919154 │ 2.0028812075734637 │ │ Joe_Biden │ 9 │ 0.00045067411455437264 │ 2.692262662620829 │ │ The_Dark_Knight_(film) │ 9 │ 0.00047863754833213585 │ 2.420864550676665 │ │ Sarah_Palin │ 10 │ 0.0012459220318907518 │ 2.607063205782761 │ │ Barack_Obama │ 12 │ 0.0034487235202817087 │ 15.615409029600414 │ │ George_W._Bush │ 12 │ 0.00042708730873936023 │ 3.6303098900144937 │ │ Fallout_3 │ 12 │ 0.0003568429236849597 │ 2.6193094036745155 │ └────────────────────────────────────────────────────┴──────────────┴────────────────────────┴────────────────────┘ 34 rows in set. Elapsed: 1.062 sec. Processed 1.22 billion rows, 49.03 GB (1.14 billion rows/s., 46.16 GB/s.)

Their response time is really good, considering the amount of data it needed to scan (the first query scanned 2.57 TB of data).

Conclusion

The ClickHouse column-oriented database looks promising for data analytics, as well as for storing and processing structural event data and time series data. ClickHouse can be ~10x faster than Spark for some workloads.

Appendix: Benchmark details

Hardware

  • CPU: 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz (physical = 2, cores = 12, virtual = 24, hyperthreading = yes)
  • Disk: 2 consumer grade SSD in software RAID 0 (mdraid)

Query 1

select count(*) from wikistat

ClickHouse:

:) select count(*) from wikistat; SELECT count(*) FROM wikistat ┌─────count()─┐ │ 26935251789 │ └─────────────┘ 1 rows in set. Elapsed: 6.610 sec. Processed 26.88 billion rows, 53.77 GB (4.07 billion rows/s., 8.13 GB/s.)

Spark:

spark-sql> select count(*) from wikistat; 26935251789 Time taken: 7.369 seconds, Fetched 1 row(s)

Query 2

select count(*), month(dt) as mon from wikistat where year(dt)=2008 and month(dt) between 1 and 10 group by month(dt) order by month(dt);

ClickHouse:

:) select count(*), toMonth(date) as mon from wikistat where toYear(date)=2008 and toMonth(date) between 1 and 10 group by mon; SELECT count(*), toMonth(date) AS mon FROM wikistat WHERE (toYear(date) = 2008) AND ((toMonth(date) >= 1) AND (toMonth(date) <= 10)) GROUP BY mon ┌────count()─┬─mon─┐ │ 2100162604 │ 1 │ │ 1969757069 │ 2 │ │ 2081371530 │ 3 │ │ 2156878512 │ 4 │ │ 2476890621 │ 5 │ │ 2526662896 │ 6 │ │ 2489723244 │ 7 │ │ 2480356358 │ 8 │ │ 2522746544 │ 9 │ │ 2614372352 │ 10 │ └────────────┴─────┘ 10 rows in set. Elapsed: 37.450 sec. Processed 23.37 billion rows, 46.74 GB (623.97 million rows/s., 1.25 GB/s.)

Spark:

spark-sql> select count(*), month(dt) as mon from wikistat where year(dt)=2008 and month(dt) between 1 and 10 group by month(dt) order by month(dt); 2100162604 1 1969757069 2 2081371530 3 2156878512 4 2476890621 5 2526662896 6 2489723244 7 2480356358 8 2522746544 9 2614372352 10 Time taken: 792.552 seconds, Fetched 10 row(s)

Query 3

SELECT path, count(*), sum(hits) AS sum_hits, round(sum(hits) / count(*), 2) AS hit_ratio FROM wikistat WHERE project = 'en' GROUP BY path ORDER BY sum_hits DESC LIMIT 100;

ClickHouse:

:) SELECT :-] path, :-] count(*), :-] sum(hits) AS sum_hits, :-] round(sum(hits) / count(*), 2) AS hit_ratio :-] FROM wikistat :-] WHERE (project = 'en') :-] GROUP BY path :-] ORDER BY sum_hits DESC :-] LIMIT 100; SELECT path, count(*), sum(hits) AS sum_hits, round(sum(hits) / count(*), 2) AS hit_ratio FROM wikistat WHERE project = 'en' GROUP BY path ORDER BY sum_hits DESC LIMIT 100 ┌─path────────────────────────────────────────────────┬─count()─┬───sum_hits─┬─hit_ratio─┐ │ Special:Search │ 44795 │ 4544605711 │ 101453.41 │ │ Main_Page │ 31930 │ 2115896977 │ 66266.74 │ │ Special:Random │ 30159 │ 533830534 │ 17700.54 │ │ Wiki │ 10237 │ 40488416 │ 3955.11 │ │ Special:Watchlist │ 38206 │ 37200069 │ 973.67 │ │ YouTube │ 9960 │ 34349804 │ 3448.78 │ │ Special:Randompage │ 8085 │ 28959624 │ 3581.9 │ │ Special:AutoLogin │ 34413 │ 24436845 │ 710.11 │ │ Facebook │ 7153 │ 18263353 │ 2553.24 │ │ Wikipedia │ 23732 │ 17848385 │ 752.08 │ │ Barack_Obama │ 13832 │ 16965775 │ 1226.56 │ │ index.html │ 6658 │ 16921583 │ 2541.54 │ … 100 rows in set. Elapsed: 398.550 sec. Processed 26.88 billion rows, 1.24 TB (67.45 million rows/s., 3.10 GB/s.)

Spark:

spark-sql> SELECT > path, > count(*), > sum(hits) AS sum_hits, > round(sum(hits) / count(*), 2) AS hit_ratio > FROM wikistat > WHERE (project = 'en') > GROUP BY path > ORDER BY sum_hits DESC > LIMIT 100; ... Time taken: 2522.903 seconds, Fetched 100 row(s)

 

Categories: MySQL

Percona Blog Poll: What Database Engine Are You Using to Store Time Series Data?

MySQL Performance Blog - Fri, 2017-02-10 15:19

Take Percona’s blog poll on what database engine you are using to store time series data.

Time series data is some of the most actionable data available when it comes to analyzing trends and making predictions. Simply put, time series data is data that is indexed not just by value, but by time as well – allowing you to view value changes over time as they occur. Obvious uses include the stock market, web traffic, user behavior, etc.

With the increasing number of smart devices in the Internet of Things (IoT), being able to track data over time is more and more important. With time series data, you can measure and make predictions on things like energy consumption, pH values, water consumption, data from environment-aware machines like smart cars, etc. The sensors used in IoT devices and systems generate huge amounts of time-series data.

How is all of this data collected, segmented and stored? We’d like to hear from you: what database engine are you using to store time series data? Please take a few seconds and answer the following poll. Which are you using? Help the community learn what database engines help solve critical database issues. Please select from one to three database engines as they apply to your environment. Feel free to add comments below if your engine isn’t listed.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Categories: MySQL

Using NVMe Command Line Tools to Check NVMe Flash Health

MySQL Performance Blog - Thu, 2017-02-09 19:50

In this blog post, I’ll look at the types of NVMe flash health information you can get from using the NVMe command line tools.

Checking SATA-based drive health is easy. Whether it’s an SSD or older spinning drive, you can use the smartctl command to get a wealth of information about the device’s performance and health. As an example:

root@blinky:/var/lib/mysql# smartctl -A /dev/sda smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-62-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0  5 Reallocated_Sector_Ct   0x0032   100   100   010    Old_age   Always       -       0  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       41 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2 171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0 172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0 173 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1 174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0 184 End-to-End_Error        0x0032   100   100   000    Old_age   Always       -       0 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0 194 Temperature_Celsius     0x0022   065   059   000    Old_age   Always       -       35 (Min/Max 21/41) 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0 199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0 202 Unknown_SSD_Attribute   0x0030   100   100   001    Old_age   Offline      -       0 206 Unknown_SSD_Attribute   0x000e   100   100   000    Old_age   Always       -       0 246 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       145599393 247 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       4550280 248 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       582524 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033   000   000   000    Pre-fail  Always       -       1260 210 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

While smartctl might not know all vendor-specific smart values, typically you can Google the drive model along with “smart attributes” and find documents like this to get more details.

If you move to newer generation NVMe-based flash storage, smartctl won’t work anymore – at least it doesn’t work for the packages available for Ubuntu 16.04 (what I’m running). It looks like support for NVMe in Smartmontools is coming, and it would be great to get a single tool that supports both  SATA and NVMe flash storage.

In the meantime, you can use the nvme tool available from the nvme-cli package. It provides some basic information for NVMe devices.

To get information about the NVMe devices installed:

root@alex:~# nvme list Node             SN                   Model                                    Version  Namespace Usage                      Format           FW Rev ---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- -------- /dev/nvme0n1     S3EVNCAHB01861F      Samsung SSD 960 PRO 1TB                  1.2      1         689.63  GB /   1.02  TB    512   B +  0 B   1B6QCXP7

To get SMART information:

root@alex:~# nvme smart-log /dev/nvme0 Smart Log for NVME device:nvme0 namespace-id:ffffffff critical_warning                    : 0 temperature                         : 34 C available_spare                     : 100% available_spare_threshold           : 10% percentage_used                     : 0% data_units_read                     : 3,465,389 data_units_written                  : 9,014,689 host_read_commands                  : 89,719,366 host_write_commands                 : 134,671,295 controller_busy_time                : 310 power_cycles                        : 11 power_on_hours                      : 21 unsafe_shutdowns                    : 8 media_errors                        : 0 num_err_log_entries                 : 1 Warning Temperature Time            : 0 Critical Composite Temperature Time : 0 Temperature Sensor 1                : 34 C Temperature Sensor 2                : 47 C Temperature Sensor 3                : 0 C Temperature Sensor 4                : 0 C Temperature Sensor 5                : 0 C Temperature Sensor 6                : 0 C

To get additional SMART information (not all devices support it):

root@ts140i:/home/pz/workloads/1m# nvme smart-log-add /dev/nvme0 Additional Smart Log for NVME device:nvme0 namespace-id:ffffffff key                               normalized raw program_fail_count              : 100%       0 erase_fail_count                : 100%       0 wear_leveling                   :  62%       min: 1114, max: 1161, avg: 1134 end_to_end_error_detection_count: 100%       0 crc_error_count                 : 100%       0 timed_workload_media_wear       : 100%       37.941% timed_workload_host_reads       : 100%       51% timed_workload_timer            : 100%       446008 min thermal_throttle_status         : 100%       0%, cnt: 0 retry_buffer_overflow_count     : 100%       0 pll_lock_loss_count             : 100%       0 nand_bytes_written              : 100%       sectors: 16185227 host_bytes_written              : 100%       sectors: 6405605

Some of this information is self-explanatory, and some of it isn’t. After looking at the NVMe specification document, here is my read on some of the data:

Available Spare. Contains a normalized percentage (0 to 100%) of the remaining spare capacity that is available.

Available Spare Threshold. When the Available Spare capacity falls below the threshold indicated in this field, an asynchronous event completion can occur. The value is indicated as a normalized percentage (0 to 100%).

(Note: I’m not quite sure what the practical meaning of “asynchronous event completion” is, but it looks like something to avoid!)

Percentage Used. Contains a vendor specific estimate of the percentage of the NVM subsystem life used, based on actual usage and the manufacturer’s prediction of NVM life.

(Note: the number can be more than 100% if you’re using storage for longer than its planned life.)

Data Units Read/Data Units Written. This is the number of 512-byte data units that are read/written, but it is measured in an unusual way. The first value corresponds to 1000 of the 512-byte units. So you can multiply this value by 512000 to get value in bytes. It does not include meta-data accesses.

Host Read/Write Commands. The number of commands of the appropriate type issued. Using this value, as well as one below, you can compute the average IO size for “physical” reads and writes.

Controller Busy Time. Time in minutes that the controller was busy servicing commands. This can be used to gauge long-term storage load trends.

Unsafe Shutdowns. The number of times a power loss happened without a shutdown notification being sent. Depending on the NVMe device you’re using, an unsafe shutdown might corrupt user data.

Warning Temperature Time/Critical Temperature Time. The time in minutes a device operated above a warning or critical temperature. It should be zeroes.

Wear_Leveling. This shows how much of the rated cell life was used, as well as the min/max/avg write count for different cells. In this case, it looks like the cells are rated for 1800 writes and about 1100 on average were used

Timed Workload Media Wear. The media wear by the current “workload.” This device allows you to measure some statistics from the time you reset them (called the “workload”) in addition to showing the device lifetime values.

Timed Workload Host Reads. The percentage of IO operations that were reads (since the workload timer was reset).

Thermal Throttle Status. This shows if the device is throttled due to overheating, and when there were throttling events in the past.

Nand Bytes Written. The bytes written to NAND cells. For this device, the measured unit seems to be in 32MB values. It might be different for other devices.

Host Bytes Written. The bytes written to the NVMe storage from the system. This unit also is in 32MB values. The scale of these values is not very important, as they are the most helpful for finding the write amplification of your workload. This ratio is measured in writes to NAND and writes to HOST. For this example, the Write Amplification Factor (WAF) is 16185227 / 6405605 = 2.53  

As you can see, the NVMe command line tools provide a lot of good information for understanding the health and performance of NVMe devices. You don’t need to use vendor specific tools (like isdct).

Categories: MySQL

Percona Server 5.6.35-80.0 is Now Available

MySQL Performance Blog - Wed, 2017-02-08 18:09

Percona announces the release of Percona Server 5.6.35-80.0 on February 8, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.35, and including all the bug fixes in it, Percona Server 5.6.35-80.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.35-80.0 milestone on Launchpad.

New Features:
  • Kill Idle Transactions feature has been re-implemented by setting a connection socket read timeout value instead of periodically scanning the internal InnoDB transaction list. This makes the feature applicable to any transactional storage engine, such as TokuDB, and, in future, MyRocks. This re-implementation is also addressing some existing bugs, including server crashes: #1166744, #1179136, #907719, and #1369373.
Bugs Fixed:
  • Logical row counts for TokuDB tables could get inaccurate over time. Bug fixed #1651844 (#732).
  • Repeated execution of SET STATEMENT ... FOR SELECT FROM view could lead to a server crash. Bug fixed #1392375.
  • CREATE TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1539504 (upstream #83003).
  • If temporary tables from CREATE TABLE ... AS SELECT contained compressed attributes it could lead to a server crash. Bug fixed #1633957.
  • Using the per-query variable statement with subquery temporary tables could cause a memory leak. Bug fixed #1635927.
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • A server could crash if a bitmap write I/O error happens in the background log tracking thread while a FLUSH CHANGED_PAGE_BITMAPS is executing concurrently. Bug fixed #1651656.
  • TokuDB was using the wrong function to calculate free space in data files. Bug fixed #1656022 (#1033).
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • InnoDB index dives did not detect some of the concurrent tree changes, which could return bogus estimates. Bug fixed #1625151 (upstream #84366).
  • INFORMATION_SCHEMA.INNODB_CHANGED_PAGES queries would needlessly read potentially incomplete bitmap data past the needed LSN range. Bug fixed #1625466.
  • Percona Server cmake compiler would always attempt to build RocksDB even if -DWITHOUT_ROCKSDB=1 argument was specified. Bug fixed #1638455.
  • Adding COMPRESSED attributes to InnoDB special tables fields (like mysql.innodb_index_stats and mysql.innodb_table_stats) could lead to server crashes. Bug fixed #1640810.
  • Lack of free pages in the buffer pool is not diagnosed with innodb_empty_free_list_algorithm set to backoff (which is the default). Bug fixed #1657026.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265. Thanks to Dawid Golunski (https://legalhackers.com).
  • Renaming a TokuDB table to a non-existent database with tokudb_dir_per_db enabled would lead to a server crash. Bug fixed #1030.
  • Read Free Replication optimization could not be used for TokuDB partition tables. Bug fixed #1012.

Other bugs fixed: #1486747 (upstream #76872), #1633988, #1638198 (upstream #82823), #1638897, #1646384, #1647530, #1647741, #1651121, #1156772, #1644569, #1644583, #1648389, #1648737, #1650247, #1650256, #1650324, #1650450, #1655587, and #1647723.

Release notes for Percona Server 5.6.35-80.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Categories: MySQL

MySQL super_read_only Bugs

MySQL Performance Blog - Wed, 2017-02-08 15:23

This blog we describe an issue with MySQL 5.7’s super_read_only feature when used alongside with GTID in chained slave instances.

Background

In MySQL 5.7.5 and onward introduced the gtid_executed table in the MySQL database to store every GTID. This allows slave instances to use the GTID feature regardless whether the binlog option is set or not. Here is an example of the rows in the gtid_executed table:

mysql> SELECT * FROM mysql.gtid_executed; +--------------------------------------+----------------+--------------+ | source_uuid | interval_start | interval_end | +--------------------------------------+----------------+--------------+ | 00005730-1111-1111-1111-111111111111 | 1 | 1 | | 00005730-1111-1111-1111-111111111111 | 2 | 2 | | 00005730-1111-1111-1111-111111111111 | 3 | 3 | | 00005730-1111-1111-1111-111111111111 | 4 | 4 | | 00005730-1111-1111-1111-111111111111 | 5 | 5 | | 00005730-1111-1111-1111-111111111111 | 6 | 6 | | 00005730-1111-1111-1111-111111111111 | 7 | 7 | | 00005730-1111-1111-1111-111111111111 | 8 | 8 | | 00005730-1111-1111-1111-111111111111 | 9 | 9 | | 00005730-1111-1111-1111-111111111111 | 10 | 10 | ...

To save space, this table needs to be compressed periodically by replacing GTIDs rows with a single row that represents that interval of identifiers. For example, the above GTIDs can be represented with the following row:

mysql> SELECT * FROM mysql.gtid_executed; +--------------------------------------+----------------+--------------+ | source_uuid | interval_start | interval_end | +--------------------------------------+----------------+--------------+ | 00005730-1111-1111-1111-111111111111 | 1 | 10 | ...

On the other hand, we have the super_read_only feature, if this option is set to ON, MySQL won’t allow any updates – even from users that have SUPER privileges. It was first implemented on WebscaleSQL and later ported to Percona Server 5.6. MySQL mainstream code implemented a similar feature in version 5.7.8.

The Issue [1]

MySQL’s super_read_only feature won’t allow the compression of the mysql.gtid_executed table. If a high number of transactions run on the master instance, it causes the gtid_executed table to grow to a considerable size. Let’s see an example.

I’m going to use the MySQL Sandbox to quickly setup a Master/Slave configuration, and sysbench to simulate a high number of transactions on master instance.

First, set up replication using GTID:

make_replication_sandbox --sandbox_base_port=5730 /opt/mysql/5.7.17 --how_many_nodes=1 --gtid

Next, set up the variables for a chained slave instance:

echo "super_read_only=ON" >> node1/my.sandbox.cnf echo "log_slave_updates=ON" >> node1/my.sandbox.cnf node1/restart

Now, generate a high number of transactions:

sysbench --test=oltp.lua --mysql-socket=/tmp/mysql_sandbox5730.sock --report-interval=1 --oltp-tables-count=100000 --oltp-table-size=100 --max-time=1800 --oltp-read-only=off --max-requests=0 --num-threads=8 --rand-type=uniform --db-driver=mysql --mysql-user=msandbox --mysql-password=msandbox --mysql-db=test prepare

After running sysbench for awhile, we check that the number of rows in the gtid_executed table is increasing faster:

slave1 [localhost] {msandbox} ((none)) &gt; select count(*) from mysql.gtid_executed ; +----------+ | count(*) | +----------+ | 300038 | +----------+ 1 row in set (0.00 sec)

By reviewing SHOW ENGINE INNODB STATUS, we can find a compression thread running and trying to compress the gtid_executed table.

---TRANSACTION 4192571, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 9 lock struct(s), heap size 1136, 1533 row lock(s), undo log entries 1525 MySQL thread id 4, OS thread handle 139671027824384, query id 0 Compressing gtid_executed table

This thread runs and takes ages to complete (or may never complete). It has been reported as #84332.

The Issue [2]

What happens if you have to stop MySQL while the thread compressing the gtid_executed table is running? In this special case, if you run the flush-logs command before or at the same time as mysqladmin shutdown, MySQL will actually stop accepting connections (all new connections hang waiting for the server) and will start to wait for the thread compressing the gtid_executed table to complete its work. Below is an example.

First, execute the flush logs command and obtain ERROR 1290:

$ mysql -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox -e "flush logs ;" ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement

We’ve tried to shutdown the instance, but it hangs:

$ mysqladmin -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox shutdown ^CWarning; Aborted waiting on pid file: 'mysql_sandbox5731.pid' after 175 seconds

This bug has been reported and verified as #84597.

The Workaround

If you already have an established connection to your database with SUPER privileges, you can disable the super_read_only feature dynamically. Once that is done, the pending thread compressing the gtid_executed table completes its work and the shutdown finishes successfully. Below is an example.

We check rows in the gtid_executed table:

$ mysql -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox -e "select count(*) from mysql.gtid_executed ;" +----------+ | count(*) | +----------+ | 300038 | +----------+

We disable the super_read_only feature on an already established connection:

$ mysql> set global super_read_only=OFF ;

We check the rows in the gtid_executed table again, verifying that the compress thread ran successfully.

$ mysql -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox -e "select count(*) from mysql.gtid_executed ;" +----------+ | count(*) | +----------+ | 1 | +----------+

Now we can shutdown the instance without issues:

$ mysqladmin -h 127.0.0.1 -P 5731 -u msandbox -pmsandbox shutdown

You can disable the super_read_only feature before you shutdown the instance to compress the gtid_executed table. If you ran into bug above, and don’t have any established connections to your database, the only way to shutdown the server is by issuing a kill -9 on the mysqld process.

Summary

As shown in this blog post, some of the mechanics of MySQL 5.7’s super_read_only command are not working as expected. This can prevent some administrative operations, like shutdown, from happening.

If you are using the super_read_only feature on MySQL 5.7.17 or older, including Percona Server 5.7.16 or older (which ports the mainstream implementation – unlike Percona Server 5.6, which ported Webscale’s super_read_only implementation) don’t use FLUSH LOGS.

Categories: MySQL

Percona Monitoring and Management 1.1.0 Beta is Now Available

MySQL Performance Blog - Tue, 2017-02-07 22:31

Percona announces the release of Percona Monitoring and Management 1.1.0 Beta on February 7, 2017. This is the first beta in the PMM 1.1 series with a focus on providing alternative deployment options for PMM Server:

The instructions for installing Percona Monitoring and Management 1.1.0 Beta are available in the documentation. Detailed release notes are available here.

New in PMM Server:

  • Grafana 4.1.1
  • Prometheus 1.5.0
  • Consul 0.7.3
  • Updated the MongoDB ReplSet dashboard to show the storage engine used by the instance
  • PMM-551: Fixed QAN changing query format when a time-based filter was applied to the digest

New in PMM Client:

  • PMM-530: Fixed pmm-admin to support special characters in passwords
  • Added displaying of original error message in pmm-admin config output

Known Issues:

  • Several of the MongoDB RocksDB metrics do not display correctly. This issue will be resolved in the production release.

A live demo of PMM is available at pmmdemo.percona.com.

We welcome your feedback and questions on our PMM forum.

About Percona Monitoring and Management
Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

Categories: MySQL

Overview of Different MySQL Replication Solutions

MySQL Performance Blog - Tue, 2017-02-07 15:04

In this blog post, I will review some of the MySQL replication concepts that are part of the MySQL environment (and Percona Server for MySQL specifically). I will also try to clarify some of the misconceptions people have about replication.

Since I’ve been working on the Solution Engineering team, I’ve noticed that – although information is plentiful – replication is often misunderstood or incompletely understood.

So What is Replication?

Replication guarantees information gets copied and purposely populated into another environment, instead of only stored in one location (based on the transactions of the source environment).

The idea is to use secondary servers on your infrastructure for either reads or other administrative solutions. The below diagram shows an example of a MySQL replication environment.

 

Fine, But What Choices Do I Have in MySQL?

You actually have several different choices:

Standard asynchronous replication

Asynchronous replication means that the transaction is completed on the local environment completely, and is not influenced by the replication slaves themselves.

After completion of its changes, the master populates the binary log with the data modification or the actual statement (the difference between row-based replication or statement-based replication – more on this later). This dump thread reads the binary log and sends it to the slave IO thread. The slave places it in its own preprocessing queue (called a relay log) using its IO thread.

The slave executes each change on the slave’s database using the SQL thread.

 

Semi-synchronous replication

Semi-synchronous replication means that the slave and the master communicate with each other to guarantee the correct transfer of the transaction. The master only populates the binlog and continues its session if one of the slaves provides confirmation that the transaction was properly placed in one of the slave’s relay log.

Semi-synchronous replication guarantees that a transaction is correctly copied, but it does not guarantee that the commit on the slave actually takes place.

Important to note is that semi-sync replication makes sure that the master waits to continue processing transactions in a specific session until at least one of the slaves has ACKed the reception of the transaction (or reaches a timeout). This differs from asynchronous replication, as semi-sync allows for additional data integrity.

Keep in mind that semi-synchronous replication impacts performance because it needs to wait for the round trip of the actual ACK from the slave.

Group Replication

This is a new concept introduced in the MySQL Community Edition 5.7, and was GA’ed in MySQL 5.7.17. It’s a rather new plugin build for virtual synchronous replication.

Whenever a transaction is executed on a node, the plugin tries to get consensus with the other nodes before returning it completed back to the client. Although the solution is a completely different concept compared to standard MySQL replication, it is based on the generation and handling of log events using the binlog.

Below is an example architecture for Group Replication.

If Group Replication interests you, read the following blog posts:

There will be a tutorial at the Percona Live Open Source Database Conference in Santa Clara in April, 2017.

Percona XtraDB Cluster / Galera Cluster

Another solution that allows you to replicate information to other nodes is Percona XtraDB Cluster. This solution focuses on delivering consistency, and also uses a certification process to guarantee that transactions avoid conflicts and are performed correctly.

In this case, we are talking about a clustered solution. Each environment is subject to the same data, and there is communication in-between nodes to guarantee consistency.

Percona XtraDB Cluster has multiple components:

  • Percona Server for MySQL
  • Percona XtraBackup for performing snapshots of the running cluster (if recovering or adding a node).
  • wsrep patches / Galera Library

This solution is virtually synchronous, which is comparable to Group Replication. However, it also has the capability to use multi-master replication. Solutions like Percona XtraDB Cluster are a component to improve the availability of your database infrastructure.

A tutorial on Percona XtraDB Cluster will be given at the Percona Live Open Source Database Conference in Santa Clara in April 2017.

Row-Based Replication Vs. Statement-Based Replication

With statement-based replication, the SQL query itself is written to the binary log. For example, the exact same INSERT/UPDATE/DELETE statements are executed by the slave.

There are many advantages and disadvantages to this system:

  • Auditing the database is much easier as the actual statements are logged in the binary log
  • Less data is transfered over the wire
  • Non-deterministic queries can create actual havoc in the slave environment
  • There might be a performance disadvantage, with some queries using statement-based replication (INSERT based on SELECT)
  • Statement-based replication is slower due to SQL optimizing and execution

Row-based replication is the default choice since MySQL 5.7.7, and it has many advantages. The row changes are logged in the binary log, and it does not require context information. This removes the impact of non-deterministic queries.

Some additional advantages are:

  • Performance improvements with high concurrency queries containing few row changes
  • Significant data-consistency improvement

And, of course, some disadvantages:

  • Network traffic can be significantly larger if you have queries that modify a large number of rows
  • It’s more difficult to audit the changes on the database
  • Row-based replication can be slower than statement-based replication in some cases
Some Misconceptions About Replication Replication is a cluster.

Standard asynchronous replication is not a synchronous cluster. Keep in mind that standard and semi-synchronous replication do not guarantee that the environments are serving the same dataset. This is different when using Percona XtraDB Cluster, where every server actually needs to process each change. If not, the impacted node is removed from the cluster. Asynchronous replication does not have this fail safe. It still accepts reads while in an inconsistent state.

Replication sounds perfect, I can use this as a manual failover solution.

Theoretically, the environments should be comparable. However, there are many parameters influencing the efficiency and consistency of the data transfer. As long as you use asynchronous replication, there is no guarantee that the transaction correctly took place. You can circumvent this by enhancing the durability of the configuration, but this comes at a performance cost. You can verify the consistency of your master and slaves using the pt-table-checksum tool.

I have replication, so I actually don’t need backups.

Replication is a great solution for having an accessible copy of the dataset (e.g., reporting issues, read queries, generating backups). This is not a backup solution, however. Having an offsite backup provides you with the certainty that you can rebuild your environment in the case of any major disasters, user error or other reasons (remember the Bobby Tables comic). Some people use delayed slaves. However, even delayed slaves are not a replacement for proper disaster recovery procedures.

I have replication, so the environment will now load balance the transactions.

Although you’ve potentially improved the availability of your environment by having a secondary instance running with the same dataset, you still might need to point the read queries towards the slaves and the write queries to the master. You can use proxy tools, or define this functionality in your own application.

Replication will slow down my master significantly.

Replication has only minor performance impacts on your master. Peter Zaitsev has an interesting post on this here, which discusses the potential impact of slaves on the master. Keep in mind that writing to the binary log can potentially impact performance, especially if you have a lot of small transactions that are then dumped and received by multiple slaves.

There are, of course, many other parameters that might impact the performance of the actual master and slave setup.

Categories: MySQL

Percona Server for MongoDB 3.4.1-1.1 Release Candidate is Now Available

MySQL Performance Blog - Mon, 2017-02-06 19:41

Percona announces the release of Percona Server for MongoDB 3.4.1-1.1rc on February 6, 2017. It is the first release candidate in the 3.4 series. Download the latest version from the Percona web site or the Percona Software Repositories.

NOTE: Release candidate packages are available from the testing repository.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release candidate is based on MongoDB 3.4.1, and includes the following additional changes:

Percona Server for MongoDB 3.4.1-1.1rc release notes are available in the official documentation.

Categories: MySQL

Percona Toolkit 3.0.0 Release Candidate is Now Available

MySQL Performance Blog - Mon, 2017-02-06 19:37

Percona announces the availability of Percona Toolkit 3.0.0rc-2 with new MongoDB tools on February 6, 2017. This is a release candidate.

Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL and MongoDB server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This is the first release candidate in the 3.0 series. It includes new features and bug fixes. Downloads are available from the Percona Software Testing Repositories.

New features:

Bug fixes:

  • 1402776: Updated MySQLProtocolParser to fix error when parsing tcpdump capture with pt-query-digest
  • 1632522: Fixed failure of pt-online-schema-change when altering a table with a self-referencing foreign key (Thanks, Amiel Marqeta)
  • 1654668: Fixed failure of pt-summary on Red Hat and derivatives (Thanks, Marcelo Altmann)

You can find release details in the release notes. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

Categories: MySQL

Percona Server for MySQL 5.7.17-11 is now available

MySQL Performance Blog - Fri, 2017-02-03 20:32

Percona announces the GA release of Percona Server for MySQL 5.7.17-11 on February 3, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.17, including all the bug fixes in it, Percona Server for MySQL 5.7.17-11 is the current GA release in the Percona Server for MySQL 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.17-11 milestone at Launchpad.

New Features:
  • Percona Server for MySQL has implemented support for per-column VARCHAR/BLOB compression for the XtraDB storage engine. This also features compression dictionary support, to improve compression ratio for relatively short individual rows, such as JSON data.
  • Kill Idle Transactions feature has been re-implemented by setting a connection socket read timeout value instead of periodically scanning the internal InnoDB transaction list. This makes the feature applicable to any transactional storage engine, such as TokuDB, and, in future, MyRocks. This re-implementation is also addressing some existing bugs, including server crashes: #1166744, #1179136, #907719, and #1369373.
Bugs Fixed:
  • Logical row counts for TokuDB tables could get inaccurate over time. Bug fixed #1651844 (#732).
  • Repeated execution of SET STATEMENT ... FOR SELECT FROM view could lead to a server crash. Bug fixed #1392375.
  • CREATE TEMPORARY TABLE would create a transaction in binary log on a read-only server. Bug fixed #1539504 (upstream #83003).
  • Using per-query variable statement with subquery temporary tables could cause a memory leak. Bug fixed #1635927.
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • A server could crash if a bitmap write I/O error happens in the background log tracking thread while a FLUSH CHANGED_PAGE_BITMAPS is executing concurrently. Bug fixed #1651656.
  • TokuDB was using wrong function to calculate free space in data files. Bug fixed #1656022 (#1033).
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • Audit Log Plugin when set to JSON format was not escaping characters properly. Bug fixed #1548745.
  • InnoDB index dives did not detect some of the concurrent tree changes, which could return bogus estimates. Bug fixed #1625151 (upstream #84366).
  • INFORMATION_SCHEMA.INNODB_CHANGED_PAGES queries would needlessly read potentially incomplete bitmap data past the needed LSN range. Bug fixed #1625466.
  • Percona Server cmake compiler would always attempt to build RocksDB even if -DWITHOUT_ROCKSDB=1 argument was specified. Bug fixed #1638455.
  • Lack of free pages in the buffer pool is not diagnosed with innodb_empty_free_list_algorithm set to backoff (which is the default). Bug fixed #1657026.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265. Thanks to Dawid Golunski (https://legalhackers.com).
  • Renaming a TokuDB table to a non-existent database with tokudb_dir_per_db enabled would lead to a server crash. Bug fixed #1030.
  • Read Free Replication optimization could not be used for TokuDB partition tables. Bug fixed #1012.

Other bugs fixed: #1486747, #1617715, #1633988, #1638198 (upstream #82823), #1642230, #1646384, #1640810, #1647530, #1651121, #1658843, #1156772, #1644583, #1648389, #1648737, #1650256, and #1647723.

The release notes for Percona Server for MySQL 5.7.17-11 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Categories: MySQL

Percona Live Featured Tutorial with Derek Downey, David Turner and René Cannaò — ProxySQL Tutorial

MySQL Performance Blog - Fri, 2017-02-03 17:22

Welcome to another post in the series of Percona Live featured tutorial speakers blogs! In these blogs, we’ll highlight some of the tutorial speakers that will be at this year’s Percona Live conference. We’ll also discuss how these tutorials can help you improve your database environment. Make sure to read to the end to get a special Percona Live 2017 registration bonus!

In this Percona Live featured tutorial, we’ll meet Derek Downey (OSDB Practice Advocate, Pythian), David Turner (Storage SRE, Uber) and René Cannaò (MySQL SRE, Dropbox / ProxySQL). Their session is ProxySQL Tutorial. There is a stigma attached to database proxies when it comes to MySQL. This tutorial hopes to blow away that stigma by showing you what can be done with a proxy designed from the ground up to perform. I had a chance to speak with Derek, David and René and learn a bit more about ProxySQL:

Percona: How did you get into database technology? What do you love about it?

Derek Downey

Derek: I took a relational database course in college based on Oracle. Set theory and the relational model made a lot of sense to me. After a few years as a web developer at a small company, I transitioned to a hybrid SysAdmin/DBA role and got my first taste of the potential of “the cloud” (and some of the drawbacks).

I really came to understand that data is the lifeblood of any organization, and making sure it is always available through any disaster – from human error to hurricanes – is a unique and exciting challenge.

You should never notice the DBA if they’re doing their job right. There’s not much praise for a DBA on a job well done. But it’s a vital position to keep a company running. And that suits me just fine.

David: I started working for the Advanced Projects Group at the University of Missouri, now known as MOREnet. They were largely responsible for connecting all of the libraries and schools in the state to the Internet. I was initially helping them with their Internet presence as a webmaster. Later they needed help with their databases. I got very excited about working with Oracle at the time, and decided to join that team.

My relationship with MySQL started primarily because the cost of sharding Oracle was so high. Additionally, MySQL’s replication allowed us to use slaves. Oracle’s Dataguard/standby options wouldn’t allow reads from the slaves at that time. Lastly, MySQL was sort of “wild west” fun, since it lacked so many other features that Oracle had long ago. You had to get creative. It has been humbling to see how much innovation has come from the community and how far MySQL has come. And this is only the beginning!

René Cannaò

René: My career followed the classic path of a system administrator that ends up becoming a DBA. I used to work for a few companies as webmaster,  and finally as SysAdmin for a web hosting company. I always saw a similar pattern: “the bottleneck is in the database.” Nobody ever knew why the database was the bottleneck. I volunteered to improve the performance of this “unknown system.” Learning was a fun experience, and the result was extremely rewarding. I love understanding how databases operate and their internals. This is the only way to be able to get the maximum performance: “scientia potentia est”!

Percona: Your tutorial is called “ProxySQL Tutorial.” What exactly is ProxySQL, and what does it do?

Derek: I’ll leave it to René, the creator of ProxySQL, to give more detail on exactly what it is. But for a DBA trying to ensure their data is always available, it is a new and exciting tool in our toolbox.

René: ProxySQL is the MySQL data gateway. It’s the Stargate that can inspect, control, transform, manage, and route all traffic between clients and database servers. It builds reliable and fault-tolerant networks. It is a software bridge that empowers MySQL DBAs, built by DBAs for DBAs, allowing them to control all MySQL traffic where previously such traffic could not be controlled either on the client side (normally the developers’ realm) or server side (where there are not enough tools).

David Turner

David: Architecturally, ProxySQL is a separate process between the client and the database. Because traffic passes through it, ProxySQL can become many things (three of which got my attention). It can be a multiplexer, a filter, and a replicator.

Multiplexers reduce many signals down to a few. Web servers often open many static connections to MySQL. Since MySQL can only support a limited number of connections before performance suffers, ProxySQL’s ability to transparently manage tens of thousands of connections while only opening a few to the database is a great feature.

Administrators can update ProxySQL to filter and even rewrite queries based on patterns they decide on. As someone that has worked in operations and seen how long it can take to disable misbehaving applications, this is a very compelling feature. With ProxySQL in place, I can completely block a query from the database in no time.

ProxySQL’s replication or mirroring capability means that all of the queries sent to one database can now be sent to N databases. As someone that has to roll out new versions of MySQL, test index changes, and benchmark hardware this is also a compelling feature.

Percona: What are the advantages of using ProxySQL in a database environment?

René: ProxySQL is the bridge between the clients and the servers. It creates two layers, and controls all the communication between the two. Sitting in the middle, ProxySQL provides a lot of advantages normally impossible to achieve in a standard database environment, such as throttling or blocking queries, rewriting queries, implementing sharding, read/write splitting, caching, duplicating traffic, handling backend failures, failovers, integration with HA solutions, generating real-time statistics, etc. All this, without any application change, and completely transparent to the application.

Derek: For me, ProxySQL decouples the application from the data layer. This provides more control over the backend database environment to the DBA in regards to queries, maintenance and failovers, without impact to the application.

David: In addition to the roles noted above, ProxySQL can be part of a failover solution, routing queries to a new master when the current master fails. Other advantages are splitting queries over multiple databases to distribute the load, provide additional metrics, etc.

Percona: What do you want attendees to take away from your tutorial session? Why should they attend?

Derek: This tutorial highlights what ProxySQL is trying to achieve, and discusses how to add ProxySQL to common architectures environments. Attendees will get hands-on experience with the technology, and learn how to install and configure ProxySQL to achieve query rewriting, seamless failover, and query mirroring.

David: Not only hands-on experience with ProxySQL, but an understanding of how much they can leverage with it. The more I use ProxySQL, the more advantages I see to it. For example, I did not realize that by clustering ProxySQL processes I can distribute the query matching and rewrites over many hosts, as well as use them as a caching layer.

René: ProxySQL is built upon very innovative technologies. Certain architectural concepts like hostgroups, chaining of query rules, granular routing and sharding, query retries and the very powerful Admin interface are concepts not always intuitive for DBAs with experience using other proxies. This tutorial helps understand these concepts, and attendees get hand-on experience in the configuration of ProxySQL in various scenarios.

Percona: What are you most looking forward to at Percona Live?

David: First, the people. Next, everything everyone is working on. We’re really lucky to work in such an innovative and collaborative industry. As databases evolve, we are on the ground floor of their evolution. What an exciting place to be.

Derek: I am mostly looking forward to reconnecting with my peers in the MySQL community. Both ones I’ve formerly worked with or previously met at Percona Live, as well as meeting new open source database professionals and hearing how they are providing solutions for their companies.

René: I am looking forward to attending sessions regarding new features in MySQL 8 and other new technologies. But moreover, I am excited to interact with MySQL users and get more input on how to improve ProxySQL so that it can become an indispensable tool in any MySQL environment.

Register for Percona Live Data Performance Conference 2017, and see Derek, David and René present their ProxySQL Tutorial. Use the code FeaturedTutorial and receive $30 off the current registration price!

Percona Live Data Performance Conference 2017 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 24-27, 2017 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Categories: MySQL

PMM Alerting with Grafana: Working with Templated Dashboards

MySQL Performance Blog - Thu, 2017-02-02 23:54

In this blog post, we will look into more intricate details of PMM alerting. More specifically, we’ll look at how to set up alerting based on templated dashboards.

Percona Monitoring and Management (PMM) 1.0.7 includes Grafana 4.0, which comes with the Alerting feature. Barrett Chambers shared how to enable alerting in general. This blog post looks at the specifics of setting up alerting based on the templated dashboards. Grafana 4.0 does not support basic alerting out-of-the-box.

This means if I try to set up an alert on the number of MySQL threads running, I get the error “Template variables are not supported in alert queries.”

What is the solution?

Until Grafana provides a better option, you need to do alerting based on graphs (which don’t use templating). This is how to do it.

Click on “Create New” in the Dashboards list to create a basic dashboard for your alerts:

Click on “Add Panel” and select “Graph”:

Click on the panel title of the related panel on the menu sign, and then click on “Panel JSON”.

This shows you the JSON of the panel, which will look like something like this:

Now you need to go back to the other browser window, and the dashboard with the graph you want to alert on. Show the JSON panel for it. In our case, we go to “MySQL Overview” and show the JSON for “MySQL Active Threads” panel.

Copy the JSON from the “MySQL Active Threads” panel and paste it into the new panel in the dashboard created for alerting.

Once we have done the copy/paste, click on the green Update button, and we’ll see the broken panel:

It’s broken because we’re using templating variables in dashboard expressions. None of them are set up in this dashboard. Expressions won’t work. We must replace the template variables in the formulas with actual hosts, instances, mount points, etc., for we want to alert on:

We need to change $host to the name of the host we want to alert on, and the $interval should align with the data capture interval (here we’ll set it to 5 seconds):

If correctly set up, you should see the graph showing the data.

Finally, we can go to edit the graph. Click on the “Alert” and “Create Alert”.

Specify Evaluate Every to create an alert. This sets up the evaluation interval for the alert rule. Obviously, the more often the alert evaluates the condition, the more quickly you get alerted if something goes wrong (as well as alert conditions).

In our case, we want to get an alert if the number of running threads are sustained at a high rate. To do this, look at the minimum number of threads for last minute to be above 30:

Note that our query has two parameters: “A” is the number of threads connected, and “B” is the number of threads running. We’re choosing to Alert on “B”. 

The beautiful thing Grafana does is show the alert threshold clearly on the graph, and allows you to edit the alert just by moving this alert line with a mouse:

You may want to click on the floppy drive at the top to save dashboard (giving it whatever identifying name you want).

At this point, you should see the alert working. A little heart sign appears by the graph title, colored green (indicating it is not active) or red (indicating it is active). Additionally, you will see the red and green vertical lines in the alert history. These show when this alert gets triggered and when the system went back to normal.

You probably want to set up notifications as well as see alerts on the graphs. 

To set up notifications, go to the Grafana Configuration menu and configure Alerting. There are Grafana Support Email, Slack, Pagerduty and general Webhook notification options (with more on the way, I’m sure).

The same way you added the “Graph” panel to set up an alert, you can add the “Alert List” panel to see all the alerts you have set up (and their status):

Summary

As you can see, it is possible to set up alerts in PMM using the new Grafana 4.0 alerting feature. It is not very convenient or easy to do. This is first alerting support release for Grafana and PMM. As such, I’m sure it will become much easier and more convenient over time.

Categories: MySQL

Vote Percona in LinuxQuestions.org Members Choice Awards 2016

MySQL Performance Blog - Thu, 2017-02-02 20:40

Percona is calling on you! Vote for Percona for Database of the Year in LinuxQuestions.org Members Choice Awards 2016. Help Percona get recognized as one of the best database options for data performance. Percona provides free, fully compatible, enhanced, open source drop-in replacement database software with superior performance, scalability and instrumentation.

LinuxQuestions.org, or LQ for short, is a community-driven, self-help website for Linux users. Each year, LinuxQuestions.org holds an annual competition to recognize the year’s best-in-breed technologies. The online Linux community determines the winners of each category!

You can vote now for your favorite database of 2016 (Percona, of course!). This is your chance to be heard!

Voting ends on February 7, 2017. You must be a registered member of LinuxQuestions.org with at least one post on their forums to vote.

Categories: MySQL

WAN Synchronous Clusters: Dealing with Latency Using Concurrency

MySQL Performance Blog - Wed, 2017-02-01 21:58

In this blog, we’ll discuss how to use concurrency to help with WAN latency when using synchronous clusters.

WAN Latency Problem

Our customers often ask us for help or advice with WAN clustering problems. Historically, the usual solution for MySQL WAN deployments is having the primary site in one data center, and stand-by backup site in another data center (replicating from the primary asynchronously). These days, however, there is a huge desire to employ available synchronous replication solutions for MySQL. These solutions include things like Galera (i.e., Percona XtraDB Cluster) or the recently released MySQL Group Replication. This trend is attributable to the fact that these solutions are less problematic and provide more automatic fail over and fail back procedures. But it’s also because businesses want to write in both data centers simultaneously.

Unfortunately, WAN link reliability and latency makes the synchronous replication solution a big challenge. In many cases, these challenges force geographically separate data centers to still replicate asynchronously.

From a requirements point of view, the Galera founders official documentation has WAN related recommendations and some dedicated options (like segments) — as described in Jay’s blog post. But WAN deployments are absolutely possible, and even an advertised option, in Galera. The MySQL Group Replication team, however, seem to discourage such use cases, as we can read:

Group Replication is designed to be deployed in a cluster environment where server instances are very close to each other, and is impacted by both network latency as well as network bandwidth.

Remedy

While perhaps obvious to some, I would like to point out a simple dependency that might be a viable solution in some deployments that face significant network latency. That solution is concurrency! When you face the problem of limited write throughput due to a transaction commit latency, you can employ more writer threads. By using separate connections to MySQL, overall you can to commit more transactions at the same time.

Let me demonstrate with example results based on a very simple test case. I tested both Percona XtraDB Cluster (with Galera replication) and MySQL Group Replication. I configured a minimal cluster of three nodes in each case, running as Docker containers on the same host (simulating a WAN network). For this setup, latency is around 0.05ms on average. Then, I introduced an artificial network latency of 50ms and 100ms into one of the node’s network interfaces. I later repeated the same tests using VirtualBox VM instances, running on a completely different server. The results were very similar. The command to simulate additional network latency is:

# tc qdisc add dev eth0 root netem delay 50ms

To delay the ping to other nodes in the cluster:

# ping -c 2 172.17.0.2 PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data. 64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=50.0 ms 64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=50.1 ms

The test is very simple: execute 500 small insert transactions, each inserting just single row (but that is less relevant now).

For testing, I used a simple mysqlslap command:

mysqlslap --password=*** --host=$IP --user=root --delimiter=";" --number-of-queries=500 --create-schema=test --concurrency=$i --query="insert into t1 set a='fooBa'"

and simple single table:

CREATE TABLE `t1` ( `id` int(11) NOT NULL AUTO_INCREMENT, `a` char(5) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB

Interestingly, without increased latency, the same test takes much longer against the Group Replication cluster, even though by default Group Replication works with enabled group_replication_single_primary_mode, and disabled group_replication_enforce_update_everywhere_checks. Theoretically, it should be a lighter operation, from a “data consistency checks” point of view. Also with WAN-type latencies, Percona XtraDB Cluster seems to be slightly faster in this particular test. Here are the test results for the three different network latencies:

XtraDB Cluster latency/seconds Threads 100ms 50ms 0.05ms 1 51.671 25.583 0.268 4 13.936 8.359 0.187 8 7.84 4.18 0.146 16 4.641 2.353 0.13 32 2.33 1.16 0.122 64 1.808 0.925 0.098 GR latency/seconds Threads 100ms 50ms 0.05ms 1 55.513 29.339 5.059 4 14.889 7.916 2.184 8 7.673 4.195 1.294 16 4.52 2.507 0.767 32 3.417 1.479 0.473 64 2.099 0.809 0.267

I used the same InnoDB settings for both clusters, each node under a separate Docker container or Virtual Box VM. Similar test result could differ a lot in real production systems, where more CPU cores provide better multi-concurrency conditions.

It also wasn’t my idea to benchmark Galera versus Group Replication, but rather to show that the same concurrency to write throughput dependency applies to both technologies. I might be missing some tuning on the Group Replication side, so I don’t claim any verified winner here.

Just to provide some more details, I was using Percona XtraDB Cluster 5.7.16 and MySQL with Group Replication 5.7.17.

One important note: when you expect higher concurrency to provide better throughput, you must make sure the concurrency is not limited by server settings. For example, you must look at innodb_thread_concurrency  (I used 0, so unlimited), slave_parallel_workers for GR and wsrep_slave_threads for Galera (among others related to IO operations, etc.).

Apart from “concurrency tuning,” which could involve application changes if not architectural re-design, there are of course more possible optimizations for WAN environments. For example:

https://www.percona.com/blog/2016/03/14/percona-xtradb-cluster-in-a-high-latency-network-environment/ (to deal with latency)

and:

https://www.percona.com/blog/2013/12/19/automatic-replication-relaying-galera-3/

for saving/minimizing network utilization using binlog_row_image=minimal and other variables.

But these are out of the scope of this post. I hope this simple post helps you deal with the speed of light better!  

Categories: MySQL

Percona Server for MySQL 5.5.54-38.6 is now available

MySQL Performance Blog - Wed, 2017-02-01 20:46

Percona announces the release of Percona Server for MySQL 5.5.54-38.6 on February 1, 2017. Based on MySQL 5.5.54, including all the bug fixes in it, Percona Server for MySQL 5.5.54-38.6 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.54-38.6 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • Fixed new compilation warnings with GCC 6. Bugs fixed #1641612 and #1644183.
  • CONCURRENT_CONNECTIONS column in the USER_STATISTICS table was showing incorrect values. Bug fixed #728082.
  • Audit Log Plugin when set to JSON format was not escaping characters properly. Bug fixed #1548745.
  • mysqld_safe now limits the use of rm and chown to avoid privilege escalation. chown can now be used only for /var/log directory. Bug fixed #1660265.

Other bugs fixed: #1638897, #1644174, #1644547, and #1644558.

[UPDATE 2016-02-02]: New packages have been pushed to repositories with incremented package version to address the bug #1661123.

Find the release notes for Percona Server for MySQL 5.5.54-38.6 in our online documentation. Report bugs on the launchpad bug tracker.

Categories: MySQL

Docker Security Vulnerability CVE-2016-9962

MySQL Performance Blog - Tue, 2017-01-31 17:39

Docker 1.12.6 was released to address CVE-2016-9962. CVE-2016-9962 is a serious vulnerability with RunC.

Quoting the coreos page (linked above):

“RunC allowed additional container processes via runc exec to be ptraced by the pid 1 of the container. This allows the main processes of the container, if running as root, to gain access to file-descriptors of these new processes during the initialization and can lead to container escapes or modification of runC state before the process is fully placed inside the container.”

In short, IF processes run as root inside a container they could potentially break out of the container and gain access over the host.

My recommendation at this time is to apply the same basic security tenants for containers as you would (I hope) for VM and baremetal installs. In other words, ensure you are adhering to a Path of Least Privilege as a best practice and not running as root for conevience’s sake.

Prior to this, we made changes to PMM prior to version 1.0.4 to reduce the number of processes within the container that ran as root. As such, only the processes required to do so run as root. All other processes run as a lower privilege user.

Check here for documentation on PMM, and use the JIRA project to raise bugs (JIRA requires registration).

To comment on running a database within docker, I’ve reviewed the following images

  • percona-server image: I have verified it does not run as root, and runs as a mysql user (for 5.7.16 at least)
  • percona-server-mongodb: I have worked with our teams internally and can confirm that the latest image no longer runs as root (you will to run the latest image, however, to see this change via docker pull)

Please comment below with any questions.

Categories: MySQL
Syndicate content