MySQL

Diagnosing SST errors with Percona XtraDB Cluster for MySQL

MySQL Performance Blog - Tue, 2014-12-30 08:00

State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues.

Port for SST is not open

The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail.

You will see in the error log of the donor that SST is started:

[...] 141223 16:08:48 [Note] WSREP: Node 2 (node1) requested state transfer from '*any*'. Selected 0 (node3)(SYNCED) as donor. 141223 16:08:48 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6) 141223 16:08:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 141223 16:08:48 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6'' [...]

But then nothing happens, and some time later you will see a bunch of errors:

[...] 2014/12/23 16:09:52 socat[2965] E connect(3, AF=2 192.168.234.101:4444, 16): Connection timed out WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 0 1 (20141223 16:09:52.057) WSREP_SST: [ERROR] Cleanup after exit with status:32 (20141223 16:09:52.064) WSREP_SST: [INFO] Cleaning up temporary directories (20141223 16:09:52.068) 141223 16:09:52 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6' [...]

On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted:

[...] 141223 16:08:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6) 141223 16:08:48 [Note] WSREP: Requesting state transfer: success, donor: 0 141223 16:08:49 [Note] WSREP: (f9560d0d, 'tcp://0.0.0.0:4567') turning message relay requesting off 141223 16:09:52 [Warning] WSREP: 0 (node3): State transfer to 2 (node1) failed: -32 (Broken pipe) 141223 16:09:52 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.

The solution is of course to make sure that the ports are open on both sides.

SST is not correctly configured

Sometimes you will see an error like this on the donor:

141223 21:03:15 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.102:4444/xtrabackup_sst' --auth 'sstuser:s3cretzzz' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e63f38f2-8ae6-11e4-a383-46557c71f368:0'' [...] WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20141223 21:03:26.973)

And if you look at innobackup.backup.log:

41223 21:03:26 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES). innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2995 main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530 innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2979

What happened?

The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file:

[mysqld] wsrep_sst_auth=sstuser:s3cret

And you also need to create the corresponding MySQL user:

mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY 's3cret';

So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set.

Galera versions do not match

Here is another set of errors you may see in the error log of the donor:

141223 21:14:27 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:30 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:33 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101

Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node.

The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version.

But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file:

[mysqld] wsrep_provider_options="socket.checksum=1"

Conclusion

SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.

The post Diagnosing SST errors with Percona XtraDB Cluster for MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

How well does your table fit in the InnoDB buffer pool in MySQL 5.6+?

MySQL Performance Blog - Mon, 2014-12-29 14:31

Some time ago, Peter Zaitsev posted a blog titled “How well does your table fits in innodb buffer pool?” He used some special INFORMATION_SCHEMA tables developed for Percona Server 5.1 to report how much of each InnoDB table and index resides in your buffer pool.

As Peter pointed out, you can use this view into the buffer pool to watch a buffer pool warm up with pages as you run queries. You can also use it for capacity planning. If you expect some tables need to be fully loaded in the buffer pool to be used efficiently, but the buffer pool isn’t large enough to hold them, then it’s time to increase the size of the buffer pool.

The problem, however, was that system tables change from version to version. Specifically, the INNODB_BUFFER_POOL_PAGES_INDEX table no longer exists in Percona Server 5.6, and the INNODB_INDEX_STATS table changed some column names in Percona Server 5.5.8, and the table no longer exists in Percona Server 5.6. So many of the comments on Peter’s blog rightly pointed out that the example query didn’t work on subsequent versions of Percona Server. And MySQL Community Edition at the time didn’t have the feature at all. They asked for an update to the blog post.

So here’s an updated, simplified query to report the content of your buffer pool, tested on the most recent versions.

Percona Server 5.1 and 5.5:

USE information_schema; SET @page_size = @@innodb_page_size; SET @bp_pages = @@innodb_buffer_pool_size/@page_size; SELECT P.TABLE_NAME, P.PAGE_TYPE, CASE WHEN P.INDEX_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE '`SYS_%' THEN P.INDEX_NAME WHEN P.INDEX_NAME <> 'PRIMARY' THEN 'SECONDARY' ELSE 'PRIMARY' END AS INDEX_TYPE, COUNT(DISTINCT P.PAGE_NUMBER) AS PAGES, ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/@bp_pages,2) AS PCT_OF_BUFFER_POOL, CASE WHEN P.TABLE_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE 'SYS_%' THEN NULL ELSE ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/CASE P.INDEX_NAME WHEN 'PRIMARY' THEN TS.DATA_LENGTH/@page_size ELSE TS.INDEX_LENGTH/@page_size END, 2) END AS PCT_OF_INDEX FROM INNODB_BUFFER_PAGE AS P JOIN INNODB_SYS_TABLES AS T ON P.SPACE = T.SPACE JOIN TABLES AS TS ON (T.SCHEMA, T.NAME) = (TS.TABLE_SCHEMA, TS.TABLE_NAME) WHERE TS.TABLE_SCHEMA <> 'mysql' GROUP BY TABLE_NAME, PAGE_TYPE, INDEX_TYPE;

MySQL 5.6 and 5.7 (this also works on Percona Server 5.6):

USE information_schema; SET @page_size = @@innodb_page_size; SET @bp_pages = @@innodb_buffer_pool_size/@page_size; SELECT P.TABLE_NAME, P.PAGE_TYPE, CASE WHEN P.INDEX_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE '`SYS_%' THEN P.INDEX_NAME WHEN P.INDEX_NAME <> 'PRIMARY' THEN 'SECONDARY' ELSE 'PRIMARY' END AS INDEX_TYPE, COUNT(DISTINCT P.PAGE_NUMBER) AS PAGES, ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/@bp_pages,2) AS PCT_OF_BUFFER_POOL, CASE WHEN P.TABLE_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE 'SYS_%' THEN NULL ELSE ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/CASE P.INDEX_NAME WHEN 'PRIMARY' THEN TS.DATA_LENGTH/@page_size ELSE TS.INDEX_LENGTH/@page_size END, 2) END AS PCT_OF_INDEX FROM INNODB_BUFFER_PAGE AS P JOIN INNODB_SYS_TABLES AS T ON P.SPACE = T.SPACE JOIN TABLES AS TS ON T.NAME = CONCAT(TS.TABLE_SCHEMA, '/', TS.TABLE_NAME) WHERE TS.TABLE_SCHEMA <> 'mysql' GROUP BY TABLE_NAME, PAGE_TYPE, INDEX_TYPE;

In both cases, the output looks something like the following (if I have read from a single table called test.foo):

+--------------+-------------------+------------+-------+--------------------+--------------+ | TABLE_NAME | PAGE_TYPE | INDEX_TYPE | PAGES | PCT_OF_BUFFER_POOL | PCT_OF_INDEX | +--------------+-------------------+------------+-------+--------------------+--------------+ | NULL | FILE_SPACE_HEADER | NULL | 1 | 0.00 | NULL | | NULL | IBUF_BITMAP | NULL | 1 | 0.00 | NULL | | NULL | INODE | NULL | 1 | 0.00 | NULL | | `test`.`foo` | INDEX | PRIMARY | 2176 | 3.32 | 98.37 | | `test`.`foo` | INDEX | SECONDARY | 2893 | 4.41 | 88.47 | +--------------+-------------------+------------+-------+--------------------+--------------+

Unfortunately, the INFORMATION_SCHEMA tables report total size of secondary indexes for a table, but not the size of each index individually. Therefore this query shows the percent of index only for the primary index (which is also the clustered index, i.e. the table itself), and then all other secondary indexes grouped together.

PERFORMANCE_SCHEMA solution

The PERFORMANCE_SCHEMA also includes some information about the contents of the buffer pool. The MySQL SYS Schema makes it easy to query this. But this view doesn’t calculate the percentage of each table in the buffer pool, nor the percentage of the buffer pool occupied by each table.

mysql> SELECT * FROM sys.innodb_buffer_stats_by_table; +---------------+----------------------+------------+------------+-------+--------------+-----------+-------------+ | object_schema | object_name | allocated | data | pages | pages_hashed | pages_old | rows_cached | +---------------+----------------------+------------+------------+-------+--------------+-----------+-------------+ | test | foo | 149.64 MiB | 106.19 MiB | 9577 | 9577 | 9577 | 1050490 | | InnoDB System | SYS_TABLES | 160.00 KiB | 91.24 KiB | 10 | 10 | 10 | 594 | | InnoDB System | SYS_INDEXES | 128.00 KiB | 93.59 KiB | 8 | 8 | 8 | 1345 | | InnoDB System | SYS_COLUMNS | 80.00 KiB | 47.13 KiB | 5 | 5 | 5 | 761 | | InnoDB System | SYS_DATAFILES | 48.00 KiB | 16.40 KiB | 3 | 3 | 3 | 246 | | InnoDB System | SYS_FIELDS | 48.00 KiB | 16.02 KiB | 3 | 3 | 3 | 377 | | InnoDB System | SYS_FOREIGN | 48.00 KiB | 0 bytes | 3 | 3 | 3 | 0 | | InnoDB System | SYS_TABLESPACES | 48.00 KiB | 15.83 KiB | 3 | 3 | 3 | 242 | | InnoDB System | SYS_FOREIGN_COLS | 16.00 KiB | 0 bytes | 1 | 1 | 1 | 0 | . . .

The post How well does your table fit in the InnoDB buffer pool in MySQL 5.6+? appeared first on MySQL Performance Blog.

Categories: MySQL

InnoDB crash recovery speed in MySQL 5.6

MySQL Performance Blog - Wed, 2014-12-24 14:43

It has been a while since I have looked at InnoDB crash recovery. A lot has change in the last few years – we have serious crash recovery performance improvements in MySQL 5.5 and MySQL 5.6, we have solid state drives raising as typical high performance IO subsystem and we also have the ability to set much larger log files and often have a much larger InnoDB Buffer Pool to work with.

First let me revisit the challenge with have with InnoDB configuration. For write-intensive workloads it is extremely important to size innodb_log_file_size for good performance, however the longer log file size you have the longer you might have to wait for InnoDB to complete crash recovery, which impacts your recovery strategy.

How much can innodb_log_file_size impact performance? Massively! Doing intensive writes to a database that well fits in memory, I’d say there’s a 10x difference between having combined size of log files of 32GB vs 64MB.

Before we look at some performance numbers let’s talk about what InnoDB Crash Recovery time depends on:

Combined Innodb Log File Size innodb_log_file_size*innodb_log_files_in_group is what really matters. It does not really matter which of those two you change. I prefer to keep innodb_log_files_in_group as default and only work with innodb_log_file_size. The larger size you have allocated the longer recovery will take.

innodb_checkpoint_age – Combined size of InnoDB log files defines how many changes not reflected in the tablespace we may have where innodb_checkpoint_age shows how much changes we actually have at the current moment, being an actual driving factor of recovery time. If you have very large log files allocated but for your workload innodb_checkpoint_age stays low chances are recovery will be quick. Be careful however – intensive writes can cause innodb_checkpoint_age to go much higher than the average for your workload causing recovery time from crashes at that time to be much longer.

Innodb Buffer Pool Size – This is another very important factor. During recovery, InnoDB has to redo changes to the unflushed/dirty pages from buffer pool, which is obviously limited by buffer pool size. This also means innodb_max_dirty_pages_pct can be used to impact recovery speed. This is the number of dirty pages being the true driving factor. With small buffer pool, a limited number of dirty pages based on the workload you might not have innodb_checkpoint_age to go high even if you have allowed for large log space.

Data Structure matters a lot for recovery speed. Generally shorter rows being updated will mean longer recovery time for the same log file size. This should make sense as shorter row changes means there is less log space produced for the same amount of page changes. If you do a lot of blob writes InnoDB crash recovery can be short even with relatively large log files.

Access Pattern is another key factor – the more “random” access is the more distinct pages you will have touched during the same innodb_checkpoint_age the longer recovery can take.

Hardware - Better hardware means recovery goes faster, as much is obvious. More specifically you will be looking for storage performance at low concurrency (both reads and writes are important) as well as fast CPU cores – crash recovery at this point is not able to use multiple cores effectively.

Let’s now look at the test….
I am running Sysbench on an 11GB table, designed to fit in the 12GB buffer pool. Here is the exact command:

sysbench --tx-rate=4000 --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=uniform --oltp-table-size=40000000 --mysql-user=root --mysql-password=password --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

The box is rather low end i3-4010U (4 threads) CPU with a Samsung EVO 840GB SSD, so numbers are expected to be higher on real server hardware.

In my first test I’m injecting 4000 updates/sec which is about half of what the box can do at sustained load. I do this to illustrate more common load scenario as we rarely run systems at their saturation point in real world. The uniform distribution should mean worse case scenarios for in-memory workloads through I think recovery speed would be slower if u use random writes to the database much larger than the amount of memory.

At this workload I’m getting innodb_checkpoint_age of 15GB even though total log file size is 32GB. Crashing the system makes for about 40 minutes recovery time so the log was processed at the 6.25MB/sec

Here are some interesting graphs:

As you can see recovery is essentially close to single core. It also can be CPU bound at times (and will be more so with faster storage) – at certain times of recovery when logs are being scanned it can be completely CPU bound (see how IO wait essentially goes to zero at some times)

Over time as recovery progresses more and more blocks become cached, so they do not have to be read from the disk for log records to be applied, meaning the workload becomes more and more write bound.

This is an unweighted IO utilization graph where 1000 corresponds to 100% of time where at least one IO request was outstanding. As you can see from this and the previous drive, InnoDB does not keep the IO well saturated all the time during crash recovery.

Additionally to the first sysbench crash test I did two more – one running a system completely saturated with updates. This made innodb_checkpoint_age to go as high as 23.2GB and crash recovery took 1h 20 minutes, showing some 4.8MB/sec The thing to consider in this case is that MySQL was not able to keep up with purging the history so it was growing quickly meaning crash recovery had to cover a lot of undo space modifications.

Finally I also did a run with a more skewed pareto distribution which resulted in 9.8G innodb_checkpoint_age 33min crash recovery time and 4.94MB/sec of log processing speed.

As I explained above there are a lot of moving parts so your numbers are likely to be quite different, yet I hope this can provide some reasonable baseline you can use for calculation.

Note also waiting for the server to recover from the crash is only one way to deal with recovery. Even if you size log files to be very small you will likely need to deal with Operating System boot and when warmup which will take a few minutes. It is often much better to use a different primary method of crash recovery, such as failover to the MySQL Replication Slave or using Percona XtraDB Cluster. If you use these methods you can often use quite a high combined InnoDB log file size to optimize for performance.

Final Thoughts: Even though InnoDB Crash Recovery has improved in MySQL 5.5 and MySQL 5.6 there is still room to improve it even more. As we see from the resource usage graphs during recovery there is an opportunity to both use multiple CPU cores more effectively as well as drive IO subsystem with higher concurrency and in more sustained fashion.

The post InnoDB crash recovery speed in MySQL 5.6 appeared first on MySQL Performance Blog.

Categories: MySQL

File carving methods for the MySQL DBA

MySQL Performance Blog - Tue, 2014-12-23 16:31

This is a long overdue blog post from London’s 44con Cyber Security conference back in September. A lot of old memories were brought to the front as it were; the one I’m going to cover in this blog post is: file carving.

So what is file carving? despite the terminology it’s not going to be a full roast dinner; unless you have an appetite for data which as you’re here I’m assuming you have.

The TL;DR of “what is file carving” is taking a target blob of data (often a multi GB / TB file) and reducing it in to targeted pieces of data, this could be for instance grabbing all the jpeg images in a packet capture / mysqldump; or pulling that single table/schema out of a huge mysqldump with –all-databases (if you’re not using mydumper you really should it avoids issues like this!) aka “Sorting the wheat from the chaff”.

Let’s take for example at the time of writing this post I am looking to extract a single schema out of one such mysqldump –all-database file of around 2GB (2GB of course isn’t large however it’s large enough to give a practical example; the methods for larger files are of course the same). So where to start?

You’ll need the following tools installed:

  1. xxd (you can substitute xxd for od, hexer or any other hex editing / viewing tool you are comfortable with, just make sure it can handle very large files)
  2. grep

Let’s carve out the mysql schema

dbusby@kali:~$ xxd yourdumpfile.sql | grep 'mysql' -B5 | grep 'ASE' -A2 -B2
00003c0: 6e74 2044 6174 6162 6173 653a 2060 6d79 nt Database: my
00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql.--..CREATE
00003e0: 4441 5441 4241 5345 202f 2a21 3332 3331 DATABASE /*!3231
00003f0: 3220 4946 204e 4f54 2045 5849 5354 532a 2 IF NOT EXISTS*
0000400: 2f20 606d 7973 716c 6020 2f2a 2134 3031 / mysql /*!40

Wonderful so we have some hex representation of the sql dumpfile why on earth do we want the hex? we need to define our offsets. In short our offsets are the position of the start and end of the chunk we intend to carve from the file.

From the above our start offset is 00003d9 at the start of CREATE DATABASE; for those unfamiliar with hexdump outputs I recommend looking at the tool hexer a vi like tool and pressing v to enter visual selection mode select a few characters and you’ll not something as follows “visual selection:  0x000003d9 – …”.

You can of course work out the range visually from the above, 00003d0 is the start of the line, each alphanumeric pair is a single byte the byte offset notation is hexedecimal 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f.

00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql.--..CREATE
00003d0 == s, 00003d1 == q, 00003d2 == l And so on, we can easily verify this using xxd
dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | head -n3
00003d9: 4352 4541 5445 2044 4154 4142 4153 4520 CREATE DATABASE
00003e9: 2f2a 2133 3233 3132 2049 4620 4e4f 5420 /*!32312 IF NOT
00003f9: 4558 4953 5453 2a2f 2060 6d79 7371 6c60 EXISTS*/ mysql

right so now we need the end offset, as above we establish a search pattern as the schema data we're carving is in the midst of a larger file we can look for the start of the dump for the next schema.


dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | grep '--' -A5 | grep C -A2 -B2 | less
...
0083b19: 2043 7572 7265 6e74 2044 6174 6162 6173 Current Databas
0083b29: 653a 2060 7065 7263 6f6e 6160 0a2d 2d0a e: nextschema`.--.
...

I’ve piped into less here as there were many matches to the grep patterns.

From the above we can see a potential offset of 0x83b19 however we want to “backtrack” a few bytes to before the — comment start.


dbusby@kali:~$ xxd -s 0x83b14 yourdumpfile.sql | head -n1
0083b14: 2d2d 0a2d 2d20 4375 7272 656e 7420 4461 --.-- Current Da

Excellent we have our offsets starting at 0x3d9 ending at 0x83b14 we need to now convert base16 (hexidecimal) into base10 fortunatly we can do this usinc the bc utility very easily however we will need to fully expand and make upper case our offsets.


dbusby@kali:~$ echo 'ibase=16;00003D9' | bc
985
dbusby@kali:~$ echo 'ibase=16;0083B14' | bc
539412
dbusby@kali:~$ echo '539412-985' | bc
538427
dbusby@kali:~$ dd if=yourdumpfile.sql of=mysql.sql skip=985 bs=1 count=538427
538427+0 records in
538427+0 records out
538427 bytes (538 kB) copied, 1.08998 s, 494 kB/s

Let’s discuss this a little; what we have done here is convert our start offset to a base10 count of bytes to offset by when using dd (skip=985) we then convert the end offset to its base10 byte position, and by removing the startoffset base10 value this gives us the size of the chunk we are carving.

We now put this into a dd command line, and voila! we have a mysql.sql file which contains only the mysqldump data.

I hope this post helps somewhat to demystify file carving; the above techniques can be applied to any for of file carving need and is not limited only to mysql files.

The post File carving methods for the MySQL DBA appeared first on MySQL Performance Blog.

Categories: MySQL

Testing backup locks during Xtrabackup SST on Percona XtraDB Cluster

MySQL Performance Blog - Mon, 2014-12-22 13:00
Background on Backup Locks

I was very excited to see Backup locks support in release notes for the latest Percona XtraDB Cluster 5.6.21 release. For those who are not aware, backup locks offer an alternative to FLUSH TABLES WITH READ LOCK (FTWRL) in Xtrabackup. While Xtrabackup can hot-copy Innodb, everything else in MySQL must be locked (usually briefly) to get a consistent snapshot that lines up with Innodb. This includes all other storage engines, but also things like table schemas (even on Innodb) and async replication binary logs. You can skip this lock, but it isn’t generally considered a ‘safe’ backup in every case.

Until recently, Xtrabackup (like most other backup tools) used FTWRL to accomplish this. This worked great, but had the unfortunate side-effect of locking every single table, even the Innodb ones.  This functionally meant that even a hot-backup tool for Innodb had to take a (usually short) global lock to get a consistent backup with MySQL overall.

Backup locks change that by introducing a new locking command on Percona Server called ‘LOCK TABLES FOR BACKUP’.  This works by locking writes to non-transactional tables, as well as locking DDL on all tables (including Innodb).  If Xtrabackup (of a recent vintage) detects that it’s backing up a Percona Server (also of recent vintage), it will automatically use LOCK TABLES WITH BACKUP instead of FLUSH TABLES WITH READ LOCK.

The TL;DR of this is that you can keep on modifying your Innodb data through the entire backup, since we don’t need to use FTWRL any longer.

This feature was introduced in Percona Server 5.6.16-64.0 and Percona XtraBackup 2.2.  I do not believe you will find it in any other MySQL variant, though I could be corrected.

What this means for Percona XtraDB Cluster (PXC)

The most common (and logical) SST method for Percona XtraDB Cluster is using Xtrabackup. This latest release of PXC includes support for backup locks, meaning that Xtrabackup donor nodes will no longer need to get a global lock. Practically for PXC users, this means that your Donor nodes can stay in rotation without causing client interruptions due to FTWRL.

Seeing it in action

To test this out, I spun up a 3-node cluster on AWS and fired up a sysbench run on the first node. I forced and SST on the node. Here is a snippet of the innobackup.backup.log (generated by all Xtrabackup donors in Percona XtraDB Cluster):

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy and Percona LLC and/or its affiliates 2009-2013. All Rights Reserved. This software is published under the GNU GENERAL PUBLIC LICENSE Version 2, June 1991. Get the latest version of Percona XtraBackup, documentation, and help resources: http://www.percona.com/xb/p 141218 19:22:01 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtraback up;mysql_socket=/var/lib/mysql/mysql.sock' as 'sst' (using password: YES). 141218 19:22:01 innobackupex: Connected to MySQL server 141218 19:22:01 innobackupex: Starting the backup operation IMPORTANT: Please check that the backup run completes successfully. At the end of a successful backup run innobackupex prints "completed OK!". innobackupex: Using server version 5.6.21-70.1-56 innobackupex: Created backup directory /tmp/tmp.Rm0qA740U3 141218 19:22:01 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/etc/my.cnf" --defaults-group="mysqld" --backup --suspe nd-at-end --target-dir=/tmp/tmp.dM03LgPHFY --innodb_data_file_path="ibdata1:12M:autoextend" --tmpdir=/tmp/tmp.dM03LgPHFY --extra-lsndir='/tmp/tmp.dM 03LgPHFY' --stream=xbstream innobackupex: Waiting for ibbackup (pid=21892) to suspend innobackupex: Suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_suspended_2' xtrabackup version 2.2.7 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) xtrabackup: uses posix_fadvise(). xtrabackup: cd to /var/lib/mysql xtrabackup: open files limit requested 0, set to 5000 xtrabackup: using the following InnoDB configuration: xtrabackup: innodb_data_home_dir = ./ xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend xtrabackup: innodb_log_group_home_dir = ./ xtrabackup: innodb_log_files_in_group = 2 xtrabackup: innodb_log_file_size = 1073741824 xtrabackup: using O_DIRECT >> log scanned up to (10525811040) xtrabackup: Generating a list of tablespaces [01] Streaming ./ibdata1 >> log scanned up to (10529368594) >> log scanned up to (10532685942) >> log scanned up to (10536422820) >> log scanned up to (10539562039) >> log scanned up to (10543077110) [01] ...done [01] Streaming ./mysql/innodb_table_stats.ibd [01] ...done [01] Streaming ./mysql/innodb_index_stats.ibd [01] ...done [01] Streaming ./mysql/slave_relay_log_info.ibd [01] ...done [01] Streaming ./mysql/slave_master_info.ibd [01] ...done [01] Streaming ./mysql/slave_worker_info.ibd [01] ...done [01] Streaming ./sbtest/sbtest1.ibd >> log scanned up to (10546490256) >> log scanned up to (10550321726) >> log scanned up to (10553628936) >> log scanned up to (10555422053) [01] ...done ... [01] Streaming ./sbtest/sbtest17.ibd >> log scanned up to (10831343724) >> log scanned up to (10834063832) >> log scanned up to (10837100278) >> log scanned up to (10840243171) [01] ...done xtrabackup: Creating suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_suspended_2' with pid '21892' >> log scanned up to (10843312323) 141218 19:24:06 innobackupex: Continuing after ibbackup has suspended 141218 19:24:06 innobackupex: Executing LOCK TABLES FOR BACKUP... 141218 19:24:06 innobackupex: Backup tables lock acquired 141218 19:24:06 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql/' innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (74 files) >> log scanned up to (10846683627) >> log scanned up to (10847773504) innobackupex: Backing up files '/var/lib/mysql//sbtest/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (21 files) innobackupex: Backing up file '/var/lib/mysql//test/db.opt' innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (53 files) >> log scanned up to (10852976291) 141218 19:24:09 innobackupex: Finished backing up non-InnoDB tables and files 141218 19:24:09 innobackupex: Executing LOCK BINLOG FOR BACKUP... 141218 19:24:09 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... 141218 19:24:09 innobackupex: Waiting for log copying to finish >> log scanned up to (10856996124) xtrabackup: The latest check point (for incremental): '9936050111' xtrabackup: Stopping log copying thread. .>> log scanned up to (10856996124) xtrabackup: Creating suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_log_copied' with pid '21892' 141218 19:24:10 innobackupex: Executing UNLOCK BINLOG 141218 19:24:10 innobackupex: Executing UNLOCK TABLES 141218 19:24:10 innobackupex: All tables unlocked 141218 19:24:10 innobackupex: Waiting for ibbackup (pid=21892) to finish xtrabackup: Transaction log of lsn (9420426891) to (10856996124) was copied. innobackupex: Backup created in directory '/tmp/tmp.Rm0qA740U3' 141218 19:24:30 innobackupex: Connection to database server closed 141218 19:24:30 innobackupex: completed OK!

We can see the LOCK TABLES FOR BACKUP issued at 19:24:06 and unlocked at 19:24:10. Let’s see Galera apply stats from this node during that time:

mycluster / ip-10-228-128-220 (idx: 0) / Galera 3.8(rf6147dd) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms 19:23:55 P 5 3 Dono 698µs 0 72 0 5418 0.0 3.5M 0 0 187k 94 3k 3 2 0 19:23:56 P 5 3 Dono 701µs 0 58 0 5411 0.0 3.5M 0 0 188k 229 3k 3 2 0 19:23:57 P 5 3 Dono 701µs 0 2 0 5721 0.0 3.7M 0 0 188k 120 3k 3 2 0 19:23:58 P 5 3 Dono 689µs 0 5 0 5643 0.0 3.6M 0 0 188k 63 3k 3 2 0 19:23:59 P 5 3 Dono 679µs 0 55 0 5428 0.0 3.5M 0 0 188k 115 3k 3 2 0 19:24:01 P 5 3 Dono 681µs 0 1 0 4623 0.0 3.0M 0 0 188k 104 3k 3 2 0 19:24:02 P 5 3 Dono 690µs 0 0 0 4301 0.0 2.7M 0 0 188k 141 3k 3 2 0 19:24:03 P 5 3 Dono 688µs 0 2 0 4907 0.0 3.1M 0 0 188k 227 3k 3 2 0 19:24:04 P 5 3 Dono 692µs 0 44 0 4894 0.0 3.1M 0 0 188k 116 3k 3 2 0 19:24:05 P 5 3 Dono 706µs 0 0 0 5337 0.0 3.4M 0 0 188k 63 3k 3 2 0

Initially the node is keeping up ok with replication. The Down Queue (wsrep_local_recv_queue) is sticking around 0. We’re applying 4-5k transactions per second (Ops Dn). When the backup lock kicks in, we do see an increase in the queue size, but note that transactions are still applying on this node:

19:24:06 P 5 3 Dono 696µs 0 170 0 5671 0.0 3.6M 0 0 187k 130 3k 3 2 0 19:24:07 P 5 3 Dono 695µs 0 2626 0 3175 0.0 2.0M 0 0 185k 2193 3k 3 2 0 19:24:08 P 5 3 Dono 692µs 0 1248 0 6782 0.0 4.3M 0 0 186k 1800 3k 3 2 0 19:24:09 P 5 3 Dono 693µs 0 611 0 6111 0.0 3.9M 0 0 187k 651 3k 3 2 0 19:24:10 P 5 3 Dono 708µs 0 93 0 5316 0.0 3.4M 0 0 187k 139 3k 3 2 0

So this node isn’t locked from innodb write transactions, it’s just suffering a bit of IO load while the backup finishes copying its files and such. After this, the backup finished up and the node goes back to a Synced state pretty quickly:

19:24:11 P 5 3 Dono 720µs 0 1 0 4486 0.0 2.9M 0 0 188k 78 3k 3 2 0 19:24:12 P 5 3 Dono 715µs 0 0 0 3982 0.0 2.5M 0 0 188k 278 3k 3 2 0 19:24:13 P 5 3 Dono 1.2ms 0 0 0 4337 0.0 2.8M 0 0 188k 143 3k 3 2 0 19:24:14 P 5 3 Dono 1.2ms 0 1 0 4901 0.0 3.1M 0 0 188k 130 3k 3 2 0 19:24:16 P 5 3 Dono 1.1ms 0 0 0 5289 0.0 3.4M 0 0 188k 76 3k 3 2 0 19:24:17 P 5 3 Dono 1.1ms 0 42 0 4998 0.0 3.2M 0 0 188k 319 3k 3 2 0 19:24:18 P 5 3 Dono 1.1ms 0 15 0 3290 0.0 2.1M 0 0 188k 75 3k 3 2 0 19:24:19 P 5 3 Dono 1.1ms 0 0 0 4124 0.0 2.6M 0 0 188k 276 3k 3 2 0 19:24:20 P 5 3 Dono 1.1ms 0 4 0 1635 0.0 1.0M 0 0 188k 70 3k 3 2 0 19:24:21 P 5 3 Dono 1.1ms 0 0 0 5026 0.0 3.2M 0 0 188k 158 3k 3 2 0 19:24:22 P 5 3 Dono 1.1ms 0 20 0 4100 0.0 2.6M 0 0 188k 129 3k 3 2 0 19:24:23 P 5 3 Dono 1.1ms 0 0 0 5412 0.0 3.5M 0 0 188k 159 3k 3 2 0 19:24:24 P 5 3 Dono 1.1ms 0 315 0 4567 0.0 2.9M 0 0 187k 170 3k 3 2 0 19:24:25 P 5 3 Dono 1.0ms 0 24 0 5535 0.0 3.5M 0 0 188k 131 3k 3 2 0 19:24:26 P 5 3 Dono 1.0ms 0 0 0 5427 0.0 3.5M 0 0 188k 71 3k 3 2 0 19:24:27 P 5 3 Dono 1.0ms 0 1 0 5221 0.0 3.3M 0 0 188k 256 3k 3 2 0 19:24:28 P 5 3 Dono 1.0ms 0 0 0 5317 0.0 3.4M 0 0 188k 159 3k 3 2 0 19:24:29 P 5 3 Dono 1.0ms 0 1 0 5491 0.0 3.5M 0 0 188k 163 3k 3 2 0 19:24:30 P 5 3 Sync 1.0ms 0 0 0 5540 0.0 3.5M 0 0 188k 296 3k 3 2 0 19:24:31 P 5 3 Sync 992µs 0 106 0 5594 0.0 3.6M 0 0 187k 130 3k 3 2 0 19:24:33 P 5 3 Sync 984µs 0 19 0 5723 0.0 3.7M 0 0 188k 275 3k 3 2 0 19:24:34 P 5 3 Sync 976µs 0 0 0 5508 0.0 3.5M 0 0 188k 182 3k 3 2 0

Compared to Percona XtraDB Cluster 5.5

The Backup Locking is only a feature of Percona XtraDB Cluster 5.6, so if we repeat the experiment on 5.5, we can see a more severe lock:

141218 20:31:19 innobackupex: Executing FLUSH TABLES WITH READ LOCK... 141218 20:31:19 innobackupex: All tables locked and flushed to disk 141218 20:31:19 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql/' innobackupex: Backing up files '/var/lib/mysql//sbtest/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (21 files) innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (72 files) >> log scanned up to (6633554484) innobackupex: Backing up file '/var/lib/mysql//test/db.opt' innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (18 files) 141218 20:31:21 innobackupex: Finished backing up non-InnoDB tables and files 141218 20:31:21 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... 141218 20:31:21 innobackupex: Waiting for log copying to finish xtrabackup: The latest check point (for incremental): '5420681649' xtrabackup: Stopping log copying thread. .>> log scanned up to (6633560488) xtrabackup: Creating suspend file '/tmp/tmp.Cq5JRZEFki/xtrabackup_log_copied' with pid '23130' 141218 20:31:22 innobackupex: All tables unlocked

Our lock lasts from 20:31:19 until 20:31:21, so it’s fairly short. Note that with larger databases with more schemas and tables, this can be quite a bit longer. Let’s see the effect on the apply rate for this node:

mycluster / ip-10-229-68-156 (idx: 0) / Galera 2.11(r318911d) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms 20:31:13 P 5 3 Dono N/A 0 73 0 3493 0.0 1.8M 0 0 1.8m 832 746 2 2 0.0 20:31:14 P 5 3 Dono N/A 0 29 0 3578 0.0 1.9M 0 0 1.8m 850 749 3 2 0.0 20:31:15 P 5 3 Dono N/A 0 0 0 3513 0.0 1.8M 0 0 1.8m 735 743 2 2 0.0 20:31:16 P 5 3 Dono N/A 0 0 0 3651 0.0 1.9M 0 0 1.8m 827 748 2 2 0.0 20:31:17 P 5 3 Dono N/A 0 27 0 3642 0.0 1.9M 0 0 1.8m 840 762 2 2 0.0 20:31:18 P 5 3 Dono N/A 0 0 0 3840 0.0 2.0M 0 0 1.8m 563 776 2 2 0.0 20:31:19 P 5 3 Dono N/A 0 0 0 4368 0.0 2.3M 0 0 1.8m 823 745 2 1 0.0 20:31:20 P 5 3 Dono N/A 0 3952 0 339 0.0 0.2M 0 0 1.8m 678 751 1 1 0.0 20:31:21 P 5 3 Dono N/A 0 7883 0 0 0.0 0.0 0 0 1.8m 678 751 0 0 0.0 20:31:22 P 5 3 Dono N/A 0 4917 0 5947 0.0 3.1M 0 0 1.8m 6034 3k 7 6 0.0 20:31:24 P 5 3 Dono N/A 0 10 0 8238 0.0 4.3M 0 0 1.8m 991 1k 7 6 0.0 20:31:25 P 5 3 Dono N/A 0 0 0 3016 0.0 1.6M 0 0 1.8m 914 754 2 1 0.0 20:31:26 P 5 3 Dono N/A 0 0 0 3253 0.0 1.7M 0 0 1.8m 613 766 1 1 0.0 20:31:27 P 5 3 Dono N/A 0 1 0 3600 0.0 1.9M 0 0 1.8m 583 777 2 1 0.0 20:31:28 P 5 3 Dono N/A 0 0 0 3640 0.0 1.9M 0 0 1.8m 664 750 2 2 0.0

The drop here is more severe and the apply rate hits 0 (and stays there for the duration of the FTWRL).

Implications

Obviously Xtrabackup running on a PXC node will cause some load on the node itself, so there still maybe good reasons to keep a Donor node out of rotation from your application.  However, this is less of an issue than it was in the past, where writes would definitely stall on a Donor node and present potentially intermittent stalls on the application.

How you allow applications to start using a Donor node automatically (or not) depends on how you have your HA between the application and cluster setup.  If you use HAproxy or similar with clustercheck, you can either modify the script itself or change a command line argument. The node is in the Donor/Desynced state below:

[root@ip-10-229-64-35 ~]# /usr/bin/clustercheck clustercheckuser clustercheckpassword! HTTP/1.1 503 Service Unavailable Content-Type: text/plain Connection: close Content-Length: 44 Percona XtraDB Cluster Node is not synced. [root@ip-10-229-64-35 ~]# /usr/bin/clustercheck clustercheckuser clustercheckpassword! 1 HTTP/1.1 200 OK Content-Type: text/plain Connection: close Content-Length: 40 Percona XtraDB Cluster Node is synced.

For those doing their own custom health checking, you basically just need to pass nodes that have a wsrep_local_state_comment of either ‘Synced’ or ‘Donor/Desynced’.

The post Testing backup locks during Xtrabackup SST on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Categories: MySQL

Store UUID in an optimized way

MySQL Performance Blog - Fri, 2014-12-19 14:00

A few years ago Peter Zaitsev, in a post titled “To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID
  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66

Benchmarking

I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

    • Data Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Data Size in MB

      The data size for UUID table is more than other two tables.
    • Index Size
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Index Size in MB
    • Total Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Total Size in MB
    • Time taken
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Time Taken in seconds

For the table with UUID as PRIMARY KEY, you can notice that as the table grows big, the time taken to insert rows is increasing almost linearly. Whereas for other tables, the time taken is almost constant.

The size of UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time taken to insert rows and the size are almost same. But they may vary slightly based on the index structure.

root@localhost:~# ls -lhtr /media/data/test/ | grep ibd -rw-rw---- 1 mysql mysql  13G Jul 24 15:53 events_uuid_ordered.ibd -rw-rw---- 1 mysql mysql  20G Jul 25 02:27 events_uuid.ibd -rw-rw---- 1 mysql mysql  15G Jul 25 07:59 events_int.ibd

Table Structure

#1 events_int CREATE TABLE `events_int` (  `count` bigint(20) NOT NULL AUTO_INCREMENT,  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL,  `event` int(11) DEFAULT NULL,  `ref_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `campaign_id` binary(16) COLLATE utf8_unicode_ci DEFAULT '',  `unique_id` binary(16) COLLATE utf8_unicode_ci DEFAULT NULL,  `user_agent` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,  `city` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `country` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `demand_partner_id` binary(16) DEFAULT NULL,  `publisher_id` binary(16) DEFAULT NULL,  `site_id` binary(16) DEFAULT NULL,  `page_id` binary(16) DEFAULT NULL,  `action_at` datetime DEFAULT NULL,  `impression` smallint(6) DEFAULT NULL,  `click` smallint(6) DEFAULT NULL,  `sold_impression` smallint(6) DEFAULT NULL,  `price` decimal(15,7) DEFAULT '0.0000000',  `actioned_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',  `unique_ads` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `notification_url` text COLLATE utf8_unicode_ci,  PRIMARY KEY (`count`),  KEY `id` (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #2 events_uuid CREATE TABLE `events_uuid` (  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #3 events_uuid_ordered CREATE TABLE `events_uuid_ordered` (   `id` binary(16) NOT NULL,   `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),   KEY `index_events_on_actioned_at` (`actioned_at`),   KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Conclusions

 

    • Create function to rearrange UUID fields and use it

DELIMITER // CREATE DEFINER=`root`@`localhost` FUNCTION `ordered_uuid`(uuid BINARY(36)) RETURNS binary(16) DETERMINISTIC RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25))); // DELIMITER ;

Inserts

INSERT INTO events_uuid_ordered VALUES (ordered_uuid(uuid()),'1','M',....);

Selects

SELECT HEX(uuid),is_active,... FROM events_uuid_ordered ;

    • Define UUID as binary(16) as binary does not have any character set

 

References

 

The post Store UUID in an optimized way appeared first on MySQL Performance Blog.

Categories: MySQL

Making HAProxy 1.5 replication lag aware in MySQL

MySQL Performance Blog - Thu, 2014-12-18 15:48

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

HAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php = 10 && $lag < 60){ return "up 50%"; } else return "up 5%"; } set_time_limit(0); $socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr); if (!$socket) { echo "$errstr ($errno) n"; } else { while ($conn = stream_socket_accept($socket,9999999999999)) { $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '"; exec("$cmd",$lag); $weight = set_weight($lag[0]); unset($lag); fputs ($conn, $weight); fclose ($conn); } fclose($socket); } ?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. up 100% Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front bind *:3307 mode tcp option tcplog log global default_backend read_only-back backend read_only-back mode tcp balance leastconn server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # HAProxy $ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 25 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,50 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,150

Slave2 down

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: NULL # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,DOWN (agent),100 read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

InnoDB’s multi-versioning handling can be Achilles’ heel

MySQL Performance Blog - Wed, 2014-12-17 16:05

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: sbtest1 type: index possible_keys: NULL key: k_1 key_len: 4 ref: NULL rows: 953860873 Extra: Using index 1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to look absolutely bizarre:

[207850s] threads: 64, tps: 0.20, reads: 7.40, writes: 0.80, response time: 222481.28ms (95%), errors: 0.70, reconnects: 0.00 [207860s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207870s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207880s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207890s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207900s] threads: 64, tps: 2.70, reads: 47.60, writes: 11.60, response time: 268815.49ms (95%), errors: 0.00, reconnects: 0.00 [207910s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207920s] threads: 64, tps: 2.30, reads: 31.60, writes: 9.50, response time: 294954.28ms (95%), errors: 0.00, reconnects: 0.00 [207930s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207940s] threads: 64, tps: 2.90, reads: 42.00, writes: 12.20, response time: 309332.04ms (95%), errors: 0.00, reconnects: 0.00 [207950s] threads: 64, tps: 0.20, reads: 4.60, writes: 1.00, response time: 318922.41ms (95%), errors: 0.40, reconnects: 0.00 [207960s] threads: 64, tps: 0.20, reads: 1.90, writes: 0.50, response time: 335170.09ms (95%), errors: 0.00, reconnects: 0.00 [207970s] threads: 64, tps: 0.60, reads: 13.20, writes: 2.60, response time: 292842.88ms (95%), errors: 0.60, reconnects: 0.00 [207980s] threads: 64, tps: 2.60, reads: 37.60, writes: 10.20, response time: 351613.43ms (95%), errors: 0.00, reconnects: 0.00 [207990s] threads: 64, tps: 5.60, reads: 78.70, writes: 22.10, response time: 186407.21ms (95%), errors: 0.00, reconnects: 0.00 [208000s] threads: 64, tps: 8.10, reads: 120.20, writes: 32.60, response time: 99179.05ms (95%), errors: 0.00, reconnects: 0.00 [208010s] threads: 64, tps: 19.50, reads: 280.50, writes: 78.90, response time: 27559.69ms (95%), errors: 0.00, reconnects: 0.00 [208020s] threads: 64, tps: 50.70, reads: 691.28, writes: 200.70, response time: 5214.43ms (95%), errors: 0.00, reconnects: 0.00 [208030s] threads: 64, tps: 77.40, reads: 1099.72, writes: 311.31, response time: 2600.66ms (95%), errors: 0.00, reconnects: 0.00 [208040s] threads: 64, tps: 328.20, reads: 4595.40, writes: 1313.40, response time: 911.36ms (95%), errors: 0.00, reconnects: 0.00 [208050s] threads: 64, tps: 538.20, reads: 7531.90, writes: 2152.10, response time: 484.46ms (95%), errors: 0.00, reconnects: 0.00 [208060s] threads: 64, tps: 350.70, reads: 4913.45, writes: 1404.09, response time: 619.42ms (95%), errors: 0.00, reconnects: 0.00 [208070s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208080s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208090s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208100s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208110s] threads: 64, tps: 1.60, reads: 24.20, writes: 6.80, response time: 42385.40ms (95%), errors: 0.10, reconnects: 0.00 [208120s] threads: 64, tps: 0.80, reads: 28.20, writes: 3.40, response time: 51381.54ms (95%), errors: 2.80, reconnects: 0.00 [208130s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208140s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208150s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208160s] threads: 64, tps: 0.60, reads: 14.20, writes: 2.40, response time: 93248.04ms (95%), errors: 0.80, reconnects: 0.00

As you can see we have long stretches of times when there are no queries completed at all… going to some spikes of higher performance. This is how it looks on the graph:

Corresponding CPU usage:

This shows what we are not only observing something we would expect with InnoDB design but also there seems to be some serve starvation happening in this case which we can confirm:

Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790809552640 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790804735744 has waited at row0sel.cc line 3506 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790810756864 has waited at row0sel.cc line 4125 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790811158272 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive

Waiting for the given buffer pool block to become available for more than 3 minutes is a big issue – this lock should never be held by more than a few microseconds.

SHOW PROCESSLIST confirms even most basic selects by primary key can get stalled for long time

| 5499 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5500 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5501 | root | localhost | sbtest | Query | 185 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5502 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5503 | root | localhost | sbtest | Query | 14 | statistics | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 ORDER BY c | 0 | 0 | | 5504 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5505 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=1 | 0 | 0 | | 5506 | root | localhost | sbtest | Query | 236 | updating | DELETE FROM sbtest1 WHERE id=1 | 0 | 0 | | 5507 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5508 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 | 0 | 0 | | 5509 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5510 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET c='80873149502-45132831358-41942500598-17481512835-07042367094-39557981480-593123 | 0 | 0 | | 5511 | root | localhost | sbtest | Query | 236 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=18 | 0 | 1 | | 5512 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=7 | 0 | 0 | | 6009 | root | localhost | sbtest | Query | 195527 | Sending data | select avg(k) from sbtest1 | 0 | 0 |

How do I know it is UNDO space related issue in this case? Because it ends up taking majority of buffer pool

mysql> select page_type,count(*) from INNODB_BUFFER_PAGE group by page_type; +-------------------+----------+ | page_type | count(*) | +-------------------+----------+ | EXTENT_DESCRIPTOR | 1 | | FILE_SPACE_HEADER | 1 | | IBUF_BITMAP | 559 | | IBUF_INDEX | 855 | | INDEX | 2186 | | INODE | 1 | | SYSTEM | 128 | | UNDO_LOG | 382969 | | UNKNOWN | 6508 | +-------------------+----------+ 9 rows in set (1.04 sec)

And it does so in a very crazy way – when there is almost no work being done UNDO_LOG contents of the buffer pool is growing very rapidly while when we’re getting some work done the INDEX type pages take a lot more space. To me this seems like as the index scan is going it touches some hot rows and some not-so-hot ones, containing less row versions and so does not put much pressure on “undo space.”

Take Away: Now you might argue that this given workload and situation is rather artificial and rather narrow. It well might be. My main point here is what if you’re looking at just part of your workload, such as your main short application queries, and not taking reporting or backups into account “because their performance is not important.” In this case you might be in for a big surprise. Those background activities might be taking much more than you would expect, and in addition, they might have much more of a severe impact to your main application workload, like in this case above.

P.S: I’ve done more experiments to validate how bad the problem really is and I can repeat it rather easily even without putting system into overdrive. Even if I run sysbench injecting just 25% of the transactions the system is possibly capable of handling at peak I have “select avg(k) from sbtest1″ query on 1 billion row table to never complete as it looks like the new entries are injected into the index at this point faster than Innodb can examine which of them are visible.

The post InnoDB’s multi-versioning handling can be Achilles’ heel appeared first on MySQL Performance Blog.

Categories: MySQL

OpenStack Live tutorials & sessions to bring OpenStack users up to speed

MySQL Performance Blog - Tue, 2014-12-16 17:28

I attended the OpenStack Paris summit last month (Percona had a booth there). It was my first opportunity to meet face-to-face with this thriving community of developers and users. I’m proud that Percona is part of this open source family and look forward to reconnecting with many of the developers and users I met in Paris – as well as meeting new faces – at OpenStack Live in Silicon Valley April 13-14.

OpenStack summits, generally held twice a year, are the place where (for the most part) developers meet and design “in the open,” as the OpenStack organization says. OpenStack Live 2015, held in parallel with the annual Percona Live MySQL Conference and Expo, will be a unique opportunity for users and enthusiasts to learn from leading OpenStack experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

OpenStack Live will also provide some serious classroom-style learning. Percona announced the OpenStack Live tutorials sessions a couple days ago. Most sessions are three hours long and because they really are “hands-on” require that you bring your laptop – and a power cord (not to be confused with a “power chord,” though those also welcome”).

Let’s take a closer look at the OpenStack Live tutorial sessions.

Barbican: Securing Your Secrets.” Join Rackspace gurus Douglas Mendizábal, Chelsea Winfree and Steve Heyman on a tour through the magical world of Barbican (yes, they are dedicated members of the Barbican project).

Don’t be intimidated if don’t have any previous experience with Barbican (and if you’ve never heard of it, more the reason to attend!). A basic understanding of security components (such as keys and certificates) and a basic understanding of ReST is helpful, but not required.

By the end of the class you will know:
1)   Importance of secret storage
2)   How to store & retrieve secrets with Barbican
3)   How to submit an order with Barbican
4)   How to create a container
5)   Use cases for Barbican / Examples
6)   The future of Barbican –Ordering SSL Certs

Deploying, Configuring and Operating OpenStack Trove.” As the title suggests, these three hours focus squarely on Trove. The tutorial – led by Tesora founder & CTO Amrith Kumar, along with Doug Shelley, the company’s vice president of product development – will begin with a quick overview of OpenStack and the various services.

If you attend this tutorial you’ll actually deploy your own OpenStack environment – and create and manage a Nova (compute) instance using a command line and a graphical user interface (Horizon). And the fun continues! You’ll then install and configure Trove, and create and manage a single MySQL instance. Finally, pupils will create and operate a simple replicated MySQL instance pair and ensure that data is being properly replicated from master to slave.

Essential DevStack.” DevStack is an opinionated script to quickly create an OpenStack development environment. It can also be used to demonstrate starting/running OpenStack services and provide examples of using them from a command line. The power of DevStack lies within small trick that if people understand can hugely improve the contribution effectiveness, quality and required time. This three-hour tutorial will be led by Red Hat senior software engineer Swapnil Kulkarni.

OpenStack Networking Introduction,” with PLUMgrid’s Valentina Alaria and Brendan Howes. Buckle your seat belts! Designed for IT professionals looking to expand their OpenStack “networking” (no, not the LinkedIn sort of networking) knowledge, OpenStack Networking Fundamentals will be a comprehensive and fast-paced course.

This half-day training provides an overview of OpenStack, its components and then dives deep into OpenStack Networking – the features and plugin model and its role in building an OpenStack Cloud. The training is concluded with a hands-on lab to bring all the concepts together.

OpenStack Networking (Neutron) Introduction [1 hour]
– Goal of Neutron
– Architecture of Neutron
– Plugin Architecture
– Use cases for Neutron
– What’s new in Juno & what’s planned for Kilo

OpenStack Networking (Neutron) Advanced [1 hour]
– Interaction with other OpenStack components (Compute & Storage)
– Designing Neutron for HA
– Installing Neutron
– Troubleshooting Neutron

Hands-on Lab [1 hour]
– Creation of tenant networks
– Configuration of external connectivity
– Advanced Features Configurati

Percona’s director of conferences, Kortney Runyan, offered a sneak peek at the OpenStack sessions last week. Attendees of the Percona Live MySQL Conference and Expo 2015 (April 13-16, 2015) with full-pass access are also welcome to attend OpenStack Live sessions. The two conferences are running in parallel, which is very exciting since there will be crossover opportunities between them.

I hope to see you next April! And be sure to take advantage of Early Bird pricing for OpenStack Live (register here).
And if you are an organizer of an OpenStack (or MySQL) Meetup and need some financial help, Percona is happy to chip in as a sponsor. Just let me know and I’ll work with you to set that up! You can drop me a note in the comments and I’ll contact you offline.

The post OpenStack Live tutorials & sessions to bring OpenStack users up to speed appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL Tutorials: A time to learn at Percona Live 2015

MySQL Performance Blog - Mon, 2014-12-15 08:00

The many hours of intensive tutorials, led by some of the top minds in MySQL, have always been a major draw each year to the Percona Live MySQL Conference and Expo. And 2015’s event will be no exception.

Percona Live 2015 runs April 13-16 in Santa Clara, Calif. and the first day is dedicated to the classroom – so bring your laptops for the combined 45 hours of learning. MySQL tutorials are included with the full-conference pass but a “tutorial-only pass” is also available. Super-saver registration discounts have been extended until Dec. 19. Here’s a look at this year’s tutorials lineup. (There will be a couple more announced in January).

And that’s just on Monday! There will be much more over the four days of the Percona Live MySQL Conference and Expo 2015. I posted a sneak peek of the full Percona Live (initial) roster a couple weeks ago. And remember, super-saver registration discounts have been extended until Dec. 19 so register now – and don’t forgot your laptop (and power cord)!

The post MySQL Tutorials: A time to learn at Percona Live 2015 appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraBackup 2.2.7 is now available

MySQL Performance Blog - Wed, 2014-12-10 15:03

Percona is glad to announce the release of Percona XtraBackup 2.2.7 on December 10, 2014. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

Bugs Fixed:

  • Non-default value for innodb_log_block_size variable would cause assertion when preparing the backup. Bug fixed #1391216.
  • When Percona XtraBackup would run FLUSH ENGINE LOGS during the backup process on GTID master, command was recorded to the slave’s binary log as well, which lead to inconsistency between master and slave. Fixed by adding the NO_WRITE_TO_BINLOG clause to FLUSH ENGINE LOGS to avoid interfering with binary log and inconsistency with coordinates. Bug fixed #1394632.
  • Exporting tables was inefficient when backup contained a large (and unrelated) change buffer. Bug fixed #1366065 (Davi Arnaut).
  • innobackupex was printing the GTID even if the GTID mode was disabled which could cause confusion since it wasn’t incrementing. Now it prints only GTID when GITD mode is enabled and when GTID mode is disabled it prints only filename and position. innobackupex still prints GTID, filename and positions if MariaDB server is being backed up. Bug fixed #1391041.

Other bugs fixed: bug #1386157.

Release notes with all the bugfixes for Percona XtraBackup 2.2.7 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.2.7 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Recover MySQL root password without restarting MySQL (no downtime!)

MySQL Performance Blog - Wed, 2014-12-10 08:00

Disclaimer: Do this at your own risk! It doesn’t apply if you’re using Pluggable authentication and certainly won’t be usable if/when MySQL system tables are stored on InnoDB

Recover your root password with care!

What is the situation?

The situation is the classic “need to recover MySQL root password” but you cannot restart MySQL (because it is the master production server, or any other reason), which makes the –skip-grant-tables solution as a no-no possibility.

 What can I do?

There is a workaround, which is the following:

  •  Launch another instance of mysqld, a small one (without innodb).
  •  Copy your user.[frm|MYD|MYI] files from the original datadir to the datadir of the new instance.
  • Modify them and then copy them back to the original location.

That simple? No, but close. Here is the step by step:

Step by step recovery
  1. Create a new datadir and run mysql_install_db for the new datadir. This one will be removed at the end. Don’t forget to change ownership to mysql user and group:
    [root@machina dbdata]# mkdir datadir [root@machina dbdata]# chown -R mysql:mysql datadir/ [root@machina dbdata]# mysql_install_db --datadir=/dbdata/datadir/ --user=mysql Installing MySQL system tables...OK Filling help tables...OK
  2. Launch the new instance. Be careful with the datadir path, the socket file and the port number. Also, disable InnoDB, you won’t need it, just add –skip-innodb AND –default-storage-engine=myisam:
    [root@machina datadir]# /usr/sbin/mysqld --basedir=/usr --datadir=/dbdata/datadir --plugin-dir=/usr/lib/mysql/plugin --skip-innodb --default-storage-engine=myisam --socket=/var/run/mysqld/mysql2.sock --port=3307 --user=mysql --log-error=/dblogs/log/error2.log --pid-file=/dbdata/data/mysql.pid &
  3. Copy the user.* files from the original mysql instance (the ones that you need to modify) to the new instance’s datadir and login to this instance of mysql:
    [root@machina ~]# cp /dbdata/data/mysql/user.* /dbdata/datadir/mysql/cp: overwrite `/dbdata/datadir/mysql/user.frm'? y cp: overwrite `/dbdata/datadir/mysql/user.MYD'? y cp: overwrite `/dbdata/datadir/mysql/user.MYI'? y [root@machina datadir]# mysql --socket=/var/run/mysqld/mysql2.sock -p Enter password: Welcome to the MySQL monitor.  Commands end with ; or g.
  4. Execute a “flush tables” command, so the user table will be “reopened” and you can see the data and verify:
    mysql2> flush tables; mysql2> select user, host, password from user where user like 'root'; +------+--------------------------------------+------------------------------------------+ | user | host                                 | password                                 | +------+--------------------------------------+------------------------------------------+ | root | localhost                            | 696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | 696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | 696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+------------------------------------------+ 3 rows in set (0.00 sec)
  5. Now, update the password field with the desired value:
    mysql2> update mysql.user set password='*696D727429CC43695423FA5F2F0155D92A0AAC08' where user like 'root'; Query OK, 3 rows affected (0.00 sec) Rows matched: 3  Changed: 3  Warnings: 0
  6. Verify again:
    mysql2> select user, host, password from user where user like 'root'; +------+--------------------------------------+-------------------------------------------+ | user | host                                 | password                                  | +------+--------------------------------------+-------------------------------------------+ | root | localhost                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | *696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec)
  7. Flush privileges and verify that the new password is correct, by logging in again:
    mysql2> flush privileges; Query OK, 0 rows affected (0.00 sec)
  8. Now that we have made the changes, we can move back the user.* files to the original location, being extremely careful with owner and privileges:
    [root@machina ~]# cd /dbdata/datadir/mysql/ [root@machina mysql]# cp user.* /dbdata/data/mysql/; chown mysql:mysql /dbdata/data/mysql/user.*; chmod 660 /dbdata/data/mysql/user.* cp: overwrite `/dbdata/data/mysql/user.frm'? y cp: overwrite `/dbdata/data/mysql/user.MYD'? y cp: overwrite `/dbdata/data/mysql/user.MYI'? y
  9. At this moment, you can shutdown the new mysql instance since is no longer needed. Be very very careful so you don’t end up shutting down your original mysqld!:
    [root@machina datadir]# mysqladmin --socket=/var/run/mysqld/mysql2.sock -p shutdown Enter password: 141120 06:59:14 mysqld_safe mysqld from pid file /dbdata/data/mysql.pid ended
  10. Now, the last step is to execute a “FLUSH PRIVILEGES” in the original mysqld. Since we cannot yet access it, we need to send a SIGHUP signal to mysqld. MySQL responds to this signal by reloading the grant tables and flushing tables, logs, the thread cache, and the host cache, so choose wisely the moment of the day when you want to send the SIGHUP since the performance might be degraded (look at “flush tables” ).The way to send SIGHUP is to execute “kill” command with the -1 flag:
    [root@machina datadir]# kill -1 $(/sbin/pidof mysqld)
  11. Finally, login into MySQL as root!:
    [root@machina datadir]# mysql -p Enter password: Welcome to the MySQL monitor.  Commands end with ; or g. Your MySQL connection id is 101208 mysql1> select user, host, password from mysql.user where user like 'root'; +------+--------------------------------------+-------------------------------------------+ | user | host                                 | password                                  | +------+--------------------------------------+-------------------------------------------+ | root | localhost                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | *696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec)
    You can see your schemas? of course you can! your databases are okay!
    mysql1> show databases; +--------------------+ | Database           | +--------------------+ | information_schema | | mysql              | | percona            | | testing            | +--------------------+ 4 rows in set (0.03 sec)

We’ve successfully recovered the MySQL root password without the need to restart MySQL and thus avoid downtime.

I hope you never face this situation, but in case you do, there’s a workaround to recover your access! Is there another way to perform this?

Share it with the world!

The post Recover MySQL root password without restarting MySQL (no downtime!) appeared first on MySQL Performance Blog.

Categories: MySQL

OpenStack Live 2015: Sneak peak of the April conference

MySQL Performance Blog - Tue, 2014-12-09 17:00

On behalf of the OpenStack Live Conference Committee, I am excited to announce the sneak peek schedule for the inaugural OpenStack Live 2015 Conference! This new annual conference, running in parallel with the already established Percona Live MySQL Conference and Expo, will feature one day of tutorials followed by a full day of breakout sessions April 13-14, in Santa Clara, Calif.

Though the entire conference schedule won’t be finalized until early January, this initial list of talks is sure to spark interest! So without further ado, here is he OpenStack Live 2015 SNEAK PEEK SCHEDULE!

Deploying an OpenStack Cloud at Scale at Time Warner Cable
-Matthew Fischer, Principal Software Engineer for OpenStack DevOps at Time Warner Cable, and Clayton O’Neill, Principal Software Engineer for OpenStack Cloud at Time Warner Cable

An Introduction to Database as a Service with an Emphasis on OpenStack Using Trove
-Amrith Kumar, Founder and CTO of Tesora, Inc., and Tushar Katarki, Director of Product Management at Percona

Lightweight OpenStack Benchmarking Service with Rally and Docker
-Swapnil Kulkarni, Senior Software Engineer at Red Hat

MySQL and OpenStack Deep Dive
-Peter Boros, Principal Architect at Percona

This is just a small taste of what will be presented at OpenStack Live 2015 conference this spring. Take advantage of this unique opportunity to hear from leading experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

As a special bonus, OpenStack Live attendees attendees will also have access to the Percona Live MySQL Conference & Expo keynotes, receptions, exhibition hall, and Birds of a Feather sessions on April 13 and 14, allowing them to dive deeper into MySQL topics such as high availability, security, performance optimization, and much more.

Registration for OpenStack Live 2015 is now open… register now with Early Bird pricing! Hope to see you there!

The post OpenStack Live 2015: Sneak peak of the April conference appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL 5.6 Transportable Tablespaces best practices

MySQL Performance Blog - Tue, 2014-12-09 08:00

In MySQL 5.6 Oracle introduced a Transportable Tablespace feature (copying tablespaces to another server) and Percona Server adopted it for partial backups which means you can now take individual database or table backups and your destination server can be a vanilla MySQL server. Moreover, since Percona Server 5.6, innodb_import_table_from_xtrabackup is obsolete as Percona Server also implemented Oracle MySQL’s transportable tablespaces feature which as I mentioned gives you the ability to copy tablespace (table.ibd) between servers. Let me demonstrate this through one example where I am going to take partial backup of selective tables instead of an entire MySQL server and restore it on a running MySQL server on destination without taking it offline.

MySQL 5.6 Transportable Tablespaces with Percona XtraBackup
Percona XtraBackup is open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. You can backup up your databases without sacrificing read/write ability and, on top of that, Percona XtraBackup supports partial backup schemas that correspond to backup-only specific databases or tables instead of taking backups of the entire database server.

For partial backups, your source server from where you taking the backup must have the innodb_file_per_table option enabled and the importing server should have innodb_file_per_table and innodb_expand_import enabled  – or innodb_import_table_from_xtrabackup (only supported for Percona Server) depends on Percona Server version for the the last option for restoring the database tables. This is all valid till Percona Server version 5.5 and you can find further details about partial backups here. Percona CTO Vadim Tkachenko wrote nice post on it about how to copy InnoDB tables between servers on Percona Server prior to version 5.6.

I am going to use Percona Server 5.6 as it uses the feature of Transportable Tablespace. There are two tables under database irfan named as “test” and “dummy”. I am going to take backup of only test table as partial backup instead taking backup of entire database server.

mysql> show tables; +----------------+ | Tables_in_irfan| +----------------+ | dummy | | test | +----------------+ mysql> show create table test; +-------+--------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `id` int(11) DEFAULT NULL, `name` char(15) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+--------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT COUNT(*) FROM test; +----------+ | COUNT(*) | +----------+ | 1000 | +----------+ 1 row in set (0.00 sec)

I am going to use latest version of Percona XtraBackup which supports multiple ways to take partial backups that are –include option, –tables-file option and –databases option. I am going to use –tables-file option to take backup of specific database table.

irfan@source$ xtrabackup --version xtrabackup version 2.2.5 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) irfan@source$ mysql --skip-column-names -e "SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema IN ('irfan') AND TABLE_NAME = 'test';" > /root/tables_to_backup.txt irfan@source$ cat tables_to_backup.txt irfan.test

Now as below you need to take backup of irfan.test database table. Note, how –tables-file option passed to backup only irfan.test database table which takes backup of only specified database table in tables_to_backup.txt file.

irfan@source$ innobackupex --no-timestamp --tables-file=/root/tables_to_backup.txt /root/partial_backup/ > /root/xtrabackup.log 2>&1 irfan@source$ cat /root/xtrabackup.log . . >> log scanned up to (2453801809) >> log scanned up to (2453801809) [01] ...done [01] Copying ./irfan/test.ibd to /root/partial_backup/irfan/test.ibd [01] ...done xtrabackup: Creating suspend file '/root/partial_backup/xtrabackup_suspended_2' with pid '14442' . . 141101 12:37:27 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql' innobackupex: Backing up file '/var/lib/mysql/irfan/test.frm' 141101 12:37:27 innobackupex: Finished backing up non-InnoDB tables and files 141101 12:37:27 innobackupex: Executing LOCK BINLOG FOR BACKUP... 141101 12:37:27 innobackupex: Executing FLUSH ENGINE LOGS... 141101 12:37:27 innobackupex: Waiting for log copying to finish . . xtrabackup: Creating suspend file '/root/partial_backup/xtrabackup_log_copied' with pid '14442' xtrabackup: Transaction log of lsn (2453801809) to (2453801809) was copied. 141101 12:37:28 innobackupex: Executing UNLOCK BINLOG 141101 12:37:28 innobackupex: Executing UNLOCK TABLES 141101 12:37:28 innobackupex: All tables unlocked . . 141101 12:37:28 innobackupex: completed OK!

Successful backup with show you “completed OK” at the end of the backup. If you scripted the backup for automation you can also check backup status to see whether it succeeded or failed by checking exit status of the backup script.

For the next step, we need to prepare the backup because there might be uncomitted transactions that needs to rollback or transactions in the log to be replayed to backup. You need to mention –export option specifically to prepare backup in order to create table.exp and table.cfg files (prior to MySQL/PS 5.6). You can read more on it in documentation. You can prepare the backup as below.

irfan@source$ innobackupex --apply-log --export /root/partial_backup/ > /root/xtrabackup-prepare.log 2>&1 irfan@source$ cat /root/xtrabackup-prepare.log . . xtrabackup version 2.2.5 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) xtrabackup: auto-enabling --innodb-file-per-table due to the --export option xtrabackup: cd to /root/partial_backup xtrabackup: This target seems to be already prepared. . . xtrabackup: export option is specified. xtrabackup: export metadata of table 'irfan/test' to file `./irfan/test.exp` (1 indexes) xtrabackup: name=GEN_CLUST_INDEX, id.low=5043, page=3 . . xtrabackup: starting shutdown with innodb_fast_shutdown = 0 InnoDB: FTS optimize thread exiting. InnoDB: Starting shutdown... InnoDB: Shutdown completed; log sequence number 2453817454 141102 11:25:13 innobackupex: completed OK!

Again a successfully prepared backup should show you “completed OK” at end. Once the backup is prepared it means it’s usable and ready to restore. For that first create the same table structure on destination as source.

mysql [localhost] {msandbox} (irfan) > CREATE TABLE `test` ( -> `id` int(11) DEFAULT NULL, -> `name` char(15) DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.12 sec) mysql [localhost] {msandbox} (irfan) > show tables; +----------------+ | Tables_in_irfan | +----------------+ | test | +----------------+ 1 row in set (0.00 sec) irfan@destination$ ls -la data/irfan/ total 116 drwx------ 2 root root 4096 Nov 2 16:29 . drwx------ 6 root root 4096 Nov 2 16:23 .. -rw-rw---- 1 root root 8586 Nov 2 16:29 test.frm -rw-rw---- 1 root root 98304 Nov 2 16:29 test.ibd

Now discard the existing tablespace and copy the test.ibd and test.cfg which Percona XtraBackup produced after preparing the backup and import tablespace back from source to destination as illustrated below. As a side note, you may need to change the ownership of test.cfg file and test.ibd file to mysql in order to make it accessible from MySQL server.

mysql [localhost] {msandbox} (irfan) > ALTER TABLE test DISCARD TABLESPACE; Query OK, 0 rows affected (0.10 sec) irfan@source$ cd /root/partial_backup/irfan/ irfan@source$ scp test.cfg test.ibd root@destination:/root/sandboxes/msb_PS-5_6_21/data/irfan/ mysql [localhost] {msandbox} (irfan) > ALTER TABLE test IMPORT TABLESPACE; Query OK, 0 rows affected (0.10 sec) irfan@destination$ tail -30 data/msandbox.err 2014-11-02 16:32:53 2037 [Note] InnoDB: Importing tablespace for table 'irfan/test' that was exported from host 'Hostname unknown' 2014-11-02 16:32:53 2037 [Note] InnoDB: Phase I - Update all pages 2014-11-02 16:32:53 2037 [Note] InnoDB: Sync to disk 2014-11-02 16:32:53 2037 [Note] InnoDB: Sync to disk - done! 2014-11-02 16:32:53 2037 [Note] InnoDB: Phase III - Flush changes to disk 2014-11-02 16:32:53 2037 [Note] InnoDB: Phase IV - Flush complete mysql [localhost] {msandbox} (irfan) > SELECt COUNT(*) FROM test; +----------+ | COUNT(*) | +----------+ | 1000 | +----------+

NOTE:  The .cfg file contains the InnoDB dictionary dump in special(binary) format for corresponding table.The .cfg file is not required to import a tablespace to MySQL 5.6 or Percona Server 5.6. A tablespace can be imported successfully even if it is from another server, but innodb will do schema validation if the corresponding .cfg file is present in the same directory.

Now, as you can see from the error log, that table is imported successfully on test database and changes to innodb tablespace completed correctly. My colleague Miguel Angel Nieto wrote a related post on this titled, “How to recover a single InnoDB table from a Full Backup.”

There is another method to copy a table from a running MySQL instance to another running MySQL server which is described in the MySQL manual. For completeness let me describe to you the procedure quickly.

MySQL 5.6 Transportable Tablespaces with FLUSH TABLES FOR EXPORT
On source server, I have table named “dummy” which i will copy to destination server.

mysql [localhost] {msandbox} (irfan) > show tables; +----------------+ | Tables_in_test | +----------------+ | dummy | | test | +----------------+ mysql [localhost] {msandbox} (irfan) > SELECT COUNT(*) FROM dummy; +----------+ | COUNT(*) | +----------+ | 100 | +----------+

First, create the same table structure on destination server.

mysql [localhost] {msandbox} (irfan) > CREATE TABLE `dummy` ( `id` int(11) DEFAULT NULL, `dummy` varchar(10) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.07 sec)

On the destination server, discard the existing tablespace before importing tablespace from source server.

mysql [localhost] {msandbox} (irfan) > ALTER TABLE dummy DISCARD TABLESPACE; Query OK, 0 rows affected (0.00 sec)

Run FLUSH TABLES FOR EXPORT on the source server to ensure all table changes flushed to the disk. It will block write transactions to the named table while only allowing read-only operations. This can be a problem on high-write workload systems as a table can be blocked for a longer period of time if your table is huge in size. While making backups with Percona XtraBackup you can manage this in a non-blocking way and it will also save binary log coordinates that can be useful in replication and disaster recovery scenario.

FLUSH TABLES FOR EXPORT is only applicable to Oracle MySQL 5.6/Percona Server 5.6 FLUSH TABLES FOR EXPORT and will produce table metadata file .cfg and tablespace file .ibd. Make sure you copy both table.cfg and table.ibd files before releasing lock. It’s worth mentioning that you shouldn’t logout from the mysql server before copying table cfg and table.ibd file otherwise it will release the lock. After copying table metadata file (.cfg) and tablespace (.ibd) file release the lock on source server.

# Don't terminate session after acquiring lock otherwise lock will be released. mysql [localhost] {msandbox} (irfan) > FLUSH TABLES dummy FOR EXPORT; Query OK, 0 rows affected (0.00 sec) # open another terminal Need another session to copy table files. irfan@source$ scp dummy.cfg dummy.ibd root@destination-server:/var/lib/mysql/irfan/ # Go back to first session. mysql [localhost] {msandbox} (irfan) > UNLOCK TABLES;

On destination server import the tablespace and verify from mysql error log as below.

mysql [localhost] {msandbox} (test) > ALTER TABLE dummy IMPORT TABLESPACE; Query OK, 0 rows affected (0.04 sec) $ tail -f data/msandbox.err 2014-11-04 13:12:13 2061 [Note] InnoDB: Phase I - Update all pages 2014-11-04 13:12:13 2061 [Note] InnoDB: Sync to disk 2014-11-04 13:12:13 2061 [Note] InnoDB: Sync to disk - done! 2014-11-04 13:12:13 2061 [Note] InnoDB: Phase III - Flush changes to disk 2014-11-04 13:12:13 2061 [Note] InnoDB: Phase IV - Flush complete mysql [localhost] {msandbox} (irfan) > SELECT COUNT(*) FROM dummy; +----------+ | COUNT(*) | +----------+ | 100 | +----------+

Take into account that there are some LIMITATIONS when copying tablespace between servers which are briefly outlined in the MySQL manual

Conclusion:
Transportable Tablespace is nice feature introduced in MySQL 5.6 and I described the use cases for that in this post. Prior to MySQL 5.6, you could backup/restore specific database tables via Percona XtraBackup’s partial backup feature (destination server should be Percona Server for this case).  Comments are welcome

The post MySQL 5.6 Transportable Tablespaces best practices appeared first on MySQL Performance Blog.

Categories: MySQL

What happens when your application cannot open yet another connection to MySQL

MySQL Performance Blog - Mon, 2014-12-08 15:57

Have you ever experienced a situation where one moment you can connect to the MySQL database and the next moment  you cannot, only to be able to connect again a second later? As you may know one cannot open infinite connections with MySQL. There’s a practical limit and more often than not it is imposed by the underlying operating system. If you’re getting:

ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.10' (99)

…there’s a good chance you’re hitting such limit. What might be misleading in the information above is whom (which side) is preventing the connection from being established.

Understanding the problem at hand

Whenever a client uses the network to connect to a service running on a given port of a server this connection is established through the creation of a socket:

A socket is a tuple of 4 terms: the source and destination IPs and ports.

The destination port is obviously the one where the service is running on the server. For instance usually port 22 for SSH and port 3306 for MySQL. The source port is an arbitrary local network port on the client side, which should show in tools like netstat and ss:

Note: I’ve used netstat and ss alternately in this post but if you read the man page for netstat these days you’ll see a note that says: “This program is obsolete. Replacement for netstat is ss”. I was advised to give preference to ss over netstat on a busy server: besides being faster and providing a more rich set of information, ss puts less stress on the server as it talks directly to the kernel while what netstat does is to scan /dev.

$ ss -na dport = :3306 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.0.100:48681 192.168.0.10:3306

The example above shows that server 192.168.0.100 has a TCP connection established with MySQL (3306) running on server 192.168.0.10 through local port 48681. The range of local ports that can be used for TCP and UDP traffic is found in /proc/sys/net/ipv4/ip_local_port_range:

$ sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 32768 61000

Those values from above are the default ones found in many Linux distributions: 32768 denotes the first local port that can be used and 61000 indicates the last one, for a total of 28233 available ports. It may look like a lot but it truly depends on the nature of the local applications connecting to services in other servers through the network.

When it comes to connecting to database servers, many applications chose to open a new connection for a single request only, closing it right after the request is processed. Even though the connection is closed by the client (application) the local port it was using is not immediately released by the OS to be reused by another connection: it will sit in a TIME_WAIT state for (usually) 60 seconds – this value cannot be easily changed as it is hard coded in the kernel:

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT                   * state, about 60 seconds    */

And contrary to what you may have heard or read, tunning /proc/sys/net/ipv4/tcp_fin_timeout is of no use here as it rules a different type of timeout and has no impact in releasing connections hanging in TIME_WAIT state. To better understand the role played by such state I suggest you read Vincent Bernat’s post, from which I reproduce:

There are two purposes for the TIME-WAIT state:

The most known one is to prevent delayed segments from one connection being accepted by a later connection relying on the same quadruplet (source address, source port, destination address, destination port) (…)

The other purpose is to ensure the remote end has closed the connection. When the last ACK is lost, the remote end stays in the LAST-ACK state. Without the TIME-WAIT state, a connection could be reopened while the remote end still thinks the previous connection is valid. (…)

The problem with this situation is that if you keep accumulating connections in TIME_WAIT state you’ll quickly saturate the available local ports. And if all ports are taken then any attempt for a new connection will result in an error similar to the one from above.

Reproducing the problem

It’s easy to verify this scenario, it suffices to decrease the local IP port range from the computer that’ll be starting the connections (usually the application server) to, for example, only half a dozen ports:

$ echo 61001 61006 > /proc/sys/net/ipv4/ip_local_port_range

Then we proceed with opening 6 connections from the application server (192.168.0.100) to the database server (192.168.0.10):

$ ss -na dport = :3306 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.0.100:61005 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61001 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61004 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61006 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61002 192.168.0.10:3306

Now, when we try to open a seventh connection to the database we’ll hit that error stated in the beginning of this post. What the error actually means is:

$ perror 99 OS error code 99: Cannot assign requested address

In fact, it should be complemented with “… because there’s no available local network ports left“.

And even if we close all 6 MySQL connections right away they’ll all move from ESTABLISHED to TIME_WAIT state and we’ll still need to wait for them to expire until we can open a new connection:

$ ss -na dport = :3306 State Recv-Q Send-Q Local Address:Port Peer Address:Port TIME-WAIT 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61003 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61002 192.168.0.10:3306

That’s where scalability problems happen if your application server keeps opening more connections than it can have old ones released in time. For example, considering the default port range from 32768 to 61000 and a TIME_WAIT of 60 seconds, in theory we can only open and close around 470 new network connections each second ((61000 – 32768 + 1)/60 = 470.55) before we saturate the available local network ports; that’s not much …

Possible Solutions

This is by no means an exhaustive list but here’s a few possible approaches to consider. If you have something to add about those or happen to known about any other please let me know by leaving a comment below.

Increasing port range

If the server initiating the connections is operating with the default port range you can start by increasing it somewhat. The first 1023 ports are said to be privileged, meaning only root can start an application listening to one of these initial ports. In the other extreme,  the highest port you can have assigned is 65535 (2^16-1). In practice, then, you can increase the port range to the maximum of 1024-65535, which would provide 64512 ports, allowing in theory around 1075 briefly connections being opened and closed per second:

$ sysctl -w net.ipv4.ip_local_port_range="1024 65535"

To make this change permanent and survive a server reboot you need to add the following line to /etc/sysctl.conf:

net.ipv4.ip_local_port_range=1024 65535

You should however pay attention when stretching these values to the limits.

Adding extra IP addresses and listening to multiple ports

Something that wasn’t clear to me at first is that the port range limitation is applied per quadruplet (<source address>:<source port>, <destination address>:<destination port>). As such, if you have port range set from 60001 to 60006 in the client you should be able to open no more than 6 MySQL connections from the same <address>:<port> pair to the same <address>:<port> pair, as well as 6 SSH connections,  6 NC connections, etc:

tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61005 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61003 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61005 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61006 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61001 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61003 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61006 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61001 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.11:3306 ESTABLISHED <-- here! tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED

Note in the list above there’s more than one connection established from the same source <address>:<port> pair, though binded to different destination <address>:<port> pairs. There’s even a 7th connection to MySQL, though to one running on a different server (192.168.0.11:3306).

So, another way of increasing the amount of concurrent connections is by adding an additional IP address to either the outgoing side (client) or to the incoming side (server) and making part of the connections happen through it. Likewise, even though you cannot make MySQL listen to multiple ports simultaneously you can configure your firewall to accept connections to other ports and redirect them to port 3306 (or any other you’re using). In the example below I configure iptables to redirect all connections destined to port 80 to port 3306:

$ iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 3306

Modifying the connection behavior of the application(s)

Expanding the number of possible quadruplets will certainly help in increasing the amount of possible concurrent connections but it won’t scale indefinitely. In the long term you may need to review your application configuration behavior and setup/re-engineer it in such a way to avoid it opening and closing many connections over the network too often – if that’s the problem after all. You may resort to some sort of connection pooling instead, but be sure to evaluate it well first.

Tweaking TCP parameter settings

Even though you cannot easily decrease the timeout for TIME_WAIT state there’s at least 3 different TCP parameters you can use to “bypass” this limitation. You should explore this options with caution though as these settings could affect the reliability of TCP connections.

tcp_tw_reuse

tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6) Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. It should not be changed without advice/request of technical experts.

It is possible to force the kernel to reuse a connection hanging in TIME_WAIT state by setting /proc/sys/net/ipv4/tcp_tw_reuse to 1. What happens in practice is that you’ll keep seeing the closed connections hanging in TIME_WAIT until either they expire or a new connection is requested. In the later case, the connection will be “relived”.

Here’s an example from the previous scenario where I had limited the port range to only 6 ports. I enabled tcp_tw_reuse and opened 6 connections with MySQL, closing five of them soon afterwards:

$ netstat -tn|grep 3306 tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 TIME_WAIT

The TIME_WAIT was still in countdown mode (both ss and netstat have option -o/–timers to display those) when I opened a new connection:

$ netstat -ton|grep 3306 tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT timewait (35.07/0/0) tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT timewait (34.52/0/0) tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED keepalive (3586.59/0/0) tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT timewait (33.91/0/0) tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT timewait (35.65/0/0) tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED keepalive (7196.66/0/0)

Note the new connection was established in lieu of the one hanging in TIME_WAIT state for longer (using local port 61003).

tcp_tw_recycle

tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4) Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when working with NAT (Network Address Translation).

When you enable /proc/sys/net/ipv4/tcp_tw_recycle closed connections will not show under TIME_WAIT anymore – they disappear from netstat altogether. But as soon as you open a new connection (within the 60 seconds mark) it will recycle one of those. But everyone writing about this alternative seems to advise against it’s use. Bottom line is: it’s preferable to reuse a connection than to recycle it.

tcp_max_tw_buckets

tcp_max_tw_buckets (integer; default: see below; since Linux 2.4) The maximum number of sockets in TIME_WAIT state allowed in the system. This limit exists only to prevent simple denial- of-service attacks. The default value of NR_FILE*2 is adjusted depending on the memory in the system. If this number is exceeded, the socket is closed and a warning is printed.

This parameter rules how many connections can remain in TIME_WAIT state concurrently: the kernel will
simply kill connections hanging in such state above that number. For example, continuing with the previous scenario where I had configured the client server with a port range composed of only 6 ports, if I set /proc/sys/net/ipv4/tcp_max_tw_buckets to 5, then open 6 concurrent connections with MySQL and then immediatelly close all 6 I’ll find only 5 of them hanging in the TIME_WAIT state – as with tcp_tw_recycle, one of then will simply disapear from netstat. This situation allows me to immediately open a new connection without needing to wait for a minute. However, I won’t be able to open a second one until one  of the other 5 connections in TIME_WAIT expire and freed the local port it was using. The secret here, then, is to find a compromise between the number of available network ports and the number of connections we allow to remain in TIME_WAIT state. The default value of this setting is 65536, which means by default the system allows all possible connections to go over the TIME_WAIT state when closed.

If “LAMP” server, use local socket

Finally, it’s important to mention that if your application is hosted in the same server as the database you may (should ?) open connections using MySQL’s socket file directly, without going over the network, and thus avoid this port range/TIME_WAIT problematic altogether. Interestingly, you can accomplish this in one of two ways: specifying the socket with the –socket option or using –host=localhost. As MySQL’s manual mentions:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file.

Clarification

I prefer to overstate this problem in favor of avoiding any confusion: the error described in this post and all the situations surrounding it is to be found in the computer that starts the connections (usually the application server), and not in the server holding the services. Whenever a client closes the connection, while it shows hanging in TIME_WAIT state there won’t be any remaining trace of the connection showing in netstat in the server side. However, if for some reason it is the server that “closes the connection first, it gets the TIME-WAIT state while the client will consider the corresponding quadruplet free and hence may reuse it for a new connection.

The post What happens when your application cannot open yet another connection to MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Streamlined Percona XtraDB Cluster (or anything) testing with Consul and Vagrant

MySQL Performance Blog - Fri, 2014-12-05 12:00
Introducing Consul

I’m always interested in what Mitchell Hashimoto and Hashicorp are up to, I typically find their projects valuable.  If you’ve heard of Vagrant, you know their work.

I recently became interested in a newer project they have called ‘Consul‘.  Consul is a bit hard to describe.  It is (in part):

  • Highly consistent metadata store (a bit like Zookeeeper)
  • A monitoring system (lightweight Nagios)
  • A service discovery system, both DNS and HTTP-based. (think of something like haproxy, but instead of tcp load balancing, it provides dns lookups with healthy services)
What this has to do with Percona XtraDB Cluster

I’ve had some more complex testing for Percona XtraDB Cluster (PXC) to do on my plate for quite a while, and I started to explore Consul as a tool to help with this.  I already have Vagrant setups for PXC, but ensuring all the nodes are healthy, kicking off tests, gathering results, etc. were still difficult.

So, my loose goals for Consul are:

  • A single dashboard to ensure my testing environment is healthy
  • Ability to adapt to any size environment — 3 node clusters up to 20+
  • Coordinate starting and stopping load tests running on any number of test clients
  • Have the ability to collect distributed test results

I’ve succeeded on some of these fronts with a Vagrant environment I’ve been working on. This spins up:

  • A Consul cluster (default is a single node)
  • Test server(s)
  • A PXC cluster

Additionally, it integrates the Test servers and PXC nodes with Consul such that:

  • The servers setup a Consul agent in client mode to the  Consul cluster
  • Additionally, they setup a local DNS forwarder that sends all DNS requests to the ‘.consul’ domain to the local agent to be serviced by the Consul cluster.
  • The servers register services with Consul that run local health checks
  • The test server(s) setup a ‘watch’ in consul to wait for starting sysbench on a consul ‘event’.
Seeing it in action

Once I run my ‘vagrant up’, I get a consul UI I can connect to on my localhost at port 8501:

Consul’s Node Overview

 

I can see all 5 of my nodes.  I can check the services and see that test1 is failing one health check because sysbench isn’t running yet:

Consul reporting sysbench is not running.

This is expected, because I haven’t started testing yet.  I can see that my PXC cluster is healthy:

 

Health checks are using clustercheck from the PXC package

 

Involving Percona Cloud Tools in the system

So far, so good.  This Vagrant configuration (if I provide a PERCONA_AGENT_API_KEY in my environment) also registers my test servers with Percona Cloud Tools, so I can see data being reported there for my nodes:

Percona Cloud Tool’s Dashboard for a single node

So now I am ready to begin my test.  To do so, I simply need to issue a consul event from any of the nodes:

jayj@~/Src/pxc_consul [507]$ vagrant ssh consul1 Last login: Wed Nov 26 14:32:38 2014 from 10.0.2.2 [root@consul1 ~]# consul event -name='sysbench_update_index' Event ID: 7c8aab42-fd2e-de6c-cb0c-1de31c02ce95

My pre-configured watchers on my test node knows what to do with that event and launches sysbench.  Consul shows that sysbench is indeed running:

 

And I can indeed see traffic start to come in on Percona Cloud Tools:

I have testing traffic limited for my example, but that’s easily tunable via the Vagrantfile.  To show something a little more impressive, here’s a 5 node cluster running hitting around 2500 tps total throughput:

So to summarize thus far:
  • I can spin up any size cluster I want and verify it is healthy with Consul’s UI
  • I can spin up any number of test servers and kick off sysbench on all of them simultaneously
Another big trick of Consul’s

That so far so good, but let me point out a few things that may not be obvious.  If you check the Vagrantfile, I use a consul hostname in a few places.  First, on the test servers:

# sysbench setup 'tables' => 1, 'rows' => 1000000, 'threads' => 4 * pxc_nodes, 'tx_rate' => 10, 'mysql_host' => 'pxc.service.consul'

then again on the PXC server configuration:

# PXC setup "percona_server_version" => pxc_version, 'innodb_buffer_pool_size' => '1G', 'innodb_log_file_size' => '1G', 'innodb_flush_log_at_trx_commit' => '0', 'pxc_bootstrap_node' => (i == 1 ? true : false ), 'wsrep_cluster_address' => 'gcomm://pxc.service.consul', 'wsrep_provider_options' => 'gcache.size=2G; gcs.fc_limit=1024',

Notice ‘pxc.service.consul’.  This hostname is provided by Consul and resolves to all the IPs of the current servers both having and passing the ‘pxc’ service health check:

[root@test1 ~]# host pxc.service.consul pxc.service.consul has address 172.28.128.7 pxc.service.consul has address 172.28.128.6 pxc.service.consul has address 172.28.128.5

So I am using this to my advantage in two ways:

  1. My PXC cluster bootstraps the first node automatically, but all the other nodes use this hostname for their wsrep_cluster_address.  This means: no specific hostnames or ips in the my.cnf file, and this hostname will always be up to date with what nodes are active in the cluster; which is the precise list that should be in the wsrep_cluster_address at any given moment.
  2. My test servers connect to this hostname, therefore they always know where to connect and they will round-robin (if I have enough sysbench threads and PXC nodes) to different nodes based on the response of the dns lookup, which returns 3 of the active nodes in a different order each time.
(Some of) The Issues

This is still a work in progress and there are many improvements that could be made:

  • I’m relying on PCT to collect my data, but it’d be nice to utilize Consul’s central key/value store to store results of the independent sysbench runs.
  • Consul’s leader election could be used to help the cluster determine which node should bootstrap on first startup. I am assuming node1 should bootstrap.
  • A variety of bugs in various software still makes this a bit clunky sometimes to manage.  Here is a sample:
    • Consul events sometimes don’t fire in the current release (though it looks to be fixed soon)
    • PXC joining nodes sometimes get stuck putting speed bumps into the automated deploy.
    • Automated installs of percona-agent (which sends data to Percona Cloud Tools) is straight-forward, except when different cluster nodes clobber each other’s credentials.

So, in summary, I am happy with how easily Consul integrates and I’m already finding it useful for a product in its 0.4.1 release.

The post Streamlined Percona XtraDB Cluster (or anything) testing with Consul and Vagrant appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL and OpenStack deep dive: Dec. 10 webinar

MySQL Performance Blog - Thu, 2014-12-04 21:59

Fact: MySQL is the most commonly used database in OpenStack deployments. Of course that includes a number of MySQL variants – standard MySQL by Oracle, MariaDB, Percona Server, MySQL Galera, Percona XtraDB Cluster, etc.

However, there are many misconceptions and myths around the pros and cons of these MySQL flavors. Join me and my friend Jay Pipes of Mirantis next Wednesday (Dec. 10) at 10 a.m. Pacific and we’ll dispel some of these myths and provide a clearer picture of the strengths and weaknesses of each of these flavors.

This free Percona webinar, titled “MySQL and OpenStack Deep Dive,” will also illuminate the pitfalls to avoid when migrating between MySQL flavors – and what architectural information to take into account when planning your OpenStack MySQL database deployments.

We’ll also discuss replication topologies and techniques, and explain how the Galera Cluster variants differ from standard MySQL replication.

Finally, in the latter part of the session, we’ll take a deep dive into MySQL database performance analysis, diving into the results of a Rally run showing a typical Nova workload. In addition, we’ll use Percona Toolkit’s famed pt-query-digest tool to determine if a synchronously replication database cluster like the free Percona XtraDB Cluster is a good fit for certain OpenStack projects.

The webinar is free but I encourage you to register now to reserve your spot. See you Dec. 10! In the meantime, learn more about the new annual OpenStack Live Conference and Expo which debuts April 13-14 in the heart of Silicon Valley. If you register now you’ll save with Early Bird pricing. However, one lucky webinar attendee will win a full pass! So be sure to register for next week’s webinar now for your chance to win! (Click here!) The winner will be announced at the end of the webinar.

The post MySQL and OpenStack deep dive: Dec. 10 webinar appeared first on MySQL Performance Blog.

Categories: MySQL

Sneak peek at the Percona Live MySQL Conference & Expo 2015

MySQL Performance Blog - Thu, 2014-12-04 13:55

You know you’ll be there so why not save some $$ by registering now for the Percona Live MySQL Conference & Expo 2015 (April 13-16 in Santa Clara, Calif.). Super Saver registration discounts are available through Dec. 14 at 11:30 p.m. PST. (That’s just 10 days away!)

What to expect this year? The Percona Live 2015 conference committee is putting together another fantastic event for the global MySQL community’s next meeting in April. The full conference agenda will be announced in January, but the initial roster includes:

  • Sunny Bains, Senior Engineering Manager at Oracle; “InnoDB 5.7- What’s New”
  • Yoshinori Matsunobu, Database Engineer at Facebook; “Fast Master Failover Without Data Loss”
  • Jeremy Cole, Senior Systems Engineer at Google, Inc.; “Exploring Your Data With InnoDB Explorer”
  • Tom Krouper, Staff Database Administrator at Twitter; “Upgrading to MySQL 5.6 @ Scale”
  • Jenni Snyder, Database Administrator at Yelp; “Schema changes multiple times a day? OK!”
  • Ike Walker, Database Architect at Flite; “Assembling the Perfect MySQL Toolbox”
  • Jean-François Gagné, Senior System Engineer/Architect at Booking.com; “Binlog Servers at Booking.com”
  • Jeremy Glick, Lead DBA at MyDBAteam, and Andrew Moore, MySQL DBA at Percona; “Using MySQL Audit Plugins and Elasticsearch ELK”
  • Tomáš Komenda, Team Lead and Database Specialist, and Lukáš Putna, Senior Developer and Database Specialist at Seznam.cz; “MySQL and HBase Ecosystem for Real-time Big Data Overviews”
  • Alexander Rubin, Principal Consultant at Percona; “Advanced MySQL Query Tuning”

And while the call for papers deadline has expired, there are still sponsorship opportunities available for the world’s largest annual MySQL event. Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Current sponsors include:

  • Diamond Plus: VMware
  • Gold: Codership, Pythian
  • Silver: Box, SpringbokSQL, Yelp
  • Exhibit Only: FoundationDB, Severalnines, Tokutek, VividCortex
  • Other Sponsors: MailChimp
  • Media Sponsor: Database Trends & Applications , Datanami, InfoQ , Linux Journal, O’Reilly Media

Percona Live 2015 will feature a variety of formal tracks and sessions related to High Availability, DevOps, Programming, Performance Optimization, Replication and Backup, MySQL in the Cloud, MySQL and NoSQL, MySQL Case Studies, Security, and What’s New in MySQL.

As usual the conference will be held in the heart of Silicon Valley at the Hyatt Regency Santa Clara and Santa Clara Convention Center. But this year Percona has also unveiled OpenStack Live 2015, a new conference that will run in parallel with Percona Live MySQL Conference & Expo 2015 on April 13 and 14.

And don’t forget, Super Saver registration discounts are available through Dec. 14 at 11:30 p.m. PST. I hope to see you in Santa Clara!

The post Sneak peek at the Percona Live MySQL Conference & Expo 2015 appeared first on MySQL Performance Blog.

Categories: MySQL

Auto-bootstrapping an all-down cluster with Percona XtraDB Cluster

MySQL Performance Blog - Wed, 2014-12-03 12:01

One new feature in Percona XtraDB Cluster (PXC) in recent releases was the inclusion of the ability for an existing cluster to auto-bootstrap after an all-node-down event.  Suppose you lose power on all nodes simultaneously or something else similar happens to your cluster. Traditionally, this meant manually re-bootstrapping the cluster, but not any more.

How it works

Given the above all-down situation, if all nodes are able to restart and see each other such that they all agree what the state was and that all nodes have returned, then the nodes will make a decision that it is safe for them to recover PRIMARY state as a whole.

This requires:

  • All nodes went down hard — that is; a kill -9, kernel panic, server power failure, or similar event
  • All nodes from the last PRIMARY component are restarted and are able to see each other again.
Demonstration

Suppose I have a 3 node cluster in a stable state. I then kill all nodes simultaneously (simulating a power failure or similar event):

[root@node1 ~]# killall -9 mysqld [root@node2 ~]# killall -9 mysqld [root@node3 ~]# killall -9 mysqld

I can see that each node maintained a state file in its datadir called ‘gvwstate.dat’. This contains the last known view of the cluster:

[root@node1 ~]# cat /var/lib/mysql/gvwstate.dat my_uuid: 78caedfe-75a5-11e4-ac69-fb694ee06530 #vwbeg view_id: 3 78caedfe-75a5-11e4-ac69-fb694ee06530 9 bootstrap: 0 member: 78caedfe-75a5-11e4-ac69-fb694ee06530 0 member: 87da2387-75a5-11e4-900f-ba49ecdce584 0 member: 8a25acd8-75a5-11e4-9e22-97412a1263ac 0 #vwend

This file will not exist on a node if it was shutdown cleanly, only if the mysqld was uncleanly terminated. This file should exist and be the same on all the nodes for the auto-recovery to work.

I can now restart all 3 nodes more or less at the same time. Note that none of these nodes are bootstrapping and all of the nodes have the wsrep_cluster_address set to a proper list of the nodes in the cluster:

[root@node1 ~]# service mysql start [root@node2 ~]# service mysql start [root@node3 ~]# service mysql start

I can indeed see that they all start successfully and enter the primary state:

[root@node1 ~]# mysql -e "show global status like 'wsrep_cluster%'" +--------------------------+--------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------+ | wsrep_cluster_conf_id | 0 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 | | wsrep_cluster_status | Primary | +--------------------------+--------------------------------------+ [root@node2 ~]# mysql -e "show global status like 'wsrep_cluster%'" +--------------------------+--------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------+ | wsrep_cluster_conf_id | 0 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 | | wsrep_cluster_status | Primary | +--------------------------+--------------------------------------+ [root@node3 ~]# mysql -e "show global status like 'wsrep_cluster%'" +--------------------------+--------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------+ | wsrep_cluster_conf_id | 0 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 | | wsrep_cluster_status | Primary | +--------------------------+--------------------------------------+

Checking the logs, I can see this indication that the feature is working:

2014-11-26 19:59:36 1809 [Note] WSREP: promote to primary component 2014-11-26 19:59:36 1809 [Note] WSREP: view(view_id(PRIM,78caedfe,13) memb { 78caedfe,0 87da2387,0 8a25acd8,0 } joined { } left { } partitioned { }) 2014-11-26 19:59:36 1809 [Note] WSREP: save pc into disk 2014-11-26 19:59:36 1809 [Note] WSREP: clear restored view

Changing this behavior

This feature is enabled by default, but you can toggle it off with the pc.recovery setting in the wsrep_provider_options

This feature helps cover an edge case where manual bootstrapping was necessary in the past to recovery properly. This feature was added in Percona XtraDB Cluster version 5.6.19, but was broken due to this bug.  It was fixed in PXC 5.6.21

The post Auto-bootstrapping an all-down cluster with Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Categories: MySQL

Tips from the trenches for over-extended MySQL DBAs

MySQL Performance Blog - Tue, 2014-12-02 08:00

This post is a follow-up to my November 19 webinar, “Tips from the Trenches: A Guide to Preventing Downtime for the Over-Extended DBA,” during which I described some of the most common reasons DBAs experience avoidable downtime. The session was aimed at the “over-stretched DBA,” identified as the MySQL DBA short of time or an engineer of another discipline without the depth of the MySQL system. The over-stretched DBA may be prone to making fundamental mistakes that cause downtime through poor response time, operations that cause blocking on important data or administrative mishaps through the lack of best practice monitoring and alerting. (You can download my slides and view the recorded webinar here.)

Monitor the things
One of the aides to keeping the system up and running is ensuring that your finger is on the pulse of the environment. Here on the Percona Managed Services team, we leverage Percona Monitoring Plugins (open source plugins for Nagios, Cacti and Zabbix) to ensure we have visibility of our client’s operations. Having a handle on basics such as disk space, memory usage and MySQL operational metrics ensures that we avoid trivial downtime that would affect the client’s uptime or worse, their bottom line.

Road Blocks
One of the most common reasons that an application is unable to serve data to its end user is that access to a table is being blocked due to another ongoing operation. This can be blamed on a variety of sources: backups, schema changes, poor configuration and long running transactions can all lend themselves to costly blocking. Understanding the impact of actions on a MySQL server can be the difference between a happy end user and a frustrated one.

During the webinar I made reference to some resources and techniques that can assist the over extended DBA avoid downtime and here are some highlights….

Monitoring and Alerting
It’s important that you have some indications that something is reaching its capacity. It might be the disk, connections to MySQL or auto_increment limit on a highly used table. There is quite the landscape to cover but here are a handful of helpful tools:
* Percona Monitoring Plugins
* Monyog
* New Relic

Query Tuning
Poorly performing SQL can be indicative that the configuration is incorrect, that there’s a missing index or that your development team needs a quick lesson on MySQL anti-patterns. Arm yourself with proof that the SQL statements are substandard using these resources and work with the source to make things more efficient:
* Percona Cloud Tools
* pt-query-digest, explain, indexes

High Availability
If you need to ensure that your application survives hiccups such as hardware failure or network impairment, a well deployed HA solution will give you the peace of mind that you can quickly mitigate bumps in the road.
* MHA
Percona XtraDB Cluster, Galera
* Percona Replication Manager
* LinuxHA/Corosync/DRBD

Backups
A wise man once quoted “A backup today saves you tomorrow.” Covering all bases can be the difference between recovering from a catastrophic failure and job hunting. Mixing logical, physical and incremental backups while adding in some offsite copies can provide you with the safety net in the event that a small mistake like a dropped table is met or worse, all working copies of data and backups are lost in a SAN failure. It happens so be prepared.
* Percona XtraBackup
* mydumper
* mysqldump
* mysqlbinlog (5.6)
* mylvmbackup

We had some great questions from the attendees and regrettably were unable to answer them all, so here are some of them with my response.

Q: I use MySQL on Amazon RDS. Isn’t much of the operations automated or do these tips still apply?
A: It’s not completely automated. There are still challenges to address and configuration opportunities, but understanding the limitations of RDS is key. For example, the location and size of the tmpdir is something you are unable to customise on RDS. You would typically review this config in a production environment if your workload required it. Any costly queries that perform operations requiring tmp area to sort (think OLAP) might not be a good fit on RDS due to this limitation. Getting to know the limitations around hosted or DBaaS services is time well spent to avoid explaining what keeps taking the application down in peak hours.

Q: What other parts of Percona Toolkit do you recommend for MySQL operations?
A: Percona Toolkit is a well-evolved suite of tools that all MySQL DBAs should familiarize themselves with. In particular I will fit many tools into my weekly workflow:

Operations

  • pt-online-schema-change
  • pt-table-checksum
  • pt-table-sync

Troubleshooting

  • pt-stalk
  • pt-pmp
  • pt-config-diff

Knowledge Gathering

  • pt-summary
  • pt-mysql-summary
  • pt-duplicate -key-checker

The key with Percona Toolkit is that many common tasks or problems that could cause you to reinvent the wheel are covered, mature and production ready. As with any tool, you should always read the label or in this case the documentation so you’re well aware what the tools can do, the risks and the features that you can make use of.

Q: HA – are there any solutions that you would stay away from?
A: Using any particular HA solution is going to be another R&D exercise. You will need to understand the tradeoffs, configuration options and compare between products. Some might have a higher TCO or lack functionality. Once the chosen solution is implemented it’s pertinent that the engineers understand the technology to be able to troubleshoot or utilize the functionality in the situation where failover needs to be instigated. I like HA solutions to be fast to failover to and some entail starting MySQL from cold.

Q: You mentioned having tested backups. How do you perform this?
A: Percona’s method is using a dedicated host with access to the backup files. Then with a combination of mysqlsandbox and pt-table-checksum we can discover if we trust the files we capture for disaster recovery. Many people underestimate the importance of this task.

Q: Percona Cloud Tools – how much does it cost?
A: Right now it’s a free service. Visit cloud.percona.com for more information, but in a nutshell Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses.

Q: Is there API access to Percona Cloud Tools for application integration?
A: There is currently not a public API available. It is on the roadmap, though. We’d be interested to hear more about your use case so please sign up for the service and try it out. After signing in, all pages include a Feedback link to share your thoughts and ideas such as how you’d like to use a public API.

Q: Can you use MHA with Percona XtraDB Cluster?
A: MHA is not something that can be used with Percona XtraDB Cluster (PXC). It’s common to partner PXC with HAProxy for making sure your writes are going to the appropriate node.

Q: Can MHA make automatic failover? If MHA has automatic failover, what do you recommend? Configure it for automatic failover?
A: MHA can make an automatic failover. Personally I prefer managed failover. When working with automated failover it’s important that failback is manual to avoid “flapping.” “Splitbrain” is an ailment that you don’t want to suffer from as well and auto failover removes the human judgment from the decision to relocate operations from a failed node onto a standby node. If you are going to vote for an automatic failover it is advised to test all potential failure scenarios and to employ a STONITH method to really ensure that the unresponsive node is not serving read/write traffic.

Q: What is the best way to detect database blocking from DML statements? Is there a tool that will show blocking after the fact so you don’t have to catch it real-time?
A: Once again, Percona has a tool called pt-deadlock-logger that can detect and log deadlocks. Detecting locking can be achieved using “SHOW ENGINE INNODB STATUS” or utilizing the information_schema.innodb_locks table. Some engineering might be required for this to be logged but those resources exist for use.

Q: Since you mentioned tinkering with ELK I was wondering if you had any tips on good Kibana dashboards to build to monitor MySQL databases/clusters?
A: ELK is something that I’m looking to publish some information on soon so watch this space!

Thanks again everyone for the great questions! And as a reminder, you can download my slides and view the recorded webinar here.

The post Tips from the trenches for over-extended MySQL DBAs appeared first on MySQL Performance Blog.

Categories: MySQL
Syndicate content