- Total page hits/unique visits snippet
5 years 21 weeks ago - Busy IRL, but happier than ever
5 years 22 weeks ago - Drupal page titles like breadcrumbs
5 years 31 weeks ago - Theming the Akismet spam counter
5 years 32 weeks ago - Akismet module v1.1.2 for Drupal 4.7
5 years 32 weeks ago
MySQL
STOP: DELETE IGNORE on Tables with Foreign Keys Can Break Replication
DELETE IGNORE suppresses errors and downgrades them as warnings, if you are not aware how IGNORE behaves on tables with FOREIGN KEYs, you could be in for a surprise.
Let’s take a table with data as example, column c1 on table t2 references column c1 on table t1 – both columns have identical set of rows for simplicity.
CREATE TABLE `t1` ( `t1_c1` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`t1_c1`) ) ENGINE=InnoDB; CREATE TABLE `t2` ( `t2_c1` int(10) unsigned NOT NULL, PRIMARY KEY (`t2_c1`), CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`t2_c1`) REFERENCES `t1` (`t1_c1`) ON UPDATE CASCADE ) ENGINE=InnoDB; [revin@forge rsandbox_5_5_17]$ for int in {1..2000}; do ./master/use test -e "insert into t1 values($int)"; done [revin@forge rsandbox_5_5_17]$ ./master/use test -e "insert into t2 select * from t1" master [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 2000 | +----------+ 1 row in set (0.00 sec) master [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t2; +----------+ | COUNT(*) | +----------+ | 2000 | +----------+ 1 row in set (0.00 sec)An expected behavior for DELETE IGNORE is that if the statement fails to delete all rows, none should be deleted at all, after all this is InnoDB right? Wrong, take a look at bug 44987. As stated on the bug, only 5.0 exhibits the above mentioned behavior on 5.1 and 5.5, MySQL will stop deleting rows if it encounters constraint errors.
To demonstrate on 5.5.17:
I delete some rows from t2 so some rows on t1 does not have a constraint.
master [localhost] {msandbox} (test) > DELETE FROM t2 WHERE t2_c1 BETWEEN 201 AND 400; Query OK, 200 rows affected (0.00 sec)Now I try to DELETE IGNORE rows 301 to 500 on t1, note rows 301 to 400 does not have any existing constraints from t2 as we deleted them from above.
master [localhost] {msandbox} (test) > DELETE IGNORE FROM t1 WHERE t1_c1 BETWEEN 301 AND 500; Query OK, 100 rows affected, 1 warning (0.00 sec) master [localhost] {msandbox} (test) > SHOW WARNINGS \G *************************** 1. row *************************** Level: Error Code: 1451 Message: Cannot delete or update a parent row: a foreign key constraint fails (`test`.`t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`t2_c1`) REFERENCES `t1` (`t1_c1`) ON UPDATE CASCADE) 1 row in set (0.00 sec)As expected a warning is generated because rows 201 to 300 on t1 still has referencing foreign keys from t2. However, 100 has been deleted! Let’s see.
master [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 1900 | +----------+ 1 row in set (0.00 sec)Now let’s check the slave.
[revin@forge rsandbox_5_5_17]$ ./node1/use test ... slave1 [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 2000 | +----------+ 1 row in set (0.00 sec)Uh oh, now the slave is out of sync, because the statement failed to delete all intended rows it was not written to the binary log and consequently not reaching the slave.
So how can you workaround this? Simple, 1) do not use IGNORE, be critical about your data 2) use ROW* based replication. When using the latter, MySQL will log separate statements for each row that is deleted – so if the first 100 rows was successfully deleted then those 100 events are logged and eventually replicated.
[revin@forge rsandbox_5_5_170]$ ./master/use test ... master [localhost] {msandbox} (test) > SELECT @@binlog_format; +-----------------+ | @@binlog_format | +-----------------+ | ROW | +-----------------+ 1 row in set (0.00 sec) master [localhost] {msandbox} (test) > DELETE FROM t2 WHERE t2_c1 BETWEEN 201 AND 400; Query OK, 200 rows affected (0.00 sec) master [localhost] {msandbox} (test) > DELETE IGNORE FROM t1 WHERE t1_c1 BETWEEN 301 AND 500; Query OK, 100 rows affected, 1 warning (0.01 sec) master [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 1900 | +----------+ 1 row in set (0.00 sec) [revin@forge rsandbox_5_5_170]$ ./node1/use test ... slave1 [localhost] {msandbox} (test) > SELECT COUNT(*) FROM t1; +----------+ | COUNT(*) | +----------+ | 1900 | +----------+ 1 row in set (0.00 sec)* MIXED mode will not work since the query in this example will be considered STATEMENT (http://dev.mysql.com/doc/refman/5.1/en/binary-log-mixed.html) thus failure to execute the query successfully means it will not get logged.
More details about SchoonerSQL performance, please!
Schooner has a blog post showing that one node of their product beats 9 nodes of Clustrix’s in throughput. But this reduces everything to a single number, and that’s not everything that matters. If you’ve looked at Vadim’s white paper about Clustrix’s (paid-for) performance evaluation with Percona, you see there is a lot of detail about how consistent the throughput and response time are.
I’d love to see that level of details in any product comparison. A single number often isn’t enough to judge how good the performance is — fast is not the only thing that matters.
I have absolutely no doubts that a single node of Schooner’s product can run like a deer. It isn’t doing any cross-node communication, after all, so it had better be faster than something that blends multiple nodes together into a virtual “single database server.” And I think if the full story were told, it would be a great knock-down drag-out fight. Give us more details, Schooner!
Further Reading:
Verifying backup integrity with CHECK TABLES
An attendee to Espen’s recent webinar asked how to check tables for corruption. This kind of ties into my recent post on InnoDB’s handling of corrupted pages, because the best way to check for corruption is with CHECK TABLES, but if a page is corrupt, InnoDB will crash the server to prevent access to the corrupt data. As mentioned in that post, this can only be changed by changing InnoDB.
So how are you supposed to check for corruption that might be introduced by bad hardware, a bug, or so forth?
It’s a great question. The answer I would give for most cases is “check your backups for corruption instead of your live server.” You need to do this anyway — a backup that isn’t checked is a ticking time bomb. You need to verify (at least periodically) that your backups are recoverable.
The usual procedure goes like this: copy your backup somewhere, start a server instance on it, and run CHECK TABLES. You can use the mysqlcheck program to do this conveniently.
You could also use innochecksum, which doesn’t require starting the server. But it only verifies that each page’s checksum matches the page’s data, it doesn’t do all the other checks that are built into InnoDB (making sure that the LSNs are sane, for example).
How often? As often as possible. Some people refresh their dev/staging environment every day with last night’s backup, which is a great way to make failures obvious, as long as you verify that it truly does happen (e.g. what if it fails and you keep running with yesterday’s without knowing it?). If you can’t do it daily, then weekly is perfectly acceptable to most people. I’m not saying a specific interval should/ought to be your goal, I’m just remarking on what a lot of people seem to feel good about.
Central Virginia MySQL Meetup has reached cruising altitude
The first Central Virginia MySQL Meetup was a nice little howdy-do, and as a test flight, I think it showed that the bird can get off the ground quite nicely. So, with the generous help of our meeting host Meddius, we’re going to do it regularly on the 3rd Wednesday of every month. The next event is already scheduled — I will be talking about high availability options for MySQL.
I’m interested in having outside speakers. Anyone who’d like to come and present something MySQL-relevant, please feel free to email me, or contact me via the Meetup page with the “suggest a Meetup” feature. If you’re traveling from outside the area, the airport is CHO, and it’s about 30 minutes away. Amtrak is also very convenient. I’m happy to chauffeur you, and can help you find lodging too.
I will not try to steer overly much, but I hope that this turns into a group where people introduce themselves, mention job openings and other news of interest, and so on.
There are also a couple of related meetups nearby that I want to promote: NOVA MySQL at AOL’s headquarters led by Mike DelNegro, DevOps DC at CustomInk’s offices led by Nathen Harvey, and one I haven’t been to yet but hope to attend soon, Shenandoah Ruby Users Group in Harrisonburg near Rosetta Stone’s headquarters, led by John Paul Ashenfelter.
Further Reading:
Speaking at MySQL Meetup in Raleigh,NC
I’ll be presenting at MySQL Meetup in Raleigh,NC February 21,2012. The talk with be about Optimizing MySQL Configuration which I believe is a great topic for my first talk at this meetup group as it covers something every MySQL user has to deal with, also being something both beginner and advanced MySQL Users can learn a lot from. I’ll also bring High Performance MySQL 2nd edition for a drawing as well as exclusive discount codes for MySQL Conference and Expo 2012 in Santa Clara. See you there.
Statistics counters for Multi Range Read
MariaDB 5.3 has now three statistics counters for Multi Range Read optimization:
MariaDB [test]> show status like 'Handler_mrr%'; +-------------------------------+-------+ | Variable_name | Value | +-------------------------------+-------+ | Handler_mrr_extra_key_sorts | 0 | | Handler_mrr_extra_rowid_sorts | 0 | | Handler_mrr_init | 0 | +-------------------------------+-------+ 3 rows in set (0.08 sec)I’ve just added the first two. The reason for having them is as follows: the point of MRR is to provide speedup over regular execution by doing reads in disk order. In order to make reads in disk order, MRR needs buffer space where it accumulates and sorts read requests. If there are too many read requests to fit into the buffer, MRR will make multiple accumulate-sort-read passes.
Doing multiple passes allows MRR to operate when having limited buffer space, but the speedup will be not as great as with one big disk-ordered read sweep. The purpose of Handler_mrr_extra_key_sorts and Handler_mrr_extra_rowid_sorts is to count the additional accumulate-sort-read passes, so you’re able to tell if you will benefit from increasing your @@mrr_buffer_size and @@join_buffer_size settings.
There are two counters, _extra_key_sorts and _extra_rowid_sorts, because MariaDB has two places where it will do sorting:
- sort rowids before reading table records
- sort key values before making a bunch of index lookups
MRR code will try to distribute buffer space between them in an optimal way. The decision is a guess based on the available statistics, and can be wrong. Having both counters will allow us to check how the guess will work in practice.
p.s. if you could not make any sense of anything above, try reading Multi Range Read page in our knowlegebase. We have just put there a hopefully-readable explanation of what MRR is.
MySQL Configuration Wizard Updated
We’ve released an updated version of the MySQL Configuration Wizard we announced at the end of last year. If you don’t remember that announcement, here’s the short version: this is a tool to help you generate my.cnf files based on your server’s hardware and other characteristics.
We’ve gotten really good feedback on this tool, including this nice mention on Stack Exchange:
Percona just built a tool to do just that called the Configuration Wizard. I tested it out once just to see what it would return and the results were pretty darn close to what we were using on our servers, whose cnf’s were put together by highly trained mysql certified dba’s.
So what’s changed in the new version of the Configuration Wizard? Quite a few things. We’ve rolled out the first iteration of the account and profile features. Now you get a homepage with your configuration files, so you can manage them and return to them anytime you like.
From this page (click on the image for a fullsize view) you can do things like sharing configuration files and emailing them to yourself. The new release also adds features like downloading the configuration files so you don’t have to copy-paste them.
If you share a configuration file, then the URL can be loaded by anyone, even if they’re not logged in. It’s kind of like sending someone a link to a pastebin or something like that. Screenshot:
Another new feature is something I’ve wanted for a long time: the ability to generate a more strict, safer configuration file. There’s a new page in the Wizard that lets you set a lot of sanity/safety options to prevent common problems MySQL users run into because of too-permissive MySQL behaviors. These are the kinds of things that Drizzle fixes — and should be fixed by default in MySQL — but never will be because they might break applications that rely on the default behaviors. If you’re building an application from the ground up, now you can prevent bad things from getting a nose under the tent. Here’s a screenshot:
In addition to these things, we have added a number of other features you might not notice, which I won’t spend much time on — they’re things like an integrated feedback form at the left of the page and so on.
What’s next? Well, next I think we’re going to turn our attention to adding new tools, rather than improving this one. I have a list of tools that people have requested or suggested: a SQL formatter, a visual EXPLAIN tool, a configuration advisor, a query analysis tool, a way to register a server’s essential characteristics and then get advice when there’s a new release that might be beneficial for you, and so on. I have selected the next priorities, but I don’t want to spoil the surprise or promise something if it turns out to be harder than I think it will be. What ideas do you have? Let me know by leaving your feedback in the comments.
We hope this suite of free browser-based tools helps you become a more productive MySQL user and administrator!
How to recover a single InnoDB table from a Full Backup
Sometimes we need to restore only some tables from a full backup maybe because your data loss affect a small number of your tables. In this particular scenario is faster to recover single tables than a full backup. This is easy with MyISAM but if your tables are InnoDB the process is a little bit different story.
With Oracle’s stock MySQL you cannot move your ibd files freely from one server to another or from one database to another. The reason is that the table definition is stored in the InnoDB shared tablespace (ibdata) and the transaction IDs and log sequence numbers that are stored in the tablespace files also differ between servers. Therefore our example will be very straightforward: we’ll delete some rows from a table in order to recover the table later.
Most of these limitations are solved on Percona Server . More info about this in the conclusion section of this post. This post will be focus on how to recover a single tablespace using stock MySQL server.
First, you must meet certain prerequisites to be able to restore a ibd tablespace:
- The ibd file must be from a consistent backup with all insert buffer entries merged and have no uncommitted transactions in order to not be dependent of the shared tablespace ibdata. That is, shutting down with innodb_fast_shutdown=0. We’ll use XtraBackup to avoid the server shutdown.
- You must not drop, truncate or alter the schema of the table after the backup has been taken.
- The variable innodb_file_per_table must be enabled.
Then, our first step is to get a consistent backup.
First we need to copy all the data to an output directory:
The –export option is the magic trick that will help us to get a consistent backup with complete independent ibd files without shutting down the service. In the second step the use of –export option runs a recovery process on the backup with innodb_fast_shutdown=0 and therefore merging all the insert buffers.
# innobackupex --defaults-file=/etc/my.cnf --export /tmp/
Then apply the logs to get a consistent backup:
# innobackupex --defaults-file=/etc/my.cnf --apply-log --export /tmp/2012-01-22_14-13-20/
Now we’re going to delete some data from one table. In this case we’re going to delete the salary information from the user 10008:
mysql> SELECT * FROM salaries WHERE emp_no=10008;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10008 | 46671 | 1998-03-11 | 1999-03-11 |
| 10008 | 48584 | 1999-03-11 | 2000-03-10 |
| 10008 | 52668 | 2000-03-10 | 2000-07-31 |
+--------+--------+------------+------------+
mysql> DELETE FROM salaries WHERE emp_no=10008;
The next step is where we are going to save a lot of time and some headaches Instead of recovering all the InnoDB data we are going to recover only the “salaries” table:
- Discard the tablespace of the salaries table:
mysql> set FOREIGN_KEY_CHECKS=0;
mysql> ALTER TABLE salaries DISCARD TABLESPACE;
- Copy the salaries.ibd files from the backup to the database data directory:
# cp /tmp/2012-01-22_14-13-20/employees/salaries.ibd /var/lib/mysql/data/employees/
- Import the new tablespace:
mysql> set FOREIGN_KEY_CHECKS=0;
mysql> ALTER TABLE salaries IMPORT TABLESPACE;
mysql> set FOREIGN_KEY_CHECKS=1;
mysql> SELECT * FROM salaries WHERE emp_no=10008;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10008 | 46671 | 1998-03-11 | 1999-03-11 |
| 10008 | 48584 | 1999-03-11 | 2000-03-10 |
| 10008 | 52668 | 2000-03-10 | 2000-07-31 |
+--------+--------+------------+------------+
The salary history from the user is back again!
Conclusion:
As we learned , you can also recover single InnoDB table as with MyISAM but knowing in advance that there are some prerequisites to comply.
Percona Server relaxes a lot of limitations and is able to import tables from different Server instance, when table was altered or truncated in the meanwhile. Though this only works if table was
“exported” with Xtrabackup as this exports essential information from main tablespace which is not stored in .ibd file. innodb_import_table_from_xtrabackup=1 should be enabled for such advanced import process to work. You can read more about this feature in Percona Server Documentation
In the next blog post I’ll explain how to do recovery using Percona Data Recovery toolkit.
Schedule for MySQL Conference 2012 is Published
I am pleased to announce the schedule for Percona Live: MySQL Conference And Expo 2012 is now published. This is truly great selection of talks with something for MySQL Developers, DBAs, Managers, people just starting to use MySQL as well as looking for advanced topics. We have talks about running MySQL on extremely large scale in a Web as well as running MySQL In the Enterprise Environments. Some speakers have spent over decade pushing MySQL to its limits, others have in depth experience working on MySQL Code.
We have many talks which are covering Oracle MySQL, and forks such as MariaDB, Drizzle and Percona Server are well covered too. You will also have a chance to learn about commercial MySQL alternatives such as Clustrix and SchoonerSQL from our sponsors.
At the same time this is the conference for MySQL Community. We’re talking about other database systems only as it comes to migration to MySQL and about NoSQL technologies such as Memcached,Redis,Sphinx which are commonly used to supplement MySQL.
The space was very tight this year and competition was very tough. We had over 300 proposals for approximately 60 slots. As results committee had to make a lot of very tough choices and many great talks could not be accommodated.
We have a great Conference Committee this year who has done a great job getting the schedule together. I can’t thank them enough !
See you in April in Santa Clara for MySQL Conference and lets make this event an amazing success !
linux.conf.au 2012 roundup
I spent last week at linux.conf.au in Ballarat, Victoria (that’s the Victoria in Australia, not wherever else there may be one) which is only a pleasant two hour drive from my home town of Melbourne (Australia, not Florida). I sent an email internally to our experts detailing bits of the conference that may interest them – and I thought that it may also interest our wider readers who are interested in all levels of the software stack.
For those that don’t know: linux.conf.au is one of (if not the) most awesome technical conference in the free software space. It consistently attracts a wide variety of very knowledgable speakers and a large number of attendees.
Every year it is put together by a (different) set of volunteers, and this means it also tours around the country (and sometimes even New Zealand). This year it was in Ballarat – a regional city a couple of hours drive out of Melbourne. One of the great things about LCA is that you are not always at the same hotel, in the same city stuck with the same two restaurants.
This year had a bit of an increased focus on privacy, security and basic freedoms and human rights. This is no doubt a reaction to the increased attacks on freedom of speech and the internet that have been going on in recent months.
That being said, there were a huge number of great talks on a variety of topics – everything from filesystem performance to open hardware, to repurposing existing hardware to upcoming challenges for the kernel to howto be a better sysadmin. In fact… for those who weren’t there and spend any of their life helping people admin machines – go and watch those talks.
linux.conf.au (for me) is one of the cannot-miss events in the year. It’s an opportunity to learn things that directly apply to my work, may apply in the future and most certainly will never apply but are rather cool anyway.
All the video from the conference are already up! This is an amazing effort from the (volunteer) AV team. I’ve included links to a selection of talks below that I especially think are worth watching:
Watch no matter what:
- Keynote – Karen Sandler
This keynote was amazing. Go watch it. The organisers did a truly excellent job at selecting keynotes this year. - Keynote – Jacob Appelbaum
This is best described as a tour of internet freedom, the attacks on it and a tour of the modern surveillance state. - UEFI and Linux: The future is here and it’s awful
You will be depressed at some point in this talk – the news is not great for the future of even being able to easily boot free software on machines. - Paul Fenwick’s Keynote
A good quick introduction to hacking your brain. I’m sure many of you (like me) are interested in ways to hack our brains and our bodies to better serve us. This talk is merely an introduction. I also suggest you check out Anki if you want to improve your ability to remember things. - Torturing OpenSSL
This was certainly one of the most amazing talks I saw. A whole new interesting way of attacking SSL. Vary CPU voltage, extract private SSL keys! Wheee! - The Kernel Report – Jonathan Corbet
You can skip this only if you read every single mail on LKML, run your own analysis on the kernel source tree and publish an (at least) weekly publication on Linux.
This is one of the few (err… only) talks that is repeatedly accepted into linux.conf.au. Why? Because Jon manages to compress a whole year of activity in the Linux world int oa single session that is incredibly informative. - I Can’t Believe This Is Butter! A tour of BTRFS
This is going to be the default filesystem in a number of places over the next few years, time to start learning! While it’s unlikely to be suitable for database workloads anywhere in the near future, I suspect we’ll see BTRFS as the root filesystem and XFS as the filesystem for the database server in the not too distant future. - Mistakes Were Made
This session explores a number of rather indispensable things for those in operations – but also leaks over into development. Learning from our mistakes can only make us better at doing our jobs. - Hack Everything: repurposing everyday devices
This was great fun – reuse existing hardware to do awesome things!
Talks that could be quite interesting for you, depending on your interests:
- XFS: Recent and Future Adventures
This talk covered a lot of the recent improvements in XFS. Especially interesting for those with metadata heavy workloads. - Bloat: How and Why UNIX Grew Up (and Out) – Rusty Russell,Matt Evans
This is a pretty neat examination of where bloat comes from taking V6 UNIX utilities as the baseline. It would be interesting to have a similar study on the database servers we all love and use. - Migrating to PHP 5.4
Probably quite useful if you’re dealing with PHP (although I wasn’t in this session). - Finding vulnerabilities in PHP code (via static code analysis)
This one is for our web developers. It’s only a 20minute talk, so you should really all go and watch it. Various static analysis tools are part of the normal toolkit for C and C++ development these days, it’s great to see people working on them for other languages.
Talks I shall be watching the videos of as I was in another talk at the time:
- Smashing a square peg into a round hole
- Codec2: Open Source Speech Codec
- Desktop Home Hacks
- This Old Code, or Renovating Dusty Old Open Source For Fun and Profit
This is especially interesting as I both know Greg and have talked to him about this work *and* because of my experience with the Drizzle project – we certainly have encountered more than our fair share of dusty old code during our time working on Drizzle. - Mentoring: we’re Doing It Wrong
My Talk:
- Multi-tenancy, multi-master, sharding, scaling and analytics with Drizzle
An update as to what we’ve been working on getting going in Drizzle.
All the videos are going up at:
Preventing MySQL Emergencies Webinar
On the 25th of January at 10 am PST, I will present a webinar on preventing MySQL emergencies titled “Preventing Downtime in Production MySQL Servers”. The material I will present is based on in-depth research done by Percona across many production servers. We analyzed more than 150 emergency cases and categorized our findings to help you learn ways to avoid production downtimes. Join us to learn more about why emergencies happen (it may be different than what you think) and what you can do to avoid them.
Sign up for the webinar now by visiting our webinar page [http://www.percona.com/webinars/2012-01-25-preventing-downtime-in-production-mysql-servers/].
What Are Full, Incremental, and Differential Backups?
Sometimes you might hear people talk about full backups, and differential backups versus incremental backups. What is the difference?
A full backup is pretty self-explanatory. It makes a copy of all of your MySQL data.
A differential backup, on the other hand, simply records the differences since your last full backup. The advantage of taking a differential backup is usually the space savings. Most databases have a lot of data that does not change from one backup to the next. Not copying this data into your backups can result in significantly smaller backups. In addition, depending on the backup tool used, a differential backup can be less labor-intensive for the server. If a differential backup does not have to scan all of the data to determine what has changed, the differential backup process can be significantly more efficient.
An incremental backup is a modification of a differential backup. An incremental backup records the changes since the last backup of any type, be it a differential or full backup. The advantages of incremental backups are similar to those of differential backups.
Here is an example that might explain this more clearly. Suppose that you take a full backup on Sunday. On Monday, you make a backup of all the changes since Sunday. This is a differential backup.
On Tuesday is when you begin to see the differences between the backup strategies. If you back up the changes since Sunday, then you have made a differential backup. If you back up the changes since Monday, it is an incremental backup.
Why would you choose an incremental versus a differential backup? That is a little bit out of scope for this blog post, because there are a lot of subtleties to consider. However, perhaps the biggest difference is in the way that you would restore a backup. Suppose that you need to restore your database on Friday. If you have taken differential backups all week long, you only need to restore Sunday, and then apply the changes that have happened since Sunday. If you have taken incremental backups, you must restore Sunday’s backup, and then apply changes repeatedly until you reach Friday. This can be more labor intensive, error-prone, and risky. It can also take longer.
Percona XtraBackup is capable of taking incremental backups, and when you specify the base backup as the previous incremental or full backup, you can easily make differential backups.
Slides from Percona Live, Washington, DC are available
If you’ve missed Percona Live in Washington,DC and even if you did not you should be happy to know we have now published slides from majority of talks at this conference. Enjoy! This conference had great talks, whenever you’re Beginner or Advanced when it comes to MySQL. Also remember not to miss MySQL Conference in Santa Clara,CA April 10-12 2012.
Percona XtraDB Cluster Feature 2: Multi-Master replication
This is about the second great feature – Multi-Master replication, what you get with Percona XtraDB Cluster.
It is recommended you get familiar with general architecture of the cluster, described on the previous post.
By Multi-Master I mean the ability to write to any node in your cluster and do not worry that eventually you get out-of-sync situation, as it regularly happens with regular MySQL replication if you imprudently write to the wrong server.
This is long-waited feature, I’ve seen growing demand for it for last two years or even more.
Percona XtraDB Cluster provides it, and let’s see how it works.
With our Cluster you can write to any node, and the Cluster guarantees consistency of writes. That is the write is either committed on all nodes or not committed at all.
Let me draw some diagram there. For the simplicity I will use a two-nodes example, but the same logic is applied when you have N nodes.
As you see all queries are executed locally on the node, and only on COMMIT there is special handling.
When you issue COMMIT, your transaction has to pass certification on all nodes. If it does not pass, you
will receive “ERROR” as response on your query. After that transaction is applied on the local node.
That is response time of COMMIT consist of several parts:
- Network roundtrip
- Certification time
- Local applying
Please note that applying the transaction on remote nodes does not affect the response time of COMMIT,
as it happens in the background after the response on certification.
The two important consequences of this architecture.
- First: we can have several appliers working in parallel. This gives us true parallel replication. Slave can have many parallel threads, and you can tune it by variable wsrep_slave_threads
- Second: There might be a small period of time when the slave is out-of-sync from master. This happens because the master may apply event faster than a slave. And if you do read from the slave, you may read data, that has not changes yet. You can see that from diagram. However you can change this behavior by using variable wsrep_causal_reads=ON. In this case the read on the slave will wait until event is applied (this however will increase the response time of the read. This gap between slave and master is the reason why this replication named “virtually synchronous replication”, not real “synchronous replication”
The described behavior of COMMIT also has the second serious implication.
If you run write transactions to two different nodes, the cluster will use an optimistic locking model.
That means a transaction will not check on possible locking conflicts during individual queries, but rather on the COMMIT stage. And you may get ERROR response on COMMIT. I am highlighting this, as this is one of incompatibilities with regular InnoDB, that you may experience. In InnoDB usually DEADLOCK and LOCK TIMEOUT errors happen in response on particular query, but not on COMMIT. Well, if you follow a good practice, you still check errors code after “COMMIT” query, but I saw many applications that do not do that.
So, if you plan to use Multi-Master capabilities of XtraDB Cluster, and run write transactions on several nodes, you may need to make sure you handle response on “COMMIT” query.
mbgc='f5f5f5';ww='320';mbc='cecece';bbc='3F79D5';bmobc='3b71c6';bbgc='4889F0';bmoc='3F79D5';bfc='FFFFFF';bmofc='ffffff';tlc='cecece';tc='6a6a6a';nc='6a6a6a';bc='6a6a6a';l='y';fs='16';fsb='13';bw='100';ff='4';pc='4889F0';b='s'; pid='109242749016593233313';
How Does Semisynchronous MySQL Replication Work?
With the recent release of Percona XtraDB Cluster, I am increasingly being asked about MySQL’s semi-synchronous replication. I find that there are often a number of misconceptions about how semi-synchronous replication really works. I think it is very important to understand what guarantees you actually get with semi-synchronous replication, and what you don’t get.
The first thing to understand is that despite the name, semi-synchronous replication is still asynchronous. Semi-synchronous is actually a pretty bad name, because there is no strong coupling between a commit on the master and a commit on the replicas. To understand why, let’s look at what truly synchronous replication means. In truly synchronous replication, when you commit a transaction, the commit does not complete until all replicas have also committed successfully. In MySQL’s semi-synchronous replication, the commit completes before the transaction is even sent to any of the replicas. Therefore, by definition the transaction cannot have committed on any of the replicas. If there’s any problem after the commit happens on the master, it’s possible that the replicas won’t get the transaction, and even after they do, there’s no guarantee they can apply and commit it successfully themselves (duplicate key error, anyone?). If any of these problems happens, it’s too late–the commit is already permanent on the master, and can’t be rolled back.
What should semi-synchronous replication be called instead? I believe that it should be called delayed-acknowledgment commits, because this is what actually happens. When a transaction commits on the master, the commit proceeds as normal, and the transaction is sent to the replicas as normal, but the client connection to the master is not told that the commit has completed until after at least one replica has acknowledged receiving the transaction.
Another way to look at the same thing is that semi-synchronous replication actually forces the client to be synchronized, not the replicas. The client is forced to wait until the transaction has been sent to one of the replicas, but the commit on the master is not forced to wait at all, nor are replicas forced to do anything. The commit has already happened on the master, so the cat’s out of the bag and there’s no way to force replicas to do anything. As a result, the effect is that the client’s activity is throttled so that it cannot outpace the replica’s ability to fetch updates from the master. Have you seen the bumper sticker that says “don’t drive faster than your Guardian Angel can fly?” That is the effect of this throttling.
Semi-synchronous replication also does not guarantee that your replicas will not become delayed. The client connection is forced to wait until at least one of the replicas has retrieved the transaction, but not until the transaction has actually been applied to the replica. As you probably know, it is perfectly possible to send a very long transaction to the replica in a matter of milliseconds. The replica will take a long time to apply this transaction to its own data, and during that time, it will be delayed relative to the master. However, other transactions can continue committing and sending their changes to the replica, because the process of retrieving changes from the master and applying them run in separate threads on the replica.
Finally, semi-synchronous replication does not provide strong guarantees against data loss. What do I mean by a strong guarantee against data loss? I consider the safety of my data strongly guaranteed when at least one other server must have a copy of the data before it can be committed on the master. However, that is not what happens in semi-synchronous replication. And if there is an error in semi-synchronous replication, such as a crash at the wrong moment, or a timeout, then even the throttling is abandoned, and everything defaults back to the traditional mode of replication.
What does semi-synchronous replication guarantee me then? If there are no errors or timeouts, then the guarantee is essentially that only one transaction per client is likely to be lost if the master crashes.
I do not mean to sound negative, or to send the message that semi-synchronous replication is not useful. It is useful, but if you misunderstand it, you could be relying on a strong guarantee that is not actually provided.
If you want to learn more about this, then I encourage you to read the relevant section of the MySQL manual. But read carefully, for example, the following sentences:
When a commit returns successfully, it is known that the data exists in at least two places (on the master and at least one slave). If the master commits but a crash occurs while the master is waiting for acknowledgment from a slave, it is possible that the transaction may not have reached any slave.
Finally, I would be interested to hear how many people are actually running semi-synchronous replication in production. I have a feeling that very few people are, even though a lot of people seem to have heard about it. What are your experiences?
Backing up binary log files with mysqlbinlog
Backing up binary logs are essential part of creating good backup infrastructure as it gives you the possibility for point in time recovery. After restoring a database from backup you have the option to recover changes that happend after taking a backup. The problem with this approach was that you had to do periodic filesystem level backups of the binary log files which could still lead to data loss depending on the interval you back them up.
Recently in MySQL 5.6, mysqlbinlog got a new feature addition that supports connecting to remote MySQL instances and dumping binary log data to local disks ( http://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-backup.html ). This can be used as a foundation of our live binary log backups.
The wrapper script below will connect to the remote server specified in the config and ensure mysqlbinlog utility is up and running. By default if you do not supply the binary log file, mysqlbinlog deletes and overwrites them all that is undesired behaviour in our case, so we have to supply the name of the last binary log. This last file will be still overwritten hence we make a backup first.
#!/bin/sh source $1 cd $BACKUPDIR echo "Backup dir: $BACKUPDIR " while : do LASTFILE=`ls -1 $BACKUPDIR|grep -v orig|tail -n 1` TIMESTAMP=`date +%s` FILESIZE=$(stat -c%s "$LASTFILE") if [ $FILESIZE -gt 0 ]; then echo "Backing up last binlog" mv $LASTFILE $LASTFILE.orig$TIMESTAMP fi touch $LASTFILE echo "Starting live binlog backup" $MBL --raw --read-from-remote-server --stop-never --host $MYSQLHOST --port $MYSQLPORT -u $MYSQLUSER -p$MYSQLPASS $LASTFILE echo "mysqlbinlog exited with $? trying to reconnect in $RESPAWN seconds." sleep $RESPAWN doneConfiguration file:
MBL=/opt/mysql5.6/usr/bin/mysqlbinlog MYSQLHOST=10.10.10.10 MYSQLPORT=3306 MYSQLUSER=replication_user MYSQLPASS=replication_pass BACKUPDIR=/media/binlogs/server2/ # time to wait before reconnecting after failure RESPAWN=10Starting in the background with logging to /var/log/livebinlog/server2.log:
nohup /media/binlogs/livebinlog.sh /media/binlogs/livebackup.server2.conf 2>&1 > /var/log/livebinlog/server2.log &As a great addition, older logfiles that have been rotated can be checked against the MySQL server’s version if they are the same or not. For this purpose you can use rsync in “dry-run” mode.
Please note MySQL 5.6 is not yet released as GA but you can use mysqlbinlog to backup MySQL 5.1 and 5.5 databases.
Benchmarks of new innodb_flush_neighbor_pages
In our recent release of Percona Server 5.5.19 we introduced new value for innodb_flush_neighbor_pages=cont.
This way we are trying to deal with the problem of InnoDB flushing.
Actually there is also the second fix to what we think is bug in InnoDB, where it blocks queries while it is not needed (I will refer to it as “sync fix”). In this post I however will focus on innodb_flush_neighbor_pages.
By default InnoDB flushes so named neighbor pages, which really are not neighbors.
Say we want to flush page P. InnoDB is looking in an area of 128 pages around page P, and flushes all the pages in that area that are dirty. To illustrate, say we have an area of memory like this: ...D...D...D....P....D....D...D....D where each dot is a page that does not need flushing, each “D” is a dirty page that InnoDB will flush, and P is our page.
So, as the result of how it works, instead of performing 1 random write, InnoDB will perform 8 random writes.
This is quite far from original intention to flush as many pages as possible in singe sequential write.
So we added new innodb_flush_neighbor_pages=cont method, with it, only really sequential write will be performed
That is case ...D...D...D..DDDPD....D....D...D....D only following pages will be flushed:
...D...D...D..FFFFF....D....D...D....D (marked as “F”)
Beside “cont”, in Percona Server 5.5.19 innodb_flush_neighbor_pages also accepts values “area” (default) and “none” (recommended for SSD).
What kind of effect does it have ? Let’s run some benchmarks.
We repeated the same benchmark I ran in Disaster MySQL 5.5 flushing, but now we used two servers: Cisco UCS C250 and HP ProLiant DL380 G6
First results from HP ProLiant.
Response time graph (axe y has logarithmic scale):
As you see with “cont” we are able to get stable line. And even with default innodb_flush_neighbor_pages, Percona Server has smaller dips than MySQL.
So this is to show effect of “sync fix”, let’s compare Percona Server 5.5.18 (without fix) and 5.5.19 (with fix).
You see that the fix helps to have queries running in cases when before it was “hard” stop, and no
transaction processed.
The previous result may give you impression that “cont” guarantees stable line, but unfortunately this is not always the case.
There are results ( throughput and response time) from Cisco UCS 250 server:
You see, on this server we have longer and deeper periods when MySQL stuck in flushing, and in such cases, the
innodb_flush_neighbor_pages=cont only helps to relief the problem, not completely solving it.
Which, I believe, is still better than complete stop for significant amount of time.
The raw results, scripts and different CPU/IO metrics are available from our Benchmarks Launchpad
mbgc='f5f5f5';ww='320';mbc='cecece';bbc='3F79D5';bmobc='3b71c6';bbgc='4889F0';bmoc='3F79D5';bfc='FFFFFF';bmofc='ffffff';tlc='cecece';tc='6a6a6a';nc='6a6a6a';bc='6a6a6a';l='y';fs='16';fsb='13';bw='100';ff='4';pc='4889F0';b='s'; pid='109242749016593233313';
MySQL and Friends devroom at FOSDEM 2012
The votes have been counted and now there is a schedule for MySQL and Friends devroom at FOSDEM 2012.
I’m giving two talks:
Percona XtraDB Cluster Feature 1: High Availability
There and in coming posts I am going to cover main features of Percona XtraDB Cluster. The first feature is High Availability.
But before jumping to HA, let’s review general architecture of the Percona XtraDB Cluster.
1. The Cluster consists of Nodes. Recommended configuration is to have at least 3 nodes, but you can make it running with 2 nodes too.
2. Each Node is regular MySQL / Percona Server setup. The point is that you can convert your existing MySQL / Percona Server into Node and roll Cluster using it as base. Or otherwise – you can detach Node from Cluster and use it as just a regular server.
3. Each Node contains the full copy of data. That defines XtraDB Cluster behavior in many ways. And obviously there are benefits and drawbacks.
Benefits of such approach:
- When you execute a query, it is executed locally on the node. All data is available locally, no need for remote access.
- No central management. You can loose any node at any point of time, and the cluster will continue to function
- Good solution for scaling a read workload. You can put read queries to any of the nodes
Drawbacks:
- Overhead of joining new node. The new node has to copy full dataset from one of existing node. If it is 100GB, it copies 100GB.
- This can’t be used as an effective write scaling solution. There might be some improvements in write throughput when you run write traffic to 2 nodes vs all traffic to 1 node, but you can’t expect a lot. All writes still have to go on all nodes.
- You have several duplicates of data. For 3 nodes – 3 duplicates
This basically defines how Percona XtraDB Cluster can be used for High Availability.
Basic setup: you run 3-nodes setup.
The Percona XtraDB Cluster will continue to function when you take any of nodes down.
At any point of time you can shutdown any Node to perform maintenance or make configuration changes.
Or Node may crash or become network unavailable. The Cluster will continue to work, you can continue to run queries on working nodes.
The biggest question there, what will happen when the Node joins the cluster back, and there were changes to data while the node
was down.
Let’s focus on this with details.
There is two ways that Node may use when it joins the cluster: State Snapshot Transfer (SST) and Incremental State Transfer (IST).
- SST is the full copy if data from one node to another. SST is used when new node joins the cluster, it has to transfer data from existing node.
There is three methods of SST available in Percona XtraDB Cluster: mysqldump, rsync and xtrabackup ( Percona XtraBackup with support of XtraDB Cluster will be released soon, so far you need to use our source code repository).
The downside of mysqldump and rsync is that your cluster becomes READ-ONLY for time that takes to copy data from one node to another (SST applies FLUSH TABLES WITH READ LOCK command).
Xtrabackup SST does not require READ LOCK for full time, only for syncing .frm files (the same as with regular backup). - Even with that, SST may be intrusive, that’s why there is IST mechanism. If down your node for short period of time, and then start it, the node is able to fetch only changes made during period it was down.
This is done using caching mechanism on nodes. Each node contains a cache, ring-buffer, (the size is configurable) of last N changes, and the node is able to transfer part of this cache. Obviously IST can be done only if amount of changes needed to transfer is less than N. If it exceeds N, then the joining node has to perform SST.
You can monitor current state of Node by using
SHOW STATUS LIKE 'wsrep_local_state_comment', when it is ‘Synced (6)’, the node is ready to handle traffic.
mbgc='f5f5f5';ww='320';mbc='cecece';bbc='3F79D5';bmobc='3b71c6';bbgc='4889F0';bmoc='3F79D5';bfc='FFFFFF';bmofc='ffffff';tlc='cecece';tc='6a6a6a';nc='6a6a6a';bc='6a6a6a';l='y';fs='16';fsb='13';bw='100';ff='4';pc='4889F0';b='s'; pid='109242749016593233313';
Free webinar Wednesday: verifying replication integrity
Join me Wednesday for a free webinar on using the new Percona Toolkit v2.0.3 to verify replication integrity. If you’re not familiar with this topic, it’s one of those must-do things that no one officially tells you is necessary with MySQL. The new tools in Percona Toolkit 2.0.3 make it much easier and less complicated — and safer — than before. Well worth an hour of your time.
Further Reading:


Recent comments
5 years 2 weeks ago
5 years 5 weeks ago
5 years 5 weeks ago
5 years 6 weeks ago
5 years 7 weeks ago
5 years 7 weeks ago
5 years 10 weeks ago
5 years 10 weeks ago
5 years 12 weeks ago
5 years 13 weeks ago