MySQL

Percona at AWS re:Invent and events around the world

MySQL Performance Blog - Mon, 2014-11-10 14:56

November is a busy month of events for Percona. Conferences in Moscow, London, Paris, Las Vegas, Buenos Aires, and Montevideo mean our experts are racking up the frequent flyer miles meeting users of Percona software from around the world, sharing our insights on MySQL performance and operations, and learning how others are growing and scaling their applications.

Percona started the month in Moscow attending and speaking at Highload++. This event from October 31 to November 1 welcomed thousands of experts from the biggest and brightest companies in Europe and around the world with a keen interest in developing large scale web applications.

From Moscow we traveled to London to host the annual Percona Live London event from November 3-4. If you missed the keynotes, tutorials, sessions, and general MySQL camaraderie, we’re meeting again April 13-16, 2015, in Santa Clara, CA, at the Percona Live MySQL Conference & Expo, the premier event for the rich and diverse MySQL ecosystem. Based on growing demand, we’ve expanded the conference with OpenStack Live, a unique event for learning key considerations for cloud strategies, how to improve overall performance, and operational best practices for managing OpenStack and maximizing the MySQL core. The Call for Papers for both Percona Live and OpenStack Live close soon so submit your speaker proposals today!

The next stop was Paris to attend, exhibit, and speak at the OpenStack Summit. This event, which ran from November 3-7, drew over 4,000 developers, users, and administrators of OpenStack cloud software seeking to learn about the technology and plan future development. Percona is pleased to play a part in the OpenStack community by participating in the Trove project for DBaaS as well as supporting those that need help with the MySQL core of the technology. In their keynote presentation, Time Warner Cable even featured Percona as a key partner helping them to radically change their business using OpenStack.

This week we’re in Las Vegas for AWS re:Invent at The Venetian running from November 11-14. AWS re:Invent will welcome thousands of developers and operations experts interested in maximizing the power of cloud infrastructure. We want to meet organizations that are creating applications on the Amazon Web Services platform using MySQL and Percona open source software.

Visit us in the Venetian Expo Hall at booth 553 by the Developer Lounge and tell us about your application. We’ll be located next to Heroku and the team from Atlassian. Stop by and enter for a chance to win a Kindle Fire HDX 8.9!

After Las Vegas, Percona MySQL experts will be speaking at two more events, both in South America.

At the MySQL, NoSQL and Cloud 2014 Latin America Conference, Percona Support Engineer Marcos Albe, Percona Consultants Francisco Bordenave and Daniel Guzman, and Senior Remote DBA Max Bubenick will speak on topics ranging from MySQL performance, TokuDB, and best operational practices for database backup and data recovery. At TechMeetup v2014, Percona Consultant Fernando Ipar will lead workshops focused on Big Data Analytics and using the Go programming language. Percona is sponsoring both of these events so visit us at our booths.

We’re looking forward to meeting many users of Percona software as well as some of our customers. If you’re at these events, keep a look out for our team in the Percona shirts. Remember to visit us at AWS re:Invent in Las Vegas this week at booth 553 and enter our drawing to win a Kindle Fire HDX 8.9!

Expect more posts soon about all of our presentations at these events so stay tuned.

The post Percona at AWS re:Invent and events around the world appeared first on MySQL Performance Blog.

Categories: MySQL

OpenStack MySQL HA Solution = Percona XtraDB Cluster: Webinar

MySQL Performance Blog - Thu, 2014-11-06 15:48

MySQL is the most popular and widely used open source database on the planet. OpenStack is fast evolving as the defacto open source cloud platform. OpenStack services such as Nova (Compute), Cinder (Storage), Neutron (Networking), to name just a few, all use MySQL as default datastore.

OpenStack is a cloud platform and therefore needs to be highly available. Understanding how to accomplish MySQL high availability is a key piece to making OpenStack highly available. This webinar will discuss MySQL high availability options for OpenStack with primary focus on high availability with Percona XtraDB Cluster/Galera. Tradeoffs with respect to other options as well as architectural considerations will be discussed.

This webinar is titled “Percona XtraDB Cluster as a MySQL HA Solution for Openstack” and will be next Wednesday, November 12 at 1 p.m. EST, 10 a.. PST.  Register free right here!

 

The post OpenStack MySQL HA Solution = Percona XtraDB Cluster: Webinar appeared first on MySQL Performance Blog.

Categories: MySQL

Data inconsistencies on MySQL replicas: Beyond pt-table-checksum

MySQL Performance Blog - Tue, 2014-11-04 08:00

Percona Toolkit’s pt-table-checksum is a great tool to find data inconsistencies between a MySQL master and its replicas. However it is sometimes not enough to know that there are inconsistencies and let pt-table-sync fix the issue: you may want to know which exact rows are different to identify the statements that created the inconsistency. This post shows one way to achieve that goal.

The issue

Let’s assume you have 2 servers running MySQL 5.5: db1 the master and db2 the replica. You want to upgrade to MySQL 5.6 using an in-place upgrade and to play safe, you will upgrade db2 (the slave) first. If all goes well you will promote it and upgrade db1.

A good thing to do after upgrading db2 is to check for potential data inconsistencies with pt-table-checksum. Once checksumming is done, you can run the following query on db2 to see if there is any data drift:

mysql> SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksums WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +------+-------+------------+--------+ | db | tbl | total_rows | chunks | +------+-------+------------+--------+ | mydb | items | 3745563 | 17 | +------+-------+------------+--------+

This indicates that inconsistencies can be found in mydb.items in 17 chunks. Now the question is: which rows are different on db1 and db2?

The solution

The previous query shows that we will find inconsistencies in 17 of the chunks pt-table-checksum used. But what is a chunk?

mysql> SELECT * FROM percona.checksums WHERE this_crc != master_crc AND tbl='items'G ***************** 1. row ***************** db: mydb tbl: items chunk: 28 chunk_time: 0.123122 chunk_index: PRIMARY lower_boundary: 7487511 upper_boundary: 7563474 this_crc: 2c11da8d this_cnt: 75964 master_crc: 66a1c22c master_cnt: 75964 ts: 2014-10-22 01:21:26 [...]

So the first chunk with inconsistencies is chunk #28, which is the set of rows where the primary key is >= 7487511 and <= 7563474.

Let’s export all these rows on db1 and db2 instance ::

# db1 mysql> SELECT * INTO outfile '/tmp/items_db1.txt' FROM mydb.items WHERE id BETWEEN 7487511 AND 7563474; # db2 mysql> SELECT * INTO outfile '/tmp/items_db2.txt' FROM mydb.items WHERE id BETWEEN 7487511 AND 7563474;

Then let’s use diff to isolate non-matching rows

# Using diff to compare rows # diff items_db1.txt items_db2.txt 75872,75874c75872,75874 < 7563382 2127002 3 0 2014-10-22 02:51:33 < 7563383 2127002 4 0 2014-10-22 02:51:33 < 7563384 2127002 5 0 2014-10-22 02:51:33 --- > 7563382 2127002 3 0 2014-10-22 02:51:34 > 7563383 2127002 4 0 2014-10-22 02:51:34 > 7563384 2127002 5 0 2014-10-22 02:51:34 [...]

We can see that some datetime fields are off by 1 second on the 5.6 instance.

In this case, the binlogs showed queries like:

INSERT INTO items ([...],posted_at) VALUES ([...],'2014-10-22 02:51:33.835249');

MySQL 5.5 rounds '2014-10-22 02:51:33.835249' to '2014-10-22 02:51:33' (ignoring the fractional part), while MySQL 5.6 rounds it to '2014-10-22 02:51:34'.

Now it’s easy to fix the application so that it works both with MySQL 5.5 and 5.6 and then continue testing MySQL 5.6.

Conclusion

The method shown above is an easy way to find the exact records that are inconsistent between the MySQL master and a replica. It is not useful if you only want to resync the slave (in this case, just run pt-table-sync) but it can be a first step in understanding how inconsistencies are created.

The post Data inconsistencies on MySQL replicas: Beyond pt-table-checksum appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraBackup 2.2.6 is now available

MySQL Performance Blog - Mon, 2014-11-03 15:16

Percona is glad to announce the release of Percona XtraBackup 2.2.6 on November 3, 2014. Downloads are available from our download site here and Percona Software Repositories. We’re also happy to announce that Ubuntu 14.10 and CentOS 7 users can download, install, and upgrade Percona XtraBackup 2.2.6 from Percona’s software repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New Features:

  • Percona XtraBackup now reads server options from SHOW VARIABLES rather than my.cnf configuration file.
  • Percona XtraBackup now has more verbose output during initial table scan (it will now print a diagnostic message before performing a tablespace scan, which may take a long time on systems with large numbers of tablespaces) and before starting a backup/apply-log/copy-back operation in innobackupex (it will now print a diagnostic message with a timestamp to make it easier for users to get duration of the operation.)

Bugs Fixed:

  • innobackupex didn’t take the default datadir, which caused backups to fail if the datadir wasn’t specified in the my.cnf configuration file. Bug fixed #936934.
  • innobackupex will now fail with an error when innobackupex –slave-info is used on a multi-threaded non-GTID slave, because Exec_Master_Log_Pos cannot be trusted for a multi-threaded slave. Bug fixed #1372679.
  • InnoDB log scanning failure (bug #60788) would cause backups to fail. Fixed by porting the fix from MySQL 5.7. Bug fixed #1375383.
  • Options --apply-log and --decompress weren’t marked as mutually exclusive, ie. if they were both specified, only --decompress would work, which could lead to MySQL instance being started with an unprepared backup. Fixed by making mutually exclusive categories of options: 1. --decompress, --decrypt; 2. --copy-back; 3. --move-back; 4. --apply-log. Bug fixed #1376874.
  • innobackupex wasn’t creating directories specified in innodb_data_home_dir and innodb_log_group_home_dir when --copy-back option was used. Bug fixed #1382347.
  • Percona XtraBackup now supports all option modifiers supported by upstream MySQL: skip, disable, enable, maximum, loose. Bug fixed #664128.
  • Percona XtraBackup would fail to perform a full backup on Percona Server 5.5 if innodb_log_file_size variable wasn’t set in the [mysqld] section of my.cnf. Bug fixed #1334062.

Other bugs fixed: #1379905, #1386013, #1072695, #1375241, #1182841, and #1343722.

Release notes with all the bugfixes for Percona XtraBackup 2.2.6 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.2.6 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.6.21-70.0 is now available

MySQL Performance Blog - Mon, 2014-11-03 15:14

Percona is glad to announce the release of Percona Server 5.6.21-70.0 on October 30, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories. We’re also happy to announce that Ubuntu 14.10 and CentOS 7 users can download, install, and upgrade Percona Server 5.6 from Percona’s software repositories.

Based on MySQL 5.6.21, including all the bug fixes in it, Percona Server 5.6.21-70.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free. Complete details of this release can be found in the 5.6.21-70.0 milestone on Launchpad.

New Features:

Bugs Fixed:

  • Values of IP and DB fields in the Audit Log Plugin were incorrect. Bug fixed #1379023.
  • Specifying the --malloc-lib during the server start would produce two LD_PRELOAD entries, if a system-wide jemalloc library was installed. Bug fixed #1382069.
  • In multi-threaded slave replication setup, an incomplete log event group (the one which doesn’t end with COMMIT/ROLLBACK/XID) in a relay log could have caused a replication stall. An incomplete log event group might occur as a result of one of the following events: 1) slave crash; 2) STOP SLAVE or FLUSH LOGS command issued at a specific moment; 3) server shutdown at a specific moment. Bug fixed #1331586 (upstream #73066).
  • Purging bitmaps exactly up to the last tracked LSN would abort XtraDB changed page tracking. Bug fixed #1382336.
  • mysql_install_db script would silently ignore any mysqld startup failures. Bug fixed #1382782 (upstream #74440).

Other bugs fixed: #1369950, #1335590, #1067103, and #1282599.

Release notes for Percona Server 5.6.21-70.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker

The post Percona Server 5.6.21-70.0 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Backup and restore of MySQL to OpenStack Swift

MySQL Performance Blog - Sat, 2014-11-01 13:00

MySQL database usage is popular in OpenStack. Core OpenStack services for Compute (Nova), Storage (Cinder), Neutron (Networking), Image (Glance) and Identity (Keystone) all use MySQL database.

MySQL – as the world’s most popular database, runs inside OpenStack Virtual Machines and serves as database backend to OpenStack cloud based applications. The MySQL instances can be configured to run in virtual machines manually (by simply installing MySQL inside a VM and running it) or can be created in an on-demand fashion by OpenStack Database-as-a-Service (Trove).

In either case, the MySQL data is mission-critical. OpenStack cloud administrators and cloud guests/tenants need the ability to backup and restore their MySQL databases. mysqldump is traditional way of doing MySQL backups and restores. However, based on previous experiences of the MySQL community, it is widely known that mysqldump has limitations especially when it comes to speed of backup and restore as databases grow larger.

Percona XtraBackup overcomes these limitations and enables faster, non-blocking MySQL backups and restores on larger datasets. It also offers other important abilities such as streaming and encryption. It works with MySQL as well as all the variants such as Percona Server and MariaDB. And all this is open source and free. Percona XtraBackup is popular and widely used in the MySQL community. More recently it has seen adoption within OpenStack context. It is at the heart of backup/restore for Rackspace Cloud DbaaS. OpenStack Database-as-a-Service (Trove) implementation uses Percona XtraBackup as a pluggable backend for database backup and restore.

Until now, however, Percona XtraBackup did not have the ability to do streaming backup and restores of MySQL databases to cloud storage targets such as OpenStack Swift or Amazon Web Services S3. xbcloud tool is a new contribution from Percona that changes this. At a basic level, the tool enables MySQL backups to OpenStack Swift.

OpenStack Swift is the Object Service of OpenStack. It offers a high availability data storage platform that scales horizontally and offers disaster recovery of data across multiple datacenters and geographies. It is the default and popular choice for storing backups of infrastructure and tenant data in OpenStack clouds.

How does xbcloud tool work? xbcloud uploads and downloads full or part of xbstream archive to/from OpenStack Swift. So what is xbstream? xbstream is a streaming format available in Percona XtraBackup that overcomes some limitations of traditional archive formats such as tar, cpio and others which did not allow streaming dynamically generated files, for example dynamically compressed files.

Archive uploading will employ multipart upload for Large Objects on Swift. Along with this, the xbstream archive index file will be uploaded which contains list of files and their parts and offsets of those parts in xbstream archive. This index is needed for downloading only part of archive (one or several tables from backups) on demand.

Details of usage and examples follow.

Backup:

innobackupex --stream=xbstream /tmp | xbcloud [options] put <name>

Example:

innobackupex --stream=xbstream --extra-lsndir=/tmp /tmp | xbcloud put --storage=Swift --swift-container=test --swift-user=test:tester --swift-url=http://192.168.8.80:8080/ --swift-key=testing --parallel=10 full_backup

Restore:

xbcloud [options] get <name> [<list-of-files>] | xbstream -x

Example:

xbcloud get --storage=Swift --swift-container=test --swift-user=test:tester --swift-url=http://192.168.8.80:8080/ --swift-key=testing full_backup | xbstream -xv -C /tmp/downloaded_full innobackupex --apply-log /tmp/downloaded_full innobackupex --copy-back /tmp/downloaded_full

Interested? Try it out. Percona XtraBackup and xbcloud are open source and free. You can find instructions to download, usage and release notes for PXB 2.3 Alpha (which includes xbcloud) here.

So what is next? We at Percona are excited about the possibilities with this new xbcloud tool and Percona Xtrabackup in general. Our most immediate next steps would be to act on community feedback, address bugs and issues and move towards GA soon.

Beyond that we are looking to enhance xbcloud to support other cloud platforms such as AWS S3 and Google Cloud Engine.

We would love to hear from you on this. You could either leave feedback directly as comments to this post or can file enhancement requests and bugs here.

The post Backup and restore of MySQL to OpenStack Swift appeared first on MySQL Performance Blog.

Categories: MySQL

Get a handle on your HA at Percona Live London 2014

MySQL Performance Blog - Fri, 2014-10-31 05:00

From left: Liz van Dijk, Frédéric Descamps and Kenny Gryp

If you’re following this blog, it’s quite likely you’re already aware of the Percona Live London 2014 conference coming up in just a few days. Just in case, though (you know, if you’re still looking for an excuse to sign up), I wanted to put a spotlight on the tutorial to be delivered by my esteemed colleagues Frédéric Descamps (@lefred) and Kenny Gryp (@gryp), and myself.

The past two years at Percona we’ve been spending a substantial amount of time working with customers taking their first steps into creating Highly Available MySQL environments built on Galera. Percona XtraDB Cluster allows you to get it up and running very fast, but as any weathered “HA” DBA will tell you, building the cluster is only the beginning. (Percona XtraDB Cluster is an open source (free) high-availability and high-scalability solution for MySQL clustering.)

Any cluster technology is likely to introduce a great amount of complexity to your environment, and in our tutorial we want to show you not only how to get started, but also how to avoid many of the operational pitfalls we’ve encountered. Our tutorial, Percona XtraDB Cluster in a nutshell, will be taking place on Monday 3 November and is a full-day (6 hours) session, with an intense hands-on approach.

We’ll be covering a great deal of practical topics, such as:

  • Things to keep in mind when migrating an existing environment over to PXC
  • How to manage and maintain the cluster, keeping it in good shape
  • Load balancing requests across the cluster
  • Considerations for deploying PXC in the cloud

Planning on attending? Be sure to come prepared! Given the hands-on approach of the tutorial, make sure you bring your laptop with enough disk space (~20GB) and processing power to run at least 4 small VirtualBox VM’s.

We look forward to seeing you there!

The post Get a handle on your HA at Percona Live London 2014 appeared first on MySQL Performance Blog.

Categories: MySQL

Facebook MySQL database engineers ready for Percona Live London 2014

MySQL Performance Blog - Thu, 2014-10-30 05:00

With 1.28 billion active users, Facebook MySQL database engineers are active and extremely valuable contributors to the global MySQL community. So naturally they are also active participants of Percona Live MySQL conferences! And next week’s Percona Live London 2014 (Nov. 3-4) is no exception. (Register now and use the promotional code “Facebook” to save £30!)

I spoke with Facebook database engineers Yoshinori “Yoshi” Matsunobu and Shlomo Priymak about their upcoming sessions along with what’s new at Facebook since our last conversation back in April.

Tom: Yoshi, last year Facebook deployed MySQL 5.6 on all production environments – what have you and your team learned since doing that? And do you have a few best practices you could share? I realize you’ll be going into detail during your session in London (MySQL 5.6 and WebScaleSQL at Facebook), but maybe a few words on a couple of the bigger ones?

Yoshi: MySQL 5.6 has excellent replication enhancements to use in large-scale deployments. For example, crash safe slave makes it possible to recover without rebuilding a slave instance on server crash. This can greatly minimize slave downtime, especially if your database size is large. There are many other new features such as GTID, multi-threaded slave, streaming mysqlbinlog and we actively use them in production.

For InnoDB, Online DDL is a good example to ease operations. Many MySQL users are doing schema changes by switching masters. This can minimize downtime but requires operational efforts. Online DDL made things much easier.

Tom: Facebook is an active and extremely valuable part of the overall MySQL community and ecosystem – what are some of the key features and improvements you’ve contributed in the past year since moving to MySQL 5.6?

Yoshi: For InnoDB, I think online defragmentation and faster full table scan are the most valuable contributions from Facebook in 5.6. I have received very positive feedback about faster InnoDB full table scan (Logical ReadAhead). My colleague Rongrong will speak about something interesting regarding online defragmentation at Percona Live London. For Replication, we have done many optimizations to make GTID and MTS work without pain. Semi-Synchronous mysqlbinlog and backported Loss-Less semisync from MySQL 5.7 are very useful when you use Semi-Synchronous replication.

Tom: Shlomo, your sesson, “MySQL Automation at Facebook Scale,” will be of great interest to DBAs at large and growing organizations considering that Facebook has one of the world’s largest MySQL database clusters. What are the two or three most significant things that you’ve learned as a database engineer operating a cluster of this size? And has anything surprised you along the way (so far)?

Shlomo: This is a great question! We like to speak of “10x” at Facebook when thinking of scaling. For example, what would you do differently if the number of servers you had was 10x more than what it is? This type of mental exercise is surprisingly useful when working with systems at scale. If you, or any of the readers, try to extrapolate this about systems you manage, there will be things you’ll be imagining about how a system like this would be – and you won’t be too far from our reality in many aspects.

You’d imagine that we automate much of the single units of work, like master/slave failover, upgrades and schema changes. You’d suspect we have automated fault detection, self managing systems, good alarming and self remediation. You’d presume that if you’re used to running a command on 100 machines, you’ll now be running it on 1000. At least that’s what I thought to myself, so these are not the things that surprised me. There are a few fundamental shifts in one’s thinking when you get to these sizes, which I didn’t foresee.

The first one is that there is absolutely no such thing as “one-off.” If there is a server somewhere that hits a problem every three years, and you have 1000 servers, this will be happening daily! Take it to 10,000 servers, and you can see absolutely nothing is a “one-off”. We can’t write things off as “worst case, I’ll get an SMS.” Whatever it is, we have to chase it down and fix it. Not just that – to deploy a fix at scale can require writing fairly large amounts of code, a fix that could be deployed manually by a DBA in smaller environments.

The second one is adapting to constraints which are very pragmatic and tangible. If you’re on AWS, you’re pretty much isolated from things like worrying where your servers are physically located, when they go over their lifetime, and if the firmware on the switch in the rack needs to be upgraded. If you’re a small shop and have a few racks up in a co-lo, hardware maintenance is just not as frequent, but it becomes more painful as you grow.

At Facebook, we run our own datacenters! We need to work around interesting challenges, such as running datacenters that have highly variable compositions of server hardware. Since we have so many servers, something is always going on. Racks of servers need to be moved. Whole clusters need to be rebuilt or refreshed, to be made better, faster, stronger.  New datacenters are constructed, others decommissioned.

Tom: And this is where automation comes into the picture, right?

Shlomo: We have had to build a lot of automation to make these operations seamless, and we work closely with the Site Ops teams on the ground to coordinate these logistically complicated processes.

Another thing my team does in this space is planning capacity and hardware purchases. Since we build our own servers, the turnaround time between ordering and getting machines is quite long, so proper planning is paramount. Buy too much, and you’ve wasted millions of dollars. Buy too few servers, and there won’t be space for user growth and upcoming projects. The sheer scale makes these decisions more complicated and involved.

These things have actually made my job much more interesting, and I think I’d find it hard to adjust to a smaller environment.

Tom: Last April Facebook announced a move to the newly created WebScaleSQL. Yoshi, do you have an update on where WebScaleSQL is today? And I know it’s early, but has there been any impact on Facebook yet?

Yoshi: WebscaleSQL is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale. Collaboration is nothing new to the MySQL community. The intent is to make this collaboration more efficient.

We are based on the latest upstream (currently MySQL-5.6.21), and added many features. We added patches to improve InnoDB performance around compression LRU flushing, locking, NUMA Support, and doublewrite. We statically link Semi-Sync based on lessons learned at Facebook environments (plugin-lock caused hot mutex contentions). We have many upcoming features such as async clients.

We will continue to track the upstream branch that is the latest, production-ready release (currently MySQL 5.6). We are continuing to push the generally useful changes we have from all of the participants.  If you think you have something to contribute, get in touch!

Tom: I remember being surprised earlier this year when you told me there was usually just one MySQL Operations team member on call at any given time thanks to “robots.” How many robots did your team build and what do they do? Oh, and should rank-and-file DBAs around the world be worried about losing their day jobs to these robots?

Shlomo: Instead of becoming obsolete as some fear, our team is shifting its focus from smaller to larger problems, as we rise higher in the levels of abstraction. Our team has progressed with the requirements of the role. From being a team of DBAs that automate some of their work, we have become more like Production Engineers. We design, write and maintain MySQL/Facebook-specific automation that does our work for us.

While we build these software “robots” to do our work, we also have to maintain them. The job of the oncall is to fix these robots when they malfunction, and that can sometimes be difficult due to the size of our codebase.

In regards to employment concerns, I’d say our work has become more interesting, and the amount has increased. It definitely did not decrease, so if Facebook is indicative of other companies, jobs are not at risk just yet. Speaking of jobs – if what we’re doing sounds interesting, we’re hiring!

Oh, and as for details about these “robots” – that’s the topic of my talk next week in London. Come and hear me speak if you want to know more!

Tom: Yoshi, you also will host a session titled “Fast Master Failover without Data Loss.” I don’t want to give too much away, but how did you get failover to work at scale – across vast datacenters?

Yoshi: Master failure is a norm at Facebook, because of the large amount of servers. Without automation, it is not realistic for a limited number of people to manage. We have a very interesting infrastructure to automate failure handling at Facebook scale. To automate stuff, reliability is important. Unreliable automation makes engineers spend lots of time fixing things manually, and that increases downtime. It is also important to define what to automate and what we shouldn’t automate. Define failure scenarios and write good test cases and continuously integrate. There are multiple failure scenarios like the ones below and you’ll hear about each in detail at my session:

– mysqld crash
– mysqld stalls
– kernel panic and reboot
– error spikes caused by H/W failure
– error spikes caused by bad application logic
– rack switch down
– multiple rack switches down
– datacenter down

Tom: What other sessions, keynotes or events are you looking forward to at Percona Live London 2014? And are you guys planning on attending the MySQL Community Dinner?

Yoshi:MySQL 5.7: Performance and Scalability Benchmark(led by Oracle MySQL performance architect Dimitri Kravtchuk). And yes, we’re looking forward to meeting with people at MySQL Community Dinner!

Tom:  Thanks again Yoshi and Shlomo for taking the time to speak with me and I look forward to seeing you both in London next week!

And readers, I invite you to register now for Percona Live London using the promotional code “Facebook” to save £30. I also hope to see you at the MySQL Community Dinner next Monday (Nov. 3). Space is limited so be sure to reserve your spot now and join us aboard our private double-decker bus to the restaurant.

I’d also like to thank the Percona Live London 2014 Conference Committee for putting together a terrific event this year! The conference committee includes:

  • Dailymotion’s Cédric Peintre, conference chairman
  • Percona’s David Busby
  • MariaDB’s Colin Charles
  • ebay Classifieds Group’s Luis Motta Campos
  • Booking.com’s Nicolai Plum
  • Oracle’s Morgan Tocker
  • Spil Games’ Art van Scheppingen

The post Facebook MySQL database engineers ready for Percona Live London 2014 appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL and Openstack deep dive talk at OpenStack Paris Summit (and more!)

MySQL Performance Blog - Wed, 2014-10-29 12:59

I will present a benchmarking talk next week (Nov. 4) at the OpenStack Paris Summit with Jay Pipes from Mirantis. In order to be able to talk about benchmarking, we had to be able to set up and tear down OpenStack environments really quickly. For the benchmarks, we are using a deployment on AWS (ironically) where the instances aren’t actually started and the tenant network is not reachable but all the backend operations still happen.

The first performance bottleneck we hit wasn’t at the MySQL level. We used Rally to benchmark the environment. We started 1,000 fake instances with it at the first glance.

The first bottleneck that we saw was neutron-server eating up a single CPU core. We took a deeper look, and saw that neutron-server is utilizing a single core completely. By default, neutron does everything in a single process. After configuring the api workers and the rpc workers, performance became significantly better.

api_workers = 64 rpc_workers = 32

Before adding the options:

u'runner': {u'concurrency': 24, u'times': 1000, u'type': u'constant'}} +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | action | min (sec) | avg (sec) | max (sec) | 90 percentile | 95 percentile | success | count | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | nova.boot_server | 4.125 | 9.336 | 15.547 | 11.795 | 12.362 | 100.0% | 1000 | | total | 4.126 | 9.336 | 15.547 | 11.795 | 12.362 | 100.0% | 1000 | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ Whole scenario time without context preparation: 391.359671831

After adding the options:

u'runner': {u'concurrency': 24, u'times': 1000, u'type': u'constant'}} +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | action | min (sec) | avg (sec) | max (sec) | 90 percentile | 95 percentile | success | count | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ | nova.boot_server | 2.821 | 6.958 | 36.826 | 8.165 | 10.49 | 100.0% | 1000 | | total | 2.821 | 6.958 | 36.826 | 8.165 | 10.49 | 100.0% | 1000 | +------------------+-----------+-----------+-----------+---------------+---------------+---------+-------+ Whole scenario time without context preparation: 292.163493156

Stop by our talk at the OpenStack Paris Summit for more details!

In addition to our talk, Percona has two additional speakers at the OpenStack Paris Summit. George Lorch, Percona software engineer, will speak with Vipul Sabhaya of the HP Cloud Platform Services team on “Percona Server Features for OpenStack and Trove Ops.” Tushar Katarki, Percona director of product management, will present a vBrownBag Tech Talk entitled “MySQL High Availability Options for OpenStack.” Percona is exhibiting at the OpenStack Paris Summit conference, as well – stop by booth E20 and say hello!

At Percona, we’re pleased to see the adoption of our open source software by the OpenStack community and we are working actively to develop more solutions for OpenStack users. We also provide Consulting assistance to organizations that are adopting OpenStack internally or are creating commercial services on top of OpenStack.

We are also pleased to introduce the first annual OpenStack Live, a conference focused on OpenStack and Trove, which is April 13 & 14, 2015 in Santa Clara, California. The call for speaking proposals is now open for submissions which will be reviewed by our OpenStack Live Conference Committee (including me!).

The post MySQL and Openstack deep dive talk at OpenStack Paris Summit (and more!) appeared first on MySQL Performance Blog.

Categories: MySQL

How to deal with MySQL deadlocks

MySQL Performance Blog - Tue, 2014-10-28 07:00

A deadlock in MySQL happens when two or more transactions mutually hold and request for locks, creating a cycle of dependencies. In a transaction system, deadlocks are a fact of life and not completely avoidable. InnoDB automatically detects transaction deadlocks, rollbacks a transaction immediately and returns an error. It uses a metric to pick the easiest transaction to rollback. Though an occasional deadlock is not something to worry about, frequent occurrences call for attention.

Before MySQL 5.6, only the latest deadlock can be reviewed using SHOW ENGINE INNODB STATUS command. But with Percona Toolkit’s pt-deadlock-logger you can have deadlock information retrieved from SHOW ENGINE INNODB STATUS at a given interval and saved to a file or table for late diagnosis. For more information on using pt-deadlock-logger, see this post. With MySQL 5.6, you can enable a new variable innodb_print_all_deadlocks to have all deadlocks in InnoDB recorded in mysqld error log.

Before and above all diagnosis, it is always an important practice to have the applications catch deadlock error (MySQL error no. 1213) and handle it by retrying the transaction.

How to diagnose a MySQL deadlock

A MySQL deadlock could involve more than two transactions, but the LATEST DETECTED DEADLOCK section only shows the last two transactions. Also it only shows the last statement executed in the two transactions, and locks from the two transactions that created the cycle. What are missed are the earlier statements that might have really acquired the locks. I will show some tips on how to collect the missed statements.

Let’s look at two examples to see what information is given. Example 1:

1 141013 6:06:22 2 *** (1) TRANSACTION: 3 TRANSACTION 876726B90, ACTIVE 7 sec setting auto-inc lock 4 mysql tables in use 1, locked 1 5 LOCK WAIT 9 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 4 6 MySQL thread id 155118366, OS thread handle 0x7f59e638a700, query id 87987781416 localhost msandbox update 7 INSERT INTO t1 (col1, col2, col3, col4) values (10, 20, 30, 'hello') 8 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: 9 TABLE LOCK table `mydb`.`t1` trx id 876726B90 lock mode AUTO-INC waiting 10 *** (2) TRANSACTION: 11 TRANSACTION 876725B2D, ACTIVE 9 sec inserting 12 mysql tables in use 1, locked 1 13 876 lock struct(s), heap size 80312, 1022 row lock(s), undo log entries 1002 14 MySQL thread id 155097580, OS thread handle 0x7f585be79700, query id 87987761732 localhost msandbox update 15 INSERT INTO t1 (col1, col2, col3, col4) values (7, 86, 62, "a lot of things"), (7, 76, 62, "many more") 16 *** (2) HOLDS THE LOCK(S): 17 TABLE LOCK table `mydb`.`t1` trx id 876725B2D lock mode AUTO-INC 18 *** (2) WAITING FOR THIS LOCK TO BE GRANTED: 19 RECORD LOCKS space id 44917 page no 529635 n bits 112 index `PRIMARY` of table `mydb`.`t2` trx id 876725B2D lock mode S locks rec but not gap waiting 20 *** WE ROLL BACK TRANSACTION (1)

Line 1 gives the time when the deadlock happened. If your application code catches and logs deadlock errors,which it should, then you can match this timestamp with the timestamps of deadlock errors in application log. You would have the transaction that got rolled back. From there, retrieve all statements from that transaction.

Line 3 & 11, take note of Transaction number and ACTIVE time. If you log SHOW ENGINE INNODB STATUS output periodically(which is a good practice), then you can search previous outputs with Transaction number to hopefully see more statements from the same transaction. The ACTIVE sec gives a hint on whether the transaction is a single statement or multi-statement one.

Line 4 & 12, the tables in use and locked are only with respect to the current statement. So having 1 table in use does not necessarily mean that the transaction involves 1 table only.

Line 5 & 13, this is worth of attention as it tells how many changes the transaction had made, which is the “undo log entries” and how many row locks it held which is “row lock(s)”. These info hints the complexity of the transaction.

Line 6 & 14, take note of thread id, connecting host and connecting user. If you use different MySQL users for different application functions which is another good practice, then you can tell which application area the transaction comes from based on the connecting host and user.

Line 9, for the first transaction, it only shows the lock it was waiting for, in this case the AUTO-INC lock on table t1. Other possible values are S for shared lock and X for exclusive with or without gap locks.

Line 16 & 17, for the second transaction, it shows the lock(s) it held, in this case the AUTO-INC lock which was what TRANSACTION (1) was waiting for.

Line 18 & 19 shows which lock TRANSACTION (2) was waiting for. In this case, it was a shared not gap record lock on another table’s primary key. There are only a few sources for a shared record lock in InnoDB:
1) use of SELECT … LOCK IN SHARE MODE
2) on foreign key referenced record(s)
3) with INSERT INTO… SELECT, shared locks on source table
The current statement of trx(2) is a simple insert to table t1, so 1 and 3 are eliminated. By checking SHOW CREATE TABLE t1, you could confirm that the S lock was due to a foreign key constraint to the parent table t2.

Example 2: With MySQL community version, each record lock has the record content printed:

1 2014-10-11 10:41:12 7f6f912d7700 2 *** (1) TRANSACTION: 3 TRANSACTION 2164000, ACTIVE 27 sec starting index read 4 mysql tables in use 1, locked 1 5 LOCK WAIT 3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1 6 MySQL thread id 9, OS thread handle 0x7f6f91296700, query id 87 localhost ro ot updating 7 update t1 set name = 'b' where id = 3 8 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: 9 RECORD LOCKS space id 1704 page no 3 n bits 72 index `PRIMARY` of table `tes t`.`t1` trx id 2164000 lock_mode X locks rec but not gap waiting 10 Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bit s 0 11 0: len 4; hex 80000003; asc ;; 12 1: len 6; hex 000000210521; asc ! !;; 13 2: len 7; hex 180000122117cb; asc ! ;; 14 3: len 4; hex 80000008; asc ;; 15 4: len 1; hex 63; asc c;; 16 17 *** (2) TRANSACTION: 18 TRANSACTION 2164001, ACTIVE 18 sec starting index read 19 mysql tables in use 1, locked 1 20 3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1 21 MySQL thread id 10, OS thread handle 0x7f6f912d7700, query id 88 localhost r oot updating 22 update t1 set name = 'c' where id = 2 23 *** (2) HOLDS THE LOCK(S): 24 RECORD LOCKS space id 1704 page no 3 n bits 72 index `PRIMARY` of table `tes t`.`t1` trx id 2164001 lock_mode X locks rec but not gap 25 Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bit s 0 26 0: len 4; hex 80000003; asc ;; 27 1: len 6; hex 000000210521; asc ! !;; 28 2: len 7; hex 180000122117cb; asc ! ;; 29 3: len 4; hex 80000008; asc ;; 30 4: len 1; hex 63; asc c;; 31 32 *** (2) WAITING FOR THIS LOCK TO BE GRANTED: 33 RECORD LOCKS space id 1704 page no 3 n bits 72 index `PRIMARY` of table `tes t`.`t1` trx id 2164001 lock_mode X locks rec but not gap waiting 34 Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bit s 0 35 0: len 4; hex 80000002; asc ;; 36 1: len 6; hex 000000210520; asc ! ;; 37 2: len 7; hex 17000001c510f5; asc ;; 38 3: len 4; hex 80000009; asc ;; 39 4: len 1; hex 62; asc b;;

Line 9 & 10: The ‘space id’ is tablespace id, ‘page no’ gives which page the record lock is on inside the tablespace. The ‘n bits’ is not the page offset, instead the number of bits in the lock bitmap. The page offset is the ‘heap no’ on line 10,

Line 11~15: It shows the record data in hex numbers. Field 0 is the cluster index(primary key). Ignore the highest bit, the value is 3. Field 1 is the transaction id of the transaction which last modified this record, decimal value is 2164001 which is TRANSACTION (2). Field 2 is the rollback pointer. Starting from field 3 is the rest of the row data. Field 3 is integer column, value 8. Field 4 is string column with character ‘c’. By reading the data, we know exactly which row is locked and what is the current value.

What else can we learn from analysis? Since most MySQL deadlocks happen between two transactions, we could start the analysis based on that assumption. In Example 1, trx (2) was waiting on a shared lock, so trx (1) either held a shared or exclusive lock on that primary key record of table t2. Let’s say col2 is the foreign key column, by checking the current statement of trx(1), we know it did not require the same record lock, so it must be some previous statement in trx(1) that required S or X lock(s) on t2’s PK record(s). Trx (1) only made 4 row changes in 7 seconds. Then you learned a few characteristics of trx(1): it does a lot of processing but a few changes; changes involve table t1 and t2, a single record insertion to t2. These information combined with other data could help developers to locate the transaction.

Where else can we find previous statements of the transactions? Besides application log and previous SHOW ENGINE INNODB STATUS output, you may also leverage binlog, slow log and/or general query log. With binlog, if binlog_format=statement, each binlog event would have the thread_id. Only committed transactions are logged into binlog, so we could only look for Trx(2) in binlog. In the case of Example 1, we know when the deadlock happened, and we know Trx(2) started 9 seconds ago. We can run mysqlbinlog on the right binlog file and look for statements with thread_id = 155097580. It is always good to then cross refer the statements with the application code to confirm.

$ mysqlbinlog -vvv --start-datetime=“2014-10-13 6:06:12” --stop-datatime=“2014-10-13 6:06:22” mysql-bin.000010 > binlog_1013_0606.out

With Percona Server 5.5 and above, you can set log_slow_verbosity to include InnoDB transaction id in slow log. Then if you have long_query_time = 0, you would be able to catch all statements including those rolled back into slow log file. With general query log, the thread id is included and could be used to look for related statements.

How to avoid a MySQL deadlock

There are things we could do to eliminate a deadlock after we understand it.

– Make changes to the application. In some cases, you could greatly reduce the frequency of deadlocks by splitting a long transaction into smaller ones, so locks are released sooner. In other cases, the deadlock rises because two transactions touch the same sets of data, either in one or more tables, with different orders. Then change them to access data in the same order, in another word, serialize the access. That way you would have lock wait instead of deadlock when the transactions happen concurrently.

– Make changes to the table schema, such as removing foreign key constraint to detach two tables, or adding indexes to minimize the rows scanned and locked.

– In case of gap locking, you may change transaction isolation level to read committed for the session or transaction to avoid it. But then the binlog format for the session or transaction would have to be ROW or MIXED.

The post How to deal with MySQL deadlocks appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL & Friends Devroom FOSDEM 2015

MySQL Performance Blog - Mon, 2014-10-27 13:50

You can already feel the cold of February coming slowly… you can also smell waffles, fries and see a large amount of beards walking around with laptops… you are right, FOSDEM is coming! And as every year, the MySQL Community will also be present! For the 4th year in a row, I’ll perpetuate the organization of the MySQL & Friends Devroom.

FOSDEM 2015 edition will be held January 31 and February 1 here in Brussels. The MySQL & Friends Devroom is back on Sunday from 9 a.m. What is FOSDEM? It stands for the “Free and Open Source Software Developers’ European Meeting.” It’s a free event that offers open-source communities a place to meet, share ideas and collaborate.

As every year, the “Call for Papers” has been announced on the MySQL mailing list, and you can still read it here. CfP is open until December 7th!

This year the committee responsible for the talk’s selection is composed by:

* Dimitri Kravtchuk, representing Oracle
* Daniël van Eeden for the Community
* Roland Bouman for the Community
* Cédric Peintre for the Community
* Liz van Dijk, representing Percona
* Serge Frezefond, representing MariaDB
* René Cannaò, representing Blackbird IT

Thanks to all who have accepted playing this role and I wish them to work hard and make the best schedule as possible.

Don’t forget to submit your sessions (submit here, don’t forget to select MySQL track) in time and see you soon in Brussels to discover amazing stuff related to MySQL and have some beers with Friends!

The post MySQL & Friends Devroom FOSDEM 2015 appeared first on MySQL Performance Blog.

Categories: MySQL

MariaDB 10.1: Better query optimization for ORDER BY … LIMIT

Sergey Petrunia's blog - Fri, 2014-10-10 23:47

For some reason, we’ve been getting a lot of issues with ORDER BY optimization recently. The fixes have passed Elena Stepanova’s scrutiny and I’ve pushed them to MariaDB 10.1. Now, MariaDB’s ORDER BY ... LIMIT optimizer:

  • Doesn’t make stupid choices when several multi-part keys and potential range accesses are present (MDEV-6402)
  • Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT (MDEV-6657)
  • Tries hard to be smart and use cost/number of records estimates from other parts of the optimizer (MDEV-6384, MDEV-465, MySQL Bug#36817)
  • Takes full advantage of InnoDB’s Extended Keys feature when checking if filesort() can be skipped (MDEV-6796)

so, if you are struggling with poor optimization of ORDER BY ... LIMIT queries, help might be underway.

I have to admit that optimizations for ORDER BY... LIMIT are still far from perfect (both in MySQL and in MariaDB). The code is old and ugly and having spent some time debugging it, I can easily believe that it still does a poor job in some cases. The good part is that now we have some fresh knowledge of the ORDER BY … LIMIT optimization code, as well as RQG scripts for testing ORDER BY .. LIMIT queries. If you have any issues with ORDER BY LIMIT optimization, we will appreciate to see bugs filed for them.

Categories: MySQL

A discovery - Index Condition Pushdown can cause a slowdown after all

Sergey Petrunia's blog - Fri, 2014-10-10 21:56

MariaDB 5.5 and then MySQL 5.6 got Index Condition Pushdown (ICP) optimization (initially coded by yours truly). The idea of ICP is simple: after reading the index record, check the part of WHERE condition that can be computed using index columns, and only then read the table record. That way, we avoid reading table rows that don’t satisfy index condition:

It seems apparent that ICP can never make things slower. The WHERE clause has to be checked anyway, and not reading certain records can only make things faster.

That was what I thought, too, until recently Joffrey Michaie observed the contrary “in the wild”: we’ve got a real-world case where using Index Condition Pushdown was slower than not using it: MDEV-6713. The slowdown was about 20%, both on MariaDB and MySQL.

From what I could investigate so far, the slowdown is caused by these three factors working together:

  • a VARCHAR(255) column in the index. MySQL’s in-memory data representation for VARCHARs is not space efficient. If a column is defined as VARCHAR(255), any value will occupy the entire 255 bytes.
  • InnoDB’s row prefetch cache. When InnoDB detects that one is reading a lot of rows from an index, it pre-fetches index records and stores them in an internal cache. The cache uses the inefficient in-memory data representation.
  • Design of MySQL’s Item classes. One can’t evaluate an expression over a table record that is in InnoDB prefetch cache. Expression evaluation functions expect to find column values in the table’s “primary location”, internally known as table->record[0]. In order for ICP to check the condition, index columns have to be copied to table->record[0], first.

I hope we will be able to investigate this problem and post more about it soon. For now, the news is that ICP can cause a slowdown, when the index has big VARCHAR columns.

Categories: MySQL

MySQL compression: Compressed and Uncompressed data size

MySQL Performance Blog - Fri, 2014-10-10 14:34

MySQL has information_schema.tables that contain information such as “data_length” or “avg_row_length.” Documentation on this table however is quite poor, making an assumption that those fields are self explanatory – they are not when it comes to tables that employ compression. And this is where inconsistency is born. Lets take a look at the same table containing some highly compressible data using different storage engines that support MySQL compression:

TokuDB:

mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: TokuDB VERSION: 10 ROW_FORMAT: tokudb_zlib TABLE_ROWS: 40960 AVG_ROW_LENGTH: 10003 DATA_LENGTH: 409722880 MAX_DATA_LENGTH: 9223372036854775807 INDEX_LENGTH: 0 DATA_FREE: 421888 AUTO_INCREMENT: NULL CREATE_TIME: 2014-10-10 07:59:05 UPDATE_TIME: 2014-10-10 08:01:20 CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: 1 row in set (0.00 sec)

Archive:

mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: ARCHIVE VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 40960 AVG_ROW_LENGTH: 12 DATA_LENGTH: 501651 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 0 AUTO_INCREMENT: NULL CREATE_TIME: NULL UPDATE_TIME: 2014-10-10 08:08:24 CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: TABLE_COMMENT: 1 row in set (0.01 sec)

InnoDB:

mysql> select * from information_schema.tables where table_schema='test' G *************************** 1. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test TABLE_NAME: comp TABLE_TYPE: BASE TABLE ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Compressed TABLE_ROWS: 40660 AVG_ROW_LENGTH: 4168 DATA_LENGTH: 169480192 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 0 DATA_FREE: 1572864 AUTO_INCREMENT: NULL CREATE_TIME: 2014-10-10 08:33:22 UPDATE_TIME: NULL CHECK_TIME: NULL TABLE_COLLATION: latin1_swedish_ci CHECKSUM: NULL CREATE_OPTIONS: row_format=COMPRESSED KEY_BLOCK_SIZE=4 TABLE_COMMENT: 1 row in set (0.00 sec)

From this we can see what Archive and Innodb show the COMPRESSED values for DATA_LENGTH and AVG_ROW_LENGTH while TokuDB shows uncompressed one (as of TokuDB 7.5.1) shipped with Percona Server 5.6.21-69.

The problem here is not only a lack of consistency but also what we need to know about BOTH numbers. We often need to know the uncompressed value to understand how much data there is really in the table, but also how much space it takes on the disk. Comparing these also can help us to understand the compression ratio (or on the contrary expansion due to storage overhead and indexes).

Looking at Information_Schema tables available I can’t find any way to find how much uncompressed data is stored in the Innodb (or Archive) table. The simple trick I can use is running the query along those lines: SELECT SUM(LENGTH(col1)+LENGTH(col2)…) FROM T – This would slightly overestimate the length converting numbers and dates to strings but it is good enough for most purposes.

TokuDB though, while providing uncompressed information in Information_schema TABLES table, allows you to get the information of real data storage on disk from its own information schema tables:

mysql> select * from information_schema.TokuDB_fractal_tree_info where table_schema='test' and table_name='comp' limit 10 G *************************** 1. row *************************** dictionary_name: ./test/comp-main internal_file_name: ./_test_sql_147e_292e_main_2c20c08_1_1b_B_0.tokudb bt_num_blocks_allocated: 125 bt_num_blocks_in_use: 125 bt_size_allocated: 1880088 bt_size_in_use: 1502232 table_schema: test table_name: comp table_dictionary_name: main *************************** 2. row *************************** dictionary_name: ./test/comp-status internal_file_name: ./_test_sql_147e_292e_status_2c20bdd_1_1b.tokudb bt_num_blocks_allocated: 4 bt_num_blocks_in_use: 4 bt_size_allocated: 8704 bt_size_in_use: 512 table_schema: test table_name: comp table_dictionary_name: status 2 rows in set (0.01 sec)

This shows us that this table is really consisting of 2 files each “bt_size_allocated” bytes in length. These numbers are close to what you will see on the disk but not exactly. I see these files are taking 1886208 and 16384 files, respectfully. I wish there would be an exact length available to query so we do not have to think how much difference there is and if it can get large enough in some cases to care.

If you’re just looking for information about how much space has been allocated and how much is currently used for given TokuDB table you can use a query like this:

mysql> select sum(bt_size_allocated) total, sum(bt_size_in_use) used, sum(bt_size_allocated)-sum(bt_size_in_use) free from information_schema.TokuDB_fractal_tree_ +---------+---------+--------+ | total | used | free | +---------+---------+--------+ | 1888792 | 1502744 | 386048 | +---------+---------+--------+ 1 row in set (0.01 sec)

To Sum it up – there is some consistency to improve in terms of reporting compressed and uncompressed data length information in MySQL – both in terms of consistency and information available. It is great to see that TokuDB found a way to report both compressed and uncompressed data size information, yet I would really love to see the actual size on the disk that a given table is taking. And it would be great if there was some consistent way to query it from inside MySQL without having to go to the file-system level and dealing with different ways that different storage engines place data on the file system. This becomes especially important with the coming of full tablespace support in MySQL 5.7 which would make it hard to find all matching files for the table on the filesystem.

The post MySQL compression: Compressed and Uncompressed data size appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL Replication: ‘Got fatal error 1236′ causes and cures

MySQL Performance Blog - Wed, 2014-10-08 07:00

MySQL replication is a core process for maintaining multiple copies of data – and replication is a very important aspect in database administration. In order to synchronize data between master and slaves you need to make sure that data transfers smoothly, and to do so you need to act promptly regarding replication errors to continue data synchronization. Here on the Percona Support team, we often help customers with replication broken-related issues. In this post I’ll highlight the top most critical replication error code 1236 along with the causes and cure. MySQL replication error “Got fatal error 1236” can be triggered by multiple reasons and I will try to cover all of them.

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ‘log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event ‘binlog.000201′ at 5480571

This is a typical error on the slave(s) server. It reflects the problem around max_allowed_packet size. max_allowed_packet refers to single SQL statement sent to the MySQL server as binary log event from master to slave. This error usually occurs when you have a different size of max_allowed_packet on the master and slave (i.e. master max_allowed_packet size is greater then slave server). When the MySQL master server tries to send a bigger packet than defined on the slave server,  the slave server then fails to accept it and hence the error. In order to alleviate this issue please make sure to have the same value for max_allowed_packet on both slave and master. You can read more about max_allowed_packet here.

This error usually occurs when updating a huge number of rows on the master and it doesn’t fit into the value of slave max_allowed_packet size because slave max_allowed_packet size is lower then the master. This usually happens with queries “LOAD DATA INFILE” or “INSERT .. SELECT” queries. As per my experience, this can also be caused by application logic that can generate a huge INSERT with junk data. Take into account, that one new variable introduced in MySQL 5.6.6 and later slave_max_allowed_packet_size which controls the maximum packet size for the replication threads. It overrides the max_allowed_packet variable on slave and it’s default value is 1 GB. In this post, “max_allowed_packet and binary log corruption in MySQL,”my colleague Miguel Angel Nieto explains this error in detail.

Got fatal error 1236 from master when reading data from binary log: ‘Could not find first log file name in binary log index file’

This error occurs when the slave server required binary log for replication no longer exists on the master database server. In one of the scenarios for this, your slave server is stopped for some reason for a few hours/days and when you resume replication on the slave it fails with above error.

When you investigate you will find that the master server is no longer requesting binary logs which the slave server needs to pull in order to synchronize data. Possible reasons for this include the master server expired binary logs via system variable expire_logs_days – or someone manually deleted binary logs from master via PURGE BINARY LOGS command or via ‘rm -f’ command or may be you have some cronjob which archives older binary logs to claim disk space, etc. So, make sure you always have the required binary logs exists on the master server and you can update your procedures to keep binary logs that the slave server requires by monitoring the “Relay_master_log_file” variable from SHOW SLAVE STATUS output. Moreover, if you have set expire_log_days in my.cnf old binlogs expire automatically and are removed. This means when MySQL opens a new binlog file, it checks the older binlogs, and purges any that are older than the value of expire_logs_days (in days). Percona Server added a feature to expire logs based on total number of files used instead of the age of the binlog files. So in that configuration, if you get a spike of traffic, it could cause binlogs to disappear sooner than you expect. For more information check Restricting the number of binlog files.

In order to resolve this problem, the only clean solution I can think of is to re-create the slave server from a master server backup or from other slave in replication topology.

– Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000525′ at 175770780, the last event read from ‘/data/mysql/repl/mysql-bin.000525′ at 175770780, the last byte read from ‘/data/mysql/repl/mysql-bin.000525′ at 175771648.’

Usually, this caused by sync_binlog <>1 on the master server which means binary log events may not be synchronized on the disk. There might be a committed SQL statement or row change (depending on your replication format) on the master that did not make it to the slave because the event is truncated. The solution would be to move the slave thread to the next available binary log and initialize slave thread with the first available position on binary log as below:

mysql> CHANGE MASTE R TO MASTER_LOG_FILE='mysql-bin.000526', MASTER_LOG_POS=4;

– [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: ‘Client requested master to start replication from impossible position; the first event ‘mysql-bin.010711′ at 55212580, the last event read from ‘/var/lib/mysql/log/mysql-bin.000711′ at 4, the last byte read from ‘/var/lib/mysql/log/mysql-bin.010711′ at 4.’, Error_code: 1236

I foresee master server crashed or rebooted and hence binary log events not synchronized on disk. This usually happens when sync_binlog != 1 on the master. You can investigate it as inspecting binary log contents as below:

$ mysqlbinlog --base64-output=decode-rows --verbose --verbose --start-position=55212580 mysql-bin.010711

You will find this is the last position of binary log and end of binary log file. This issue can usually be fixed by moving the slave to the next binary log. In this case it would be:

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000712', MASTER_LOG_POS=4;

This will resume replication.

To avoid corrupted binlogs on the master, enabling sync_binlog=1 on master helps in most cases. sync_binlog=1 will synchronize the binary log to disk after every commit. sync_binlog makes MySQL perform on fsync on the binary log in addition to the fsync by InnoDB. As a reminder, it has some cost impact as it will synchronize the write-to-binary log on disk after every commit. On the other hand, sync_binlog=1 overhead can be very minimal or negligible if the disk subsystem is SSD along with battery-backed cache (BBU). You can read more about this here in the manual.

sync_binlog is a dynamic option that you can enable on the fly. Here’s how:

mysql-master> SET GLOBAL sync_binlog=1;

To make the change persistent across reboot, you can add this parameter in my.cnf.

As a side note, along with replication fixes, it is always a better option to make sure your replica is in the master and to validate data between master/slaves. Fortunately, Percona Toolkit has tools for this purpose: pt-table-checksum & pt-table-sync. Before checking for replication consistency, be sure to check the replication environment and then, later, to sync any differences.

The post MySQL Replication: ‘Got fatal error 1236′ causes and cures appeared first on MySQL Performance Blog.

Categories: MySQL

MySQL ring replication: Why it is a bad option

MySQL Performance Blog - Tue, 2014-10-07 16:15

I’ve recently worked with customers using replication rings with 4+ servers; several servers accepting writes. The idea behind this design is always the same: by having multiple servers, you get high availability and by having multiple writer nodes, you get write scalability. Alas, this is simply not true. Here is why.

High Availability

Having several servers is a necessary condition to have high availability, but it’s far from sufficient. What happens if for instance C suddenly disappears?

  • The replication ring is broken, so updates from A and B will never go to D. D will then quickly become so out-of-date that it’s not going to be usable. But wait! A will no longer receive the updates from B so A will quickly become non usable as well. Same thing for B. So unless you are very quick to configure a smaller ring with the remaining nodes, the whole chain will soon be non working.
  • If an event from C is still being executed on one of the other servers, it will go into an infinite loop, simply because C is the only server being able to prevent an event originally coming from C to cycle through the ring.

Conclusion: each time a server goes down, the whole system goes down. In other words, availability is poorer than with a single server.

Write Scalability

You can think that if you are able to run 1000 writes/s on a single server, writing on 4 servers in parallel will allow you to run 4000 writes/s over the whole cluster. However reality is quite different.

Don’t forget that ALL writes will be executed on ALL servers. So we have 2 separate scenarios:

  • Scenario #1: 1000 writes/s is the point where you’re hitting a bottleneck (for instance disk saturation). Then you’ll never be able to handle the extra load coming from replication. What is going to happen is simply that the servers will become slow because of overload and they’ll never be able to go beyond the 1000 writes/s mark.
  • Scenario #2: a single server could handle 5000 writes/s. Then writing on all servers will indeed allow you to claim that your cluster can absorb 4000 writes/s. But you would achieve the same result by running 4000 writes/s on a single server. This has nothing to do with write scalability.

Conclusion: As all writes are run on all servers, writing on multiple nodes doesn’t magically create extra write capacity. You’re still bound by the capacity of a single server.

Other concerns

Another concern when allowing multiple writers is write conflicts. MySQL doesn’t have any mechanism to detect or solve write conflicts.

So lots of “funny” things can happen when writes are conflicting:

  • Duplicate key errors that will cause replication to halt. And no, setting auto_increment_increment and auto_increment_offset cannot resolve all possible situations when duplicate key errors can happen.
  • An even funnier situation is when conflicting writes do not generate a replication error, but instead create hidden data inconsistencies. Like you have value=100 in a field, A does value=value+2 and B does value=valuex2. You can end up with one server having value=202 and another server having value=204. Which one is the right value? Impossible to know…

If you’re interested in learning more on the risk of writing on multiple nodes while using regular MySQL replication, you can check out this webinar.

Conclusion

A ring is one the worst MySQL replication topologies as it dramatically increases the complexity of all operations on the ring while providing no benefit.

If you need an HA solution, it is not an easy choice as there are many of them and all have tradeoffs, but a ring is definitely not the right option. This post can help you find the right candidate(s).

If you need write scalability, the options are limited, but again, MySQL ring replication is not a good fit. The main question to answer is how many writes do you want to be able to run? For instance, if you want 10x write scalability but your current workload is 100 writes/s, that’s easy: just make sure you have a decent schema, decent indexes and decent hardware. If you want 10x write scalability but you’re already running 5000 writes/s, it’s probably time to explore sharding.

The post MySQL ring replication: Why it is a bad option appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.6.21-69.0 is now available

MySQL Performance Blog - Tue, 2014-10-07 16:04

Percona is glad to announce the release of Percona Server 5.6.21-69.0 on October 7, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.21, including all the bug fixes in it, Percona Server 5.6.21-69.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free. Complete details of this release can be found in the 5.6.21-69.0 milestone on Launchpad.

New Features:

Bugs Fixed:

  • Backup Locks did not guarantee consistent SHOW SLAVE STATUS information with binary log disabled. Bug fixed #1358836.
  • Audit Log Plugin would rotate the audit log in middle of an audit message. Bug fixed #1363370.
  • When the binary log is enabled on a replication slave, SHOW SLAVE STATUS performed under an active BINLOG lock could lead to a deadlock. Bug fixed #1372806.
  • Fixed a memory leak in Metrics for scalability measurement. Bug fixed #1334570.
  • Fixed a memory leak if secure-file-priv option was used with no argument. Bug fixed #1334719.
  • LOCK TABLES FOR BACKUP is now incompatible with LOCK TABLES, FLUSH TABLES WITH READ LOCK, and FLUSH TABLES FOR EXPORT in the same connection. Bug fixed #1360064.

Other bugs fixed: #1361568.

NOTE: Automatic upgrade for Percona Server with TokuDB on Debian/Ubuntu distribution will cause an error. In order to upgrade you’ll need to force the upgrade with “apt-get install -f” or remove the percona-server-tokudb-5.6 before upgrading and install it after the server package upgrade is done.

Release notes for Percona Server 5.6.21-69.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker

The post Percona Server 5.6.21-69.0 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.5.40-36.1 is now available

MySQL Performance Blog - Tue, 2014-10-07 15:20


Percona is glad to announce the release of
Percona Server 5.5.40-36.1 on October 7, 2014. Based on MySQL 5.5.40, including all the bug fixes in it, Percona Server 5.5.40-36.1 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.34-36.1 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

Release notes for Percona Server 5.5.40-36.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

(Please also note that Percona Server 5.6 series is the latest General Availability series and current GA release is 5.6.21-69.0.)

The post Percona Server 5.5.40-36.1 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona XtraBackup 2.2.5 now available (free MySQL hot backup software)

MySQL Performance Blog - Mon, 2014-10-06 12:50

Percona is glad to announce the release of Percona XtraBackup 2.2.5 on October 2, 2014. Downloads are available from our download site here and Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New Features:

  • Percona XtraBackup has been rebased on MySQL 5.6.21.

Bugs Fixed:

  • The fix for bug #1079700 introduced a problem for users with huge numbers of InnoDB tablespaces, and the workaround of raising the open files limits didn’t work in all cases due to a limitation in the Linux kernel. A new innobackupex --close-files option has been implemented to close the file handles once they are no longer accessed. NOTE: Using this option may result in a broken backup if DDL is performed on InnoDB tables during the backup procedure. Bug fixed #1222062.
  • Fix for bug #1206309 introduced a regression in Percona XtraBackup 2.2.0 which caused Percona XtraBackup to fail to copy redo logs in random cases. Bug fixed #1365835.
  • innobackupex --galera-info didn’t copy the last binlog file when it was taking a backup from server where backup locks are supported. Bug fixed #1368577.
  • xtrabackup binary would accept arguments that were not options, which could lead to unexpected results. Bug fixed #1367377.
  • If innobackupex is run against MySQL 5.1 with built-in InnoDB, it will now suggest using Percona XtraBackup 2.0 or upgrading to InnoDB plugin, rather than just failing with the generic unsupported server version message. Bug fixed #1335101.
  • Using the (deprecated) log parameter in mysqld section would cause backups to fail. Bug fixed #1347698.
  • Percona XtraBackup now uses MySQL code to get the stack trace in case Percona XtraBackup crashes with a segmentation fault or an assertion failure. Bug fixed #766305.
  • Attempt to use any of the following options without the --incremental option now fails with an error message rather than creates a full backup: --incremental-lsn, --incremental-basedir, --incremental-history-name, --incremental-history-uuid. Bug fixed #1213778.

Other bugs fixed: #1367613, #1368574, #1370462, #1371441, #1373429, #1373984, and #1265070.

Release notes with all the bugfixes for Percona XtraBackup 2.2.5 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.2.5 now available (free MySQL hot backup software) appeared first on MySQL Performance Blog.

Categories: MySQL

HAProxy: Give me some logs on CentOS 6.5!

MySQL Performance Blog - Fri, 2014-10-03 13:59

HAProxy is frequently used as a load-balancer in front of a Galera cluster. While diagnosing an issue with HAProxy configuration, I realized that logging doesn’t work out of the box on CentOS 6.5. Here is a simple recipe to fix the issue.

If you look at the top of /etc/haproxy/haproxy.cfg, you will see something like:

global log 127.0.0.1 local2 [...]

This means that HAProxy will send its messages to rsyslog on 127.0.0.1. But by default, rsyslog doesn’t listen on any address, hence the issue.

Let’s edit /etc/rsyslog.conf and uncomment these lines:

$ModLoad imudp $UDPServerRun 514

This will make rsyslog listen on UDP port 514 for all IP addresses. Optionally you can limit to 127.0.0.1 by adding:

$UDPServerAddress 127.0.0.1

Now create a /etc/rsyslog.d/haproxy.conf file containing:

local2.* /var/log/haproxy.log

You can of course be more specific and create separate log files according to the level of messages:

local2.=info /var/log/haproxy-info.log local2.notice /var/log/haproxy-allbutinfo.log

Then restart rsyslog and see that log files are created:

# service rsyslog restart Shutting down system logger: [ OK ] Starting system logger: [ OK ] # ls -l /var/log | grep haproxy -rw-------. 1 root root 131 3 oct. 10:43 haproxy-allbutinfo.log -rw-------. 1 root root 106 3 oct. 10:42 haproxy-info.log

Now you can start your debugging session!

The post HAProxy: Give me some logs on CentOS 6.5! appeared first on MySQL Performance Blog.

Categories: MySQL
Syndicate content