MySQL

How to use MySQL Global Transaction IDs (GTIDs) in production

MySQL Performance Blog - Wed, 2014-08-20 08:00

Reconfiguring replication has always been a challenge with MySQL. Each time the replication topology has to be changed, the process is tedious and error-prone because finding the correct binlog position is not straightforward at all. Global Transaction IDs (GTIDs) introduced in MySQL 5.6 aim at solving this annoying issue.

The idea is quite simple: each transaction is associated with a unique identifier shared by all servers in a given replication topology. Now reconfiguring replication is easy as the correct binlog position can be automatically calculated by the server.

Awesome? Yes it is! However GTIDs are also changing a lot of things in how we can perform operations on replication. For instance, skipping transactions is a bit more difficult. Or you can get bitten by errant transactions, a concept that did not exist before.

This is why I will be presenting a webinar on Aug. 27 at 10 a.m. PDT: Using MySQL Global Transaction IDs in Production.

You will learn what you need to operate a replication cluster using GTIDs: how to monitor replication status or to recover from replication errors, tools that can help you and tools that you should avoid and also the main issues that can occur with GTIDs.

This webinar is free but you can register today to reserve your seat. And a recording will be available afterwards. See you next week!

The post How to use MySQL Global Transaction IDs (GTIDs) in production appeared first on MySQL Performance Blog.

Categories: MySQL

5 great new features from Percona Cloud Tools for MySQL

MySQL Performance Blog - Tue, 2014-08-19 13:00

It’s been three months since we announced anything for Percona Cloud Tools, not because we’ve been idle but because we’ve been so busy the time flew by!  Here’s the TL;DR to pique your interest:

  • EXPLAIN queries in real-time through the web app
  • Query Analytics for Performance Schema
  • Dashboards: customizable, shared groups of charts
  • Install and upgrade the agent with 1 command line
  • Unified UI: same time range, same host wherever you go

Percona Cloud Tools for MySQL is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. There’s a lot more, but let’s just look at these five new features…

 

EXPLAIN queries in real-time through the web app

Like many people, to get a query’s EXPLAIN plan you probably copy the query, ssh to the server, log in to MySQL, then paste the query after typing “EXPLAIN”.  With Percona Cloud Tools’ new real-time EXPLAIN feature you can simply click a button.  It’s a real timesaver.

The EXPLAIN plan is a vital part of diagnosing the query.  Now with Percona Cloud Tools you have a lot of powerful information in one place: the query, its metrics, its EXPLAIN plan, and more–and more to come, too!

 

Query Analytics for Performance Schema

The MySQL slow log is a wealth of indispensable data about queries that you cannot get anywhere else.  That’s why it’s the default for Percona Cloud Tools Query Analytics.  Like most things, however, it has tradeoffs: for one, it can be time-consuming to parse, especially on very busy servers.  Or, in the case of Amazon RDS, the slow log may simply not be available.  That’s ok now because with MySQL 5.6 or newer (including Percona Server 5.6 or newer) you can parse queries from the Performance Schema.  It’s not as data-rich as the slow log, but it has the basics and it’s a great alternative (and sometimes the only alternative) to the slow log.

 

Dashboards: customizable, shared groups of charts

Metrics Monitor has a default dashboard (a collection of charts) for MySQL.  The default dashboard is a great start because it’s created by us (Vadim, actually) so you know it’s relevant and meaningful for MySQL.  However, it presents only a fraction of the data that percona-agent collects, so we need more dashboards to organize and present other data.  Now you can create new dashboards which are accessible to everyone in the organization.  For example, Peter was recently benchmarking TokuDB, so he created a TokuDB-specific dashboard.

 

Install and upgrade the agent with 1 command line

As of percona-agent 1.0.6, you can install, upgrade, and uninstall the agent with a single command line, ran as root, like:

# curl -s https://cloud.percona.com/install | bash /dev/stdin -api-key <API KEY>

For many environments this is all you need for a first-time install of a new agent.  The install will auto-detect MySQL, configure, and run all services by default.  You can tweak things later in the web app.  This also means you can install percona-agent in an automated environment.

 

Unified UI: same time range, same host wherever you go

Like most projects, Percona Cloud Tools has evolved over time.  Consequently, certain parts of the web app were different than other parts.  These differences had workarounds, but now the system is unified. Pick a MySQL instance, pick a time range, then view whatever part of the app you want and these selections will stay the same.  This is, of course, a natural expectation because it allows you to see easily examine a specific system at a specific time range from different perspectives.

There’s a lot more, but we don’t want to take up too much of your time!

Percona Cloud Tools is still in free beta, but not for much longer, so be sure to sign up today!

The post 5 great new features from Percona Cloud Tools for MySQL appeared first on MySQL Performance Blog.

Categories: MySQL

Measuring failover time for ScaleArc load balancer

MySQL Performance Blog - Tue, 2014-08-19 12:00

ScaleArc hired Percona to benchmark failover times for the ScaleArc database traffic management software in different scenarios. We tested failover times for various clustered setups, where ScaleArc itself was the load balancer for the cluster. These tests complement other performance tests on the ScaleArc software – sysbench testing for latency and testing for WordPress acceleration.

We tested failover times for Percona XtraDB Cluster (PXC) and MHA (any traditional MySQL replication-based solution works pretty much the same way).

In each case, we tested failover with a rate limited sysbench benchmark. In this mode, sysbench generates roughly 10 transactions each second, even if not all 10 were completed in the previous second. In that case, the newly generated transactions are queued.

The sysbench command we used for testing is the following.

# while true ; do sysbench --test=sysbench/tests/db/oltp.lua --mysql-host=172.31.10.231 --mysql-user=root --mysql-password=admin --mysql-db=sbtest --oltp-table-size=10000 --oltp-tables-count=4 --num-threads=4 --max-time=0 --max-requests=0 --report-interval=1 --tx-rate=10 run ; sleep 1; done

The command is run in a loop, because typically at the time of failover, the application receives some kind of error while the virtual IP is moved, or while the current node is declared dead. Well-behaving applications are reconnecting and retrying in this case. Sysbench is not a well-behaving application from this perspective – after failover it has to be restarted.

This is good for testing the duration of errors during a failover procedure – but not the number of errors. In a simple failover scenario (ScaleArc is just used as a regular load balancer), the number of errors won’t be any higher than usual with ScaleArc’s queueing mechanism.

ScaleArc+MHA

In this test, we used MHA as a replication manager. The test result would be similar regardless of how its asynchronous replication is managed – only the ScaleArc level checks would be different. In the case of MHA, we tested graceful and non-graceful failover. In the graceful case, we stopped the manager and performed a manual master switchover, after which we informed the ScaleArc software via an API call for failover.

We ran two tests:

  • A manual switchover with the manager stopped, switching over manually, and informing the load balancer about the change.
  • An automatic failover where the master was killed, and we let MHA and the ScaleArc software discover it.
ScaleArc+MHA manual switchover

The manual switchover was performed with the following command.

# time ( masterha_master_switch --conf=/etc/cluster_5.cnf --master_state=alive --new_master_host=10.11.0.107 --orig_master_is_new_slave --interactive=0 && curl -k -X PUT https://172.31.10.30/api/cluster/5/manual_failover -d '{"apikey": "0c8aa9384ddf2a5756299a9e7650742a87bbb550"}' ) {"success":true,"message":"4214 Failover status updated successfully.","timestamp":1404465709,"data":{"apikey":"0c8aa9384ddf2a5756299a9e7650742a87bbb550"}} real 0m8.698s user 0m0.302s sys 0m0.220s

The curl command calls ScaleArc’s API to inform the ScaleArc software about the master switchover.

During this time, sysbench output was the following.

[ 21s] threads: 4, tps: 13.00, reads/s: 139.00, writes/s: 36.00, response time: 304.07ms (95%) [ 21s] queue length: 0, concurrency: 1 [ 22s] threads: 4, tps: 1.00, reads/s: 57.00, writes/s: 20.00, response time: 570.13ms (95%) [ 22s] queue length: 8, concurrency: 4 [ 23s] threads: 4, tps: 19.00, reads/s: 237.99, writes/s: 68.00, response time: 976.61ms (95%) [ 23s] queue length: 0, concurrency: 2 [ 24s] threads: 4, tps: 9.00, reads/s: 140.00, writes/s: 40.00, response time: 477.55ms (95%) [ 24s] queue length: 0, concurrency: 3 [ 25s] threads: 4, tps: 10.00, reads/s: 105.01, writes/s: 28.00, response time: 586.23ms (95%) [ 25s] queue length: 0, concurrency: 1

Only a slight hiccup is visible at 22 seconds. In this second, only 1 transaction was done, and 8 others were queued. These results show a sub-second failover time. The reason no errors were received is that ScaleArc itself queued the transactions during the failover process. If the transaction in question were done from an interactive client, the queuing itself would be visible as increased response time – for example a START TRANSACTION or an INSERT command is taking longer than usual, but no errors result. This is as good as it gets for graceful failover. ScaleArc knows about the failover (and in the case of a switchover initiated by a DBA, notifying the ScaleArc software can be part of the failover process). The queueing mechanism is quite configurable. Administrators can set up the timeout for the queue – we set it to 60 seconds, so if the failover doesn’t complete in that timeframe transactions start to fail.

ScaleArc+MHA non-graceful failover

In the case of the non-graceful failover MHA and the ScaleArc software have to figure out that the node died.

[ 14s] threads: 4, tps: 11.00, reads/s: 154.00, writes/s: 44.00, response time: 1210.04ms (95%) [ 14s] queue length: 4, concurrency: 4 ( sysbench restarted ) [ 1s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 1s] queue length: 13, concurrency: 4 [ 2s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 2s] queue length: 23, concurrency: 4 [ 3s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 3s] queue length: 38, concurrency: 4 [ 4s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 4s] queue length: 46, concurrency: 4 [ 5s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 5s] queue length: 59, concurrency: 4 [ 6s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 6s] queue length: 69, concurrency: 4 [ 7s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 7s] queue length: 82, concurrency: 4 [ 8s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 8s] queue length: 92, concurrency: 4 [ 9s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 9s] queue length: 99, concurrency: 4 [ 10s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 10s] queue length: 108, concurrency: 4 [ 11s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 11s] queue length: 116, concurrency: 4 [ 12s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 12s] queue length: 126, concurrency: 4 [ 13s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 13s] queue length: 134, concurrency: 4 [ 14s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 14s] queue length: 144, concurrency: 4 [ 15s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 15s] queue length: 153, concurrency: 4 [ 16s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 16s] queue length: 167, concurrency: 4 [ 17s] threads: 4, tps: 5.00, reads/s: 123.00, writes/s: 32.00, response time: 16807.85ms (95%) [ 17s] queue length: 170, concurrency: 4 [ 18s] threads: 4, tps: 18.00, reads/s: 255.00, writes/s: 76.00, response time: 16888.55ms (95%) [ 18s] queue length: 161, concurrency: 4

The failover time in this case was 18 seconds. We looked at the results and found that a check the ScaleArc software does (which involves opening an SSH connection to all nodes in MHA) take 5 seconds, and ScaleArc declares the node dead after 3 consecutive checks (this parameter is configurable). Hence the high failover time. A much lower one can be achieved with more frequent checks and a different check method – for example checking the read-only flag, or making MHA store its state in the databases.

ScaleArc+Percona XtraDB Cluster tests

Percona XtraDB Cluster is special when it comes to high availability testing because of the many ways one can use it. Many people write only to one node to avoid rollback on conflicts, but we have also seen lots of folks using all the nodes for writes.

Also, graceful failover can be interpreted differently.

  • Failover can be graceful from both Percona XtraDB Cluster’s and from the ScaleArc software’s perspective. First the traffic is switched to another node, or removed from the node in question, and then MySQL is stopped.
  • Failover can be graceful from Percona XtraDB Cluster’s perspective, but not from the ScaleArc software’s perspective. In this case the database is simply stopped with service mysql stop, and the load balancer figures out what happened. I will refer to this approach as semi-graceful from now on.
  • Failover can be completely non-graceful if a node is killed, where neither Percona XtraDB Cluster, nor ScaleArc knows about its departure.

We did all the tests using one node at a time, and using all three nodes. What makes these test sets even more complex is that when only one node at a time used, some tests (the semi-graceful and the non-graceful one) don’t have the same result if the node removed is the used one or an unused one. This process involves a lot of tests, so for the sake of brevity, I omit the actual sysbench output here – they look either like the graceful MHA and non-graceful MHA case – and only present the results in a tabular format. In the case of active/active setups, to remove the nodes gracefully we first have to lower the maximum number of connections on that node to 0.

Failover type1 node (active)1 node (passive)all nodesGracefulsub-second (no errors)no effect at allsub-second (no errors)Semi-graceful4 seconds (errors)no effect at all3 secondsNon-graceful4 seconds (errors)6 seconds (no errors)7 seconds (errors)

In the previous table, active means that the failed node did receive sysbench transactions and passive means that it didn’t.

All the graceful cases are similar to MHA’s graceful case.

If only one node is used and a passive node is removed from the cluster, by stopping the database itself gracefully with the

# service mysql stop

command, it doesn’t have an effect on the cluster. For the subsequent cases of the graceful failover, switching on ScaleArc will enable queuing similar to MHA’s case. In case of the semi-graceful, if the passive node (which has no traffic) departs, it has no effect. Otherwise, the application will get errors (because of the unexpected mysql stop), and the failover time is around 3-4 seconds for the cases when only 1 node is active and when all 3 are active. This makes sense, because ScaleArc was configured to do checks (using clustercheck) every second, and declare a node dead after three consecutive failed checks. After the ScaleArc software determined that it should fail over, and it did so, the case is practically the same as the passive node’s removal from that point (because ScaleArc removes traffic in practice making that node passive).

The non-graceful case is tied to suspect timeout. In this case, XtraDB Cluster doesn’t know that the node departed the cluster, and the originating nodes are trying to send write sets to it. Those write sets will never be certified because the node is gone, so the writes will be stalled. The exception here is the case when the active node failed. After ScaleArc figures out that the node dies (three consecutive checks at one second intervals) a new node is chosen, but because only the failed node was doing transactions, no write sets are in the remaining two nodes’ queues, which are waiting for certification, so there is no need to wait for suspect timeout here.

Conclusion

ScaleArc does have reasonably good failover time, especially in the case when it doesn’t have to interrupt transactions. Its promise of zero-downtime maintenance is fulfilled both in the case of MHA and XtraDB Cluster.

The post Measuring failover time for ScaleArc load balancer appeared first on MySQL Performance Blog.

Categories: MySQL

Getting my hands dirty on an OpenStack lab

MySQL Performance Blog - Mon, 2014-08-18 10:00

Like you all may know, OpenStack is currently one of the coolest open source projects, so I was thrilled when I was asked to manage the deployment of an OpenStack lab for internal Percona use. Starting from basically zero, I created tasks in our Jira and assigned them to a pool of volunteer consultants. As usual in a service company, billing is the priority so I ended up losing the 2 senior guys but fortunately most of my time was with a customer that wasn’t very demanding and I could easily multitask with the project and fill the gap. So, here it goes…

Hardware

To deploy the OpenStack lab we were given 8 similar servers in our Durham, N.C. offices. The specs are:

  • CPU: 12 physical cores (24 with HT)
  • Disks: 1 sata 4 TB drive and one 480GB SSD drive
  • Nics: 2x GbE
  • OS: Centos 6

The hardware is recent and decent, a good start.

Deployment choices

Given the hardware we had, I picked the first to be the controller and jumphost, the second to be the network node (a bit overkill) and the remaining 6 nodes would become the compute nodes. I also wanted to use Ceph and RBD with 2 types of volumes, the default using SATA and a SSD type using the SSD drives. The servers only have a single GbE interface to use with Ceph, that’s not ideal but sufficient.

So, we basically followed: OpenStack doc for Centos and had our share of learning and fun. Overall, it went relatively well with only a few hiccups with Neutron and Ceph.

Neutron

Neutron is probably the most difficult part to understand. You need to be somewhat familiar with all the networking tools and protocols to find your way around. Neutron relies on network namespaces, virtual network switches, GRE tunnels, iptables, dnsmasq, etc. For my part, I discovered network namespaces and virtual network switches.

The tasks of providing isolated networks to different tenants with their own set of IPs and firewalling rules, on the same infrastructure, is a challenging tasks. I enjoyed a lot reading Networking in too much detail, from there, things just started to make sense.

A first issue we encountered was that the iproute package on Centos is old and it does not support the network namespaces. It just needs to be replaced by a newer version. It took me some time to understand how things are connected, each tenant has its own Gre tunnel id and vlan. I can only recommend you read the above document and look at your setup. Don’t forget, you have one set of iptables rules per network namespace… The network node is mainly dealing with Natting, SNAT and DNAT while the security group rules are set on the compute nodes. The “ip netns exec …” and the “ovs-ofctl dump-flows …” have been my best friends for debugging.

Once I got things working, I realized the network performance was, to say the least, pretty bad. I switch “gro off” on the Nics but it made very little change. Finally, I found how the MTU of the VMs were too large, an easy fix by adding a configuration file for dnsmasq with “dhcp-option-force=26,1400″. With an MTU of 1400, less than the NIC MTU + the GRE header, packets were no longer split and performance went back to normal.

More on Ceph

The integration of Ceph happened to be more challenging than I first thought. First, let’s say the documentation is not as polished as the Openstack one, there are some rough edges but nothing unmanageable. The latest version, at the time, had no rpms for Centos but that was easy to work around if you know rpmbuild. Same for the Centos RPM for qemu-kvm and nova required a patch to support rbd devices. I succeeded deploying Ceph over the SATA drives, configured Glance and Cinder to use it and that was all good. The patch for nova allows to launch instances on clones of an image. While normally the image has to be copied to the local drive, an operation that can take some time if the image is large, with a clone, barely a few second after you started a VM, it is actually starting. Very impressive, I haven’t tested but I suppose the same can be accomplished with btrfs or ZFS, shared over iscsi.

Since the servers all have a SSD drive, I also tackled the task of setting up a SSD volume type. That has been a bit tricky, you need to setup a rule so that a given storage pool uses the SSD drives. The SSD drives must be placed all in their own branch in the ceph osd tree and then, you need to decompile the current rule set (crush map), modify it by adding a rule that use the ssd drives, recompile and then define a storage pool that uses the ssd rule. Having done this, I modified the cinder configuration for the “volume-ssd” type and finally, I could mount a ssd backed volumes, replicated, to a VM, quite cool.

The only drawback I found using Ceph is when you want to create a snapshot. Ideally, Ceph should handle the snapshot and it should be kept there as is. The way it works is less interesting, a snapshot is created but it is then copied to /tmp of the compute node, uncompressed…, and then copied back to a destination volume. I don’t know why it is done like that, maybe some workaround for limitations of other Cinder backends. The compute nodes have only 30GB available for /tmp and with Ceph, you must use raw images so that’s quite limiting. I already started to look at the code, maybe this could be my first contribution to the Openstack project.

Overall impressions

My first impression is that OpenStack is a big project with many moving, many moving parts. Installing OpenStack is not a beginners project, my experience saved me quite a few times. Logging, when you activate verbose and even debug in the configuration files is abundant and if you take your time, you can figure out what is going on and why something is failing.

Regarding the distribution, I am not sure Centos was the right fit, I think we had a rougher ride than we could have had using Ubuntu. My feeling is that the packages, not necessarily the OpenStack ones, were not as up to date as they should have compared to Ubuntu.

Ceph rbd backend is certainly a good point in my experience, I always wanted to touch Ceph and this project has been a very nice opportunity. The rbd backend works very well and the ability to launch instances almost instantaneously is just awesome. Another great plus is the storage redundancy (2 replica), like EBS, and the storage saving of working only on a clone of the original image. Also, the integration of the SSD backed pool adds a lot of flexibility. There’s only the instance snapshot issue that will need to be worked on.

So, that’s the current status of our OpenStack lab. I hope to be able to add a few more servers that are currently idle, allowing me to replace the over powerful network node and recycle it as a compute node Another thing I would really like to do is to mix the hypervisor types, I’d like to be able to use a lightweight container like LXC or docker. Also, although this is just a lab for the consultants, I’d like to see how to improve its available which is currently quite low. So, more fun to come!

The post Getting my hands dirty on an OpenStack lab appeared first on MySQL Performance Blog.

Categories: MySQL

Make a difference! See the world! Speak at Percona Live London; Santa Clara!

MySQL Performance Blog - Fri, 2014-08-15 07:00

Twice each year members of the global open-source community converge on London in November and Santa Clara in April to network with, and learn from, some of the world’s most experienced and accomplished system architects, developers and DBAs.

And now it’s time for YOU to give back to this diverse and growing MySQL community. The Call for Speakers for Percona Live London (Nov. 3-4) closes Aug. 17 and the deadline for submitting talks for the ever-growing Percona Live MySQL Conference and Expo (April 13-16, 2015) expires Nov. 9.

If you like putting things off until the last minute, then it’s time to get to work! Aug. 17 is just two days away and November will be here before you know it (think of how fast summer has gone by already).

Close your eyes and visualize the beauty of London in November (well, usually)…. And who doesn’t enjoy spring in Silicon Valley? Think BIG: Why not submit papers for BOTH conferences? Be a MySQL jetsetter! If your proposal(s) is approved by the Conference Committee then you’ll receive a complimentary full-conference pass. (You’ll have to pay for London’s fish ‘n’ chips and Santa Clara’s famed fish tacos, though. Sorry!)

I’ve said it before (yes, just up above, but it’s important): Your participation in Percona Live London and the Percona Live MySQL Conference and Expo are opportunities to make a difference… to give back to the MySQL community. Don’t worry about being fancy. Some of the most popular talks have been on simple topics. Many people come to the conferences to learn – and we need content for MySQL newcomers as well as the veterans.

Struggling to come up with an idea? Share what you know; share what you’ve learned… the good, the bad and the ugly. These stories are gold and you owe it to the MySQL community to share them – because doing so could save the day of those who hear your tale.

It’s GO time….

  • Click here to submit your speaker proposal (yes, more than one submission is OK) for Percona Live London (November 3-4 at the Millennium Gloucester Conference Center).
  • Click here to submit your speaker proposal (again, the more the merrier) for the Percona Live MySQL Conference & Expo 2015 (April 13-16, 2015 at The Hyatt Regency Santa Clara and Santa Clara Convention Center).

Questions? Need some advice or help with your submission? Let me know in the comments section and we can connect via email. I look forward to hearing from you! I also have five limited-edition Percona t-shirts to send to the first five people who submit a talk and let me know about it here in the comments section (and yes, I will check to confirm!).

One more (OK, more than one) thing:

  • Phone your colleagues, friends and family alerting them that Early Bird pricing for Percona Live London ends Aug. 31!
  • Super Saver registration discounts for the Percona Live MySQL Conference & Expo 2015 are available through Nov. 16 (at 11:30 p.m. PST)!
  • Sponsorship opportunities (this is HUGE!) for next April’s Percona Live MySQL Conference & Expo 2015 are available! Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who typically attend the event. Current (very cool) sponsors include:
    • Diamond Plus: Continuent
    • Gold: Pythian
    • Silver: Box and Yelp
    • Exhibit Only: Blackbird
    • Media Sponsor: Database Trends & Applications , Datanami, Linux Journal, and O’Reilly Media

Pssst! One more thing (yes, yet another!): These are the people you must impress when submitting a speaker proposal…. (the more you know!)

Meet the Conference Committees!

Percona Live London and the 2015 Percona Live MySQL Conference & Expo 2015 each features a dedicated Conference Committee.

Percona Live London’s Conference Committee:

  • Cédric Peintre, DBA at Dailymotion
  • David Busby, Remote DBA EMEA Regional Lead/RDBA Security Lead at Percona
  • Colin Charles, Chief Evangelist, MariaDB
  • Luis Motta Campos, DBA at ebay Classifieds Group
  • Nicolai Plum, Senior Systems Architect at Booking.com
  • Morgan Tocker, MySQL Community Manager at Oracle
  • Art van Scheppingen, Head of database engineering at Spil Games

The Percona Live MySQL Conference & Expo 2015 Conference Committee:

  • Tamar Bercovici, Senior Engineering Manager at Box
  • Colin Charles, Chief Evangelist for MariaDB
  • Sean Chighizola, Senior Director of IT Operations at Big Fish Games
  • Jeremy Cole, Sr. Systems Engineer at Google
  • Harrison Fisk, MySQL Performance Engineer at Facebook
  • Patrick Galbraith, Principal Engineer at HP
  • Jay Janssen, Consulting Lead at Percona
  • Shlomi Noach, Senior Software Engineer at Outbrain (Conference Chairman)
  • Chris Schneider, Database Architect at Groupon
  • John Scott, Lead Database Administrator at Wellcentive
  • Gwen Shapira, Solutions Architect at Cloudera
  • Shivinder Singh, Database Architect and Engineer at Verizon Wireless
  • Calvin Sun, Senior Engineering Manager at Twitter
  • Morgan Tocker, MySQL Community Manager at Oracle
  • Peter Zaitsev, Co-founder and CEO of Percona

The post Make a difference! See the world! Speak at Percona Live London; Santa Clara! appeared first on MySQL Performance Blog.

Categories: MySQL

Benchmarking IBM eXFlash™ DIMM with sysbench fileio

MySQL Performance Blog - Tue, 2014-08-12 12:00

Diablo Technologies engaged Percona to benchmark IBM eXFlash™ DIMMs in various aspects. An eXFlash™ DIMM itself is quite an interesting piece of technology. In a nutshell, it’s flash storage, which you can put in the memory DIMM slots. Enabled by Diablo’s Memory Channel Storage™ technology, this practically means low latency and some unique performance characteristics.

These devices are special, because their full performance potential is unlocked by using multiple devices to leverage the parallelism of the memory subsystem. In a modern CPU there is more than one memory controller (4 is typical nowadays) and spreading eXFlash™ DIMMs across them will provide maximum performance. There are quite some details about the device that are omitted in this post, they can be found in this redbook.

Diablo technologies also provided us a 785 GB FusionIO ioDrive 2 card (MLC). We will compare the performance of it to 4 and 8 of the eXFlash DIMMs. The card itself is a good choice for comparison because of it’s popularity, and because as you will see, it provides a good baseline.

Environment

CPU: 2x Intel Xeon E5-2690 (hyper threading enabled)
FusionIO driver version: 3.2.6 build 1212
Operating system: CentOS 6.5
Kernel version: 2.6.32-431.el6.x86_64

Sysbench command used:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=${test_mode} --max-time=${rtime} --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=${mode} --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=1 run

The variables above mean the following:
Size: the devices always had 50% free space.
Test_mode: rndwr, rndrw, rndrd for the different types of workloads (read, write, mixed)
Rtime: 4 hours for asynchronsous tests, 30 minutes for synchronous ones.
Mode: sync or async dependening on which test are we talking about.

From the tdimm devices, we created RAID0 arrays with linux md (this is the recommended way to use them).
In case of 4 devices:

mdadm --create /dev/md0 --level=0 --chunk=4 --raid-devices=4 /dev/td{a,c,e,g}

In case of 4 devices:

mdadm --create /dec/md0 --level=0 --chunk=4 --raid-devices=8 /dev/td{a,b,c,d,e,f,g,h}

The filesystem used for the tests was ext4, it was created on the whole device (md0 or fioa, no parititions), which the block size of 4k.

We tested these with read-only, write-only and mixed workloads with sysbench fileio. The read/write ratio with the mixed test was 1.5. Apart from the workload itself, we varied the IO mode as well (we did tests with synchronous IO as well as asynchronous IO). We did the following tests.

  • Asynchronous read workload (16 threads, 16k block size, 4 hour long)
  • Asynchronous write workload (16 threads, 16k block size, 4 hour long)
  • Asynchronous mixed workload (16 threads, 16k block size, 4 hour long)
  • Synchronous read workload (1..1024 threads, 16k block size, 0.5 hour long)
  • Synchronous write workload (1..1024 threads, 16k block size, 0.5 hour long)
  • Synchronous mixed workload (1..1024 threads, 16k block size, 0.5 hour long)

The asynchronous read tests lasted 4 hours, because we tested the consistency of the device’s performance with those over time.

Asynchronous IOReads

The throughput is fairly consistent. With 8 dimms, the peak throughput goes above 3 GB/s in case of 8 eXFlash DIMMs.

At the beginning of the testing we expected that latency would drop significantly in case of eXFlash DIMMs. We based this assumption on the fact that the memory is even “closer” to the processor than the PCI-E bus. In case of 8 eXFlash DIMMs, it’s visible that we are able to utilize the memory channels completely, which gives a boost in throughput as well as it gives lower latency.

Writes

The consistent asynchronous write  throughput is around 1GB/sec in case of eXFlash DIMM_8. Each case at the beginning has a spike up in throughput: that’s the effect of caching. In eXFlash DIMM itself, the data can arrive much faster on the bus than it could be written to the flash devices.

In case of write latency, the PCI-E based device is Somewhat better, and also the performance is more consistent compared to the case where we were using 4 eXFlash DIMMs. When using 8, similarly to the case where only reads were tested in isolation, both the throughput increased and the latency drops.

Mixed

In the mixed workload (which is quite relevant to some databases), the eXFlash DIMM shines. The throughput in case of 8 DIMMs can go as high as 1.1 GB/s for reads, while it’s doing roughly 750 MB/s writes.

In case of mixed workload, the latency is the eXFlash DIMMs is lower and far more consistent (the more DIMMs we use, the more consistent it will be).

Synchronous IOReads

In case of synchronous IO, we reached the peak throughput at 32 threads used in case of the PCI-E device, but needed 64 and 128 threads for the eXFlash DIMM_4 and eXFlash DIMM_8 configurations respectively (in both cases, the throughput at 32 threads was higher than the PCI-E device’s).

The read latency degrades much more gradually in case of the eXFlash DIMMs, which makes it very suitable for workloads like linkbench, where the buffer pool misses are more frequent than writes (many large web application have this kind of read mostly and the database doesn’t fit into memory profile).In order to be able to see the latency differences at a lower number of threads, this graph has to be zoomed in a bit. The following graph is the same (synchronous read latency), but the y axis has a maximum value of 10 ms.

The maximum value is hit at 256 threads in case of the PCI-E device, and performance is already severely degraded at 128 threads (which is the optimal degree of parallelism for the eXFlash DIMM_8 configuration). Massively parallel, synchronous reads is a workload where the eXFlash DIMMs are exceptionally good.

Writes

Because of caching mechanisms, we don’t need too much writer threads in order to utilize the devices. This makes sense, writes are easy to cache: after a write, the device can tell the operating system that the write is completed, when in reality it’s only completed later. In case of reads, this is different, when the application requests a block to read, the device can’t give the data to the application, but only read it later. Because of mechanisms like this, consistency is very important in case of write tests. The PCI-E device delivers the most consistent, but the lowest throughput. The reason for this is NUMA’s effect to the MCS performance. Because sysbench was allowed to use any CPU core, it may got lucky by hitting the DIMM which is in the CPU’s memory bank it’s currently running on, or may got unlucky, and the QPI was involved in getting the data. One case is not more rare than the other, hence we see the more wide bars on the graphs.

The latency is better in case of eXFlash DIMMs, but not as much as in case of isolated reads. It only seems to be slightly less in case of eXFlash DIMM_4, so let’s do a similar zoom here as we did in case of writes.

It’s visible on the graph that in case of lower concurrency, the eXFlash DIMM_4 and fusionio measurement is roughly the same, but as the workload becomes more parallel, the eXFlash DIMM will degrade more gradually. This more gradual degradation is also present in the eXFlash DIMM_8 case, but it starts to degrade later.

Mixed

In case of mixed throughput, the PCI-E device similarly shows more consistent, but lower performance. Using 8 DIMMs means a significant performance boost: the 1.1 GB/s reads and 750 MB/s writes like in case of asynchronous IO is doable with 256 threads and above.

Similarly to the previous cases, the response times are slightly better in case of using eXFlash DIMMs. Because the latency is quite high in case of 1024 threads in each case, a similar zoom like previous cases will show more details.

At a low concurrency, the latency is better on the PCI-E device, at least the maximum latency. We can see some fluctuation in case of the eXFlash DIMMs. The reason for that the NUMA, on a single thread, which CPU has the storage the benchmark is accessing matters a lot. As the concurrency gets higher, the response time degrades in case of the flash device, but in the case of eXFlash DIMMs, the maximum remains the same and the response time becomes more consistent. At 32 threads, the mean latency of the devices are very close, with the PCI-E device being slightly better. From there, the eXFlash DIMMs are degrading more gracefully.

Conclusion

Overall, I am impressed, especially with the read latency at massive parallelism. So far this is the fastest MLC device we have tested. I would be curious how would 16 or 32 DIMMs would perform in a machine which has enough sockets and memory channels for that kind of configuration. Like we moved one step closer to the CPU with PCI-E based flash, MCS is one step closer to the CPU compared to the PCI-E based flash.

The post Benchmarking IBM eXFlash™ DIMM with sysbench fileio appeared first on MySQL Performance Blog.

Categories: MySQL

Release Candidate Packages for Red Hat Enterprise Linux 7 and CentOS 7 now available

MySQL Performance Blog - Fri, 2014-08-08 13:56

The Percona team is pleased to announce the Release Candidate for Percona Software packages for Red Hat Enterprise Linux 7 and CentOS 7.

With more than 1 million downloads and thousands of production deployments at small, mid-size and large enterprises, Percona software is a industry leading distribution that enhances the performance, scale, management and diagnosis of MySQL deployments.

This new packages bring the benefits of Percona Software to developers and administrators that are using Red Hat Enterprise Linux 7 and CentOS 7.

The new packages are available from our testing repository. The packages included are: :

In addition, we have included systemd service scripts for Percona Server and Percona XtraDB Cluster.

Please try out the Percona Software for Red Hat Enterprise Linux 7 and CentOS 7 by downloading and installing the packages from our testing repository as follows:

yum install http://repo.percona.com/testing/centos/7/os/noarch/percona-testing-0.0-1.noarch.rpm

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

NOTE: These packages aren’t Generally Available (GA) and thus are not recommended for production deployments.

The post Release Candidate Packages for Red Hat Enterprise Linux 7 and CentOS 7 now available appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Toolkit 2.2.10 is now available

MySQL Performance Blog - Fri, 2014-08-08 13:07

Percona is glad to announce the release of Percona Toolkit 2.2.10 on August 8, 2014 (downloads are available here and from the Percona Software Repositories). This release is the current GA (Generally Available) stable release in the 2.2 series.

Bugs Fixed:

  • Fixed bug 1287253: pt-table-checksum would exit with error if it would encounter deadlock when doing checksum. This was fixed by retrying the command in case of deadlock error.
  • Fixed bug 1311654: When used with Percona XtraDB Cluster, pt-table-checksum could show incorrect result if --resume option was used. This was fixed by adding a new --replicate-check-retries command line parameter. If you are having resume problems you can now set --replicate-check-retries N , where N is the number of times to retry a discrepant checksum (default = 1 , no retries). Setting a value of 3 is enough to completely eliminate spurious differences.
  • Fixed bug 1299387: pt-query-digest didn’t work correctly due to a changed logging format when field Thread_id has been renamed to Id. Fixed by implementing support for the new format.
  • Fixed bug 1340728: in some cases, where the index was of type “hash” , pt-online-schema-change would refuse to run because MySQL reported it would not use an index for the select. This check should have been able to be skipped using --nocheck-plan option, but it wasn’t. Option --nocheck-plan now ignores the chosen index correctly.
  • Fixed bug 1253872: When running pt-table-checksum or pt-online-schema on a server that is unused, setting the 20% max load would fail due to tools rounding the value down. This has been fixed by rounding the value up.
  • Fixed bug 1340364: Due to incompatibility of dash and bash syntax some shell tools were showing error when queried for version.

Percona Toolkit is free and open-source. Details of the release can be found in the release notes and the 2.2.10 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.10 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

New in Percona Replication Manager: Slave resync, Async stop

MySQL Performance Blog - Tue, 2014-08-05 14:39

Percona Replication Manager (PRM) continues to evolve and improve, I want to introduce two new features: Slave resync and Async stop.

Slave resync

This behavior is for regular non-gtid replication.  When a master crashes and is no longer available, how do we make sure all the slaves are in a consistent state. It is easy to find the most up to date slave and promote it to be the new master based on the master execution position, the PRM agent already does that but how do we apply the missing transactions to the other slaves.

In order to solve that problem, I modified a tool originally written by Yelp, that outputs the MD5 sums of the payload (XID boundaries) and the commit positions of a binlog file. It produces an output like:

root@yves-desktop:/home/yves/src/PRM/percona-pacemaker-agents/tools/prm_binlog_parser# ./prm_binlog_parser.static.x86_64 /opt/prm/binlog.000382 | tail -n 5 53844618,E5A35A971445FC8E77D730932869F2 53846198,E37AA6704BE572727B342ABE6EFA935 53847779,B17816AC37424BB87B3CD55226D5EB17 53848966,A7CFF846D281DB363FE4192C38DD7743 53850351,A9C5B7FC24A1BA3B97C332CC362633CE

Where the first field is the commit position and the second field, the md5 sum of the payload. The only type of transaction that is not supported is “LOAD DATA INTO”. Of course, since it relies on XID values, this only works with InnoDB. It also requires the “log_slave_updates” to be enabled. The sources and some static binaries can be found in the tools directory on the PRM github, link is at the bottom.

So, if the agent detects a master crash and the prm_binlog_parser tool is available (the prm_binlog_parser_path primitive parameter), upon promotion, the new master will look at its binary logs and publish to the pacemaker CIB, the last 3000 transactions. The 3000 transactions limit is from bash, the command line must be below 64k. With some work this limit could be raised but I believe, maybe wrongly, that it covers most of the cases. The published data in the CIB attribute also contains the corresponding binlog file.

The next step happens on the slaves, when they get the post-promotion notification. They look for the MD5 sum of their last transaction in the relay log, again using the prm_binlog_parser tool, find the matching file and position in the last 3000 transactions the new master published to the CIB and reconfigure replication using the corrected file and position.

The result is a much more resilient solution that helps having the slave in a consistent state. My next goal regarding this feature will be to look at the crashed master and attempt to salvage any transactions from the old master binlog in the case MySQL crashed but Pacemaker is still up and running.

Async_stop

The async_stop feature, enabled by the “async_stop” primitive parameter, allows for faster failover. Without this feature, when stopping mysqld, PRM will wait until is confirmed stopped before completing a failover. When there’re many InnoDB dirty pages, we all know that stopping mysql can take many minutes. Jervin Real, a colleague at Percona, suggested that we should fake to Pacemaker that MySQL is stopping in order to proceed faster with failover. After adding some safe guards, it proved to be a very good idea. I’ll spare you of the implementation details but now, if the setting is enabled, as soon as mysqld is effectively stopping, the failover completes. If it happens that Pacemaker wants to start a stopping MySQL instance, a very usual situation, an error will be generated.

PRM agents, tools and documentation can be found here: https://github.com/percona/percona-pacemaker-agents

The post New in Percona Replication Manager: Slave resync, Async stop appeared first on MySQL Performance Blog.

Categories: MySQL

Q&A: Putting MySQL Fabric to use

MySQL Performance Blog - Mon, 2014-08-04 18:56

Martin Arrieta and I gave an online presentation last week on “Putting MySQL Fabric To Use.” If you missed it, you can find a recording and the slides here, and the vagrant environment we used plus a transcript of the commands we ran here (be sure to check out the ‘sharding’ branch, as that’s what we used during the webinar).

Thank you all for attending and asking interesting questions. We were unable to answer all of them in the scheduled time, so here are our replies to all the questions.

What is GTID? And how does it relate to MySQL Fabric?
GTID stands for Global Transaction Identifier, and MySQL Fabric requires MySQL replication to be configured to used this in order to work. This means that it won’t work with MySQL versions previous to 5.6. You can find more information in the manual, and in the upcoming Percona webinar, “Using MySQL Global Transaction IDs in Production,” that our colleague Stephane Combaudon will present on August 27.

During any issue, does MySQL Fabric internally do a switch over or failover?
MySQL Fabric will only detect failures and initiate failovers for a group that has been activated. Otherwise, any changes must be made manually through the mysqlfabric command.

For an alter table, how do you avoid replication lag?
We think using pt-online-schema for the alter table is a good way to avoid or minimize this problem.

Are there benchmarks available for MySQL Fabric?
We’re not aware of any, but once we wrap up our current series of blog posts on the topic, we plan to do some benchmarks ourselves. Our interest lies in the overhead the connectors can place vs a standard connector accessing a single MySQL Server, but if you have other ideas, let us know in the comments and we’ll consider adding them to our list.

Can MySQL Fabric be used to handle Big Data?
We think the HA features are usable regardless of data set size, with the caveat that MySQL Fabric does not (currently) handle provisioning the data of the managed instances. It just configures and points replication as required, but you’re responsible for making sure all servers have the same set of data.

We feel that the sharding features, however, may not be a good fit for working with very large data sets yet, specially because of the use of mysqldump/mysql on the shard split/move operations. However, since sharding usually goes hand in hand with big data (a data set size too large is one of the reasons to shard after all) we’re sure this will get improved in the future. Or, someone with some python skills can adjust the scripts to use a more efficient mechanism to move data from one server to another, like Percona XtraBackup.

Does sharding require many changes to application coding (database code on shards etc) at the MySQL Fabric level? Should we sit with the developers/architects to help them understand tables, their usage, and to plan sharding?
Sharding with MySQL Fabric is not transparent (and we’d say sharding, in general, is typically not transparent) so some application coding and DBAs/Devs interaction will be needed. But you should be working close to your developers and architects (or DBAs and sysadmins, depending on which side of the counter you stand) anyway

Would you say that MySQL Fabric is a good HA replacement removing the need for VIPs?
We don’t see them as mutually exclusive. Simplifying, the HA features of MySQL Fabric are just another way to manage replication between a set of MySQL servers. It doesn’t force any way to make writable and readable nodes available to clients. With some coding, you can use MySQL Fabric together with VIPs, or dynamic DNS entries, or a load balancer like HA Proxy.

Does global group mean fabric node?
If you did not attend the webinar, we’d recommend you take a look at slide 7 to get the required context for this question.

Global group means a group that has the schema for all tables, all the data for global tables, and no data for sharded tables. Since it’s a group, it can be one or more nodes, but at any given time, there will only be one PRIMARY node in the group, and the PRIMARY nodes on all shard groups will be replication slaves of this node.

Is the config file at the application side or on the MySQL Fabric node?
There is some configuration at both ends. The MySQL Fabric node (the ‘store’ server in our Vagrant setup) has some configuration, but applications need their own too. The separation is rather clean though. Typically, the application only needs to know how to reach the MySQL Fabric server (host, port, credentials), while the MySQL Fabric node needs to know how to connect to the nodes it manages, how long to wait before starting a failover, etc. The configuration on MySQL Fabric’s side is done by the fabric.cfg file, while on the application side, it depends on the connector used.

When setting a faulty to secondary, does it also automatically come back in replication as a slave?
Yes, though remember you’ll first have to set it to SPARE

When we set primary and secondary and it sets the replication, it never copies data and builds replication? It only starts replication from that side and we will lose all the data which is not available on secondary
Yes to the first question. As to the second, we reiterate the fact that MySQL Fabric does not handle provisioning data for instances. When you’re using it to manage nodes, it (currently) assumes the nodes all have a copy of the data. It is your responsibility to make sure that is the case, though this may change in the future. To give you a specific example, suppose you run the following commands:


mysqlfabric group create ha
mysqlfabric group add ha node1
mysqlfabric group add ha node2

Then, on node1′s MySQL’s CLI:

create database test;

and finally run:

mysqlfabric group promote ha --slave_id=<uuid_for_node2>

You’ll end up with node2 being PRIMARY, node1 being SECONDARY, without the ‘test’ database on the PRIMARY node.

Are there any limitations around adding shards? Is it possible to re-configure the sharding keys to handle the additional shard(s)?
We don’t know about any limitations, and yes, it is possible to add more shards and configure the mapping. We did not have enough time to do this at the webinar, but the transcript on the repo includes examples for splitting and moving shards. What you want in this case is to add a new group (can be a single node) and then split a shard (optionally providing a pivot value), keeping one part on the original group and the rest on the new one. Fernando encountered one problem when doing this on a busy server, though the bug is not yet verified so that may have been a user error on his side, but we’d recommend you do similar tests yourself before attempting this in production.

Is the addition of Connectors at the MySQL Fabric level or even at application side? Is an application code change a huge effort?
It’s at the application side. It’s difficult to estimate the effort for the code changes, but if your database access is reasonably modularized/abstracted from your application logic, it shouldn’t be huge. In fact, for HA, if you don’t want to split read and writes, you can migrate an application to a MySQL Fabric-aware connector with just a couple of lines changed (changing the connection string and then requesting a MODE_READWRITE connection always).

What is the minimum server requirements to setup mysqlfabric and does it use any multithreading if i have more than one core for each node. Can we get the transcripts with examples..
We don’t know of the official minimum requirements, but it is a python script which in our experience has been lightweight. Remember it does *not* act as a middle man/proxy between your app and the servers, it is only involved when a client establishes a new connection or when a change in status happens in a server. As for multithreading, we know it’s concurrent (i.e. multiple clients can establish connections at the same time) but we don’t know if it’s parallel.

The transcript with examples can be found here.

How does MySQL Fabric handle latency issues? Does latency increase with mysql traffic?
MySQL Fabric uses regular MySQL connections so latency will affect it the same as any other app using MySQL, and yes, it will increase with traffic (but not due to Fabric).

What if MySQL Fabric is hung? Will connections going to the primary stop? And how can we deduce if there is an issue at the fabric side?
If MySQL Fabric is hung, the following scenarios will be problematic:

  • New connections come in. Clients will try to connect to Fabric before trying to connect any hosts (they won’t know what hosts to connect to otherwise!) so if MySQL Fabric is down, no new connections can be established.
  • A failure happens in the managed servers. MySQL Fabric is responsible for monitoring for failures and taking the appropriate actions (including promoting another server, repointing replicas, etc), so if something fails while MySQL Fabric is down, there will be nobody to take action.

Are there any scenarios where a data loss can happen when promoting/demoting nodes (or nodes failure happens) in a production environment?
Given the use of gtid-enforce-consistency and the fact that MySQL Fabric won’t promote a node to PRIMARY until it has caught up with any pending changes, we feel this is unlikely, but we’re planning a blog post specifically focused on evaluating the potential of data loss during failover.

How to configure MySQL Fabric to use synchronous / semi-synchronous mechanisms?
We have not attempted this, but from this manual page, we think currently only async replication is supported. I would expect sync to be in the roadmap though. If this is something you’d like to see, another section on the same page has some suggestions on how to make it happen.

Thanks again for attending the webinar, and feel free to post any further questions in the comments section.

The post Q&A: Putting MySQL Fabric to use appeared first on MySQL Performance Blog.

Categories: MySQL

Advanced MySQL Query Tuning (Aug. 6) and MySQL 5.6 Performance Schema (Aug. 13) webinars

MySQL Performance Blog - Mon, 2014-08-04 13:56

I will be presenting two webinars in August:

This Wednesday’s webinar on advanced MySQL query tuning will be focused on tuning the “usual suspects”: queries with “Group By”, “Order By” and subqueries; those query types are usually perform bad in MySQL and add an additional load as MySQL may need to create a temporary table(s) or perform a filesort. New this year: I will talk more about new MySQL 5.6 features that can increase MySQL query performance for subqueries and “order by” queries.

Next week’s performance schema webinar will be focused on practical use of Performance Schema in MySQL 5.6. I will show real-world examples of MySQL monitoring and troubleshooting using MySQL 5.6 Performance Schema. Performance Schema can give you a lots of valuable information about MySQL internal operations. For example, if using a multi-tenant shared MySQL installation, Performance Schema can give you a lots of extensive information about your MySQL load: breakdown by user, queries, tables and schema (see more examples in my previous blog post: Using MySQL 5.6 Performance Schema in multi-tenant environments).

 

 

The post Advanced MySQL Query Tuning (Aug. 6) and MySQL 5.6 Performance Schema (Aug. 13) webinars appeared first on MySQL Performance Blog.

Categories: MySQL

Paris OpenStack Summit Voting – Percona Submits 16 MySQL Talks

MySQL Performance Blog - Fri, 2014-08-01 05:00

MySQL plays a critical role in OpenStack. It serves as the host database supporting most components such as Nova, Glance, and Keystone and is the most mature guest database in Trove. Many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for the Paris OpenStack Summit.

Paris OpenStack Summit presentations are chosen by OpenStack member voting. Please vote for our talks by clicking the titles below that interest you. You must be an OpenStack Foundation member to vote. If you aren’t a member, sign up here – it’s free and only takes a minute. The deadline to vote is Wednesday, August 6, 2014!

Paris OpenStack Summit MySQL Talks Submitted by PerconaOpenStack Operations

MySQL Database Operations in the OpenStack World
Speaker: Stéphane Combaudon

MySQL High Availability Options for Openstack
Speakers: Stéphane Combaudon

Host and Guest Database Backup and Recovery for OpenStack Ops
Speakers: George Lorch, David Busby

Benchmarking the Different Cinder Storage Backends
Speaker: Peter Boros

MySQL and OpenStack Deep Dive
Speakers: Peter Boros, Jay Pipes (Mirantis)

Trove Performance Tuning for MySQL
Speaker: Alexander Rubin

Schema Management: Versioning and Automation with Puppet and MySQL Utilities
Speaker: Frederic Descamps

Deploying Databases for OpenStack
Speakers: Matt Griffin, Jay Pipes (Mirantis), Amrith Kumar (Tesora), Vinay Joosery (Severalnines)

Related Open Source Software Projects

Introduction to Percona XtraDB Cluster
Speaker: Kenny Gryp

Percona Server Features for OpenStack and Trove Ops
Speakers: George Lorch, Vipul Sabhaya (HP Cloud)

Products, Tools & Services

ClusterControl: Efficient and reliable MySQL Management, Monitoring, and Troubleshooting for OpenStack HA
Speakers: Peter Boros, Vinay Joosery (Severalnines)

Advanced MySQL Performance Monitoring for OpenStack Ops
Speaker: Daniel Nichter

Targeting Apps for OpenStack Clouds

Oars in the Cloud: Virtualization-aware Galera instances
Speaker: Raghavendra Prabhu

ACIDic Clusters: Review of contemporary ACID-compliant databases with synchronous replication
Speaker: Raghavendra Prabhu

Cloud Security

Security: It’s more than just your database you should worry about
Speaker: David Busby

Planning Your OpenStack Project

Infrastructure at Scale
Speaker: Michael Coburn

The Paris OpenStack Summit will offer developers, operators, and service providers with valuable insights into OpenStack. The Design Summit sessions will be filled with lively discussions driving OpenStack development including sessions defining the future of Trove, the DBaaS (database as a service) component near and dear to Percona’s heart. There will also be many valuable presentations in the main Paris OpenStack Summit conference about operating OpenStack, utilizing the latest features, complimentary software and services, and real world case studies.

Thank you for your support. We’re looking forward to seeing many Percona software users at the Paris OpenStack Summit in November.

The post Paris OpenStack Summit Voting – Percona Submits 16 MySQL Talks appeared first on MySQL Performance Blog.

Categories: MySQL

MariaDB: Selective binary logs events

MySQL Performance Blog - Thu, 2014-07-31 21:29

In the first post in a series on MariaDB features we find interesting, we begin with selectively skipping replication of binlog events. This feature is available on MariaDB 5.5 and 10.

By default when using MySQL’s standard replication, all events are logged in the binary log and those binary log events are replicated to all slaves (it’s possible to filter out some schema). But with this feature, it’s also possible to bypass some events to be replicated on the slave(s) even if they are written in the binary log. Having those event in the binary logs is always useful for point-in-time recovery.

Indeed, usually when we need to not replicate an event, we set sql_log_bin = 0 and the event is bypassed: neither written into the binlog, neither replicated to slave(s).

So with this new feature, it’s possible to just set a session variable to tag events that will be written into the binary log and bypassed on demand on some slaves.

And it’s really easy to use, on the master you do:

set skip_replication=1;

and on the slave(s) having replicate_events_marked_for_skip='FILTER_ON_MASTER' or 'FILTER_ON_SLAVE' the events skipped on the master won’t be replicated.

The valid values for replicate_events_marked_for_skip are:

  • REPLICATE (default) : skipped events are replicated on the slave
  • FILTER_ON_SLAVE : events so marked will be skipped on the slave and not replicated
  • FILTER_ON_MASTER : the filtering will be done on the master so the slave won’t even receive it and then save network bandwidth

That’s a cool feature but when this can be very useful?

Use case:

For archiving this can be very interesting. Indeed most of the time when people is archiving data, they use something like pt-archiver that deletes the data and copy the removed data on an archive server.

Thanks to this feature, instead of having an archiving server where we copy the deleted data, it’s possible to have a slave where we won’t delete the data. This will be much faster (smarter?) and allows to have an archiving server always up to date. Of course in this case sql_log_bin = 0 would have worked (if we ignore the point-in-time recovery).
But with a Galera Cluster? Yes that’s where this feature is really cool, if we would have used sql_log_bin = 0 on a Galera Cluster node, all other nodes would have ignored the delete and the result would be inconsistency between the nodes.

So if you use an asynchronous slave as an archiving server of a Galera Cluster, this feature is really mandatory.

As illustrated below, you can have a MariaDB Galera Cluster node joining a Percona XtraDB Cluster that will be used to delete historical data using pt-archiver:

pt-archiver is started with --set-vars "skip_replication=1"

The post MariaDB: Selective binary logs events appeared first on MySQL Performance Blog.

Categories: MySQL

Percona Server 5.1.73-14.12 is now available

MySQL Performance Blog - Thu, 2014-07-31 13:52

Percona Server version 5.1.73-14.12

Percona is glad to announce the release of Percona Server 5.1.73-14.12 on July 31st, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.1.73, including all the bug fixes in it, Percona Server 5.1.73-14.12 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.73-14.12 milestone at Launchpad.

NOTE: Packages for Debian 7.0 (wheezy), Ubuntu 13.10 (saucy) and 14.04 (trusty) are not available for this release due to conflict with newer packages available in those releases.

Bugs Fixed:

  • Percona Server couldn’t be built with Bison 3.0. Bug fixed #1262439, upstream #71250.
  • Ignoring Query Cache Comments feature could cause server crash. Bug fixed #705688.
  • Database administrator password could be seen in plain text when debconf-get-selections was executed. Bug fixed #1018291.
  • If XtraDB variable innodb_dict_size was set, the server could attempt to remove a used index from the in-memory InnoDB data dictionary, resulting in a server crash. Bugs fixed #1250018 and #758788.
  • Ported a fix from MySQL 5.5 for upstream bug #71315 that could cause a server crash f a malformed GROUP_CONCAT function call was followed by another GROUP_CONCAT call. Bug fixed #1266980.
  • MTR tests from binary tarball didn’t work out of the box. Bug fixed #1158036.
  • InnoDB did not handle the cases of asynchronous and synchronous I/O requests completing partially or being interrupted. Bugs fixed #1262500 (upstream #54430).
  • Percona Server version was reported incorrectly in Debian/Ubuntu packages. Bug fixed #1319670.
  • Percona Server source files were referencing Maatkit instead of Percona Toolkit. Bug fixed #1174779.
  • The XtraDB version number in univ.i was incorrect. Bug fixed #1277383.

Other bug fixes: #1272732, #1167486, and #1314568.

Release notes for Percona Server 5.1.73-14.12 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.1.73-14.12 is now available appeared first on MySQL Performance Blog.

Categories: MySQL

Examining the TokuDB MySQL storage engine file structure

MySQL Performance Blog - Wed, 2014-07-30 15:07

As we know different storage engines in MySQL have different file structures. Every table in MySQL 5.6 must have a .frm file in the database directory matching the table name. But where the rest of the data resides depends on the storage engine.

For MyISAM we have .MYI and .MYD files in the database directory (unless special settings are in place); for InnoDB we might have data stored in the single table space (typically ibdata1 in the database directory) or as file per table (or better said file per partition) producing a single file with .ibd extension for each table/partition. TokuDB as of this version (7.1.7) has its own innovative approach to storing the table contents.

I have created the table in the database test having the following structure:

CREATE TABLE `mytable` ( `id` int(10) unsigned NOT NULL, `c` varchar(15) NOT NULL, `d` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `c` (`c`), KEY `d` (`d`,`c`), KEY `d_2` (`d`) ) ENGINE=TokuDB DEFAULT CHARSET=latin1

No files appear in the “test” database directory besides mytable.frm, however few files are created in the database directory:

-rwxrwx--x 1 mysql mysql 40960 Jul 29 21:01 _test_mytable_key_c_22f19b0_3_19_B_0.tokudb -rwxrwx--x 1 mysql mysql 16896 Jul 29 21:02 _test_mytable_key_d_2_22f223b_3_19_B_0.tokudb -rwxrwx--x 1 mysql mysql 16896 Jul 29 21:01 _test_mytable_key_d_22f1c9a_3_19_B_0.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:01 _test_mytable_main_22f1818_2_19.tokudb -rwxrwx--x 1 mysql mysql 65536 Jul 29 21:02 _test_mytable_status_22f1818_1_19.tokudb

As you can see the table is presented by a series of files – the “status” file, the “main” table which contains clustered fractal tree index (primary key) plus each index is stored in its own file. Note how files are named – to include the database name, file name and the key name (the name you give to the key, not the columns involved). This is followed by something like “22f1818_1_19″ which I assume is kind of internal TokuDB object identifier.

Note also (at least in my system) files are created with executable bit set. I see no reason for this and this is probably just a minor bug.

Another minor bug (or intended design limitation?) seems to be TokuDB might loose the actual table name in its file name when you alter the table. For example as I altered the table to drop one of the keys and add another one named “superkey” I see the “mytable” name is replaced with “sql_390c_247″ which looks very much like the temporary table which was used to rebuild the table:

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_c_22f6f7d_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_d_22f6f7d_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:15 _test_sql_390c_247_key_superkey_22f6f7d_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:14 _test_sql_390c_247_main_22f6f7d_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:14 _test_sql_390c_247_status_22f6f7d_1_19.tokudb

I like the approach of storing different indexes in the different files as this makes it much easier to drop the index as well as potentially allows the placement of indexes onto different storage if it is desired for some reason. However putting all tables in the database root is a bad idea – having substantial amount of tables, especially with few indexes, each producing huge amounts of files, makes it inconvenient to work with the database directory (which often contains other files – log files, binary logs etc.) plus it might push file systems to their limits or performance limits dealing with huge amounts of files in the single directory.

Many also like having files in the data directory as it allows, in basic configurations, to use simple Unix tools such as du to see how much space given database physically takes.

Same as InnoDB and MyISAM, TokuDB will create a separate set of files for each partition, placing it in the same directory, having same table partitioned by HASH on primary key with 4 partitions I observe:

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_c_22f9f35_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_d_22f9f35_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_key_superkey_22f9f35_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_main_22f9f35_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p0_status_22f9f35_1_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_c_22f9f8e_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_d_22f9f8e_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_key_superkey_22f9f8e_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_main_22f9f8e_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p1_status_22f9f8e_1_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_c_22f9fe1_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_d_22f9fe1_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_key_superkey_22f9fe1_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_main_22f9fe1_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p2_status_22f9fe1_1_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_c_22f9ffb_3_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_d_22f9ffb_5_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_key_superkey_22f9ffb_4_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_main_22f9ffb_2_19.tokudb -rwxrwx--x 1 mysql mysql 32768 Jul 29 21:22 _test_sql_390c_25b_P_p3_status_22f9ffb_1_19.tokudb

As you can see “P_p1″ to “P_p2″ suffixes added to each of the files.

What other files exist beyond those which come from TokuDB tables?

There are few system files

-rwxrwx--x 1 mysql mysql 32768 Jul 29 21:16 tokudb.directory -rwxrwx--x 1 mysql mysql 16384 Jul 17 19:09 tokudb.environment -rwxrwx--x 1 mysql mysql 1048576 Jul 29 21:22 tokudb.rollback

Which as their name say contain “directory” – metadata about tables and indexes in the system, rollback – contains data which is needed for transaction rollback and environment contains some kind of information about environment. I did not dig into this – I just see this file is not changed after I have started the instance unlike other 2 which are changed when table structure is changed (tokudb.directory) or when database is modified (tokudb.rollback)

Next you will see

-rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_data -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_environment -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_logs -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_recovery -rw------- 1 mysql mysql 0 Jul 17 19:09 __tokudb_lock_dont_delete_me_temp

I see TokuDB really tries to lock the database instance preventing concurrent access to the directory with multiple files. I think InnoDB does it more clean way placing the lock on system tablespace which does not require extra files… though the TokuDB team might have some specific reason to do it this way. Might be Oracle holds the software patent on preventing concurrent database operation by locking the file?

Finally there is the transaction log file:

-rwx------ 1 mysql mysql 37499084 Jul 29 21:34 log000000002593.tokulog25

TokuDB transaction log files are not pre-allocated like InnoDB’s but they look more similar to MySQL binary logs – they have sequentially increasing file numbers which will increment as new files are created, file itself will grow as new data is written to the log. As I understand log rotation happens during checkpoint and you would typically see only one log file.

There is a lot more for me to learn when it comes to TokuDB file layout and purpose of individual files, yet I hope this provides you with good basic overview of the TokuDB MySQL storage engine.

Further Reading: Rich Prohaska from TokuDB team was kind enough to share some more good reading with me, for those extra curious: TokuDB Files and File Descriptors, Separate Database Directories Design

The post Examining the TokuDB MySQL storage engine file structure appeared first on MySQL Performance Blog.

Categories: MySQL

Prevent MySQL downtime: Set max_user_connections

MySQL Performance Blog - Tue, 2014-07-29 12:05

One of the common causes of downtime with MySQL is running out of connections. Have you ever seen this error? “ERROR 1040 (00000): Too many connections.” If you’re working with MySQL long enough you surely have. This is quite a nasty error as it might cause complete downtime… transient errors with successful transactions mixed with failing ones as well as only some processes stopping to run properly causing various kinds of effects if not monitored properly.

There are number of causes for running out of connections, the most common ones involving when the Web/App server is creating unexpectedly large numbers of connections due to a miss-configuration or some script/application leaking connections or creating too many connections in error.

The solution I see some people employ is just to increase max_connections to some very high number so MySQL “never” runs out of connections. This however can cause resource utilization problems – if a large number of connections become truly active it may use a lot of memory and cause the MySQL server to swap or be killed by OOM killer process, or cause very poor performance due to high contention.

There is a better solution: use different user accounts for different scripts and applications and implement resource limiting for them. Specifically set max_user_connections:

mysql> GRANT USAGE ON *.* TO 'batchjob1'@'localhost' -> WITH MAX_USER_CONNECTIONS 10;

This approach (available since MySQL 5.0) has multiple benefits:

Security – different user accounts with only required permissions make your system safer from development errors and more secure from intruders
Preventing Running out of Connections – if there is a bug or miss-configuration the application/script will run out of connections of course but it will be the only part of the system affected and all other applications will be able to use the database normally.
Overload Protection – Additional numbers of connections limits how much queries you can run concurrently. Too much concurrency is often the cause of downtime and limiting it can reduce the impact of unexpected heavy queries running concurrently by the application.

In addition to configuring max_user_connections for given accounts you can set it globally in my.cnf as “max_user_connections=20.” This is too coarse though in my opinion – you’re most likely going to need a different number for different applications/scripts. Where max_user_connections is most helpful is in multi-tenant environments with many equivalent users sharing the system.

The post Prevent MySQL downtime: Set max_user_connections appeared first on MySQL Performance Blog.

Categories: MySQL

What I learned while migrating a customer MySQL installation to Amazon RDS

MySQL Performance Blog - Mon, 2014-07-28 11:00

Hi, I recently had the experience of assisting with a migration of a customer MySQL installation to Amazon RDS (Relational Database Service). Amazon RDS is a great platform for hosting your MySQL installation and offers the following list of pros and cons:

  • You can scale your CPU, IOPS, and storage space separately by using Amazon RDS. Otherwise you need to take downtime and upgrade physical components of a rack-mounted server.
  • Backups, software version patching, failure detection, and (some) recovery is automated with Amazon RDS.
  • You lose shell access to your DB instance
  • You lose SUPER privilege for regular users. Many SUPER-type statements and commands are provided for as a Stored Procedure.
  • It is easy to set up multiple read replicas (slaves in READ_ONLY=1 mode).
  • You can set up a secondary sychronous instance for failover in the event your primary instance fails.

While this article is written to be Amazon RDS-specific it also has implications for any sort of migration.

  1. The only way to interface with RDS is through mysql client, which means loading data must be done using SQL. This means you need to use mysqldump or mydumper, which can be a large endeavour should your dataset be north of 500GB — this is a lot of single threaded activity!  Think about not only how long dumping and loading will take, but also factor in how much time it will take for replication to catch up on the hours/days/weeks your dumping and loading procedure took.  You might need to allocate more disk space and Provisioned IOPS to your RDS node in order to improve disk throughput, along with a change to innodb_flush_log_at_trx_commit, and sync_binlog.
  2. RDS is set to UTC (system_time_zone=UTC) and this cannot be changed as in Parameter Groups you will see that default_time_zone is set as Modifiable=false. This can bite you if you are planning to use RDS as a slave for a short while before failing the application over to Amazon RDS.  If you have configured binlog_format=STATEMENT on your master and you have TIMESTAMP columns, this will lead to differences in RDS data set for absolute values ’2014-07-24 10:15:00′ vs NOW(). It is also a concern for the Developer who may not be explicitly declaring their MySQL connections to set an appropriate time zone. Often the best piece of advice can be to leave all database data in UTC no matter where the server is physically located, and deal with localization at the presentation layer.
  3. Amazon RDS by default has max_allowed_packet=1MB. This is pretty low as most other configs are 16MB so if you’re using extended-insert (by default, you are), the size of each insert statement will be close to 16MB and thus can lead to errors related to “packet too big” on Amazon RDS side, thus failing out an import.
  4. Amazon RDS does not support the SUPER privilege for regular users. For example, this becomes quite a challenge as many tools (Percona Toolkit) are authored to assume you have SUPER-level access on all nodes — simple tasks become vastly more complicated as you need to think of clever workarounds (I’m looking at you pt-table-sync!).
  5. Triggers and views thus cannot be applied using the default mysqldump syntax which includes SQL DEFINER entries — these lines are there so that a user with SUPER can “grant” another user ability to execute the trigger/view. Your load will fail if you forget this.
  6. Consider running your load with –force to the mysql client, and log to disk stderr/stdout so you can review errors later. It is painful to spend 4 days loading a 500GB database only to have it fail partially through because you forgot about SQL DEFINER issue..
  7. Consider splitting the mysqldump into two phases: –no-data so you dump schema only, and then –data-only so you get just the rows. This way you can isolate faults and solve them along the way.
  8. Skipping replication events is SLOW. You don’t have ability to do sql_slave_skip_counter (since this requires SUPER), instead you need to use an Amazon RDS function of mysql.rds_skip_repl_error. Sadly this Stored Procedure takes no argument and thus it only skips one event at a time. It takes about 2-3 seconds for each execution, so if you have a lot of events to skip, that’s a problem. Having to skip ANYTHING is indication that something went wrong in the process, so if you find yourself in the unenviable position of skipping events, know that pt-table-checksum should be able to give you an idea how widespread is the data divergence issue.
  9. pt-table-sync doesn’t work against Amazon RDS as it is written to expect SUPER because it wants to do binlog_format=STATEMENT in session, but that’s not allowed. Kenny Gryp hacked me a version to just skip this check, and Kenny also reported it for inclusion in a future release of Percona Toolkit, but in the meantime you need to work around the lack of SUPER privilege.
  10. pt-table-sync is SLOW against RDS. As pt-table-sync doesn’t log a lot of detail about where time is spent, I haven’t completely isolated the source of the latency, but I suspect this is more about network round trip than anything else.
  11. innodb_log_file_size is hardcoded to 128MB in Amazon RDS, you can’t change this.  innodb_log_files_in_group is not even showing up in Parameter Groups view but SHOW GLOBAL VARIABLES reports as 2. So you’re cookin’ on 256MB, if your writes are heavy this may become a bottleneck with little workaround available in MySQL.
  12. CHANGE MASTER isn’t available in RDS. You define RDS as a slave by calling a stored procedure where you pass the appropriate options such as CALL mysql.rds_set_external_master.

For those of you wondering about the SUPER-privilege, I was fortunate that Bill Karwin from Percona’s Support team took the time to review my post and suggested I dig into this deeper, turns out that Amazon didn’t hack MySQL to remove the SUPER privilege, but instead run the Stored Procedures with security_type of DEFINER:

mysql> select db,name,type,language,security_type,definer from proc where name = 'rds_external_master' G *************************** 1. row *************************** db: mysql name: rds_external_master type: PROCEDURE language: SQL security_type: DEFINER definer: rdsadmin@localhost 1 row in set (0.08 sec)

mysql> show grants for 'rdsadmin'@'localhost'; +------------------------------------------------------------------------------------------------------+ | Grants for rdsadmin@localhost | +------------------------------------------------------------------------------------------------------+ | GRANT ALL PRIVILEGES ON *.* TO 'rdsadmin'@'localhost' IDENTIFIED BY PASSWORD 'XXX' WITH GRANT OPTION | +------------------------------------------------------------------------------------------------------+ 1 row in set (0.07 sec)

So for those of you working with Amazon RDS, I hope that this list saves you some time and helps our your migration!  If you get stuck you can always contact Percona Consulting for assistance.

The post What I learned while migrating a customer MySQL installation to Amazon RDS appeared first on MySQL Performance Blog.

Categories: MySQL

Monitoring MySQL flow control in Percona XtraDB Cluster 5.6

MySQL Performance Blog - Fri, 2014-07-25 14:41

Monitoring flow control in a Galera cluster is very important. If you do not, you will not understand why writes may sometimes be stalled. Percona XtraDB Cluster 5.6 provides 2 status variables for such monitoring: wsrep_flow_control_paused and wsrep_flow_control_paused_ns. Which one should you use?

What is flow control?

Flow control does not exist with regular MySQL replication, but only with Galera replication. It is simply the mechanism nodes are using when they are not able to keep up with the write load: to keep replication synchronous, the node that is starting to lag instructs the other nodes that writes should be paused for some time so it does not get too far behind.

If you are not familiar with this notion, you should read this blogpost.

Triggering flow control and graphing it

For this test, we’ll use a 3-node Percona XtraDB Cluster 5.6 cluster. On node 3, we will adjust gcs.fc_limit so that flow control is triggered very quickly and then we will lock the node:

pxc3> set global wsrep_provider_options="gcs.fc_limit=1"; pxc3> flush tables with read lock;

Now we will use sysbench to insert rows on node 1:

$ sysbench --test=oltp --oltp-table-size=50000 --mysql-user=root --mysql-socket=/tmp/pxc1.sock prepare

Because of flow control, writes will be stalled and sysbench will hang. So after some time, we will release the lock on node 3:

pxc3> unlock tables;

During the whole process, wsrep_flow_control_paused and wsrep_flow_control_paused_ns are recorded every second with mysqladmin ext -i1. We can then build a graph of the evolution of both variables:

While we can clearly see when flow control was triggered on both graphs, it is much easier to know when flow control was stopped with wsrep_flow_control_paused_ns. It would be even more obvious if we have had several timeframes when flow control is in effect.

Conclusion

Monitoring a server is obviously necessary if you want to be able to catch issues. But you need to look at the right metrics. So don’t be scared if you are seeing that wsrep_flow_control_paused is not 0: it simply means that flow control has been triggered at some point since the server started up. If you want to know what is happening right now, prefer wsrep_flow_control_paused_ns.

The post Monitoring MySQL flow control in Percona XtraDB Cluster 5.6 appeared first on MySQL Performance Blog.

Categories: MySQL

Putting MySQL Fabric to Use: July 30 webinar

MySQL Performance Blog - Thu, 2014-07-24 22:13

Martin and I have recently been blogging together about MySQL Fabric (in case you’ve missed this, you can find the first post of the series here), and on July 30th, we’re going to be presenting a webinar on this topic titled “Putting MySQL Fabric to Use.”

The focus of the webinar is to help you get started quickly on this technology, so we’ll include very few slides (mostly just a diagram or two) and then jump straight into shared screen mode, with lots of live console and source code examples.

In order to make the best use of time, we won’t show you how to install and configure MySQL Fabric. However, we can point you to a few resources to help you get ready and even follow our examples as we go:

  • The official manual is an obvious starting point
  • Our second post in the series includes configuration instructions
  • This git repo contains the test environment we’ll use to run our demos. Specifically, we’ll use the sharding branch, so if you intend to follow our examples as we go, we recommend checking that one out.

If you’re interested, you can register for this webinar here, and if there’s something specific you’d like to see (we had a request for PHP examples in the comments to our last post) feel free to post that as a comment. We can’t promise we’ll be able to cover all requests during the webinar, but we’ll incorporate examples to the repo as time allows.

Hope to see you then!

The post Putting MySQL Fabric to Use: July 30 webinar appeared first on MySQL Performance Blog.

Categories: MySQL

DBaaS, OpenStack and Trove 101: Introduction to the basics

MySQL Performance Blog - Thu, 2014-07-24 07:00

We’ll be publishing a series of posts on OpenStack and Trove over the next few weeks, diving into their usage and purpose. For readers who are already familiar with these technologies, there should be no doubt as to why we are incredibly excited about them, but for those who aren’t, consider this a small introduction to the basics and concepts.

What is Database as a Service (DBaaS)?
In a nutshell, DBaaS – as it is frequently referred to – is a loose moniker to the concept of providing a managed cloud-based database environment accessible by users, applications or developers. Its aim is to provide a full-fledged database environment, while minimizing the administrative turmoil and pains of managing the surrounding infrastructure.

Real life example: Imagine you are working on a new application that has to be accessible from multiple regions. Building and maintaining a large multiregion setup can be very expensive. Furthermore, it introduces additional complexity and strain on your system engineers once timezones start to come into play. The challenge of having to manage machines in multiple datacenters won’t simplify your release cycle, nor increase your engineers’ happiness.

Let’s take a look at some of the questions DBaaS could answer in a situation like this:

- How do I need to size my machines, and where should I locate them?
Small environments require less computing power and can be a good starting point, although this also means they may not be as well-prepared for future growth. Buying larger-scale and more expensive hardware and hosting can be very expensive and can be a big stumbling block for a brand new development project. Hosting machines in multiple DC’s could also introduce administrative difficulties, like having different SLA’s and potential issues setting up WAN or VPN communications. DBaaS introduces an abstraction layer, so these consideration aren’t yours, but those of the company offering it, while you get to reap all the rewards.

- Who will manage my environment from an operational standpoint?
Staffing considerations and taking on the required knowledge to properly maintain a production database are often either temporarily sweeped under the rug or, when the situation turns out badly, a cause for the untimely demise of quite a few young projects. Rather than think about how long ago you should have applied that security patch, wouldn’t it be nice to just focus on managing the data itself, and be otherwise confident that the layers beyond it are managed responsibly?

- Have a sudden need to scale out?
Once you’re up and running, enjoying the success of a growing use base, your environment will need to scale accordingly. Rather than think long and hard on the many options available, as well as the logistics attached to those changes, your DBaaS provider could handle this transparently.

Popular public options: Here are a few names of public services you may have come across already that fall under the DBaaS moniker:

- Amazon RDS
- Rackspace cloud databases
- Microsoft SQLAzure
- Heroku
- Clustrix DBaaS

What differentiates these services from a standard remote database is the abstraction layer that fully automates their backend, while still offering an environment that is familiar to what your development team is used to (be it MySQL, MongoDB, Microsoft SQLServer, or otherwise). A big tradeoff to using these services is that you are effectively trusting an external company with all of your data, which might make your legal team a bit nervous.

Private cloud options?
What if you could offer your team the best of both worlds? Or even provide a similar type of service to your own customers? Over the years, a lot of platforms have been popping up to allow effective management and automation of virtual environments such as these, allowing you to effectively “roll your own” DBaaS. To get there, there are two important layers to consider:

  • Infrastructure Management, also referred to as Infrastructure-as-a-Service (IaaS), focusing on the logistics of spinning up virtual machines and keeping their required software packages running.
  • Database Management, previously referred to DBaaS, transparently coordinating multiple database instances to work together and present themselves as a single, coherent data repository.

Examples of IaaS products:
- OpenStack
- OpenQRM

Ecample of DBaaS:
- Trove

Main Advantages of DBaaS
For reference, the main reasons why you might want to consider using an existing DBaaS are as follows:

- Reduced Database management costs

DBaaS removes the amount of maintenance you need to perform on isolated DB instances. You offload the system administration of hardware, OS and database to either a dedicated service provider, or in the case where you are rolling your own, allow your database team to more efficiently manage and scale the platform (public vs private DBaaS).

- Simplifies certain security aspects

If you are opting to use a DBaaS platform, the responsibility of worrying about this or that patch being applied falls to your service provider, and you can generally assume that they’ll keep your platform secure from the software perspective.

- Centralized management

One system to rule them all. A guarantee of no nasty surprises concerning that one ancient server that should have been replaced years ago, but you never got around to it. As a user of DBaaS, all you need to worry about is how you interface with the database itself.

- Easy provisioning

Scaling of the environment happens transparently, with minimal additional management.

- Choice of backends

Typically, DBaas providers offer you the choice of a multitude of database flavors, so you can mix and match according to your needs.

Main Disadvantages
- Reduced visibility of the backend

Releasing control of the backend requires a good amount of trust in your DBaaS provider. There is limited or no visibility into how backups are run and maintained, which configuration modifications are applied, or even when and which updates will be implemented. Just as you offload your responsibilities, you in turn need to rely on an SLA contract.

- Potentially harder to recover from catastrophic failures

Similarly to the above, unless your service providers have maintained thorough backups on your behalf, the lack of direct access to the host machines means that it could be much harder to recover from database failure.

- Reduced performance for specific applications

There’s a good chance that you are working on a shared environment. This means the amount of workload-specific performance tuning options is limited.

- Privacy and Security concerns

Although it is much easier to maintain and patch your environment. Having a centralized system also means you’re more prone to potential attacks targeting your dataset. Whichever provider you go with, make sure you are intimately aware of the measures they take to protect you from that, and what is expected from your side to help keep it safe.

Conclusion: While DBaaS is an interesting concept that introduces a completely new way of approaching an application’s database infrastructure, and can bring enterprises easily scalable, and financially flexible platforms, it should not be considered a silver bullet. Some big tradeoffs need to be considered carefully from the business perspective, and any move there should be accompanied with careful planning and investigation of options.

Embracing the immense flexibility these platforms offer, though, opens up a lot of interesting perspectives too. More and more companies are looking at ways to roll their own “as-a-Service”, provisioning completely automated hosted platforms for customers on-demand, and abstracting their management layers to allow them to be serviced by smaller, highly focused technical teams.

Stay tuned: Over the next few weeks we’ll be publishing a series of posts focusing on the combination of two technologies that allow for this type of flexibility: OpenStack and Trove.

The post DBaaS, OpenStack and Trove 101: Introduction to the basics appeared first on MySQL Performance Blog.

Categories: MySQL
Syndicate content