MySQL

Orchestrator: MySQL Replication Topology Manager

MySQL Performance Blog - Tue, 2016-03-08 22:42

This blog post discusses Orchestrator: MySQL Replication Topology Manager.

What is Orchestrator?

Orchestrator is a replication topology manager for MySQL.

It has many great features:

  • The topology and status of the replication tree is automatically detected and monitored
  • Either a GUI, CLI or API can be used to check the status and perform operations
  • Supports automatic failover of the master, and the replication tree can be fixed when servers in the tree fail – either manually or automatically
  • It is not dependent on any specific version or flavor of MySQL (MySQL, Percona Server, MariaDB or even MaxScale binlog servers)
  • Orchestrator supports many different types of topologies, from a single master -> slave  to complex multi-layered replication trees consisting of hundreds of servers
  • Orchestrator can make topology changes and will do so based on the state at that moment; it does not require a configuration to be defined with what corresponds to the database topology
  • The GUI is not only there to report the status – one of the cooler things you can do is change replication just by doing a drag and drop in the web interface (of course you can do this and much more through the CLI and API as well)

Here’s a gif that demonstrates this (click on an image to see a larger version):

Orchestrator’s manual is quite extensive and detailed, so the goal of this blogpost is not to go through every installation and configuration step. It will just give a global overview on how Orchestrator works, while mentioning some important and interesting settings.

How Does It Work?

Orchestrator is a go application (binaries, including rpm  and deb  packages are available for download).

It requires it’s own MySQL database as a backend server to store all information related to the Orchestrator managed database cluster topologies.

There should be at least one Orchestrator daemon, but it is recommended to run many Orchestrator daemons on different servers at the same time – they will all use the same backend database but only one Orchestrator is going to be “active” at any given moment in time. (You can check who is active under the Status  menu on the web interface, or in the database in the active_node  table.)

Using MySQL As Database Backend, Isn’t That A SPOF?

If the Orchestrator MySQL database is gone, it doesn’t mean the monitored MySQL clusters stop working. Orchestrator just won’t be able to control the replication topologies anymore. This is similar to how MHA works: everything will work but you can not perform a failover until MHA is back up again.

At this moment, it’s required to have a MySQL backend and there is no clear/tested support for having this in high availability (HA) as well. This might change in the future.

Database Server Installation Requirements

Orchestrator only needs a MySQL user with limited privileges (SUPER, PROCESS, REPLICATION SLAVE, RELOAD) to connect to the database servers. With those permissions, it is able to check the replication status of the node and perform replication changes if necessary. It supports different ways of replication: binlog file positions, MySQL&MariaDB GTID, Pseudo GTID and Binlog servers.

There is no need to install any extra software on the database servers.

Automatic Master Failure Recovery

One example of what Orchestrator can do is promote a slave if a master is down. It will choose the most up to date slave to be promoted.

Let’s see what it looks like:

In this test we lost rep1 (master) and Orchestrator promoted rep4  to be the new master, and started replicating the other servers from the new master.

With the default settings, if rep1 comes back rep4  is going to continue the replication from rep1. This behavior can be changed with the setting ApplyMySQLPromotionAfterMasterFailover:True in the configuration.

Command Line Interface

Orchestrator has a nice command line interface too. Here are some examples:

Print the topology:

> orchestrator -c topology -i rep1:3306 cli rep1:3306 [OK,5.6.27-75.0-log,ROW,>>] + rep2:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID] + rep3:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID] + rep4:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID] + rep5:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]

Move a slave:

orchestrator -c relocate -i rep2:3306 -d rep4:3306

Print the topology again:

> orchestrator -c topology -i rep1:3306 cli rep1:3306 [OK,5.6.27-75.0-log,ROW,>>] + rep3:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID] + rep4:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID] + rep2:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID] + rep5:3306 [OK,5.6.27-75.0-log,ROW,>>,GTID]

As we can see, rep2  now is replicating from rep4 .

Long Queries

One nice addition to the GUI is how it displays slow queries on all servers inside the replication tree. You can even kill bad queries from within the GUI.

Orchestrator Configuration Settings

Orchestrator’s daemon configuration can be found in /etc/orchestrator.conf.json. There are many configuration options, some of which we elaborate here:

  • SlaveLagQuery  – Custom queries can be defined to check slave lag.
  • AgentAutoDiscover  – If set to True , Orchestrator will auto-discover the topology.
  • HTTPAuthPassword  and HTTPAuthUser  –  Avoids everybody being able to access the Web GUI and change your topology.
  • RecoveryPeriodBlockSeconds  – Avoids flapping.
  • RecoverMasterClusterFilters  –  Defines which clusters should auto failover/recover.
  • PreFailoverProcesses  – Orchestrator will execute this command before the failover.
  • PostFailoverProcesses  – Orchestrator will execute this command after the failover.
  • ApplyMySQLPromotionAfterMasterFailover  –  Detaches the promoted slave after failover.
  • DataCenterPattern  – If there are multiple data centers, you can mark them using a pattern (they will get different colors in the GUI):
Limitations

While being a very feature-rich application, there are still some missing features and limitations of which we should be aware.

One of the key missing features is that there is no easy way to promote a slave to be the new master. This could be useful in scenarios where the master server has to be upgraded, there is a planned failover, etc. (this is a known feature request).

Some known limitations:
  • Slaves can not be manually promoted to be a master
  • Does not support multi-source replication
  • Does not support all types of parallel replication
  • At this moment, combining this with Percona XtraDB Cluster (Galera) is not supported
Is Orchestrator Your High Availability Solution?

In order to integrate this in your HA architecture or include in your fail-over processes you still need to manage many aspects manually, which can all be done by using the different hooks available in Orchestrator:

  • Updating application connectivity:
    • VIP handling,
    • Updating DNS
    • Updating Proxy server (MaxScale , HAProxy , ProxySQL…) connections.
  • Automatically setting slaves to read only to avoid writes happening on non-masters and causing data inconsistencies
  • Fencing (STONITH) of the dead master, to avoid split-brain in case a crashed master comes back online (and applications still try to connect to it)
  • If semi-synchronous replication needs to be used to avoid data loss in case of master failure, this has to be manually added to the hooks as well

The work that needs to be done is comparable to having a setup with MHA or MySQLFailover.

This post also doesn’t completely describe the decision process that Orchestrator takes to determine if a server is down or not. The way we understand it right now, one active Orchestrator node will make the decision if a node is down or not. It does check a broken node’s slaves replication state to determine if Orchestrator isn’t the only one losing connectivity (in which it should just do nothing with the production servers). This is already a big improvement compared to MySQLFailover, MHA or even MaxScale’s failoverscripts, but it still might cause some problems in some cases (more information can be found on Shlomi Noach’s blog).

Summary

The amount of flexibility and power and fun that this tool gives you with a very simple installation process is yet to be matched. Shlomi Noach did a great job developing this at Outbrain, Booking.com and now at GitHub.

If you are looking for MySQL Topology Manager, Orchestrator is definitely worth looking at.

Categories: MySQL

JSON document fast lookup with MySQL 5.7

MySQL Performance Blog - Mon, 2016-03-07 23:43

In this blog post, we’ll discuss JSON document fast lookup with MySQL 5.7.

Recently I attended Morgan Tocker’s talk on MySQL 5.7 and JSON at FOSDEM, and I found it awesome.

I learned some great information from the talk. Let me share one of them here: a very useful trick if you plan to store JSON documents in your MySQL database and want to retrieve the documents from some attribute’s values. So let’s look at how to do JSON document fast lookup with MySQL 5.7!

In this short example, I show you how we can speed up this type of search using JSON functions and virtual columns.

This our test table:

Table: test_features Create Table: CREATE TABLE `test_features` ( `id` int(11) NOT NULL AUTO_INCREMENT, `feature` json NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=206561 DEFAULT CHARSET=latin1 mysql> show table status like 'test_features'G *************************** 1. row *************************** Name: test_features Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 171828 Avg_row_length: 1340 Data_length: 230326272 Max_data_length: 0 Index_length: 0 Data_free: 3145728 Auto_increment: 206561 Create_time: 2016-03-01 15:22:34 Update_time: 2016-03-01 15:23:20 Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment:

We can see the data length is almost 230M:

+--------------------+--------+-------+-------+-------+------------+---------+ | TABLE | ENGINE | ROWS | DATA | IDX | TOTAL SIZE | IDXFRAC | +--------------------+--------+-------+-------+-------+------------+---------+ | json.test_features | InnoDB | 0.17M | 0.21G | 0.00G | 0.21G | 0.00 | +--------------------+--------+-------+-------+-------+------------+---------+ -rw-r----- 1 mysql mysql 228M Mar 1 15:23 /var/lib/mysql/json/test_features.ibd

As an example here is one record (the data is coming from https://github.com/zemirco/sf-city-lots-json):

{ "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [ [ -122.41983177253881, 37.80720512387136, 0 ], ... [ -122.41983177253881, 37.80720512387136, 0 ] ] ] }, "properties": { "TO_ST": "600", "BLKLOT": "0010001", "STREET": "BEACH", "FROM_ST": "600", "LOT_NUM": "001", "ST_TYPE": "ST", "ODD_EVEN": "E", "BLOCK_NUM": "0010", "MAPBLKLOT": "0010001" } }

Now let’s try to find all records where the street is “BEACH”. “Street” is part of the array attribute properties.

mysql> SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'; +----------+ | count(*) | +----------+ | 208 | +----------+ 1 row in set (0.21 sec) mysql> explain SELECT count(*) FROM test_features WHERE feature->"$.properties.STREET" = 'BEACH'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test_features partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 171828 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

As you can see, we perform a full table scan to achieve this.

With MySQL, we have the possibility of using virtually generated columns. Let’s create one for the streets:

mysql> ALTER TABLE test_features ADD COLUMN street VARCHAR(30) GENERATED ALWAYS AS (json_unquote(json_extract(`feature`,'$.properties.STREET'))) VIRTUAL;

I use “json_unquote()” to avoid to add the JSON string quotes in the column, and later in the index.

You can verify the size of the table on disk, and you will see this doesn’t increase (as it’s a virtual column).

Even if we can now use the “street” column in the search, that won’t help. We still need to add an index on it:

mysql> ALTER TABLE test_features ADD KEY `street` (`street`);

And now we can see that the size is larger, because we have added the size of the index:

-rw-r----- 1 mysql mysql 232M Mar 1 15:48 /var/lib/mysql/json/test_features.ibd

Now we can try to run the query like this:

mysql> SELECT count(*) FROM test_features WHERE street = 'BEACH'; +----------+ | count(*) | +----------+ | 208 | +----------+ 1 row in set (0.00 sec)

Let’s have a look at the Query Execution Plan:

mysql> explain SELECT count(*) FROM test_features WHERE street = 'BEACH'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test_features partitions: NULL type: ref possible_keys: street key: street key_len: 33 ref: const rows: 208 filtered: 100.00 Extra: Using index

And finally we can verify this in the statistics available in sys schema:

mysql> select * from sys.schema_index_statistics where table_name='test_features'G *************************** 1. row *************************** table_schema: json table_name: test_features index_name: street rows_selected: 208 select_latency: 72.59 us rows_inserted: 0 insert_latency: 0 ps rows_updated: 0 update_latency: 0 ps rows_deleted: 0 delete_latency: 0 ps *************************** 2. row *************************** table_schema: json table_name: test_features index_name: PRIMARY rows_selected: 0 select_latency: 0 ps rows_inserted: 0 insert_latency: 0 ps rows_updated: 0 update_latency: 0 ps rows_deleted: 0 delete_latency: 0 ps 2 rows in set (0.00 sec)

As you can see, this is very fast. If you already know how you want to retrieve data out of your JSON document, it’s very easy to add such indexes in MySQL.

Categories: MySQL

Percona Toolkit 2.2.17 is now available

MySQL Performance Blog - Mon, 2016-03-07 16:31

Percona is pleased to announce the availability of Percona Toolkit 2.2.17.  Released March 7, 2016. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. It includes multiple bug fixes for pt-table-checksum with better support for Percona XtraDB Cluster, various other fixes, as well as MySQL 5.7 general compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New Features:

  • Percona Toolkit 2.2.17 has implemented general compatibility with MySQL 5.7 tools, documentation and test suite.

Bugs Fixed:

  • Bug 1523685: pt-online-schema-change invalid recursion method where a comma was interpreted as the separation of two DSN methods has been fixed.
  • Bugs 1480719 and 1536305: The current version of Perl on supported distributions has implemented stricter checks for arguments provided to sprintf. This could cause warnings when pt-query-digest and pt-table-checksum were being run.
  • Bug 1498128: pt-online-schema-change would fail with an error if the table being altered has foreign key constraints where some start with an underscore and some don’t.
  • Bug 1336734: pt-online-schema-change has implemented new --null-to-non-null flag which can be used to convert NULL columns to NOT NULL.
  • Bug 1362942: pt-slave-restart would fail to run on MariaDB 10.0.13 due to a different implementation of GTID.
  • Bug 1389041: pt-table-checksum had a high likelihood to skip a table when row count was around chunk-size * chunk-size-limit value. To address this issue a new --slave-skip-tolerance option has been implemented.
  • Bug 1506748: pt-online-schema-change could not set the SQL_MODE by using the --set-vars option, preventing some use case schema changes that require it.
  • Bug 1523730: pt-show-grants didn’t sort the column-level privileges.
  • Bug 1526105: pt-online-schema-change would fail if used with –no-drop-old-table option after ten times. The issue would arise because there was an accumulation of tables that have already had their names extended, the code would retry ten times to append an underscore, each time finding an old table with that number of underscores appended.
  • Bug 1529411: pt-mysql-summary was displaying incorrect information about Fast Server Restarts for Percona Server 5.6.
  • pt-stalk shell collect module was confusing the new MySQL variable binlog_error_action with the log_error variable.

Details of the release can be found in the release notes and the 2.2.17 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

Categories: MySQL

Percona Server 5.6.29-76.2 is now available

MySQL Performance Blog - Mon, 2016-03-07 15:40

Percona is glad to announce the release of Percona Server 5.6.29-76.2 on March 7, 2016. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.29, including all the bug fixes in it, Percona Server 5.6.29-76.2 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.29-76.2 milestone on Launchpad.

Bugs Fixed:

  • With Expanded Fast Index Creation enabled, DDL queries involving InnoDB temporary tables would cause later queries on the same tables to produce warnings that their indexes were not found in the index translation table. Bug fixed #1233431.
  • Package upgrade on Ubuntu would run mysql_install_db even though data directory already existed. Bug fixed #1457614.
  • Package upgrade on Ubuntu and Debian could reset the GTID number sequence when post-install script was restarting the service. Bug fixed #1507812.
  • Starting MySQL with systemctl would fail with timeout if the socket was specified with a custom path. Bug fixed #1534825.
  • Write-heavy workload with a small buffer pool could lead to a deadlock when free buffers are exhausted. Bug fixed #1521905.
  • libjemalloc.so.1 was missing from binary tarball. Bug fixed #1537129.
  • mysqldumpslow script has been removed because it was not compatible with Percona Server extended slow query log format. Please use pt-query-digest from Percona Toolkit instead. Bug fixed #856910.
  • When cmake/make/make_binary_distribution workflow was used to produce binary tarballs it would produce tarballs with mysql-... naming instead of percona-server-.... Bug fixed #1540385.
  • Cardinality of partitioned TokuDB tables became inaccurate after the changes introduced by TokuDB Background ANALYZE TABLE feature in Percona Server 5.6.27-76.0. Bug fixed #925.
  • Running the TRUNCATE TABLE while TokuDB Background ANALYZE TABLE is enabled could lead to a server crash once analyze job tries to access the truncated table. Bug fixed #938.
  • Added proper memory cleanup if for some reason a TokuDB table is unable to be opened from a dead closed state. This prevents an assertion from happening the next time the table is attempted to be opened. Bug fixed #917.

Other bugs fixed: #898, #1521120 and #1534246.

Release notes for Percona Server 5.6.29-76.2 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

Categories: MySQL

My proposals for Percona Live: Window Functions and ANALYZE for statements

Sergey Petrunia's blog - Mon, 2015-11-30 17:05

I’ve made two session proposals for Percona Live conference:

if you feel these talks are worth it, please vote!

Categories: MySQL

Amber Alert: Worse Than Nothing?

Xaprb, home of innotop - Wed, 2014-02-12 00:00

In the last few years, there’s been a lot of discussion about alerts in the circles I move in. There’s general agreement that a lot of tools don’t provide good alerting mechanisms, including problems such as unclear alerts, alerts that can’t be acted upon, and alerts that lack context.

Yesterday and today at the Strata conference, my phone and lots of phones around me started blaring klaxon sounds. When I looked at my phone, I saw something like this (the screenshot is from a later update, but otherwise similar):

I’ve seen alerts like this before, but they were alerts about severe weather events, such as tornado watches. This one, frankly, looked like someone hacked into the Verizon network and sent out spam alarms. Seriously — what the hell, a license plate? What?

Besides, it says AMBER, which is a cautionary color. It’s not a red alert, after all. It can’t be anything serious, right?

The second time it happened I looked at the details:

This is even less informative. It’s an amber alert (not an urgent color like red). But it’s a sigificant threat to my life or property? I’m supposed to respond to it immediately? Oh wait, my response is to “monitor” and “attend to information sources.” Almost everything on this whole screen is conflicting. What a cluster-fudge of useless non-information!

Later I looked up some information online and found that an amber alert is a child abduction alert. This one turned out to be a false alarm.

All of this raises an obvious question: why on earth would someone think that making a bunch of people’s cellphones quack with a cryptic message would convey useful information? For something as critical as a child abduction, they should get to the point and state it directly. Judging by reactions around me, and people I spoke to, almost nobody knows what an amber alert is. I certainly didn’t. When I tweeted about it, only one person in my network seemed to be aware of it.

How can anyone take something like this seriously? All this does is make people like me find the preferences for alerts and disable them.

In my opinion, this is an example of complete failure in alert design. I don’t think I can overstate how badly done this is. I want to say only a politician could have dreamed up something so stupid…

But then I remember: oh, yeah. Pingdom alerts (we’ll email you that your site is down, but we won’t tell you an HTTP status code or anything else remotely useful.) Nagios alerts (we’ll tell you DISK CRITICAL and follow that with (44% inode=97%) – anyone know what that means?). And so on.

Perhaps the amber alert system was designed by a system administrator, not a politician.

Categories: MySQL

Bloom Filters Made Easy

Xaprb, home of innotop - Tue, 2014-02-11 00:00

I mentioned Bloom Filters in my talk today at Strata. Afterwards, someone told me it was the first time he’d heard of Bloom Filters, so I thought I’d write a little explanation of what they are, what they do, and how they work.

But then I found that Jason Davies already wrote a great article about it. Play with his live demo. I was able to get a false positive through luck in a few keystrokes: add alice, bob, and carol to the filter, then test the filter for candiceaklda.

Why would you use a Bloom filter instead of, say…

  • Searching the data for the value? Searching the data directly is too slow, especially if there’s a lot of data.
  • An index? Indexes are more efficient than searching the whole dataset, but still too costly. Indexes are designed to minimize the number of times some data needs to be fetched into memory, but in high-performance applications, especially over huge datasets, that’s still bad. It typically represents random-access to disk, which is catastrophically slow and doesn’t scale.
Categories: MySQL

MySQL, SQL, NoSQL, Open Source And Beyond: a Google Tech Talk

Xaprb, home of innotop - Wed, 2014-02-05 00:00

I’ve been invited to give a Tech Talk at Google next Thursday, February 13th, from 11:00 to 12:30 Pacific time. Unfortunately the talk won’t be streamed live, nor is it open to the general public, but it will be recorded and hosted on YouTube afterwards. I’ve also been told that a small number of individuals might be allowed to attend from outside Google. If you would like me to try to get a guest pass for you, please tweet that to @xaprb.

The topic is, roughly, databases. Officially,

MySQL, SQL, NoSQL, and Open Source in 2014 and Beyond

Predictions are hard to get right, especially when they involve the future. Rather than predict the future, I’ll explain how I view the relational and NoSQL database worlds today, especially the MySQL product and community, but including open-source and proprietary data management technologies about which I know enough to get in trouble. I’ll explain how my self-image as a practitioner and contributor has changed, especially as I’ve moved from consulting (where I tell people what they should do) into being a company founder (where I sometimes wish someone would tell me what to do). As for the future, I’ll express my preferences for specific outcomes, and try to be careful what I wish for.

I am excited and a bit nervous. A Google Tech Talk! Wow! Thanks for inviting me, Google!

Categories: MySQL

A simple rule for sane timestamps in MySQL

Xaprb, home of innotop - Thu, 2014-01-30 00:00

Do you store date or time values in MySQL?

Would you like to know how to avoid many possible types of pain, most of which you cannot even begin to imagine until you experience them in really fun ways?

Then this blog post is for you. Here is a complete set of rules for how you can avoid aforementioned pain:

  1. All date and time columns shall be INT UNSIGNED NOT NULL, and shall store a Unix timestamp in UTC.

Enjoy all the spare time you’ll have to do actually useful things as a result.

Categories: MySQL

Generating Realistic Time Series Data

Xaprb, home of innotop - Fri, 2014-01-24 00:00

I am interested in compiling a list of techniques to generate fake time-series data that looks and behaves realistically. The goal is to make a mock API for developers to work against, without needing bulky sets of real data, which are annoying to deal with, especially as things change and new types of data are needed.

To achieve this, I think several specific things need to be addressed:

  1. What common classes or categories of time-series data are there? For example,
    • cyclical (ex: traffic to a web server day-over-day)
    • apparently random (ex: stock ticker)
    • generally increasing (ex: stock ticker for an index)
    • exponentially decaying (ex: unix load average)
    • usually zero, with occasional nonzero values (ex: rainfall in a specific location)
  2. What parameters describe the data’s behavior? Examples might include an exponential decay, periodicity, distribution of values, distribution of intervals between peaks, etc.
  3. What techniques can be used to deterministically generate data that approximates a given category of time-series data, so that one can generate mock sources of data without storing real examples? For a simplistic example, you could seed a random number generator for determinism, and use something like y_n = rand() * 10 + 100 for data that fluctuates randomly between 90 and 100.

To make the mock API, I imagine we could catalog a set of metrics we want to be able to generate, with the following properties for each:

  • name
  • type
  • dimensions
  • parameters
  • random seed or other initializer

This reduces the problem from what we currently do (keeping entire data sets, which need to be replaced as our data gathering techniques evolve) into just a dictionary of metrics and their definitions.

Then the mock API would accept requests for a set of metrics, the time range desired, and the resolution desired. The metrics would be computed and returned.

To make this work correctly, the metrics need to be generated deterministically. That is, if I ask for metrics from 5am to 6am on a particular day, I should always get the same values for the metrics. And if I ask for a different time range, I’d get different values. What this means, in my opinion, is that there needs to be a closed-form function that produces the metric’s output for a given timestamp. (I think one-second resolution of data is fine enough for most purposes.)

Does anyone have suggestions for how to do this?

The result will be open-sourced, so everyone who’s interested in such a programmatically generated dataset can benefit from it.

Categories: MySQL

Speaking at Percona Live

Xaprb, home of innotop - Thu, 2014-01-23 00:00

I’m excited to be speaking at the Percona Live MySQL Conference again this year. I’ll present two sessions: Developing MySQL Applications with Go and Knowing the Unknowable: Per-Query Metrics. The first is a walk-through of everything I’ve learned over the last 18 months writing large-scale MySQL-backed applications with Google’s Go language. The second is about using statistical techniques to find out things you can’t even measure, such as how much CPU a query really causes MySQL to use. There are great reasons that this is both desirable to know, and impossible to do directly in the server itself.

I’m also looking forward to the conference overall. Take a few minutes and browse the selection of talks. As usual, it’s a fantastic program; the speakers are really the top experts from the MySQL world. The conference committee and Percona have done a great job again this year! See you in Santa Clara.

Categories: MySQL

On Crossfit and Safety

Xaprb, home of innotop - Mon, 2014-01-20 00:00

I’ve been a happy CrossFiter for a few years now. I met my co-founder and many friends in CrossFit Charlottesville, completely changed my level of fitness and many key indicators of health such as my hemoglobin A1C and vitamin D levels, am stronger than I’ve ever been, feel great, and now my wife does CrossFit too. It’s fantastic. It’s community, fun, health, fitness. It’s the antidote to the boring gyms I forced myself to go to for years and hated every minute.

But there is a fringe element in CrossFit, which unfortunately looks mainstream to some who don’t really have enough context to judge. From the outside, CrossFit can look almost cult-like. It’s easy to get an impression of people doing dangerous things with little caution or training. To hear people talk about it, everyone in CrossFit works out insanely until they vomit, pushing themselves until their muscles break down and vital organs go into failure modes.

That’s not what I’ve experienced. I’ve never seen anyone vomit, or even come close to it as far as I know. I think that part of this dichotomy comes from certain people trying to promote CrossFit as a really badass thing to do, so they not only focus on extreme stories, they even exaggerate stories to sound more extreme.

Last week there was a tragic accident: Denver CrossFit coach Kevin Ogar injured himself badly. This has raised the issue of CrossFit safety again.

To be clear, I think there is something about CrossFit that deserves to be looked at. It’s just not the mainstream elements, that’s all. The things I see about CrossFit, which I choose not to participate in personally, are:

  1. The hierarchy and structure above the local gyms. If you look at local gyms and local events, things look good. Everyone’s friends and nobody does stupid things. But when you get into competitions, people are automatically elevated into the realms of the extreme. This reaches its peak at the top levels of the competitions. Why? Because there’s something to gain besides just fitness. When someone has motivations (fame, endorsements and sponsorship, financial rewards) beyond just being healthy, bad things are going to happen. There’s talk now about cheating and performance-enhancing drugs and all kinds of “professional sports issues.” Those are clear signs that it’s not about fitness and health.
  2. Some inconsistencies in the underlying philosophy from the founders of CrossFit. I’m not sure how much this gets discussed, but a few of the core concepts (which I agree with, by the way) are that varied, functional movements are good. The problem is, the workout movements aren’t all functional. A few of them are rather hardcore and very technical movements chosen from various mixtures of disciplines.
  3. Untempered enthusiasm about, and ignorant promotion of, things such as the so-called Paleo Diet. I’m biased about this by being married to an archaeologist, but it isn’t the diet that is the issue. It’s the fanaticism that some people have about it, which can be off-putting to newcomers.

I’m perfectly fine when people disagree with me on these topics. Lots of people are really enthusiastic about lots of things. I choose to take what I like about CrossFit and leave the rest. I would point out, however, that the opinions of those who don’t really know CrossFit first-hand tend to be colored by the extremism that’s on display.

Now, there is one issue I think that’s really important to talk about, and that’s the safety of the movements. This comes back to point #2 in my list above. I’d especially like to pick out one movement that is done in a lot of CrossFit workouts.

The Snatch

If you’re not familiar with the snatch, it’s an Olympic weightlifting movement where the barbell is pulled from the floor as high as possible in one movement. The athlete then jumps under the barbell, catching it in a deep squat with arms overhead, and stands up to complete the movement with the bar high overhead. Here’s an elite Olympic lifter just after catching the bar at the bottom of the squat.

The snatch is extremely technical. It requires factors such as balance, timing, strength, and flexibility to come together flawlessly. Many of these factors are not just necessary in moderate quantities. For example, the flexibility required is beyond what most people are capable of without a lot of training. If you don’t have the mobility to pull off the snatch correctly, your form is compromised and it’s dangerous.

The snatch is how Kevin Ogar got hurt. Keep in mind this guy is a CrossFit coach himself. He’s not a novice.

I challenge anyone to defend the snatch as a functional movement. Tell me one time in your life when you needed to execute a snatch, and be serious about it. I can see the clean-and-jerk’s utility. But not the snatch. It’s ridiculous.

The snatch is also inherently very dangerous. You’re throwing a heavy weight over your head and getting under it, fast. You’re catching it in an extremely compromised position. And if you drop it, which is hard not to do, where’s it going to go? It’s going to fall on you. Here’s another Olympic athlete catching hundreds of pounds with his neck when a snatch went a little bit wrong. A split second later this picture looked much worse, but I don’t want to gross you out.

The next issue is that the snatch features prominently in many CrossFit workouts, especially competition workouts. This is not a small problem. Think about it: in competition, when these extreme athletes have raised the bar to such an extent that weeding out the best of the best requires multi-day performances few mortals could ever achieve, we’re throwing high-rep, heavy-weight snatches into the mix. What’s astonishing isn’t that Kevin Ogar got seriously injured. What’s amazing is that we don’t have people severing their spines on the snatch all the time.

What on earth is wrong with these people? What do they expect?

You might think this is an issue that’s only present in the competitions. But that’s not true. I generally refuse to do snatches in workouts at the gym. I will substitute them for other movements. Why? Take a look at one sample snatch workout:

AMRAP (as many rounds as possible) in 12 Minutes of:

  1. Snatch x 10
  2. Double Under x 50
  3. Box Jump x 10
  4. Sprint

That’s 12 minutes of highly challenging movements (to put it in perspective, most non-CrossFitters, and even many CrossFitters, would not be able to do the double-unders or box-jumps). You’re coming off a sprint and you’re going to throw off 10 snatches in a row, and you’re going to do it with perfect form? Unlikely. This is just asking for injury.

Or we could look at the “named WODs” that are benchmarks for CrossFitters everywhere. There’s Amanda, for example: 9, 7, and 5 reps of muscle-ups and snatches, as fast as possible. Or Isabel: 30 reps of 135-pound snatches, as fast as possible. To get a sense for how insane that actually is, take a look at Olympic weightlifting competitor Kendrick Farriss doing Isabel. The man is a beast and he struggles. And his form breaks down. I’m over-using italics. I’m sorry, I’ll cool down.

My point is that I think this extremely technical, very dangerous movement should have limited or no place in CrossFit workouts. I think it does very little but put people into a situation where they’re at very high risk of getting injured. I do not think it makes people more fit more effectively than alternative movements. I think one can get the same or better benefits from much safer movements.

Doing the snatch is an expert stunt. I personally think that I’ll never be good at snatches unless I do them twice a week, minimum. And one of the tenets of CrossFit is that there should be a large variety of constantly varied movements. This automatically rules out doing any one movement very often. In my own CrossFit workouts, going to the gym 2 or 3 times a week, I typically go weeks at a time without being trained on snatches in pre-workout skill work. That is nowhere near enough to develop real skill at it. (This is why I do my skill-work snatches with little more than an empty bar.)

There are other movements in CrossFit that I think are riskier than they need to be, but snatches are the quintessential example.

I know many people who are experts in these topics will disagree with me very strongly, and I don’t mind that. This is just my opinion.

Bad Coaches, Bad Vibes

There’s one more problem that contributes, I think, to needless risk in CrossFit gyms. This is the combination of inadequate coaching and a focus on “goal completion” to the exclusion of safety and absolutely perfect form, especially during workouts where you’re trying to finish a set amount of movements as fast as possible, or do as much as possible in a fixed time.

There’s no getting around the fact that CrossFit coaches aren’t all giving the same level of attention to their athletes, nor do all of them have the qualifications they need.

Anecdotally, I’ll tell the story of traveling in California, where I visited a gym and did one of my favorite workouts, Diane. In Diane, you deadlift 225 pounds 21 reps, do 21 handstand pushups, repeat both movements with 15 reps each, and finish with 9 reps each.

Deadlifting consists of grasping the bar on the ground and standing up straight, then lowering it again. It is not a dynamic or unstable movement. You do not move through any out-of-control ranges of motion. If you drop the bar you won’t drop it on yourself, it’ll just fall to the ground. Nevertheless, if done wrong, it can injure you badly, just like anything else.

The gym owner / coach didn’t coach. There’s no other way to say it. He set up a bar and said “ok, everyone look at me.” He then deadlifted and said some things that sounded really important about how to deadlift safely. Then he left us on our own. A relative newcomer was next to me. His form and technique were bad, and the coach didn’t say anything. He was standing at the end of the room, ostensibly watching, but he either wasn’t really looking, or he was lazy, or he didn’t know enough to see that the guy was doing the movement unsafely.

The newcomer turned to me and asked me what weight I thought he should use. I recommended that he scale the weights way down, but it wasn’t my gym and I wasn’t the coach. He lifted too heavy. I don’t think he hurt himself, but he was rounding his back horribly and I’m sure he found it hard to move for a week afterward. The coach just watched from the end of the gym, all the way through the workout. All he did was start and stop the music. What a jerk.

There’s an element of responsibility to put on the athletes. You need to know whether you’re doing things safely or not. If you don’t know, you should ask your coach. For me, rule #1 is to find out how much I don’t know, and not to attempt something unless I know how much I know about it. This athlete should have taken the matter into his own hands and asked for more active coaching.

But that doesn’t excuse the coach either.

The gym I go to — that nonsense does not happen. And I’ve been to a few gyms over the years and found them to be good. I’m glad I learned in a safe environment, but not all gyms and coaches are that good.

Precedent and Direction-Setting, and Lack of Reporting

What worries me the most is that the type of tragedy that happened to Kevin Ogar is going to happen close to home and impact my friends or my gym. The problem is complex to untangle, but in brief,

  1. Once annually there’s a series of quasi-competitions called the CrossFit Open. These are scored workouts over a span of weeks. They are set by the national CrossFit organization, not the local gyms. The scores are used to filter who is the first rank of competitors to go to regional competitions, and then eventually on to the annual CrossFit Games.
  2. The CrossFit Open workouts will certainly include snatches.
  3. If local gyms don’t program snatches regularly, their members won’t be prepared at all for the Open.
  4. Local gyms don’t have to participate in the Open, and don’t have to encourage their members to, but that’s easier said than done due to the community aspects of CrossFit.

The result, in my opinion, is that there’s systemic pressure for gyms and members to do things that carry a higher risk-to-reward ratio than many members would prefer. Anecdotally, many members I’ve spoken to share my concerns about the snatch. They love CrossFit, but they don’t like the pressure to do this awkward and frightening movement.

Finally, it’s very difficult to understand how serious the problem really is. Is there a high risk of injury from a snatch, or does it just seem that way because of high-profile incidents? Are we right to be afraid of the snatch, or is it just a movement that makes you feel really vulnerable? The problem here is that there’s no culture of reporting incidents in CrossFit.

I can point to another sport where that culture does exist: caving. The National Speleological Society publishes accident reports, and conscientious cavers share a culture that every incident, even trivial ones, must be reported. As a result, you can browse the NSS accident reports (summarized here) and see some things clearly (you have to be a member to access the full reports, which are often excruciatingly detailed). One of the most obvious conclusions you’ll draw right away is that cave diving (scuba diving in underwater caves) is incredibly dangerous and kills a lot of people, despite it being a small portion of the overall caving sport’s popularity. If you weren’t a caver and you didn’t know about cave diving, would you think this was the case? I’m not sure I would. After reading cave diving accident reports, I remember being shocked at how many people are found dead underwater for no apparent reason, with air left in their tanks. The accident reports help cavers assess the risks of what they do.

Nothing similar exists for CrossFit, and I wish it did.

Negative Press About CrossFit

On the topic of what gets attention and exposure, I’ve seen a bunch of attention-seeking blog posts from people who “told the dirty truth” about how CrossFit injured them and there’s a culture of silencing dissenters and so on. I’m sure some of that happens, but the stuff I’ve read has been from people who have an axe to grind. And frankly, most of those people were indisputably idiots. They were blaming their problems and injuries on CrossFit when the real problem was between their ears. I won’t link to them, because they don’t deserve the attention.

Don’t believe most of what you read online about CrossFit. Many of the people telling their personal stories about their experiences in CrossFit are drama queens blowing things completely out of proportion. There’s a lot of legitimate objective criticism too, most of it from neutral third-parties who have serious credentials in physical fitness coaching, but this doesn’t get as much attention. And there’s a lot of great writing about what’s good about CrossFit, much of it from the good-hearted, honest, knowledgeable coaches and gym owners who soldier on despite the ongoing soap operas and media hype wars. They’re bringing fitness and health — and fun — to people who otherwise don’t get enough of it.

Summary

Toss corporate sponsors, personal politics, competition, the lure of great gains from winning, and a bunch of testosterone together and you’re going to get some people hurt. Mix it in with snatches and it’s a miracle if nobody gets seriously injured.

If you participate in CrossFit, which I highly recommend, take responsibility for your own safety. If there is a rah-rah attitude of pushing too hard at all costs in your gym, or if your coaches aren’t actually experts at what they do (the CrossFit weekend-long certification seminars don’t count), or if it’s not right for any other reason, go elsewhere.

Stay healthy and have fun, and do constantly varied, functional movements at high intensity in the company of your peers – and do it safely.

Photo credits:

Categories: MySQL

How to Tune A Guitar (Or Any Instrument)

Xaprb, home of innotop - Sat, 2014-01-18 00:00

Do you know how to tune a guitar? I mean, do you really know how to tune a guitar?

I’ve met very few people who do. Most people pick some notes, crank the tuners, play some chords, and endlessly fidget back and forth until they either get something that doesn’t sound awful to their ears, or they give up. I can’t recall ever seeing a professional musician look like a tuning pro on stage, either. This really ought to be embarrassing to someone who makes music for a career.

There’s a secret to tuning an instrument. Very few people seem to know it. It’s surprisingly simple, it isn’t at all what you might expect, and it makes it easy and quick to tune an instrument accurately without guesswork. However, even though it’s simple and logical, it is difficult and subtle at first, and requires training your ear. This is a neurological, physical, and mental process that takes some time and practice. It does not require “perfect pitch,” however.

In this blog post I’ll explain how it works. There’s a surprising amount of depth to it, which appeals to the nerd in me. If you’re looking for “the short version,” you won’t find it here, because I find the math, physics, and theory of tuning to be fascinating, and I want to share that and not just the quick how-to.

If you practice and train yourself to hear in the correct way, with a little time you’ll be able to tune a guitar by just striking the open strings, without using harmonics or frets. You’ll be able to do this quickly, and the result will be a guitar that sounds truly active, alive, energetic, amazing — much better results than you’ll get with a digital tuner. As a bonus, you’ll impress all of your friends.

My Personal History With Tuning

When I was a child my mother hired a piano tuner who practiced the “lost art” of tuning entirely by ear. His name was Lee Flory. He was quite a character; he’d tuned for famous concert pianists all over the world, toured with many of them, and had endless stories to tell about his involvement with all sorts of musicians in many genres, including bluegrass and country/western greats. My mother loved the way the piano sounded when he tuned it. It sang. It was alive. It was joyous.

For whatever reason, Lee took an interest in me, and not only tolerated but encouraged my fascination with tuning. I didn’t think about it at the time, but I’m pretty sure he scheduled his visits differently to our house. I think he allowed extra time so that he could spend an hour or more explaining everything to me, playing notes, coaching me to hear subtleties.

And thus my love affair with the math, physics, and practice of tuning began.

Beats

The first great secret is that tuning isn’t about listening to the pitch of notes. While tuning, you don’t try to judge whether a note is too high or too low. You listen to something called beats instead.

Beats are fluctuations in volume created by two notes that are almost the same frequency.

When notes are not quite the same frequency, they’ll reinforce each other when the peaks occur together, and cancel each other out when the peaks are misaligned. Here’s a diagram of two sine waves of slightly different frequencies, and the sum of the two (in red).

Your ear will not hear two distinct notes if they’re close together. It’ll hear the sum.

Notice how the summed wave (the red wave) fluctuates in magnitude. To the human ear, this sounds like a note going “wow, wow, wow, wow.” The frequency of this fluctuation is the difference between the frequencies of the notes.

This is the foundation of all tuning by ear that isn’t based on guesswork.

Before you go on, tune two strings close together on your guitar or other instrument, and listen until you can hear it. Or, just fret one string so it plays the same note as an open string, and strike them together. Bend the string you’ve fretted, a little less, a little more. Listen until you hear the beats.

The Math of Pitch

Musical notes have mathematical relationships to one another. The exact relationships depend on the tuning. There are many tunings, but in this article I’ll focus on the tuning used for nearly all music in modern Western cultures: the 12-tone equal temperament tuning.

In this tuning, the octave is the fundamental interval of pitch. Notes double in frequency as they rise an octave, and the ratio of frequencies between each adjacent pair of notes is constant. Since there are twelve half-steps in an octave, the frequency increase from one note to the next is the twelfth root of 2, or about 1.059463094359293.

Staying with Western music, where we define the A above middle C to have the frequency of 440Hz, the scale from A220 to A440 is as follows:

Note Frequency ======= ========= A220 220.0000 A-sharp 233.0819 B 246.9417 C 261.6256 C-sharp 277.1826 D 293.6648 D-sharp 311.1270 E 329.6276 F 349.2282 F-sharp 369.9944 G 391.9954 G-sharp 415.3047 A440 440.0000

We’ll refer back to this later.

The Math Of Intervals

If you’ve ever sung in harmony or played a chord, you’ve used intervals. Intervals are named for the relative distance between two notes: a minor third, a fifth, and so on. These are a little confusing, because they sound like fractions. They’re not. A fifth doesn’t mean that one note is five times the frequency of another. A fifth means that if you start on the first note and count upwards five notes on a major scale, you’ll reach the second note in the interval. Here’s the C scale, with the intervals between the lowest C and the given note listed at the right:

Note Name Interval from C ==== ==== =============== C Do Unison D Re Major 2nd E Mi Major 3rd F Fa 4th (sometimes called Perfect 4th) G So 5th (a.k.a. Perfect 5th) A La Major 6th B Ti Major 7th C Do Octave (8th)

On the guitar, adjacent strings form intervals of fourths, except for the interval between the G and B strings, which is a major third.

Some intervals sound “good,” “pure,” or “harmonious.” A major chord, for example, is composed of the root (first note), major third, fifth, and octave. The chord sounds good because the intervals between the notes sound good. There’s a variety of intervals at play: between the third and fifth is a minor third, between the fifth and octave is a fourth, and so on.

It turns out that the intervals that sound the most pure and harmonious are the ones whose frequencies have the simplest relationships. In order of increasing complexity, we have:

  • Unison: two notes of the same frequency.
  • Octave: the higher note is double the frequency.
  • Fifth: the higher note is 3/2s the frequency.
  • Fourth: the higher note is 4/3rds the frequency.
  • Third: the higher note is 5/4ths the frequency.
  • Further intervals (minor thirds, sixths, etc) have various relationships, but the pattern of N/(N-1) doesn’t hold beyond the third.

These relationships are important for tuning, but beyond here it gets significantly more complex. This is where things are most interesting!

Overtones and Intervals

As a guitar player, you no doubt know about “harmonics,” also called overtones. You produce a harmonic by touching a string gently at a specific place (above the 5th, 7th, or 12th fret, for example) and plucking the string. The note that results sounds pure, and is higher pitched than the open string.

Strings vibrate at a base frequency, but these harmonics (they’re actually partials, but I’ll cover that later) are always present. In fact, much of the sound energy of a stringed instrument is in overtones, not in the fundamental frequency. When you “play a harmonic” you’re really just damping out most of the frequencies and putting more energy into simpler multiples of the fundamental frequency.

Overtones are basically multiples of the fundamental frequency. The octave, for example, is twice the frequency of the open string. Touching the string at the 12th fret is touching it at its halfway point. This essentially divides the string into two strings of half the length. The frequency of the note is inversely dependent on the string’s length, so half the length makes a note that’s twice the frequency. The seventh fret is at 1/3rd the length of the string, so the note is three times the frequency; the 5th fret is ¼th the length, so you hear a note two octaves higher, and so on.

The overtones give the instrument its characteristic sound. How many of them there are, their frequencies, their volumes, and their attack and decay determines how the instrument sounds. There are usually many overtones, all mixing together into what you usually think of as a single note.

Tuning depends on overtones, because you can tune an interval by listening to the beats in its overtones.

Take a fifth, for example. Recall from before that the second note in the fifth is 3/2 the frequency of the first. Let’s use A220 as an example; a fifth up from A220 is E330. E330 times two is E660, and A220 times three is E660 also. So by listening to the first overtone of the E, and the second overtone of the A, you can “hear a fifth.”

You’re not really hearing the fifth, of course; you’re really hearing the beats in the overtones of the two notes.

Practice Hearing Intervals

Practice hearing the overtones in intervals. Pick up your guitar and de-tune the lowest E string down to a D. Practice hearing its overtones. Pluck a harmonic at the 12th string and strike your open D string; listen to the beats between the notes. Now play both strings open, with no harmonics, at the same time. Listen again to the overtones, and practice hearing the beats between them. De-tune slightly if you need to, to make the “wow, wow, wow, wow” effect easier to notice.

Take a break; don’t overdo it. Your ear will probably fatigue quickly and you’ll be unable to hear the overtones, especially as you experiment more with complex intervals. In the beginning, you should not be surprised if you can focus on these overtones for only a few minutes before it gets hard to pick them out and things sound jumbled together. Rest for a few hours. I would not suggest doing this more than a couple of times a day initially.

The fatigue is real, by the way. As I mentioned previously, being able to hear beats and ignore the richness of the sound to pick out weak overtones is a complex physical, mental, and neurological skill — and there are probably other factors too. I’d be interested in seeing brain scans of an accomplished tuner at work. Lee Flory was not young, and he told me that his audiologist said his hearing had not decayed with age. This surprised the doctor, because he spent his life listening to loud sounds. Lee attributed this to daily training of his hearing, and told me that the ear is like any other part of the body: it can be exercised. According to Lee, if he took even a single day’s break from tuning, his ear lost some of its acuity.

Back to the topic: When you’re ready, pluck a harmonic on the lowest D string (formerly the E string) at the 7th fret, and the A string at the 12th fret, and listen to the beats between them. Again, practice hearing the same overtones (ignoring the base notes) when you strike both open strings at the same time.

When you’ve heard this, you can move on to a 4th. You can strike the harmonic at the 5th fret of the A string and th 7th fret of the D string, for example, and listen to the beats; then practice hearing the same frequencies by just strumming those two open strings together.

As you do all of these exercises, try your best to ignore pitch (highness or lowness) of the notes, and listen only to the fluctuations in volume. In reality you’ll be conscious of both pitch and beats, but this practice will help develop your tuning ear.

Imperfect Intervals and Counting Beats

You may have noticed that intervals in the equal-tempered 12-tone tuning don’t have exactly the simple relationships I listed before. If you look at the table of frequencies above, for example, you’ll see that in steps of the 12th root of 2, E has a frequency of 329.6276Hz, not 330Hz.

Oh no! Was it all a lie? Without these relationships, does tuning fall apart?

Not really. In the equal-tempered tuning, in fact, there is only one perfect interval: the octave. All other intervals are imperfect, or “tempered.”

  • The 5th is a little “narrow” – the higher note in the interval is slightly flat
  • The 4th is a little “wide” – the higher note is sharp
  • The major 3rd is even wider than the 4th

Other intervals are wide or narrow, just depending on where their frequencies fall on the equal-tempered tuning. (In practice, you will rarely or never tune intervals other than octaves, 5ths, 4ths, and 3rds.)

As the pitch of the interval rises, so does the frequency of the beats. The 4th between A110 and the D above it will beat half as fast as the 4th an octave higher.

What this means is that not only do you need to hear beats, but you need to count them. Counting is done in beats per second. It sounds insanely hard at first (how the heck can you count 7.75 beats a second!?) but it will come with practice.

You will need to know how many beats wide or narrow a given interval will be. You can calculate it easily enough, and I’ll show examples later.

After a while of tuning a given instrument, you’ll just memorize how many beats to count for specific intervals, because as you’ll see, there’s a system for tuning any instrument. You generally don’t need to have every arbitrary interval memorized. You will use only a handful of intervals and you’ll learn their beats.

Tuning The Guitar

With all that theory behind us, we can move on to a tuning system for the guitar.

Let’s list the strings, their frequencies, and some of their overtones.

String Freq Overtone_2 Overtone_3 Overtone_4 Overtone_5 ====== ====== ====== ====== ======= ======= E 82.41 164.81 247.22 329.63 412.03 A 110.00 220.00 330.00 440.00 550.00 D 146.83 293.66 440.50 587.33 734.16 G 196.00 392.00 587.99 783.99 979.99 B 246.94 493.88 740.82 987.77 1234.71 E 329.63 659.26 988.88 1318.51 1648.14

Because the open strings of the guitar form 4ths and one 3rd, you can tune the guitar’s strings open, without any frets, using just those intervals. There’s also a double octave from the lowest E to the highest E, but you don’t strictly need to use that except as a check after you’re done.

For convenience, here’s the same table with only the overtones we’ll use.

String Freq Overtone_2 Overtone_3 Overtone_4 Overtone_5 ====== ====== ========== ========== ========== ========== E 82.41 247.22 329.63 A 110.00 330.00 440.00 D 146.83 440.50 587.33 734.16 G 196.00 587.99 979.99 B 246.94 740.82 987.77 E 329.63 988.88 Tuning the A String

The first thing to do is tune one of the strings to a reference pitch. After that, you’ll tune all of the other strings relative to this first one. On the guitar, the most convenient reference pitch is A440, because the open A string is two octaves below at 110Hz.

You’ll need a good-quality A440 tuning fork. I prefer a Wittner for guitar tuning; it’s a good-quality German brand that is compact, so it fits in your guitar case’s pocket, and has a small notch behind the ball at the end of the stem, so it’s easy to hold in your teeth if you prefer that.

Strike the tuning fork lightly with your fingernail, or tap it gently against your knee. Don’t bang it against anything hard or squeeze the tines, or you might damage it and change its pitch. You can hold the tuning fork against the guitar’s soundboard, or let it rest lightly between your teeth so the sound travels through your skull to your ears, and strike the open A string. Tune the A string until the beats disappear completely. Now put away the tuning fork and continue. You won’t adjust the A string after this.

If you don’t have a tuning fork, you can use any other reference pitch, such as the A on a piano, or a digitally produced A440.

Tuning the Low E String

Strike the open low E and A strings together, and tune the E string. Listen to the beating of the overtones at the frequency of the E two octaves higher. If you have trouble hearing it, silence all the strings, then pluck a harmonic on the E string at the 5th fret. Keep that tone in your memory and then sound the two strings together. It’s important to play the notes together, open, simultaneously so that you don’t get confused by pitches. Remember, you’re trying to ignore pitch completely, and get your ear to isolate the sound of the overtone, ignoring everything but its beating.

When correctly tuned, the A string’s overtone will be at 330Hz and the E string’s will be at 329.63Hz, so the interval is 1/3rd of a beat per second wide. That is, you can tune the E string until the beats disappear, and then flatten the low E string very slightly until you hear one beat every three seconds. The result will be a very slow “wwwoooooowww, wwwwoooooowww” beating.

Tuning the D String

Now that the low E and A strings are tuned, strike the open A and D strings together. You’re listening for beats in the high A440 overtone. The A string’s overtone will be at 440Hz, and the D string’s will be at 440.50Hz, so the interval should be ½ beat wide. Tune the D string until the beats disappear, then sharpen the D string slightly until you hear one beat every 2 seconds.

Tuning the G String

Continue by striking the open D and G strings, and listen for the high D overtone’s beating. Again, if you have trouble “finding the note” with your ear, silence everything and strike the D string’s harmonic at the 5th fret. You’re listening for a high D overtone, two octaves higher than the open D string. The overtones will be at 587.33Hz and 587.99Hz, so the interval needs to be 2/3rds of a beat wide. Counting two beats every three seconds is a little harder than the other intervals we’ve used thus far, but it will come with practice. In the beginning, feel free to just give it your best wild guess. As we’ll discuss a little later, striving for perfection is futile anyway.

Tuning the B String

Strike the open G and B strings. The interval between them is a major 3rd, so this one is trickier to hear. A major 3rd’s frequency ratio is approximately 5/4ths, so you’re listening for the 5th overtone of the G string and the 4th overtone of the B string. Because these are higher overtones, they’re not as loud as the ones you’ve been using thus far, and it’s harder to hear.

To isolate the note you need to hear, mute all the strings and then pluck a harmonic on the B string at the 5th fret. The overtone is a B two octaves higher. Search around on the G string near the 4th fret and you’ll find the same note.

The overtones are 979.99Hz and 987.77Hz, so the interval is seven and three-quarters beats wide. This will be tough to count at first, so just aim for something about 8 beats and call it good enough. With time you’ll be able to actually count this, but it will be very helpful at first to use some rules of thumb. For example, you can compare the rhythm of the beating to the syllables in the word “mississippi” spoken twice per second, which is probably about as fast as you can say it back-to-back without pause.

Tune the B string until the beats disappear, then sharpen it 8 beats, more or less.

Tuning the High E String

You’re almost done! Strike the open B and E strings, and listen for the same overtone you just used to tune the G and B strings: a high B. The frequencies are 987.77Hz and 988.88Hz, so the interval is 1.1 beats wide. Sharpen the E string until the high B note beats a little more than once a second.

Testing The Results

Run a couple of quick checks to see whether you got things right. First, check your high E against your low E. They are two octaves apart, so listen to the beating of the high E string. It should be very slow or nonexistent. If there’s a little beating, don’t worry about it. You’ll get better with time, and it’ll never be perfect anyway, for reasons we’ll discuss later.

You can also check the low E against the open B string, and listen for beating at the B note, which is the 3rd overtone of the E string. The B should be very slightly narrow (flat) — theoretically, you should hear about ¼th of a beat.

Also theoretically, you could tune the high B and E strings against the low open E using the same overtones. However, due to imperfections in strings and the slowness of the beating, this is usually much harder to do. As a result, you’ll end up with high strings that don’t sound good together. A general rule of thumb is that it’s easier to hear out-of-tune-ness in notes that are a) closer in pitch and b) higher pitched, so you should generally “tune locally” rather than “tuning at a distance.” If you don’t get the high strings tuned well together, you’ll get really ugly-sounding intervals such as the following:

  • the 5th between your open G string and the D on the 3rd fret of the B string
  • the 5th between the A on the second fret of the G string and the open high E string
  • the octave between your open G string and the G on the 3rd fret of the high E string
  • the octave between your open D string and the D on the 3rd fret of the B string
  • the 5th between the E on the second fret of the D string and the open B string

If those intervals are messed up, things will sound badly discordant. Remember that the 5ths should be slightly narrow, not perfect. But the octaves should be perfect, or very nearly so.

Play a few quick chords to test the results, too. An E Major, G major, and B minor are favorites of mine. They have combinations of open and fretted notes that helps make it obvious if anything’s a little skewed.

You’re Done!

With time, you’ll be able to run through this tuning system very quickly, and you’ll end up with a guitar that sounds joyously alive in all keys, no matter what chord you play. No more fussing with “this chord sounds good, but that one is awful!” No more trial and error. No more guessing which string is out of tune when something sounds bad. No more game of “tuning whack-a-mole.”

To summarize:

  • Tune the A string with a tuning fork.
  • Tune the low E string 1/3 of a beat wide relative to the A.
  • Tune the D string ½ of a beat wide relative to the A.
  • Tune the G string 2/3 of a beat wide relative to the D.
  • Tune the B string 7 ¾ beats wide relative to the G.
  • Tune the high E string just over 1 beat wide relative to the B.
  • Cross-check the low and high E strings, and play a few chords.

This can be done in a few seconds per string.

If you compare your results to what you’ll get from a digital tuner, you’ll find that with practice, your ear is much better. It’s very hard to tune within a Hz or so with a digital tuner, in part because the indicators are hard to read. What you’ll get with a digital tuner is most strings are pretty close to their correct frequency. This is a lot better than the ad-hoc tuning by trial-and-error you might have been accustomed to doing, because that method results in some intervals being tuned to sound good but others badly discordant. The usual scenario I see is someone’s B string is in good shape, but the G and the E are out of tune. The guitar player then tunes the B string relative to the out-of-tune E and G, and then everything sounds awful. This is because the guitarist had no frame of reference for understanding which strings were out of tune in which directions.

But when you tune by listening to beats, and get good at it, you’ll be able to tune strings to within a fraction of a cycle per second of what they should be. Your results will absolutely be better than a digital tuner.

I don’t mean to dismiss digital tuners. They’re very useful when you’re in a noisy place, or when you’re tuning things like electric guitars, which have distortion that buries overtones in noise. But if you learn to tune by hearing beats, you’ll be the better for it, and you’ll never regret it, I promise. By the way, if you have an Android smartphone, I’ve had pretty good results with the gStrings app.

Advanced Magic

If you do the math on higher overtones, you’ll notice a few other interesting intervals between open strings. As your ear sharpens, you’ll be able to hear these, and use them to cross-check various combinations of strings. This can be useful because as you get better at hearing overtones and beats, you’ll probably start to become a bit of a perfectionist, and you won’t be happy unless particular intervals (such as the 5ths and octaves mentioned just above) sound good. Here they are:

  • Open A String to Open B String. The 9th overtone of the open A string is a high B note at 990Hz, and the 4th overtone of the open B is a high B at 987.77HZ. If you can hear this high note, you should hear it beating just over twice per second. The interval between the A and B strings is a minor 7th, which should be slightly narrow. Thus, if you tune the B until the beating disappears, you should then flatten it two beats.
  • Open D String to Open E String. This is also a minor 7th interval. You’re listening for a very high E note, at 1321.5Hz on the D string, and 1318.5 on the E string, which is 3 beats narrow.
  • Open D String to Open B String. The 5th overtone of the D string is similar to the 3rd overtone of the B string. This interval is about 6 and 2/3 beats wide. This is a bit hard to hear at first, but you’re listening for a high F-sharp.
Systems for Tuning Arbitrary Instruments

The guitar is a fairly simple instrument to tune, because it has only 6 strings, and 4ths are an easy interval to tune. The inclusion of a major 3rd makes it a little harder, but not much.

It is more complicated, and requires more practice, to tune instruments with more strings. The most general approach is to choose an octave, and to tune all the notes within it. Then you extend the tuning up and down the range as needed. For example, to tune the piano you first tune all the notes within a C-to-C octave (piano tuners typically use a large middle-C tuning fork).

Once you have your first octave tuned, the rest is simple. Each note is tuned to the octave below it or above it. But getting that first octave is a bit tricky.

There are two very common systems of tuning: fourths and fifths, and thirds and fifths. As you may know, the cycle of fifths will cycle you through every note in the 12-note scale. You can cycle through the notes in various ways, however.

The system of thirds and fifths proceeds from middle C up a fifth to G, down a third to E-flat, up a fifth to B-flat, and so on. The system of fourths and fifths goes from C up a fifth to G, down a fourth to D, and so on.

All you need to do is calculate the beats in the various intervals and be able to count them. The piano tuners I’ve known prefer thirds and fifths because if there are imperfections in the thirds, especially if they’re not as wide as they should be, it sounds truly awful. Lively-sounding thirds are important; fourths and fifths are nearly perfect, and should sound quite pure, but a third is a complex interval with a lot of different things going on. Fourths and fifths also beat slowly enough that it’s easy to misjudge and get an error that accumulates as you go through the 12 notes. Checking the tuning with thirds helps avoid this.

Tuning a Hammered Dulcimer

I’ve built several many-stringed instruments, including a couple of hammered dulcimers. My first was a home woodworking project with some two-by-four lumber, based on plans from a book by Phillip Mason I found at the library and decided to pick up on a whim. For a homebuilt instrument, it sounded great, and building an instrument like this is something I highly recommend.

Later I designed and built a second one, pictured below. Pardon the dust!

Tuning this dulcimer takes a while. I start with an octave on the bass course. Dulcimers can have many different tunings; this one follows the typical tuning of traditional dulcimers, which is essentially a set of changing keys that cycle backwards by fifths as you climb the scale. Starting at G, for example, you have a C major scale up to the next G, centered around middle C. But the next B is B-flat instead of B-natural, so there’s an F major scale overlapping with the top of the C major, and so on:

G A B C D E F G A B-flat C D...

It’s easy to tune this instrument in fourths and fifths because of the way its scales are laid out. If I do that, however, I find that I have ugly-sounding thirds more often than not. So I’ll tune by combinations of fifths, fourths, and thirds:

G A B C D E F G A B-flat C D... ^-------------^ (up an octave) ^-------^ (down a fifth) ^---^ (up a third) ^-------^ (down a fifth)

And so on. In addition to using thirds where I can (G-B, C-E), I’ll check my fifths and fourths against each other. If you do the math, you’ll notice that the fourth from G to C is exactly as wide as the fifth from C to G again is narrow. (This is a general rule of fourths and fifths. Another rule is that the fourth at the top of the octave beats twice as fast as the fifth at the bottom; so G-D beats half as fast as D-G.)

When I’m done with this reference octave, I’ll extend it up the entire bass course, adjusting for B-flat by tuning it relative to F, and checking any new thirds that I encounter as I climb the scale. And then I’ll extend that over to the right-hand side of the treble course. I do not use the left-hand (high) side of the treble course to tune, because its notes are inaccurate depending on the placement of the bridge.

With a little math (spreadsheets are nice), and some practice, you can find a quick way to tune almost any instrument, along with cross-checks to help prevent skew as you go.

Tuning a Harp

Another instrument I built (this time with my grandfather) is a simplified replica of the Scottish wire-strung Queen Mary harp. This historical instrument might have been designed for some golden and silver strings, according to Ann Heyman’s research. In any case, it is quite difficult to tune with bronze or brass strings. It is “low-headed” and would need a much higher head to work well with bronze or brass.

Tuning this harp is quite similar to the hammered dulcimer, although it is in a single key, so there’s no need to adjust to key changes as you climb the scale. A simple reference octave is all you need, and then it’s just a matter of extending it. I have never tuned a concert harp, but I imagine it’s more involved.

Tangent: I first discovered the wire-strung harp in 1988, when I heard Patrick Ball’s first volume of Turlough O’Carolan’s music. If you have not listened to these recordings, do yourself a favor and at least preview them on Amazon. All these years later, I still listen to Patrick Ball’s music often. His newest recording, The Wood of Morois, is just stunning. I corresponded with Patrick while planning to build my harp, and he put me in touch with master harpmaker Jay Witcher, and his own role model, Ann Heymann, who was responsible for reinventing the lost techniques of playing wire-strung harps. Her recordings are a little hard to find in music stores, but are worth it. You can buy them from her websites http://www.clairseach.com/, http://www.annheymann.com/, and http://www.harpofgold.net/. If you’re interested in learning to play wire-strung harp, her book is one of the main written sources. There are a variety of magazines covering the harp renaissance in the latter part of the 20th century, and they contain much valuable additional material.

Beyond Tuning Theory: The Real World

Although simple math can compute the theoretically correct frequencies of notes and their overtones, and thus the beats of various intervals, in practice a number of factors make things more complicated and interesting. In fact, the math up until now has been of the “frictionless plane” variety. For those who are interested, I’ll dig deeper into these nuances.

The nuances and deviations from perfect theory are the main reasons why a) it’s impossible to tune anything perfectly and b) an instrument that’s tuned skillfully by ear sounds glorious, whereas an instrument tuned digitally can sound lifeless.

Harmonics, Overtones, and Partials

I was careful to use the term “overtone” most of the time previously. In theory, a string vibrates at its fundamental frequency, and then it has harmonic overtones at twice that frequency, three times, and so on.

However, that’s not what happens in practice, because theory only applies to strings that have no stiffness. The stiffness of the string causes its overtones to vibrate at slightly higher frequencies than you’d expect. For this reason, these overtones aren’t true harmonics. This is called inharmonicity, and inharmonic overtones are called partials to distinguish them from the purely harmonic overtones of an instrument like a flute, which doesn’t exhibit the same effect.

You might think that this inharmonicity is a bad thing, but it’s not. Common tones with a great deal of inharmonicity are bells (which often have so much inharmonicity that you can hear the pitches of their partials are too high) and various types of chimes. I keep a little “zenergy” chime near my morning meditation table because its bright tones focus my attention. I haven’t analyzed its spectrum, but because it is made with thick bars of aluminum, I’m willing to bet that it has partials that are wildly inharmonic. Yet it sounds pure and clear.

Much of the richness and liveliness of a string’s sound is precisely because of the “stretched” overtones. Many people compare Patrick Ball’s brass-strung wire harp to the sound of bells, and say it’s “pure.” It may sound pure, but pure-sounding is not simple-sounding. Its tones are complex and highly inharmonic, which is why it sounds like a bell.

In fact, if you digitally alter a piano’s overtones to correct the stretching, you get something that sounds like an organ, not a piano. This is one of the reasons that pianos tuned with digital tuners often sound like something scraped from the bottom of a pond.

Some digital tuners claim to compensate for inharmonicity, but in reality each instrument and its strings are unique and will be inharmonic in different ways.

Some practical consequences when tuning by listening to beats:

  • Don’t listen to higher partials while tuning. When tuning an octave, for example, you should ignore the beating of partials 2 octaves up. This is actually quite difficult to do and requires a well-developed ear. The reason is that higher partials will beat even when the octave is perfect, and they beat more rapidly and more obviously than the octave. Tuning a perfect octave requires the ability to hear very subtle, very gradual beats while blocking out distractions. This is also why I said not to worry if your low E string and high E string beat slightly. When tuned as well as possible, there will probably be a little bit of beating.
  • You might need to ignore higher partials in other intervals as well.
  • You might need to adjust your tuning for stretching caused by inharmonicity. In practice, for example, most guitars need to be tuned to slightly faster beats than you’d expect from pure theory.
  • Cross-checking your results with more complex intervals (especially thirds) can help balance the stretching better, and make a more pleasing-sounding tuning.
  • You might find that when using the “advanced tricks” I mentioned for the guitar, the open intervals such as minor 7ths will beat at different rates than you’d predict mathematically. However, once you are comfortable tuning your guitar so it sounds good, you’ll learn how fast those intervals should beat and it’ll be a great cross-reference for you.
Sympathetic and False Beats

It’s often very helpful to mute strings while you’re tuning other strings. The reason is that the strings you’re tuning will set up sympathetic vibrations in other strings that have similar overtones, and this can distract you.

When tuning the guitar, this generally isn’t much of a problem. However, be careful that when you tune the low E and A strings you don’t get distracted by vibrations from the high E string.

When tuning other instruments such as a hammered dulcimer or harp, small felt or rubber wedges (with wire handles if possible) are invaluable. If you don’t have these, you can use small loops of cloth.

In addition to distraction from sympathetic vibrations, strings can beat alone, when no other note is sounding. This is called a false beat. It’s usually caused by a flaw in the string itself, such as an imperfection in the wire or a spot of rust. This is a more difficult problem, because you can’t just make it go away. Instead, you will often have to nudge the tuning around a little here, a little there, to make it sound the best you can overall, given that there will be spurious beats no matter what. False beats will challenge your ear greatly, too.

In a guitar, false beats might signal that it’s time for a new set of strings. In a piano or other instrument, strings can be expensive to replace, and new strings take a while to settle in, so it’s often better to just leave it alone.

Imperfect Frets, Strings, Bridges and Nuts

I’ve never played a guitar with perfect frets. The reality is that every note you fret will be slightly out of tune, and one goal of tuning is to avoid any particular combination of bad intervals that sounds particularly horrible.

This is why it’s helpful to play at least a few chords after tuning. If you tune a particular instrument often you’ll learn the slight adjustments needed to make things sound as good as possible. On my main guitar, for example, the B string needs to be slightly sharp so that a D sounds better.

It’s not only the frets, but the nut (the zeroth fret) and the bridge (under the right hand) that matter. Sometimes the neck needs to be adjusted as well. A competent guitar repairman should be able to adjust the action if needed.

Finally, the weight and manufacture of the strings makes a difference. My main guitar and its frets and bridge sound better and more accurate with medium-weight Martin bronze-wound strings than other strings I’ve tried. As your ear improves, you’ll notice subtleties like this.

New Strings

New strings (or wires) will take some time to stretch and settle in so they stay in tune. You can shorten this time by playing vigorously and stretching the strings, bending them gently. Be careful, however, not to be rough with the strings. If you kink them or strain them past their elastic point, you’ll end up with strings that have false beats, exaggerated inharmonicity, or different densities along some lengths of the string, which will make it seem like your frets are wrong in strange ways.

The Instrument Flexes and Changes

If an instrument is especially out of tune, the first strings you tune will become slightly out of tune as you change the tension on the rest of the instrument. The best remedy I can offer for this is to do a quick approximate tuning without caring much about accuracy. Follow this up with a second, more careful tuning.

This was especially a problem with my first hammered dulcimer, and is very noticeable with my harp, which flexes and changes a lot as it is tuned. My second hammered dulcimer has a ¾ inch birch plywood back and internal reinforcements, so it’s very stable. On the downside, it’s heavy!

Temperature and humidity play a large role, too. All of the materials in an instrument respond in different ways to changes in temperature and humidity. If you have a piano, you’re well advised to keep it in a climate-controlled room. If you’re a serious pianist you already know much more than I do about this topic.

Friction and Torque in Tuning Pin and Bridges

For guitarists, it’s important to make sure that your nut (the zeroth fret) doesn’t pinch the string and cause it to move in jerks and starts, or to have extra tension built up between the nut and the tuning peg itself. If this happens, you can rub a pencil in the groove where the string rides. The graphite in the pencil is a natural lubricant that can help avoid this problem.

Of course, you should also make sure that your tuning pegs and their machinery are smooth and well lubricated. If there’s excessive slop due to wear-and-tear or cheap machinery, that will be an endless source of frustration for you.

On instruments such as pianos, hammered dulcimers, and harps, it’s important to know how to “set” the tuning pin. While tuning the string upwards, you’ll create torque on the pin, twisting it in the hole. The wood fibers holding it in place will also be braced in a position that can “flip” downwards. If you just leave the pin like this, it will soon wiggle itself back to its normal state, and even beyond that due to the tension the wire places on the pin. As a result, you need to practice tuning the note slightly higher than needed, and then de-tuning it, knocking it down to the desired pitch with a light jerk and leaving it in a state of equilibrium.

This technique is also useful in guitars and other stringed instruments, but each type of tuning machine has its own particularities. The main point to remember is that if you don’t leave things in a state of equilibrium and stability, they’ll find one soon enough, de-tuning the instrument in the process.

References and Further Reading

I tried to find the book from which I studied tuning as a child, but I can’t anymore. I thought it was an old Dover edition. The Dover book on tuning that I can find is not the one I remember.

You can find a little bit of information at various places online. One site with interesting information is Historical Tuning of Keyboard Instruments by Robert Chuckrow. I looked around on Wikipedia but didn’t find much of use. Please suggest further resources in the comments.

In this post I discussed the equally-tempered tuning, but there are many others. The study of them and their math, and the cultures and musical histories related to them, is fascinating. Next time you hear bagpipes, or a non-Western instrument, pay attention to the tuning. Is it tempered? Are there perfect intervals other than the octave? Which ones?

Listening to windchimes is another interesting exercise. Are the chimes harmonic or do they have inharmonicity? What scales and tunings do they use? What are the effects? Woodstock chimes use many unique scales and tunings. Many of their chimes combine notes in complex ways that result in no beating between some or all of the tones. Music of the Spheres also makes stunning chimes in a variety of scales and tunings.

As I mentioned, spreadsheets can be very helpful in computing the relationships between various notes and their overtones. I’ve made a small online spreadsheet that contains some of the computations I used to produce this blog post.

Let me know if you suggest any other references or related books, music, or links.

Enjoy your beautifully tuned guitar or other instrument, and most of all, enjoy the process of learning to tune and listen! I hope it enriches your appreciation and pleasure in listening to music.

Suggested links from various sources:

Picture Credits
Categories: MySQL

A review of Bose, Sony, and Sennheiser noise-cancelling headphones

Xaprb, home of innotop - Thu, 2014-01-16 00:00

I’ve used active noise-cancelling headphones for over ten years now, and have owned several pairs of Bose, one of Sony, and most recently a pair of Sennheiser headphones. The Sennheisers are my favorites. I thought I’d write down why I’ve gone through so many sets of cans and what I like and dislike about them.

Bose QuietComfort 15 Acoustic Noise Cancelling Headphones

I’m sure you’re familiar with Bose QuietComfort headphones. They’re the iconic “best-in-class” noise-cancelling headphones, the ones you see everywhere. Yet, after owning several pairs (beginning with Quiet Comfort II in 2003), I decided I’m not happy with them and won’t buy them anymore. Why not?

  • They’re not very good quality. I’ve worn out two pairs and opted to sell the third pair that Bose sent me as a replacement. Various problems occurred, including torn speakers that buzzed and grated. I just got tired of sending them back to Bose for servicing.
  • They’re more expensive than I think they’re worth, especially given the cheap components used.
  • They don’t sound bad – but to my ears they still have the classic Bose fairy-dust processing, which sounds rich and pleasant at first but then fatigues me.
  • They produce a sensation of suction on the eardrums that becomes uncomfortable over long periods of time.
  • They can’t be used in non-cancelling mode. In other words, if the battery is dead, they’re unusable.
  • On a purely personal note, I think Bose crosses the line into greed and jealousy. I know this in part because I used to work at Crutchfield, and saw quite a bit of interactions with Bose. As an individual – well, try selling a pair of these on eBay, and you’ll see what I mean. I had to jump through all kinds of hoops after my first listing was cancelled for using a stock photo that eBay themselves suggested and provided in the listing wizard. Here is the information the take-down notice directed me to.

On the plus side, the fit is very comfortable physically, they cancel noise very well, and they’re smaller than some other noise-cancelling headphones. Also on the plus side, every time I’ve sent a pair in for servicing, Bose has just charged me $100 and sent me a new pair.

Sony MDR-NC200D

When I sent my last pair of Bose in for servicing, they replaced them with a factory-sealed pair of new ones in the box, and I decided to sell them on eBay and buy a set of Sony MDR-NC200D headphones, which were about $100 less money than new Bose headphones at the time. I read online reviews and thought it was worth a try.

First, the good points. The Sonys are more compact even than the Bose, although as I recall they’re a little heavier. And the noise cancellation works quite well. The passive noise blocking (muffling) is in itself quite good. You can just put them on without even turning on the switch, and block a lot of ambient noise. The sound quality is also quite good, although there is a slight hiss when noise cancellation is enabled. Active cancellation is good, but not as good as the Bose.

However, it wasn’t long before I realized I couldn’t keep them. The Sonys sit on the ear, and don’t enclose the ear and sit against the skull as the Bose do. They’re on-the-ear, not over-the-ear. Although this doesn’t feel bad at first, in about 20 minutes it starts to hurt. After half an hour it’s genuinely painful. This may not be your experience, but my ears just start to hurt after being pressed against my head for a little while.

I had to sell the Sonys on eBay too. My last stop was the Sennheisers.

Sennheiser PXC 450 NoiseGard Active Noise-Canceling Headphones

The Sennheiser PXC 450 headphones are midway in price between the Bose and the Sony: a little less expensive than the Bose. I’ve had them a week or so and I’m very happy with them so far.

This is not the first pair of Sennheisers I’ve owned. I’ve had a pair of open-air higher-end Sennheisers for over a decade. I absolutely love them, so you can consider me a Sennheiser snob to some extent.

I’m pleased to report that the PXC 450s are Sennheisers through and through. They have amazing sound, and the big cups fit comfortably around my ears. They are a little heavier than my other Sennheisers, but still a pleasure to wear.

The nice thing is that not only does noise cancellation work very well (on par with Bose’s, I’d say), but there is no sensation of being underwater with pressure or suction on the eardrums. Turn on the noise cancellation switch and the noise just vanishes, but there’s no strange feeling as a result. Also, these headphones can work in passive mode, with noise cancellation off, and don’t need a battery to work.

On the downside, if you want to travel with them, they’re a little bigger than the Bose. However I’ve travelled with the Bose headphones several times and honestly I find even them too large to be convenient. I don’t use noise-cancelling headphones for travel, as a result.

Another slight downside is that the earcups aren’t completely “empty” inside. There are some caged-over protrusions with the machinery inside. Depending on the shape of your ears, these might brush your ears if you move your head. I find that if I don’t place the headphones in the right spot on my head, they do touch my ears every now and then.

Summary

After owning several pairs of top-rated noise-cancelling headphones, I think the Sennheisers are the clear winners in price, quality, comfort, and sound. Your mileage may vary.

Categories: MySQL

Xaprb now uses Hugo

Xaprb, home of innotop - Wed, 2014-01-15 00:00

I’ve switched this blog from Wordpress to Hugo. If you see any broken links or other problems, let me know. I’ll re-enable comments and other features in the coming days.

Why not Wordpress? I’ve used Wordpress since very early days, but I’ve had my fill of security problems, the need to worry about whether a database is up and available, backups, plugin compatibility problems, upgrades, and performance issues. In fact, while converting the content from Wordpress to Markdown, I found a half-dozen pages that had been hacked by some link-farm since around 2007. This wasn’t the first such problem I’d had; it was merely the only one I hadn’t detected and fixed. And I’ve been really diligent with Wordpress security; I have done things like changing my admin username and customizing my .htaccess file to block common attack vectors, in addition to the usual “lockdown” measures that one takes with Wordpress.

In contrast to Wordpress or other CMSes that use a database, static content is secure, fast, and worry-free. I’m particularly happy that my content is all in Markdown format now. Even if I make another change in the future, the content is now mostly well-structured and easy to transform as desired. (There are some pages and articles that didn’t convert so well, but I will clean them up later.)

Why Hugo? There are lots of static site generators. Good ones include Octopress and Jekyll, and I’ve used those. However, they come with some of their own annoyances: dependencies, the need to install Ruby and so on, and particularly bothersome for this blog, performance issues. Octopress ran my CPU fan at top speed for about 8 minutes to render this blog.

Hugo is written in Go, so it has zero dependencies (a single binary) and is fast. It renders this blog in a couple of seconds. That’s fast enough to run it in server mode, hugo server -w, and I can just alt-tab back and forth between my writing and my browser to preview my changes. By the time I’ve tabbed over, the changes are ready to view.

Hugo isn’t perfect. For example, it lacks a couple of features that are present in Octopress or Jekyll. But it’s more than good enough for my needs, and I intend to contribute some improvements to it if I get time. I believe it has the potential to be a leading static site/blog generator going forward. It’s already close to a complete replacement for something like Jekyll.

Categories: MySQL

Immutability, MVCC, and garbage collection

Xaprb, home of innotop - Sat, 2013-12-28 20:33

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut. Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly.

Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability.

In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version. Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened? Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality. The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database). ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases — no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely. Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Categories: MySQL

Immutability, MVCC, and garbage collection

Xaprb, home of innotop - Sat, 2013-12-28 00:00

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut.

Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly. Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability. In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version.
Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened?

Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality.

The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database).

ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases – no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely.

Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Categories: MySQL

Early-access books: a double-edged sword

Xaprb, home of innotop - Thu, 2013-12-26 21:46

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too.

To some extent, editions create a similar problem[1]. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

[1] Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator — cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Categories: MySQL

Early-access books: a double-edged sword

Xaprb, home of innotop - Thu, 2013-12-26 00:00

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too. To some extent, editions create a similar problem1. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

1 Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator – cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Categories: MySQL

Napkin math: How much waste does Celestial Seasonings save?

Xaprb, home of innotop - Sun, 2013-12-22 19:32

I was idly reading the Celestial Seasonings box today while I made tea. Here’s the end flap:

It seemed hard to believe that they really save 3.5 million pounds of waste just by not including that extra packaging, so I decided to do some back-of-the-napkin math.

How much paper is in each package of non-Celestial-Seasonings tea? The little bag is about 2 inches by 2 inches, it’s two-sided, and there’s a tag, staple, and string. Call it 10 square inches.

How heavy is the paper? It feels about the same weight as normal copy paper. Amazon.com lists a box of 5000 sheets of standard letter-sized paper at a shipping weight of 50 pounds (including the cardboard box, but we’ll ignore that). Pretend that each sheet (8.5 * 11 inches = 93.5 square inches) is about 100 square inches. That’s .0001 pounds per square inch.

How much tea does Celestial Seasonings sell every year? Wikipedia says their sales in the US are over $100M, and they are a subsidiary of Hain Celestial, which has a lot of other large brands. Hain’s sales last year were just under $500M. $100M is a good enough ballpark number. Each box of 20 tea bags sells at about $3.20 on their website, and I think it’s cheaper at my grocery store. Call it $3.00 per box, so we’ll estimate the volume of tea bags on the high side (to make up for the low-side estimate caused by pretending there’s 100 square inches per sheet of paper). That means they sell about 33.3M boxes, or 667M bags, of tea each year.

If they put bags, tags, and strings on all of them, I estimated 10 square inches of paper per bag, so at .0001 pound per square inch that’s .001 pound of extra paper and stuff per bag. That means they’d use about 667 thousand pounds of paper to bag up all that tea.

That’s quite a difference from the 3.5 million pounds of waste they claim they save. Did I do the math wrong or assume something wrong?

Categories: MySQL
Syndicate content