MySQL

Generating Realistic Time Series Data

Xaprb, home of innotop - Fri, 2014-01-24 00:00

I am interested in compiling a list of techniques to generate fake time-series data that looks and behaves realistically. The goal is to make a mock API for developers to work against, without needing bulky sets of real data, which are annoying to deal with, especially as things change and new types of data are needed.

To achieve this, I think several specific things need to be addressed:

  1. What common classes or categories of time-series data are there? For example,
    • cyclical (ex: traffic to a web server day-over-day)
    • apparently random (ex: stock ticker)
    • generally increasing (ex: stock ticker for an index)
    • exponentially decaying (ex: unix load average)
    • usually zero, with occasional nonzero values (ex: rainfall in a specific location)
  2. What parameters describe the data’s behavior? Examples might include an exponential decay, periodicity, distribution of values, distribution of intervals between peaks, etc.
  3. What techniques can be used to deterministically generate data that approximates a given category of time-series data, so that one can generate mock sources of data without storing real examples? For a simplistic example, you could seed a random number generator for determinism, and use something like y_n = rand() * 10 + 100 for data that fluctuates randomly between 90 and 100.

To make the mock API, I imagine we could catalog a set of metrics we want to be able to generate, with the following properties for each:

  • name
  • type
  • dimensions
  • parameters
  • random seed or other initializer

This reduces the problem from what we currently do (keeping entire data sets, which need to be replaced as our data gathering techniques evolve) into just a dictionary of metrics and their definitions.

Then the mock API would accept requests for a set of metrics, the time range desired, and the resolution desired. The metrics would be computed and returned.

To make this work correctly, the metrics need to be generated deterministically. That is, if I ask for metrics from 5am to 6am on a particular day, I should always get the same values for the metrics. And if I ask for a different time range, I’d get different values. What this means, in my opinion, is that there needs to be a closed-form function that produces the metric’s output for a given timestamp. (I think one-second resolution of data is fine enough for most purposes.)

Does anyone have suggestions for how to do this?

The result will be open-sourced, so everyone who’s interested in such a programmatically generated dataset can benefit from it.

Categories: MySQL

Speaking at Percona Live

Xaprb, home of innotop - Thu, 2014-01-23 00:00

I’m excited to be speaking at the Percona Live MySQL Conference again this year. I’ll present two sessions: Developing MySQL Applications with Go and Knowing the Unknowable: Per-Query Metrics. The first is a walk-through of everything I’ve learned over the last 18 months writing large-scale MySQL-backed applications with Google’s Go language. The second is about using statistical techniques to find out things you can’t even measure, such as how much CPU a query really causes MySQL to use. There are great reasons that this is both desirable to know, and impossible to do directly in the server itself.

I’m also looking forward to the conference overall. Take a few minutes and browse the selection of talks. As usual, it’s a fantastic program; the speakers are really the top experts from the MySQL world. The conference committee and Percona have done a great job again this year! See you in Santa Clara.

Categories: MySQL

On Crossfit and Safety

Xaprb, home of innotop - Mon, 2014-01-20 00:00

I’ve been a happy CrossFiter for a few years now. I met my co-founder and many friends in CrossFit Charlottesville, completely changed my level of fitness and many key indicators of health such as my hemoglobin A1C and vitamin D levels, am stronger than I’ve ever been, feel great, and now my wife does CrossFit too. It’s fantastic. It’s community, fun, health, fitness. It’s the antidote to the boring gyms I forced myself to go to for years and hated every minute.

But there is a fringe element in CrossFit, which unfortunately looks mainstream to some who don’t really have enough context to judge. From the outside, CrossFit can look almost cult-like. It’s easy to get an impression of people doing dangerous things with little caution or training. To hear people talk about it, everyone in CrossFit works out insanely until they vomit, pushing themselves until their muscles break down and vital organs go into failure modes.

That’s not what I’ve experienced. I’ve never seen anyone vomit, or even come close to it as far as I know. I think that part of this dichotomy comes from certain people trying to promote CrossFit as a really badass thing to do, so they not only focus on extreme stories, they even exaggerate stories to sound more extreme.

Last week there was a tragic accident: Denver CrossFit coach Kevin Ogar injured himself badly. This has raised the issue of CrossFit safety again.

To be clear, I think there is something about CrossFit that deserves to be looked at. It’s just not the mainstream elements, that’s all. The things I see about CrossFit, which I choose not to participate in personally, are:

  1. The hierarchy and structure above the local gyms. If you look at local gyms and local events, things look good. Everyone’s friends and nobody does stupid things. But when you get into competitions, people are automatically elevated into the realms of the extreme. This reaches its peak at the top levels of the competitions. Why? Because there’s something to gain besides just fitness. When someone has motivations (fame, endorsements and sponsorship, financial rewards) beyond just being healthy, bad things are going to happen. There’s talk now about cheating and performance-enhancing drugs and all kinds of “professional sports issues.” Those are clear signs that it’s not about fitness and health.
  2. Some inconsistencies in the underlying philosophy from the founders of CrossFit. I’m not sure how much this gets discussed, but a few of the core concepts (which I agree with, by the way) are that varied, functional movements are good. The problem is, the workout movements aren’t all functional. A few of them are rather hardcore and very technical movements chosen from various mixtures of disciplines.
  3. Untempered enthusiasm about, and ignorant promotion of, things such as the so-called Paleo Diet. I’m biased about this by being married to an archaeologist, but it isn’t the diet that is the issue. It’s the fanaticism that some people have about it, which can be off-putting to newcomers.

I’m perfectly fine when people disagree with me on these topics. Lots of people are really enthusiastic about lots of things. I choose to take what I like about CrossFit and leave the rest. I would point out, however, that the opinions of those who don’t really know CrossFit first-hand tend to be colored by the extremism that’s on display.

Now, there is one issue I think that’s really important to talk about, and that’s the safety of the movements. This comes back to point #2 in my list above. I’d especially like to pick out one movement that is done in a lot of CrossFit workouts.

The Snatch

If you’re not familiar with the snatch, it’s an Olympic weightlifting movement where the barbell is pulled from the floor as high as possible in one movement. The athlete then jumps under the barbell, catching it in a deep squat with arms overhead, and stands up to complete the movement with the bar high overhead. Here’s an elite Olympic lifter just after catching the bar at the bottom of the squat.

The snatch is extremely technical. It requires factors such as balance, timing, strength, and flexibility to come together flawlessly. Many of these factors are not just necessary in moderate quantities. For example, the flexibility required is beyond what most people are capable of without a lot of training. If you don’t have the mobility to pull off the snatch correctly, your form is compromised and it’s dangerous.

The snatch is how Kevin Ogar got hurt. Keep in mind this guy is a CrossFit coach himself. He’s not a novice.

I challenge anyone to defend the snatch as a functional movement. Tell me one time in your life when you needed to execute a snatch, and be serious about it. I can see the clean-and-jerk’s utility. But not the snatch. It’s ridiculous.

The snatch is also inherently very dangerous. You’re throwing a heavy weight over your head and getting under it, fast. You’re catching it in an extremely compromised position. And if you drop it, which is hard not to do, where’s it going to go? It’s going to fall on you. Here’s another Olympic athlete catching hundreds of pounds with his neck when a snatch went a little bit wrong. A split second later this picture looked much worse, but I don’t want to gross you out.

The next issue is that the snatch features prominently in many CrossFit workouts, especially competition workouts. This is not a small problem. Think about it: in competition, when these extreme athletes have raised the bar to such an extent that weeding out the best of the best requires multi-day performances few mortals could ever achieve, we’re throwing high-rep, heavy-weight snatches into the mix. What’s astonishing isn’t that Kevin Ogar got seriously injured. What’s amazing is that we don’t have people severing their spines on the snatch all the time.

What on earth is wrong with these people? What do they expect?

You might think this is an issue that’s only present in the competitions. But that’s not true. I generally refuse to do snatches in workouts at the gym. I will substitute them for other movements. Why? Take a look at one sample snatch workout:

AMRAP (as many rounds as possible) in 12 Minutes of:

  1. Snatch x 10
  2. Double Under x 50
  3. Box Jump x 10
  4. Sprint

That’s 12 minutes of highly challenging movements (to put it in perspective, most non-CrossFitters, and even many CrossFitters, would not be able to do the double-unders or box-jumps). You’re coming off a sprint and you’re going to throw off 10 snatches in a row, and you’re going to do it with perfect form? Unlikely. This is just asking for injury.

Or we could look at the “named WODs” that are benchmarks for CrossFitters everywhere. There’s Amanda, for example: 9, 7, and 5 reps of muscle-ups and snatches, as fast as possible. Or Isabel: 30 reps of 135-pound snatches, as fast as possible. To get a sense for how insane that actually is, take a look at Olympic weightlifting competitor Kendrick Farriss doing Isabel. The man is a beast and he struggles. And his form breaks down. I’m over-using italics. I’m sorry, I’ll cool down.

My point is that I think this extremely technical, very dangerous movement should have limited or no place in CrossFit workouts. I think it does very little but put people into a situation where they’re at very high risk of getting injured. I do not think it makes people more fit more effectively than alternative movements. I think one can get the same or better benefits from much safer movements.

Doing the snatch is an expert stunt. I personally think that I’ll never be good at snatches unless I do them twice a week, minimum. And one of the tenets of CrossFit is that there should be a large variety of constantly varied movements. This automatically rules out doing any one movement very often. In my own CrossFit workouts, going to the gym 2 or 3 times a week, I typically go weeks at a time without being trained on snatches in pre-workout skill work. That is nowhere near enough to develop real skill at it. (This is why I do my skill-work snatches with little more than an empty bar.)

There are other movements in CrossFit that I think are riskier than they need to be, but snatches are the quintessential example.

I know many people who are experts in these topics will disagree with me very strongly, and I don’t mind that. This is just my opinion.

Bad Coaches, Bad Vibes

There’s one more problem that contributes, I think, to needless risk in CrossFit gyms. This is the combination of inadequate coaching and a focus on “goal completion” to the exclusion of safety and absolutely perfect form, especially during workouts where you’re trying to finish a set amount of movements as fast as possible, or do as much as possible in a fixed time.

There’s no getting around the fact that CrossFit coaches aren’t all giving the same level of attention to their athletes, nor do all of them have the qualifications they need.

Anecdotally, I’ll tell the story of traveling in California, where I visited a gym and did one of my favorite workouts, Diane. In Diane, you deadlift 225 pounds 21 reps, do 21 handstand pushups, repeat both movements with 15 reps each, and finish with 9 reps each.

Deadlifting consists of grasping the bar on the ground and standing up straight, then lowering it again. It is not a dynamic or unstable movement. You do not move through any out-of-control ranges of motion. If you drop the bar you won’t drop it on yourself, it’ll just fall to the ground. Nevertheless, if done wrong, it can injure you badly, just like anything else.

The gym owner / coach didn’t coach. There’s no other way to say it. He set up a bar and said “ok, everyone look at me.” He then deadlifted and said some things that sounded really important about how to deadlift safely. Then he left us on our own. A relative newcomer was next to me. His form and technique were bad, and the coach didn’t say anything. He was standing at the end of the room, ostensibly watching, but he either wasn’t really looking, or he was lazy, or he didn’t know enough to see that the guy was doing the movement unsafely.

The newcomer turned to me and asked me what weight I thought he should use. I recommended that he scale the weights way down, but it wasn’t my gym and I wasn’t the coach. He lifted too heavy. I don’t think he hurt himself, but he was rounding his back horribly and I’m sure he found it hard to move for a week afterward. The coach just watched from the end of the gym, all the way through the workout. All he did was start and stop the music. What a jerk.

There’s an element of responsibility to put on the athletes. You need to know whether you’re doing things safely or not. If you don’t know, you should ask your coach. For me, rule #1 is to find out how much I don’t know, and not to attempt something unless I know how much I know about it. This athlete should have taken the matter into his own hands and asked for more active coaching.

But that doesn’t excuse the coach either.

The gym I go to — that nonsense does not happen. And I’ve been to a few gyms over the years and found them to be good. I’m glad I learned in a safe environment, but not all gyms and coaches are that good.

Precedent and Direction-Setting, and Lack of Reporting

What worries me the most is that the type of tragedy that happened to Kevin Ogar is going to happen close to home and impact my friends or my gym. The problem is complex to untangle, but in brief,

  1. Once annually there’s a series of quasi-competitions called the CrossFit Open. These are scored workouts over a span of weeks. They are set by the national CrossFit organization, not the local gyms. The scores are used to filter who is the first rank of competitors to go to regional competitions, and then eventually on to the annual CrossFit Games.
  2. The CrossFit Open workouts will certainly include snatches.
  3. If local gyms don’t program snatches regularly, their members won’t be prepared at all for the Open.
  4. Local gyms don’t have to participate in the Open, and don’t have to encourage their members to, but that’s easier said than done due to the community aspects of CrossFit.

The result, in my opinion, is that there’s systemic pressure for gyms and members to do things that carry a higher risk-to-reward ratio than many members would prefer. Anecdotally, many members I’ve spoken to share my concerns about the snatch. They love CrossFit, but they don’t like the pressure to do this awkward and frightening movement.

Finally, it’s very difficult to understand how serious the problem really is. Is there a high risk of injury from a snatch, or does it just seem that way because of high-profile incidents? Are we right to be afraid of the snatch, or is it just a movement that makes you feel really vulnerable? The problem here is that there’s no culture of reporting incidents in CrossFit.

I can point to another sport where that culture does exist: caving. The National Speleological Society publishes accident reports, and conscientious cavers share a culture that every incident, even trivial ones, must be reported. As a result, you can browse the NSS accident reports (summarized here) and see some things clearly (you have to be a member to access the full reports, which are often excruciatingly detailed). One of the most obvious conclusions you’ll draw right away is that cave diving (scuba diving in underwater caves) is incredibly dangerous and kills a lot of people, despite it being a small portion of the overall caving sport’s popularity. If you weren’t a caver and you didn’t know about cave diving, would you think this was the case? I’m not sure I would. After reading cave diving accident reports, I remember being shocked at how many people are found dead underwater for no apparent reason, with air left in their tanks. The accident reports help cavers assess the risks of what they do.

Nothing similar exists for CrossFit, and I wish it did.

Negative Press About CrossFit

On the topic of what gets attention and exposure, I’ve seen a bunch of attention-seeking blog posts from people who “told the dirty truth” about how CrossFit injured them and there’s a culture of silencing dissenters and so on. I’m sure some of that happens, but the stuff I’ve read has been from people who have an axe to grind. And frankly, most of those people were indisputably idiots. They were blaming their problems and injuries on CrossFit when the real problem was between their ears. I won’t link to them, because they don’t deserve the attention.

Don’t believe most of what you read online about CrossFit. Many of the people telling their personal stories about their experiences in CrossFit are drama queens blowing things completely out of proportion. There’s a lot of legitimate objective criticism too, most of it from neutral third-parties who have serious credentials in physical fitness coaching, but this doesn’t get as much attention. And there’s a lot of great writing about what’s good about CrossFit, much of it from the good-hearted, honest, knowledgeable coaches and gym owners who soldier on despite the ongoing soap operas and media hype wars. They’re bringing fitness and health — and fun — to people who otherwise don’t get enough of it.

Summary

Toss corporate sponsors, personal politics, competition, the lure of great gains from winning, and a bunch of testosterone together and you’re going to get some people hurt. Mix it in with snatches and it’s a miracle if nobody gets seriously injured.

If you participate in CrossFit, which I highly recommend, take responsibility for your own safety. If there is a rah-rah attitude of pushing too hard at all costs in your gym, or if your coaches aren’t actually experts at what they do (the CrossFit weekend-long certification seminars don’t count), or if it’s not right for any other reason, go elsewhere.

Stay healthy and have fun, and do constantly varied, functional movements at high intensity in the company of your peers – and do it safely.

Photo credits:

Categories: MySQL

How to Tune A Guitar (Or Any Instrument)

Xaprb, home of innotop - Sat, 2014-01-18 00:00

Do you know how to tune a guitar? I mean, do you really know how to tune a guitar?

I’ve met very few people who do. Most people pick some notes, crank the tuners, play some chords, and endlessly fidget back and forth until they either get something that doesn’t sound awful to their ears, or they give up. I can’t recall ever seeing a professional musician look like a tuning pro on stage, either. This really ought to be embarrassing to someone who makes music for a career.

There’s a secret to tuning an instrument. Very few people seem to know it. It’s surprisingly simple, it isn’t at all what you might expect, and it makes it easy and quick to tune an instrument accurately without guesswork. However, even though it’s simple and logical, it is difficult and subtle at first, and requires training your ear. This is a neurological, physical, and mental process that takes some time and practice. It does not require “perfect pitch,” however.

In this blog post I’ll explain how it works. There’s a surprising amount of depth to it, which appeals to the nerd in me. If you’re looking for “the short version,” you won’t find it here, because I find the math, physics, and theory of tuning to be fascinating, and I want to share that and not just the quick how-to.

If you practice and train yourself to hear in the correct way, with a little time you’ll be able to tune a guitar by just striking the open strings, without using harmonics or frets. You’ll be able to do this quickly, and the result will be a guitar that sounds truly active, alive, energetic, amazing — much better results than you’ll get with a digital tuner. As a bonus, you’ll impress all of your friends.

My Personal History With Tuning

When I was a child my mother hired a piano tuner who practiced the “lost art” of tuning entirely by ear. His name was Lee Flory. He was quite a character; he’d tuned for famous concert pianists all over the world, toured with many of them, and had endless stories to tell about his involvement with all sorts of musicians in many genres, including bluegrass and country/western greats. My mother loved the way the piano sounded when he tuned it. It sang. It was alive. It was joyous.

For whatever reason, Lee took an interest in me, and not only tolerated but encouraged my fascination with tuning. I didn’t think about it at the time, but I’m pretty sure he scheduled his visits differently to our house. I think he allowed extra time so that he could spend an hour or more explaining everything to me, playing notes, coaching me to hear subtleties.

And thus my love affair with the math, physics, and practice of tuning began.

Beats

The first great secret is that tuning isn’t about listening to the pitch of notes. While tuning, you don’t try to judge whether a note is too high or too low. You listen to something called beats instead.

Beats are fluctuations in volume created by two notes that are almost the same frequency.

When notes are not quite the same frequency, they’ll reinforce each other when the peaks occur together, and cancel each other out when the peaks are misaligned. Here’s a diagram of two sine waves of slightly different frequencies, and the sum of the two (in red).

Your ear will not hear two distinct notes if they’re close together. It’ll hear the sum.

Notice how the summed wave (the red wave) fluctuates in magnitude. To the human ear, this sounds like a note going “wow, wow, wow, wow.” The frequency of this fluctuation is the difference between the frequencies of the notes.

This is the foundation of all tuning by ear that isn’t based on guesswork.

Before you go on, tune two strings close together on your guitar or other instrument, and listen until you can hear it. Or, just fret one string so it plays the same note as an open string, and strike them together. Bend the string you’ve fretted, a little less, a little more. Listen until you hear the beats.

The Math of Pitch

Musical notes have mathematical relationships to one another. The exact relationships depend on the tuning. There are many tunings, but in this article I’ll focus on the tuning used for nearly all music in modern Western cultures: the 12-tone equal temperament tuning.

In this tuning, the octave is the fundamental interval of pitch. Notes double in frequency as they rise an octave, and the ratio of frequencies between each adjacent pair of notes is constant. Since there are twelve half-steps in an octave, the frequency increase from one note to the next is the twelfth root of 2, or about 1.059463094359293.

Staying with Western music, where we define the A above middle C to have the frequency of 440Hz, the scale from A220 to A440 is as follows:

Note Frequency ======= ========= A220 220.0000 A-sharp 233.0819 B 246.9417 C 261.6256 C-sharp 277.1826 D 293.6648 D-sharp 311.1270 E 329.6276 F 349.2282 F-sharp 369.9944 G 391.9954 G-sharp 415.3047 A440 440.0000

We’ll refer back to this later.

The Math Of Intervals

If you’ve ever sung in harmony or played a chord, you’ve used intervals. Intervals are named for the relative distance between two notes: a minor third, a fifth, and so on. These are a little confusing, because they sound like fractions. They’re not. A fifth doesn’t mean that one note is five times the frequency of another. A fifth means that if you start on the first note and count upwards five notes on a major scale, you’ll reach the second note in the interval. Here’s the C scale, with the intervals between the lowest C and the given note listed at the right:

Note Name Interval from C ==== ==== =============== C Do Unison D Re Major 2nd E Mi Major 3rd F Fa 4th (sometimes called Perfect 4th) G So 5th (a.k.a. Perfect 5th) A La Major 6th B Ti Major 7th C Do Octave (8th)

On the guitar, adjacent strings form intervals of fourths, except for the interval between the G and B strings, which is a major third.

Some intervals sound “good,” “pure,” or “harmonious.” A major chord, for example, is composed of the root (first note), major third, fifth, and octave. The chord sounds good because the intervals between the notes sound good. There’s a variety of intervals at play: between the third and fifth is a minor third, between the fifth and octave is a fourth, and so on.

It turns out that the intervals that sound the most pure and harmonious are the ones whose frequencies have the simplest relationships. In order of increasing complexity, we have:

  • Unison: two notes of the same frequency.
  • Octave: the higher note is double the frequency.
  • Fifth: the higher note is 3/2s the frequency.
  • Fourth: the higher note is 4/3rds the frequency.
  • Third: the higher note is 5/4ths the frequency.
  • Further intervals (minor thirds, sixths, etc) have various relationships, but the pattern of N/(N-1) doesn’t hold beyond the third.

These relationships are important for tuning, but beyond here it gets significantly more complex. This is where things are most interesting!

Overtones and Intervals

As a guitar player, you no doubt know about “harmonics,” also called overtones. You produce a harmonic by touching a string gently at a specific place (above the 5th, 7th, or 12th fret, for example) and plucking the string. The note that results sounds pure, and is higher pitched than the open string.

Strings vibrate at a base frequency, but these harmonics (they’re actually partials, but I’ll cover that later) are always present. In fact, much of the sound energy of a stringed instrument is in overtones, not in the fundamental frequency. When you “play a harmonic” you’re really just damping out most of the frequencies and putting more energy into simpler multiples of the fundamental frequency.

Overtones are basically multiples of the fundamental frequency. The octave, for example, is twice the frequency of the open string. Touching the string at the 12th fret is touching it at its halfway point. This essentially divides the string into two strings of half the length. The frequency of the note is inversely dependent on the string’s length, so half the length makes a note that’s twice the frequency. The seventh fret is at 1/3rd the length of the string, so the note is three times the frequency; the 5th fret is ¼th the length, so you hear a note two octaves higher, and so on.

The overtones give the instrument its characteristic sound. How many of them there are, their frequencies, their volumes, and their attack and decay determines how the instrument sounds. There are usually many overtones, all mixing together into what you usually think of as a single note.

Tuning depends on overtones, because you can tune an interval by listening to the beats in its overtones.

Take a fifth, for example. Recall from before that the second note in the fifth is 3/2 the frequency of the first. Let’s use A220 as an example; a fifth up from A220 is E330. E330 times two is E660, and A220 times three is E660 also. So by listening to the first overtone of the E, and the second overtone of the A, you can “hear a fifth.”

You’re not really hearing the fifth, of course; you’re really hearing the beats in the overtones of the two notes.

Practice Hearing Intervals

Practice hearing the overtones in intervals. Pick up your guitar and de-tune the lowest E string down to a D. Practice hearing its overtones. Pluck a harmonic at the 12th string and strike your open D string; listen to the beats between the notes. Now play both strings open, with no harmonics, at the same time. Listen again to the overtones, and practice hearing the beats between them. De-tune slightly if you need to, to make the “wow, wow, wow, wow” effect easier to notice.

Take a break; don’t overdo it. Your ear will probably fatigue quickly and you’ll be unable to hear the overtones, especially as you experiment more with complex intervals. In the beginning, you should not be surprised if you can focus on these overtones for only a few minutes before it gets hard to pick them out and things sound jumbled together. Rest for a few hours. I would not suggest doing this more than a couple of times a day initially.

The fatigue is real, by the way. As I mentioned previously, being able to hear beats and ignore the richness of the sound to pick out weak overtones is a complex physical, mental, and neurological skill — and there are probably other factors too. I’d be interested in seeing brain scans of an accomplished tuner at work. Lee Flory was not young, and he told me that his audiologist said his hearing had not decayed with age. This surprised the doctor, because he spent his life listening to loud sounds. Lee attributed this to daily training of his hearing, and told me that the ear is like any other part of the body: it can be exercised. According to Lee, if he took even a single day’s break from tuning, his ear lost some of its acuity.

Back to the topic: When you’re ready, pluck a harmonic on the lowest D string (formerly the E string) at the 7th fret, and the A string at the 12th fret, and listen to the beats between them. Again, practice hearing the same overtones (ignoring the base notes) when you strike both open strings at the same time.

When you’ve heard this, you can move on to a 4th. You can strike the harmonic at the 5th fret of the A string and th 7th fret of the D string, for example, and listen to the beats; then practice hearing the same frequencies by just strumming those two open strings together.

As you do all of these exercises, try your best to ignore pitch (highness or lowness) of the notes, and listen only to the fluctuations in volume. In reality you’ll be conscious of both pitch and beats, but this practice will help develop your tuning ear.

Imperfect Intervals and Counting Beats

You may have noticed that intervals in the equal-tempered 12-tone tuning don’t have exactly the simple relationships I listed before. If you look at the table of frequencies above, for example, you’ll see that in steps of the 12th root of 2, E has a frequency of 329.6276Hz, not 330Hz.

Oh no! Was it all a lie? Without these relationships, does tuning fall apart?

Not really. In the equal-tempered tuning, in fact, there is only one perfect interval: the octave. All other intervals are imperfect, or “tempered.”

  • The 5th is a little “narrow” – the higher note in the interval is slightly flat
  • The 4th is a little “wide” – the higher note is sharp
  • The major 3rd is even wider than the 4th

Other intervals are wide or narrow, just depending on where their frequencies fall on the equal-tempered tuning. (In practice, you will rarely or never tune intervals other than octaves, 5ths, 4ths, and 3rds.)

As the pitch of the interval rises, so does the frequency of the beats. The 4th between A110 and the D above it will beat half as fast as the 4th an octave higher.

What this means is that not only do you need to hear beats, but you need to count them. Counting is done in beats per second. It sounds insanely hard at first (how the heck can you count 7.75 beats a second!?) but it will come with practice.

You will need to know how many beats wide or narrow a given interval will be. You can calculate it easily enough, and I’ll show examples later.

After a while of tuning a given instrument, you’ll just memorize how many beats to count for specific intervals, because as you’ll see, there’s a system for tuning any instrument. You generally don’t need to have every arbitrary interval memorized. You will use only a handful of intervals and you’ll learn their beats.

Tuning The Guitar

With all that theory behind us, we can move on to a tuning system for the guitar.

Let’s list the strings, their frequencies, and some of their overtones.

String Freq Overtone_2 Overtone_3 Overtone_4 Overtone_5 ====== ====== ====== ====== ======= ======= E 82.41 164.81 247.22 329.63 412.03 A 110.00 220.00 330.00 440.00 550.00 D 146.83 293.66 440.50 587.33 734.16 G 196.00 392.00 587.99 783.99 979.99 B 246.94 493.88 740.82 987.77 1234.71 E 329.63 659.26 988.88 1318.51 1648.14

Because the open strings of the guitar form 4ths and one 3rd, you can tune the guitar’s strings open, without any frets, using just those intervals. There’s also a double octave from the lowest E to the highest E, but you don’t strictly need to use that except as a check after you’re done.

For convenience, here’s the same table with only the overtones we’ll use.

String Freq Overtone_2 Overtone_3 Overtone_4 Overtone_5 ====== ====== ========== ========== ========== ========== E 82.41 247.22 329.63 A 110.00 330.00 440.00 D 146.83 440.50 587.33 734.16 G 196.00 587.99 979.99 B 246.94 740.82 987.77 E 329.63 988.88 Tuning the A String

The first thing to do is tune one of the strings to a reference pitch. After that, you’ll tune all of the other strings relative to this first one. On the guitar, the most convenient reference pitch is A440, because the open A string is two octaves below at 110Hz.

You’ll need a good-quality A440 tuning fork. I prefer a Wittner for guitar tuning; it’s a good-quality German brand that is compact, so it fits in your guitar case’s pocket, and has a small notch behind the ball at the end of the stem, so it’s easy to hold in your teeth if you prefer that.

Strike the tuning fork lightly with your fingernail, or tap it gently against your knee. Don’t bang it against anything hard or squeeze the tines, or you might damage it and change its pitch. You can hold the tuning fork against the guitar’s soundboard, or let it rest lightly between your teeth so the sound travels through your skull to your ears, and strike the open A string. Tune the A string until the beats disappear completely. Now put away the tuning fork and continue. You won’t adjust the A string after this.

If you don’t have a tuning fork, you can use any other reference pitch, such as the A on a piano, or a digitally produced A440.

Tuning the Low E String

Strike the open low E and A strings together, and tune the E string. Listen to the beating of the overtones at the frequency of the E two octaves higher. If you have trouble hearing it, silence all the strings, then pluck a harmonic on the E string at the 5th fret. Keep that tone in your memory and then sound the two strings together. It’s important to play the notes together, open, simultaneously so that you don’t get confused by pitches. Remember, you’re trying to ignore pitch completely, and get your ear to isolate the sound of the overtone, ignoring everything but its beating.

When correctly tuned, the A string’s overtone will be at 330Hz and the E string’s will be at 329.63Hz, so the interval is 1/3rd of a beat per second wide. That is, you can tune the E string until the beats disappear, and then flatten the low E string very slightly until you hear one beat every three seconds. The result will be a very slow “wwwoooooowww, wwwwoooooowww” beating.

Tuning the D String

Now that the low E and A strings are tuned, strike the open A and D strings together. You’re listening for beats in the high A440 overtone. The A string’s overtone will be at 440Hz, and the D string’s will be at 440.50Hz, so the interval should be ½ beat wide. Tune the D string until the beats disappear, then sharpen the D string slightly until you hear one beat every 2 seconds.

Tuning the G String

Continue by striking the open D and G strings, and listen for the high D overtone’s beating. Again, if you have trouble “finding the note” with your ear, silence everything and strike the D string’s harmonic at the 5th fret. You’re listening for a high D overtone, two octaves higher than the open D string. The overtones will be at 587.33Hz and 587.99Hz, so the interval needs to be 2/3rds of a beat wide. Counting two beats every three seconds is a little harder than the other intervals we’ve used thus far, but it will come with practice. In the beginning, feel free to just give it your best wild guess. As we’ll discuss a little later, striving for perfection is futile anyway.

Tuning the B String

Strike the open G and B strings. The interval between them is a major 3rd, so this one is trickier to hear. A major 3rd’s frequency ratio is approximately 5/4ths, so you’re listening for the 5th overtone of the G string and the 4th overtone of the B string. Because these are higher overtones, they’re not as loud as the ones you’ve been using thus far, and it’s harder to hear.

To isolate the note you need to hear, mute all the strings and then pluck a harmonic on the B string at the 5th fret. The overtone is a B two octaves higher. Search around on the G string near the 4th fret and you’ll find the same note.

The overtones are 979.99Hz and 987.77Hz, so the interval is seven and three-quarters beats wide. This will be tough to count at first, so just aim for something about 8 beats and call it good enough. With time you’ll be able to actually count this, but it will be very helpful at first to use some rules of thumb. For example, you can compare the rhythm of the beating to the syllables in the word “mississippi” spoken twice per second, which is probably about as fast as you can say it back-to-back without pause.

Tune the B string until the beats disappear, then sharpen it 8 beats, more or less.

Tuning the High E String

You’re almost done! Strike the open B and E strings, and listen for the same overtone you just used to tune the G and B strings: a high B. The frequencies are 987.77Hz and 988.88Hz, so the interval is 1.1 beats wide. Sharpen the E string until the high B note beats a little more than once a second.

Testing The Results

Run a couple of quick checks to see whether you got things right. First, check your high E against your low E. They are two octaves apart, so listen to the beating of the high E string. It should be very slow or nonexistent. If there’s a little beating, don’t worry about it. You’ll get better with time, and it’ll never be perfect anyway, for reasons we’ll discuss later.

You can also check the low E against the open B string, and listen for beating at the B note, which is the 3rd overtone of the E string. The B should be very slightly narrow (flat) — theoretically, you should hear about ¼th of a beat.

Also theoretically, you could tune the high B and E strings against the low open E using the same overtones. However, due to imperfections in strings and the slowness of the beating, this is usually much harder to do. As a result, you’ll end up with high strings that don’t sound good together. A general rule of thumb is that it’s easier to hear out-of-tune-ness in notes that are a) closer in pitch and b) higher pitched, so you should generally “tune locally” rather than “tuning at a distance.” If you don’t get the high strings tuned well together, you’ll get really ugly-sounding intervals such as the following:

  • the 5th between your open G string and the D on the 3rd fret of the B string
  • the 5th between the A on the second fret of the G string and the open high E string
  • the octave between your open G string and the G on the 3rd fret of the high E string
  • the octave between your open D string and the D on the 3rd fret of the B string
  • the 5th between the E on the second fret of the D string and the open B string

If those intervals are messed up, things will sound badly discordant. Remember that the 5ths should be slightly narrow, not perfect. But the octaves should be perfect, or very nearly so.

Play a few quick chords to test the results, too. An E Major, G major, and B minor are favorites of mine. They have combinations of open and fretted notes that helps make it obvious if anything’s a little skewed.

You’re Done!

With time, you’ll be able to run through this tuning system very quickly, and you’ll end up with a guitar that sounds joyously alive in all keys, no matter what chord you play. No more fussing with “this chord sounds good, but that one is awful!” No more trial and error. No more guessing which string is out of tune when something sounds bad. No more game of “tuning whack-a-mole.”

To summarize:

  • Tune the A string with a tuning fork.
  • Tune the low E string 1/3 of a beat wide relative to the A.
  • Tune the D string ½ of a beat wide relative to the A.
  • Tune the G string 2/3 of a beat wide relative to the D.
  • Tune the B string 7 ¾ beats wide relative to the G.
  • Tune the high E string just over 1 beat wide relative to the B.
  • Cross-check the low and high E strings, and play a few chords.

This can be done in a few seconds per string.

If you compare your results to what you’ll get from a digital tuner, you’ll find that with practice, your ear is much better. It’s very hard to tune within a Hz or so with a digital tuner, in part because the indicators are hard to read. What you’ll get with a digital tuner is most strings are pretty close to their correct frequency. This is a lot better than the ad-hoc tuning by trial-and-error you might have been accustomed to doing, because that method results in some intervals being tuned to sound good but others badly discordant. The usual scenario I see is someone’s B string is in good shape, but the G and the E are out of tune. The guitar player then tunes the B string relative to the out-of-tune E and G, and then everything sounds awful. This is because the guitarist had no frame of reference for understanding which strings were out of tune in which directions.

But when you tune by listening to beats, and get good at it, you’ll be able to tune strings to within a fraction of a cycle per second of what they should be. Your results will absolutely be better than a digital tuner.

I don’t mean to dismiss digital tuners. They’re very useful when you’re in a noisy place, or when you’re tuning things like electric guitars, which have distortion that buries overtones in noise. But if you learn to tune by hearing beats, you’ll be the better for it, and you’ll never regret it, I promise. By the way, if you have an Android smartphone, I’ve had pretty good results with the gStrings app.

Advanced Magic

If you do the math on higher overtones, you’ll notice a few other interesting intervals between open strings. As your ear sharpens, you’ll be able to hear these, and use them to cross-check various combinations of strings. This can be useful because as you get better at hearing overtones and beats, you’ll probably start to become a bit of a perfectionist, and you won’t be happy unless particular intervals (such as the 5ths and octaves mentioned just above) sound good. Here they are:

  • Open A String to Open B String. The 9th overtone of the open A string is a high B note at 990Hz, and the 4th overtone of the open B is a high B at 987.77HZ. If you can hear this high note, you should hear it beating just over twice per second. The interval between the A and B strings is a minor 7th, which should be slightly narrow. Thus, if you tune the B until the beating disappears, you should then flatten it two beats.
  • Open D String to Open E String. This is also a minor 7th interval. You’re listening for a very high E note, at 1321.5Hz on the D string, and 1318.5 on the E string, which is 3 beats narrow.
  • Open D String to Open B String. The 5th overtone of the D string is similar to the 3rd overtone of the B string. This interval is about 6 and 2/3 beats wide. This is a bit hard to hear at first, but you’re listening for a high F-sharp.
Systems for Tuning Arbitrary Instruments

The guitar is a fairly simple instrument to tune, because it has only 6 strings, and 4ths are an easy interval to tune. The inclusion of a major 3rd makes it a little harder, but not much.

It is more complicated, and requires more practice, to tune instruments with more strings. The most general approach is to choose an octave, and to tune all the notes within it. Then you extend the tuning up and down the range as needed. For example, to tune the piano you first tune all the notes within a C-to-C octave (piano tuners typically use a large middle-C tuning fork).

Once you have your first octave tuned, the rest is simple. Each note is tuned to the octave below it or above it. But getting that first octave is a bit tricky.

There are two very common systems of tuning: fourths and fifths, and thirds and fifths. As you may know, the cycle of fifths will cycle you through every note in the 12-note scale. You can cycle through the notes in various ways, however.

The system of thirds and fifths proceeds from middle C up a fifth to G, down a third to E-flat, up a fifth to B-flat, and so on. The system of fourths and fifths goes from C up a fifth to G, down a fourth to D, and so on.

All you need to do is calculate the beats in the various intervals and be able to count them. The piano tuners I’ve known prefer thirds and fifths because if there are imperfections in the thirds, especially if they’re not as wide as they should be, it sounds truly awful. Lively-sounding thirds are important; fourths and fifths are nearly perfect, and should sound quite pure, but a third is a complex interval with a lot of different things going on. Fourths and fifths also beat slowly enough that it’s easy to misjudge and get an error that accumulates as you go through the 12 notes. Checking the tuning with thirds helps avoid this.

Tuning a Hammered Dulcimer

I’ve built several many-stringed instruments, including a couple of hammered dulcimers. My first was a home woodworking project with some two-by-four lumber, based on plans from a book by Phillip Mason I found at the library and decided to pick up on a whim. For a homebuilt instrument, it sounded great, and building an instrument like this is something I highly recommend.

Later I designed and built a second one, pictured below. Pardon the dust!

Tuning this dulcimer takes a while. I start with an octave on the bass course. Dulcimers can have many different tunings; this one follows the typical tuning of traditional dulcimers, which is essentially a set of changing keys that cycle backwards by fifths as you climb the scale. Starting at G, for example, you have a C major scale up to the next G, centered around middle C. But the next B is B-flat instead of B-natural, so there’s an F major scale overlapping with the top of the C major, and so on:

G A B C D E F G A B-flat C D...

It’s easy to tune this instrument in fourths and fifths because of the way its scales are laid out. If I do that, however, I find that I have ugly-sounding thirds more often than not. So I’ll tune by combinations of fifths, fourths, and thirds:

G A B C D E F G A B-flat C D... ^-------------^ (up an octave) ^-------^ (down a fifth) ^---^ (up a third) ^-------^ (down a fifth)

And so on. In addition to using thirds where I can (G-B, C-E), I’ll check my fifths and fourths against each other. If you do the math, you’ll notice that the fourth from G to C is exactly as wide as the fifth from C to G again is narrow. (This is a general rule of fourths and fifths. Another rule is that the fourth at the top of the octave beats twice as fast as the fifth at the bottom; so G-D beats half as fast as D-G.)

When I’m done with this reference octave, I’ll extend it up the entire bass course, adjusting for B-flat by tuning it relative to F, and checking any new thirds that I encounter as I climb the scale. And then I’ll extend that over to the right-hand side of the treble course. I do not use the left-hand (high) side of the treble course to tune, because its notes are inaccurate depending on the placement of the bridge.

With a little math (spreadsheets are nice), and some practice, you can find a quick way to tune almost any instrument, along with cross-checks to help prevent skew as you go.

Tuning a Harp

Another instrument I built (this time with my grandfather) is a simplified replica of the Scottish wire-strung Queen Mary harp. This historical instrument might have been designed for some golden and silver strings, according to Ann Heyman’s research. In any case, it is quite difficult to tune with bronze or brass strings. It is “low-headed” and would need a much higher head to work well with bronze or brass.

Tuning this harp is quite similar to the hammered dulcimer, although it is in a single key, so there’s no need to adjust to key changes as you climb the scale. A simple reference octave is all you need, and then it’s just a matter of extending it. I have never tuned a concert harp, but I imagine it’s more involved.

Tangent: I first discovered the wire-strung harp in 1988, when I heard Patrick Ball’s first volume of Turlough O’Carolan’s music. If you have not listened to these recordings, do yourself a favor and at least preview them on Amazon. All these years later, I still listen to Patrick Ball’s music often. His newest recording, The Wood of Morois, is just stunning. I corresponded with Patrick while planning to build my harp, and he put me in touch with master harpmaker Jay Witcher, and his own role model, Ann Heymann, who was responsible for reinventing the lost techniques of playing wire-strung harps. Her recordings are a little hard to find in music stores, but are worth it. You can buy them from her websites http://www.clairseach.com/, http://www.annheymann.com/, and http://www.harpofgold.net/. If you’re interested in learning to play wire-strung harp, her book is one of the main written sources. There are a variety of magazines covering the harp renaissance in the latter part of the 20th century, and they contain much valuable additional material.

Beyond Tuning Theory: The Real World

Although simple math can compute the theoretically correct frequencies of notes and their overtones, and thus the beats of various intervals, in practice a number of factors make things more complicated and interesting. In fact, the math up until now has been of the “frictionless plane” variety. For those who are interested, I’ll dig deeper into these nuances.

The nuances and deviations from perfect theory are the main reasons why a) it’s impossible to tune anything perfectly and b) an instrument that’s tuned skillfully by ear sounds glorious, whereas an instrument tuned digitally can sound lifeless.

Harmonics, Overtones, and Partials

I was careful to use the term “overtone” most of the time previously. In theory, a string vibrates at its fundamental frequency, and then it has harmonic overtones at twice that frequency, three times, and so on.

However, that’s not what happens in practice, because theory only applies to strings that have no stiffness. The stiffness of the string causes its overtones to vibrate at slightly higher frequencies than you’d expect. For this reason, these overtones aren’t true harmonics. This is called inharmonicity, and inharmonic overtones are called partials to distinguish them from the purely harmonic overtones of an instrument like a flute, which doesn’t exhibit the same effect.

You might think that this inharmonicity is a bad thing, but it’s not. Common tones with a great deal of inharmonicity are bells (which often have so much inharmonicity that you can hear the pitches of their partials are too high) and various types of chimes. I keep a little “zenergy” chime near my morning meditation table because its bright tones focus my attention. I haven’t analyzed its spectrum, but because it is made with thick bars of aluminum, I’m willing to bet that it has partials that are wildly inharmonic. Yet it sounds pure and clear.

Much of the richness and liveliness of a string’s sound is precisely because of the “stretched” overtones. Many people compare Patrick Ball’s brass-strung wire harp to the sound of bells, and say it’s “pure.” It may sound pure, but pure-sounding is not simple-sounding. Its tones are complex and highly inharmonic, which is why it sounds like a bell.

In fact, if you digitally alter a piano’s overtones to correct the stretching, you get something that sounds like an organ, not a piano. This is one of the reasons that pianos tuned with digital tuners often sound like something scraped from the bottom of a pond.

Some digital tuners claim to compensate for inharmonicity, but in reality each instrument and its strings are unique and will be inharmonic in different ways.

Some practical consequences when tuning by listening to beats:

  • Don’t listen to higher partials while tuning. When tuning an octave, for example, you should ignore the beating of partials 2 octaves up. This is actually quite difficult to do and requires a well-developed ear. The reason is that higher partials will beat even when the octave is perfect, and they beat more rapidly and more obviously than the octave. Tuning a perfect octave requires the ability to hear very subtle, very gradual beats while blocking out distractions. This is also why I said not to worry if your low E string and high E string beat slightly. When tuned as well as possible, there will probably be a little bit of beating.
  • You might need to ignore higher partials in other intervals as well.
  • You might need to adjust your tuning for stretching caused by inharmonicity. In practice, for example, most guitars need to be tuned to slightly faster beats than you’d expect from pure theory.
  • Cross-checking your results with more complex intervals (especially thirds) can help balance the stretching better, and make a more pleasing-sounding tuning.
  • You might find that when using the “advanced tricks” I mentioned for the guitar, the open intervals such as minor 7ths will beat at different rates than you’d predict mathematically. However, once you are comfortable tuning your guitar so it sounds good, you’ll learn how fast those intervals should beat and it’ll be a great cross-reference for you.
Sympathetic and False Beats

It’s often very helpful to mute strings while you’re tuning other strings. The reason is that the strings you’re tuning will set up sympathetic vibrations in other strings that have similar overtones, and this can distract you.

When tuning the guitar, this generally isn’t much of a problem. However, be careful that when you tune the low E and A strings you don’t get distracted by vibrations from the high E string.

When tuning other instruments such as a hammered dulcimer or harp, small felt or rubber wedges (with wire handles if possible) are invaluable. If you don’t have these, you can use small loops of cloth.

In addition to distraction from sympathetic vibrations, strings can beat alone, when no other note is sounding. This is called a false beat. It’s usually caused by a flaw in the string itself, such as an imperfection in the wire or a spot of rust. This is a more difficult problem, because you can’t just make it go away. Instead, you will often have to nudge the tuning around a little here, a little there, to make it sound the best you can overall, given that there will be spurious beats no matter what. False beats will challenge your ear greatly, too.

In a guitar, false beats might signal that it’s time for a new set of strings. In a piano or other instrument, strings can be expensive to replace, and new strings take a while to settle in, so it’s often better to just leave it alone.

Imperfect Frets, Strings, Bridges and Nuts

I’ve never played a guitar with perfect frets. The reality is that every note you fret will be slightly out of tune, and one goal of tuning is to avoid any particular combination of bad intervals that sounds particularly horrible.

This is why it’s helpful to play at least a few chords after tuning. If you tune a particular instrument often you’ll learn the slight adjustments needed to make things sound as good as possible. On my main guitar, for example, the B string needs to be slightly sharp so that a D sounds better.

It’s not only the frets, but the nut (the zeroth fret) and the bridge (under the right hand) that matter. Sometimes the neck needs to be adjusted as well. A competent guitar repairman should be able to adjust the action if needed.

Finally, the weight and manufacture of the strings makes a difference. My main guitar and its frets and bridge sound better and more accurate with medium-weight Martin bronze-wound strings than other strings I’ve tried. As your ear improves, you’ll notice subtleties like this.

New Strings

New strings (or wires) will take some time to stretch and settle in so they stay in tune. You can shorten this time by playing vigorously and stretching the strings, bending them gently. Be careful, however, not to be rough with the strings. If you kink them or strain them past their elastic point, you’ll end up with strings that have false beats, exaggerated inharmonicity, or different densities along some lengths of the string, which will make it seem like your frets are wrong in strange ways.

The Instrument Flexes and Changes

If an instrument is especially out of tune, the first strings you tune will become slightly out of tune as you change the tension on the rest of the instrument. The best remedy I can offer for this is to do a quick approximate tuning without caring much about accuracy. Follow this up with a second, more careful tuning.

This was especially a problem with my first hammered dulcimer, and is very noticeable with my harp, which flexes and changes a lot as it is tuned. My second hammered dulcimer has a ¾ inch birch plywood back and internal reinforcements, so it’s very stable. On the downside, it’s heavy!

Temperature and humidity play a large role, too. All of the materials in an instrument respond in different ways to changes in temperature and humidity. If you have a piano, you’re well advised to keep it in a climate-controlled room. If you’re a serious pianist you already know much more than I do about this topic.

Friction and Torque in Tuning Pin and Bridges

For guitarists, it’s important to make sure that your nut (the zeroth fret) doesn’t pinch the string and cause it to move in jerks and starts, or to have extra tension built up between the nut and the tuning peg itself. If this happens, you can rub a pencil in the groove where the string rides. The graphite in the pencil is a natural lubricant that can help avoid this problem.

Of course, you should also make sure that your tuning pegs and their machinery are smooth and well lubricated. If there’s excessive slop due to wear-and-tear or cheap machinery, that will be an endless source of frustration for you.

On instruments such as pianos, hammered dulcimers, and harps, it’s important to know how to “set” the tuning pin. While tuning the string upwards, you’ll create torque on the pin, twisting it in the hole. The wood fibers holding it in place will also be braced in a position that can “flip” downwards. If you just leave the pin like this, it will soon wiggle itself back to its normal state, and even beyond that due to the tension the wire places on the pin. As a result, you need to practice tuning the note slightly higher than needed, and then de-tuning it, knocking it down to the desired pitch with a light jerk and leaving it in a state of equilibrium.

This technique is also useful in guitars and other stringed instruments, but each type of tuning machine has its own particularities. The main point to remember is that if you don’t leave things in a state of equilibrium and stability, they’ll find one soon enough, de-tuning the instrument in the process.

References and Further Reading

I tried to find the book from which I studied tuning as a child, but I can’t anymore. I thought it was an old Dover edition. The Dover book on tuning that I can find is not the one I remember.

You can find a little bit of information at various places online. One site with interesting information is Historical Tuning of Keyboard Instruments by Robert Chuckrow. I looked around on Wikipedia but didn’t find much of use. Please suggest further resources in the comments.

In this post I discussed the equally-tempered tuning, but there are many others. The study of them and their math, and the cultures and musical histories related to them, is fascinating. Next time you hear bagpipes, or a non-Western instrument, pay attention to the tuning. Is it tempered? Are there perfect intervals other than the octave? Which ones?

Listening to windchimes is another interesting exercise. Are the chimes harmonic or do they have inharmonicity? What scales and tunings do they use? What are the effects? Woodstock chimes use many unique scales and tunings. Many of their chimes combine notes in complex ways that result in no beating between some or all of the tones. Music of the Spheres also makes stunning chimes in a variety of scales and tunings.

As I mentioned, spreadsheets can be very helpful in computing the relationships between various notes and their overtones. I’ve made a small online spreadsheet that contains some of the computations I used to produce this blog post.

Let me know if you suggest any other references or related books, music, or links.

Enjoy your beautifully tuned guitar or other instrument, and most of all, enjoy the process of learning to tune and listen! I hope it enriches your appreciation and pleasure in listening to music.

Suggested links from various sources:

Picture Credits
Categories: MySQL

A review of Bose, Sony, and Sennheiser noise-cancelling headphones

Xaprb, home of innotop - Thu, 2014-01-16 00:00

I’ve used active noise-cancelling headphones for over ten years now, and have owned several pairs of Bose, one of Sony, and most recently a pair of Sennheiser headphones. The Sennheisers are my favorites. I thought I’d write down why I’ve gone through so many sets of cans and what I like and dislike about them.

Bose QuietComfort 15 Acoustic Noise Cancelling Headphones

I’m sure you’re familiar with Bose QuietComfort headphones. They’re the iconic “best-in-class” noise-cancelling headphones, the ones you see everywhere. Yet, after owning several pairs (beginning with Quiet Comfort II in 2003), I decided I’m not happy with them and won’t buy them anymore. Why not?

  • They’re not very good quality. I’ve worn out two pairs and opted to sell the third pair that Bose sent me as a replacement. Various problems occurred, including torn speakers that buzzed and grated. I just got tired of sending them back to Bose for servicing.
  • They’re more expensive than I think they’re worth, especially given the cheap components used.
  • They don’t sound bad – but to my ears they still have the classic Bose fairy-dust processing, which sounds rich and pleasant at first but then fatigues me.
  • They produce a sensation of suction on the eardrums that becomes uncomfortable over long periods of time.
  • They can’t be used in non-cancelling mode. In other words, if the battery is dead, they’re unusable.
  • On a purely personal note, I think Bose crosses the line into greed and jealousy. I know this in part because I used to work at Crutchfield, and saw quite a bit of interactions with Bose. As an individual – well, try selling a pair of these on eBay, and you’ll see what I mean. I had to jump through all kinds of hoops after my first listing was cancelled for using a stock photo that eBay themselves suggested and provided in the listing wizard. Here is the information the take-down notice directed me to.

On the plus side, the fit is very comfortable physically, they cancel noise very well, and they’re smaller than some other noise-cancelling headphones. Also on the plus side, every time I’ve sent a pair in for servicing, Bose has just charged me $100 and sent me a new pair.

Sony MDR-NC200D

When I sent my last pair of Bose in for servicing, they replaced them with a factory-sealed pair of new ones in the box, and I decided to sell them on eBay and buy a set of Sony MDR-NC200D headphones, which were about $100 less money than new Bose headphones at the time. I read online reviews and thought it was worth a try.

First, the good points. The Sonys are more compact even than the Bose, although as I recall they’re a little heavier. And the noise cancellation works quite well. The passive noise blocking (muffling) is in itself quite good. You can just put them on without even turning on the switch, and block a lot of ambient noise. The sound quality is also quite good, although there is a slight hiss when noise cancellation is enabled. Active cancellation is good, but not as good as the Bose.

However, it wasn’t long before I realized I couldn’t keep them. The Sonys sit on the ear, and don’t enclose the ear and sit against the skull as the Bose do. They’re on-the-ear, not over-the-ear. Although this doesn’t feel bad at first, in about 20 minutes it starts to hurt. After half an hour it’s genuinely painful. This may not be your experience, but my ears just start to hurt after being pressed against my head for a little while.

I had to sell the Sonys on eBay too. My last stop was the Sennheisers.

Sennheiser PXC 450 NoiseGard Active Noise-Canceling Headphones

The Sennheiser PXC 450 headphones are midway in price between the Bose and the Sony: a little less expensive than the Bose. I’ve had them a week or so and I’m very happy with them so far.

This is not the first pair of Sennheisers I’ve owned. I’ve had a pair of open-air higher-end Sennheisers for over a decade. I absolutely love them, so you can consider me a Sennheiser snob to some extent.

I’m pleased to report that the PXC 450s are Sennheisers through and through. They have amazing sound, and the big cups fit comfortably around my ears. They are a little heavier than my other Sennheisers, but still a pleasure to wear.

The nice thing is that not only does noise cancellation work very well (on par with Bose’s, I’d say), but there is no sensation of being underwater with pressure or suction on the eardrums. Turn on the noise cancellation switch and the noise just vanishes, but there’s no strange feeling as a result. Also, these headphones can work in passive mode, with noise cancellation off, and don’t need a battery to work.

On the downside, if you want to travel with them, they’re a little bigger than the Bose. However I’ve travelled with the Bose headphones several times and honestly I find even them too large to be convenient. I don’t use noise-cancelling headphones for travel, as a result.

Another slight downside is that the earcups aren’t completely “empty” inside. There are some caged-over protrusions with the machinery inside. Depending on the shape of your ears, these might brush your ears if you move your head. I find that if I don’t place the headphones in the right spot on my head, they do touch my ears every now and then.

Summary

After owning several pairs of top-rated noise-cancelling headphones, I think the Sennheisers are the clear winners in price, quality, comfort, and sound. Your mileage may vary.

Categories: MySQL

Xaprb now uses Hugo

Xaprb, home of innotop - Wed, 2014-01-15 00:00

I’ve switched this blog from Wordpress to Hugo. If you see any broken links or other problems, let me know. I’ll re-enable comments and other features in the coming days.

Why not Wordpress? I’ve used Wordpress since very early days, but I’ve had my fill of security problems, the need to worry about whether a database is up and available, backups, plugin compatibility problems, upgrades, and performance issues. In fact, while converting the content from Wordpress to Markdown, I found a half-dozen pages that had been hacked by some link-farm since around 2007. This wasn’t the first such problem I’d had; it was merely the only one I hadn’t detected and fixed. And I’ve been really diligent with Wordpress security; I have done things like changing my admin username and customizing my .htaccess file to block common attack vectors, in addition to the usual “lockdown” measures that one takes with Wordpress.

In contrast to Wordpress or other CMSes that use a database, static content is secure, fast, and worry-free. I’m particularly happy that my content is all in Markdown format now. Even if I make another change in the future, the content is now mostly well-structured and easy to transform as desired. (There are some pages and articles that didn’t convert so well, but I will clean them up later.)

Why Hugo? There are lots of static site generators. Good ones include Octopress and Jekyll, and I’ve used those. However, they come with some of their own annoyances: dependencies, the need to install Ruby and so on, and particularly bothersome for this blog, performance issues. Octopress ran my CPU fan at top speed for about 8 minutes to render this blog.

Hugo is written in Go, so it has zero dependencies (a single binary) and is fast. It renders this blog in a couple of seconds. That’s fast enough to run it in server mode, hugo server -w, and I can just alt-tab back and forth between my writing and my browser to preview my changes. By the time I’ve tabbed over, the changes are ready to view.

Hugo isn’t perfect. For example, it lacks a couple of features that are present in Octopress or Jekyll. But it’s more than good enough for my needs, and I intend to contribute some improvements to it if I get time. I believe it has the potential to be a leading static site/blog generator going forward. It’s already close to a complete replacement for something like Jekyll.

Categories: MySQL

Immutability, MVCC, and garbage collection

Xaprb, home of innotop - Sat, 2013-12-28 20:33

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut. Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly.

Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability.

In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version. Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened? Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality. The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database). ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases — no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely. Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Categories: MySQL

Immutability, MVCC, and garbage collection

Xaprb, home of innotop - Sat, 2013-12-28 00:00

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut.

Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly. Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability. In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version.
Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened?

Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality.

The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database).

ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases – no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely.

Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Categories: MySQL

Early-access books: a double-edged sword

Xaprb, home of innotop - Thu, 2013-12-26 21:46

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too.

To some extent, editions create a similar problem[1]. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

[1] Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator — cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Categories: MySQL

Early-access books: a double-edged sword

Xaprb, home of innotop - Thu, 2013-12-26 00:00

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too. To some extent, editions create a similar problem1. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

1 Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator – cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Categories: MySQL

Napkin math: How much waste does Celestial Seasonings save?

Xaprb, home of innotop - Sun, 2013-12-22 19:32

I was idly reading the Celestial Seasonings box today while I made tea. Here’s the end flap:

It seemed hard to believe that they really save 3.5 million pounds of waste just by not including that extra packaging, so I decided to do some back-of-the-napkin math.

How much paper is in each package of non-Celestial-Seasonings tea? The little bag is about 2 inches by 2 inches, it’s two-sided, and there’s a tag, staple, and string. Call it 10 square inches.

How heavy is the paper? It feels about the same weight as normal copy paper. Amazon.com lists a box of 5000 sheets of standard letter-sized paper at a shipping weight of 50 pounds (including the cardboard box, but we’ll ignore that). Pretend that each sheet (8.5 * 11 inches = 93.5 square inches) is about 100 square inches. That’s .0001 pounds per square inch.

How much tea does Celestial Seasonings sell every year? Wikipedia says their sales in the US are over $100M, and they are a subsidiary of Hain Celestial, which has a lot of other large brands. Hain’s sales last year were just under $500M. $100M is a good enough ballpark number. Each box of 20 tea bags sells at about $3.20 on their website, and I think it’s cheaper at my grocery store. Call it $3.00 per box, so we’ll estimate the volume of tea bags on the high side (to make up for the low-side estimate caused by pretending there’s 100 square inches per sheet of paper). That means they sell about 33.3M boxes, or 667M bags, of tea each year.

If they put bags, tags, and strings on all of them, I estimated 10 square inches of paper per bag, so at .0001 pound per square inch that’s .001 pound of extra paper and stuff per bag. That means they’d use about 667 thousand pounds of paper to bag up all that tea.

That’s quite a difference from the 3.5 million pounds of waste they claim they save. Did I do the math wrong or assume something wrong?

Categories: MySQL

Napkin math: How much waste does Celestial Seasonings save?

Xaprb, home of innotop - Sun, 2013-12-22 00:00

I was idly reading the Celestial Seasonings box today while I made tea. Here’s the end flap:

It seemed hard to believe that they really save 3.5 million pounds of waste just by not including that extra packaging, so I decided to do some back-of-the-napkin math.

How much paper is in each package of non-Celestial-Seasonings tea? The little bag is about 2 inches by 2 inches, it’s two-sided, and there’s a tag, staple, and string. Call it 10 square inches.

How heavy is the paper? It feels about the same weight as normal copy paper. Amazon.com lists a box of 5000 sheets of standard letter-sized paper at a shipping weight of 50 pounds (including the cardboard box, but we’ll ignore that). Pretend that each sheet (8.5 * 11 inches = 93.5 square inches) is about 100 square inches. That’s .0001 pounds per square inch.

How much tea does Celestial Seasonings sell every year? Wikipedia says their sales in the US are over $100M, and they are a subsidiary of Hain Celestial, which has a lot of other large brands. Hain’s sales last year were just under $500M. $100M is a good enough ballpark number. Each box of 20 tea bags sells at about $3.20 on their website, and I think it’s cheaper at my grocery store. Call it $3.00 per box, so we’ll estimate the volume of tea bags on the high side (to make up for the low-side estimate caused by pretending there’s 100 square inches per sheet of paper). That means they sell about 33.3M boxes, or 667M bags, of tea each year.

If they put bags, tags, and strings on all of them, I estimated 10 square inches of paper per bag, so at .0001 pound per square inch that’s .001 pound of extra paper and stuff per bag. That means they’d use about 667 thousand pounds of paper to bag up all that tea.

That’s quite a difference from the 3.5 million pounds of waste they claim they save. Did I do the math wrong or assume something wrong?

Categories: MySQL

Secure your accounts and devices

Xaprb, home of innotop - Wed, 2013-12-18 20:17

This is a public service announcement. Many people I know are not taking important steps necessary to secure their online accounts and devices (computers, cellphones) against malicious people and software. It’s a matter of time before something seriously harmful happens to them.

This blog post will urge you to use higher security than popular advice you’ll hear. It really, really, really is necessary to use strong measures to secure your digital life. The technology being used to attack you is very advanced, operates at a large scale, and you probably stand to lose much more than you realize.

You’re also likely not as good at being secure as you think you are. If you’re like most people, you don’t take some important precautions, and you overestimate the strength and effectiveness of security measures you do use. Password Security

The simplest and most effective way to dramatically boost your online security is use a password storage program, or password safe. You need to stop making passwords you can remember and make long, random passwords on websites. The only practical way to do this is to use a password safe.

Why? Because if you can remember the password, it’s trivially hackable. For example, passwords like 10qp29wo38ei47ru can be broken instantly. Anything you can feasibly remember is just too weak.

And, any rule you set for yourself that requires self-discipline will be violated, because you’re lazy. You need to make security easier so that you automatically do things more securely. A password safe is the best way to do that, by far. A good rule of thumb for most people is that you should not try to know your own passwords, except the password to your password safe. (People with the need to be hyper-secure will take extraordinary measures, but those aren’t practical or necessary for most of us.)

I use 1Password. Others I know of are LastPass and KeePass Password Safe. I personally wouldn’t use any others, because lesser-known ones are more likely to be malware.

It’s easy to share a password safe’s data across devices, and make a backup of it, by using a service such as Dropbox. The password safe’s files are encrypted, so the contents will not be at risk even if the file syncing service is compromised for some reason. (Use a strong password to encrypt your password safe!)

It’s important to note that online passwords are different from the password you use to log into your personal computer. Online passwords are much more exposed to brute-force, large-scale hacking attacks. By contrast, your laptop probably isn’t going to be subjected to a brute-force password cracking attack, because attackers usually need physical access to the computer to do that. This is not a reason to use a weak password for your computer; I’m just trying to illustrate how important it is to use really long, random passwords for websites and other online services, because they are frequent targets of brute-force attacks.

Here are some other important rules for password security.

  • Never use the same password in more than one service or login. If you do, someone who compromises it will be able to compromise other services you use.
  • Set your password generation program (likely part of your password safe) to make long, random passwords with numbers, special characters, and mixed case. I leave mine set to 20 characters by default. If a website won’t accept such a long password I’ll shorten it. For popular websites such as LinkedIn, Facebook, etc I use much longer passwords, 50 characters or more. They are such valuable attack targets that I’m paranoid.
  • Don’t use your web browser’s features for storing passwords and credit cards. Browsers themselves, and their password storage, are the target of many attacks.
  • Never write passwords down on paper, except once. The only paper copy of my passwords is the master password to my computer, password safe, and GPG key. These are in my bank’s safe deposit box, because if something happens to me I don’t want my family to be completely screwed. (I could write another blog post on the need for a will, power of attorney, advance medical directive, etc.)
  • Never treat any account online, no matter how trivial, as “not important enough for a secure password.”

That last item deserves a little story. Ten years ago I didn’t use a password safe, and I treated most websites casually. “Oh, this is just a discussion forum, I don’t care about it.” I used an easy-to-type password for such sites. I used the same one everywhere, and it was a common five-letter English word (not my name, if you’re guessing). Suddenly one day I realized that someone could guess this password easily, log in, change the password and in many cases the email address, and lock me out of my own account. They could then proceed to impersonate me, do illegal and harmful things in my name, etc. Worse, they could go find other places that I had accounts (easy to find — just search Google for my name or username!) and do the same things in many places. I scrambled to find and fix this problem. At the end of it, I realized I had created more than 300 accounts that could have been compromised. Needless to say, I was very, very lucky. My reputation, employment, credit rating, and even my status as a free citizen could have been taken away from me. Don’t let this happen to you! Use Two-Factor Auth

Two-factor authentication (aka 2-step login) is a much stronger mechanism for account security than a password alone. It uses a “second factor” (something you physically possess) in addition to the common “first factor” (something you know — a password) to verify that you are the person authorized to access the account.

Typically, the login process with two-factor authentication looks like this:

  • You enter your username and password.
  • The service sends a text message to your phone. The message contains a 6-digit number.
  • You must enter the number to finish logging in.

With two-factor auth in place, it is very difficult for malicious hackers to access your account, even if they know your password. Two-factor auth is way more secure than other tactics such as long passwords, but it doesn’t mean you shouldn’t also use a password safe and unique, random, non-memorized passwords.

Two-factor auth has a bunch of special ways to handle other common scenarios, such as devices that can’t display the dialog to ask for the 6-digit code, or what if you lose your cellphone, or what if you’re away from your own computer and don’t have your cellphone. Nonetheless, these edge cases are easy to handle. For example, you can get recovery codes for when you lose or don’t have your cellphone. You should store these — where else? — in your password safe.

There seems to be a perception that lots of people think two-factor auth is not convenient. I disagree. I’ve never found it inconvenient, and I use two-factor auth a lot. And I’ve never met these people, whoever they are, who think two-factor auth is such a high burden. The worst thing that happens to me is that I sometimes have to get out of my chair and get my phone from another room to log in.

Unfortunately, most websites don’t support two-factor authentication. Fortunately, many of the most popular and valuable services do, including Facebook, Google, Paypal, Dropbox, LinkedIn, Twitter, and most of the other services that you probably use which are most likely to get compromised. Here is a list of services with two-factor auth, with instructions on how to set it up for each one.

Please enable two-factor authentication if it is supported! I can’t tell you how many of my friends and family have had their Gmail, Facebook, Twitter, and other services compromised. Please don’t let this happen to you! It could do serious harm to you — worse than a stolen credit card. Secure Your Devices

Sooner or later someone is going to get access to one of your devices — tablet, phone, laptop, thumb drive. I’ve never had a phone or laptop lost or stolen myself, but it’s a matter of time. I’ve known a lot of people in this situation. One of my old bosses, for example, forgot a laptop in the seat pocket of an airplane, and someone took it and didn’t return it.

And how many times have you heard about some government worker leaving a laptop at the coffee shop and suddenly millions of people’s Social Security numbers are stolen?

Think about your phone. If someone stole my phone and it weren’t protected, they’d have access to a bunch of my accounts, contact lists, email, and a lot of other stuff I really, really do not want them messing with. If you’re in the majority of people who leave your phone completely unsecured, think about the consequences for a few minutes. Someone getting access to all the data and accounts on your phone could probably ruin your life for a long time if they wanted to.

All of this is easily preventable. Given that one or more of your devices will someday certainly end up in the hands of someone who may have bad intentions, I think it’s only prudent to take some basic measures:

  • Set the device to require a password, lock code, or pattern to be used to unlock it after it goes to sleep, when it’s idle for a bit, or when you first power it on. If someone steals your device, and can access it without entering your password, you’re well and truly screwed.
  • Use full-device encryption. If someone steals your device, for heaven’s sake don’t let them have access to your data. For Mac users, use File Vault under Preferences / Security and Privacy. Encrypt the whole drive, not just the home directory. On Windows, use TrueCrypt, and on Linux, you probably already know what you’re doing.
  • On Android tablets and phones, you can encrypt the entire device. You have to set up a screen lock code first.
  • If you use a thumb drive or external hard drive to transfer files between devices, encrypt it.
  • Encrypt your backup hard drives. Backups are one of the most common ways that data is stolen. (You have backups, right? I could write another entire blog post on backups. Three things are inevitable: death, taxes, and loss of data that you really care about.)
  • Use a service such as Prey Project to let you have at least some basic control over your device if it’s lost or stolen. If you’re using an Android device, set up Android Device Manager so you can track and control your device remotely. I don’t know if there’s anything similar for Apple devices.
  • Keep records of your devices’ make, model, serial number, and so on. Prey Project makes this easy.
  • On your phone or tablet, customize the lockscreen with a message such as “user@email.com – reward if found” and on your laptops, stick a small label inside the lid with your name and phone number. You never know if a nice person will return something to you. I know I would do it for you.
Things that don’t help

Finally, here are some techniques that aren’t as useful as you might have been told.

  • Changing passwords doesn’t significantly enhance security unless you change from an insecure password to a strong one. Changing passwords is most useful, in my opinion, when a service has already been compromised or potentially compromised. It’s possible on any given day that an attacker has gotten a list of encrypted passwords for a service, hasn’t yet been discovered, and hasn’t yet decrypted them, and that you’ll foil the attack by changing your password in the meanwhile, but this is such a vanishingly small chance that it’s not meaningful.
  • (OK, this ended up being a list of 1 thing. Tell me what else should go here.)
Summary

Here is a summary of the most valuable steps you can take to protect yourself:

  • Get a password safe, and use it for all of your accounts. Protect it with a long password. Make this the one password you memorize.
  • Use long (as long as possible), randomly generated passwords for all online accounts and services, and never reuse a password.
  • Use two-factor authentication for all services that support it.
  • Encrypt your hard drives, phones and tablets, and backups, and use a password or code to lock all computers, phones, tablets, etc when you turn them off, leave them idle, or put them to sleep.
  • Install something like Prey Project on your portable devices, and label them so nice people can return them to you.
  • Write down the location and access instructions (including passwords) for your password safe, computer, backup hard drives, etc and put it in a safe deposit box.

Friends try not to let friends get hacked and ruined. Don’t stop at upgrading your own security. Please tell your friends and family to do it, too!

Do you have any other suggestions? Please use the comments below to add your thoughts.

Categories: MySQL

Secure your accounts and devices

Xaprb, home of innotop - Wed, 2013-12-18 00:00

This is a public service announcement. Many people I know are not taking important steps necessary to secure their online accounts and devices (computers, cellphones) against malicious people and software. It’s a matter of time before something seriously harmful happens to them.

This blog post will urge you to use higher security than popular advice you’ll hear. It really, really, really is necessary to use strong measures to secure your digital life. The technology being used to attack you is very advanced, operates at a large scale, and you probably stand to lose much more than you realize.

You’re also likely not as good at being secure as you think you are. If you’re like most people, you don’t take some important precautions, and you overestimate the strength and effectiveness of security measures you do use.

Password Security

The simplest and most effective way to dramatically boost your online security is use a password storage program, or password safe. You need to stop making passwords you can remember and make long, random passwords on websites. The only practical way to do this is to use a password safe.

Why? Because if you can remember the password, it’s trivially hackable. For example, passwords like 10qp29wo38ei47ru can be broken instantly. Anything you can feasibly remember is just too weak.

And, any rule you set for yourself that requires self-discipline will be violated, because you’re lazy. You need to make security easier so that you automatically do things more securely. A password safe is the best way to do that, by far. A good rule of thumb for most people is that you should not try to know your own passwords, except the password to your password safe. (People with the need to be hyper-secure will take extraordinary measures, but those aren’t practical or necessary for most of us.)

I use 1Password. Others I know of are LastPass and KeePass Password Safe. I personally wouldn’t use any others, because lesser-known ones are more likely to be malware.

It’s easy to share a password safe’s data across devices, and make a backup of it, by using a service such as Dropbox. The password safe’s files are encrypted, so the contents will not be at risk even if the file syncing service is compromised for some reason. (Use a strong password to encrypt your password safe!)

It’s important to note that online passwords are different from the password you use to log into your personal computer. Online passwords are much more exposed to brute-force, large-scale hacking attacks. By contrast, your laptop probably isn’t going to be subjected to a brute-force password cracking attack, because attackers usually need physical access to the computer to do that. This is not a reason to use a weak password for your computer; I’m just trying to illustrate how important it is to use really long, random passwords for websites and other online services, because they are frequent targets of brute-force attacks.

Here are some other important rules for password security.

  • Never use the same password in more than one service or login. If you do, someone who compromises it will be able to compromise other services you use.
  • Set your password generation program (likely part of your password safe) to make long, random passwords with numbers, special characters, and mixed case. I leave mine set to 20 characters by default. If a website won’t accept such a long password I’ll shorten it. For popular websites such as LinkedIn, Facebook, etc I use much longer passwords, 50 characters or more. They are such valuable attack targets that I’m paranoid.
  • Don’t use your web browser’s features for storing passwords and credit cards. Browsers themselves, and their password storage, are the target of many attacks.
  • Never write passwords down on paper, except once. The only paper copy of my passwords is the master password to my computer, password safe, and GPG key. These are in my bank’s safe deposit box, because if something happens to me I don’t want my family to be completely screwed. (I could write another blog post on the need for a will, power of attorney, advance medical directive, etc.)
  • Never treat any account online, no matter how trivial, as “not important enough for a secure password.” That last item deserves a little story. Ten years ago I didn’t use a password safe, and I treated most websites casually. “Oh, this is just a discussion forum, I don’t care about it.” I used an easy-to-type password for such sites. I used the same one everywhere, and it was a common five-letter English word (not my name, if you’re guessing). Suddenly one day I realized that someone could guess this password easily, log in, change the password and in many cases the email address, and lock me out of my own account. They could then proceed to impersonate me, do illegal and harmful things in my name, etc. Worse, they could go find other places that I had accounts (easy to find – just search Google for my name or username!) and do the same things in many places. I scrambled to find and fix this problem. At the end of it, I realized I had created more than 300 accounts that could have been compromised. Needless to say, I was very, very lucky. My reputation, employment, credit rating, and even my status as a free citizen could have been taken away from me. Don’t let this happen to you!
Use Two-Factor Auth

Two-factor authentication (aka 2-step login) is a much stronger mechanism for account security than a password alone. It uses a “second factor” (something you physically possess) in addition to the common “first factor” (something you know – a password) to verify that you are the person authorized to access the account.

Typically, the login process with two-factor authentication looks like this:

  • You enter your username and password.
  • The service sends a text message to your phone. The message contains a 6-digit number.
  • You must enter the number to finish logging in. With two-factor auth in place, it is very difficult for malicious hackers to access your account, even if they know your password. Two-factor auth is way more secure than other tactics such as long passwords, but it doesn’t mean you shouldn’t also use a password safe and unique, random, non-memorized passwords.

Two-factor auth has a bunch of special ways to handle other common scenarios, such as devices that can’t display the dialog to ask for the 6-digit code, or what if you lose your cellphone, or what if you’re away from your own computer and don’t have your cellphone. Nonetheless, these edge cases are easy to handle. For example, you can get recovery codes for when you lose or don’t have your cellphone. You should store these – where else? – in your password safe.

There seems to be a perception that lots of people think two-factor auth is not convenient. I disagree. I’ve never found it inconvenient, and I use two-factor auth a lot. And I’ve never met these people, whoever they are, who think two-factor auth is such a high burden. The worst thing that happens to me is that I sometimes have to get out of my chair and get my phone from another room to log in.

Unfortunately, most websites don’t support two-factor authentication. Fortunately, many of the most popular and valuable services do, including Facebook, Google, Paypal, Dropbox, LinkedIn, Twitter, and most of the other services that you probably use which are most likely to get compromised. Here is a list of services with two-factor auth, with instructions on how to set it up for each one.

Please enable two-factor authentication if it is supported! I can’t tell you how many of my friends and family have had their Gmail, Facebook, Twitter, and other services compromised. Please don’t let this happen to you! It could do serious harm to you – worse than a stolen credit card.

Secure Your Devices

Sooner or later someone is going to get access to one of your devices – tablet, phone, laptop, thumb drive. I’ve never had a phone or laptop lost or stolen myself, but it’s a matter of time. I’ve known a lot of people in this situation. One of my old bosses, for example, forgot a laptop in the seat pocket of an airplane, and someone took it and didn’t return it.

And how many times have you heard about some government worker leaving a laptop at the coffee shop and suddenly millions of people’s Social Security numbers are stolen?

Think about your phone. If someone stole my phone and it weren’t protected, they’d have access to a bunch of my accounts, contact lists, email, and a lot of other stuff I really, really do not want them messing with. If you’re in the majority of people who leave your phone completely unsecured, think about the consequences for a few minutes. Someone getting access to all the data and accounts on your phone could probably ruin your life for a long time if they wanted to.

All of this is easily preventable. Given that one or more of your devices will someday certainly end up in the hands of someone who may have bad intentions, I think it’s only prudent to take some basic measures:

  • Set the device to require a password, lock code, or pattern to be used to unlock it after it goes to sleep, when it’s idle for a bit, or when you first power it on. If someone steals your device, and can access it without entering your password, you’re well and truly screwed.
  • Use full-device encryption. If someone steals your device, for heaven’s sake don’t let them have access to your data. For Mac users, use File Vault under Preferences / Security and Privacy. Encrypt the whole drive, not just the home directory. On Windows, use TrueCrypt, and on Linux, you probably already know what you’re doing.
  • On Android tablets and phones, you can encrypt the entire device. You have to set up a screen lock code first.
  • If you use a thumb drive or external hard drive to transfer files between devices, encrypt it.
  • Encrypt your backup hard drives. Backups are one of the most common ways that data is stolen. (You have backups, right? I could write another entire blog post on backups. Three things are inevitable: death, taxes, and loss of data that you really care about.)
  • Use a service such as Prey Project to let you have at least some basic control over your device if it’s lost or stolen. Android phones now have the Android Device Manager and Google Location History, but you have to enable these.
  • Keep records of your devices’ make, model, serial number, and so on. Prey Project makes this easy.
  • On your phone or tablet, customize the lockscreen with a message such as “user@email.com – reward if found” and on your laptops, stick a small label inside the lid with your name and phone number. You never know if a nice person will return something to you. I know I would do it for you.
External Links and Resources Things that don’t help

Finally, here are some techniques that aren’t as useful as you might have been told.

  • Changing passwords doesn’t significantly enhance security unless you change from an insecure password to a strong one. Changing passwords is most useful, in my opinion, when a service has already been compromised or potentially compromised. It’s possible on any given day that an attacker has gotten a list of encrypted passwords for a service, hasn’t yet been discovered, and hasn’t yet decrypted them, and that you’ll foil the attack by changing your password in the meanwhile, but this is such a vanishingly small chance that it’s not meaningful.
  • (OK, this ended up being a list of 1 thing. Tell me what else should go here.)
Summary

Here is a summary of the most valuable steps you can take to protect yourself:

  • Get a password safe, and use it for all of your accounts. Protect it with a long password. Make this the one password you memorize.
  • Use long (as long as possible), randomly generated passwords for all online accounts and services, and never reuse a password.
  • Use two-factor authentication for all services that support it.
  • Encrypt your hard drives, phones and tablets, and backups, and use a password or code to lock all computers, phones, tablets, etc when you turn them off, leave them idle, or put them to sleep.
  • Install something like Prey Project on your portable devices, and label them so nice people can return them to you.
  • Write down the location and access instructions (including passwords) for your password safe, computer, backup hard drives, etc and put it in a safe deposit box. Friends try not to let friends get hacked and ruined. Don’t stop at upgrading your own security. Please tell your friends and family to do it, too!

Do you have any other suggestions? Please use the comments below to add your thoughts.

Categories: MySQL

How is the MariaDB Knowledge Base licensed?

Xaprb, home of innotop - Mon, 2013-12-16 22:37

I clicked around for a few moments but didn’t immediately see a license mentioned for the MariaDB knowledgebase. As far as I know, the MySQL documentation is not licensed in a way that would allow copying or derivative works, but at least some of the MariaDB Knowledge Base seems to be pretty similar to the corresponding MySQL documentation. See for example LOAD DATA LOCAL INFILE: MariaDB, MySQL.

Oracle’s MySQL documentation has a licensing notice that states:

You may create a printed copy of this documentation solely for your own personal use. Conversion to other formats is allowed as long as the actual content is not altered or edited in any way. You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Oracle disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium. Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.

Can someone clarify the situation?

Categories: MySQL

How is the MariaDB Knowledge Base licensed?

Xaprb, home of innotop - Mon, 2013-12-16 00:00

I clicked around for a few moments but didn’t immediately see a license mentioned for the MariaDB knowledgebase. As far as I know, the MySQL documentation is not licensed in a way that would allow copying or derivative works, but at least some of the MariaDB Knowledge Base seems to be pretty similar to the corresponding MySQL documentation. See for example LOAD DATA LOCAL INFILE: MariaDB, MySQL.

Oracle’s MySQL documentation has a licensing notice that states:

You may create a printed copy of this documentation solely for your own personal use. Conversion to other formats is allowed as long as the actual content is not altered or edited in any way. You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Oracle disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium. Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.

Can someone clarify the situation?

Categories: MySQL

Props to the MySQL Community Team

Xaprb, home of innotop - Sat, 2013-12-07 21:02

Enough negativity sometimes gets slung around that it’s easy to forget how much good is going on. I want to give a public thumbs-up to the great job the MySQL community team, especially Morgan Tocker, is doing. I don’t remember ever having so much good interaction with this team, not even in the “good old days”:

  • Advance notice of things they’re thinking about doing (deprecating, changing, adding, etc)
  • Heads-up via private emails about news and upcoming things of interest (new features, upcoming announcements that aren’t public yet, etc)
  • Solicitation of opinion on proposals that are being floated internally (do you use this feature, would it hurt you if we removed this option, do you care about this legacy behavior we’re thinking about sanitizing)

I don’t know who or what has made this change happen, but it’s really welcome. I know Oracle is a giant company with all sorts of legal and regulatory hoops to jump through, for things that seem like they ought to be obviously the right thing to do in an open-source community. I had thought we were not going to get this kind of interaction from them, but happily I was wrong.

(At the same time, I still wish for more public bug reports and test cases; I believe those things are really in everyone’s best interests, both short- and long-term.)

Categories: MySQL

S**t sales engineers say

Xaprb, home of innotop - Sat, 2013-12-07 20:51

Here’s a trip down memory lane. I was just cleaning out some stuff and I found some notes I took from a hilarious MySQL seminar a few years back. I won’t say when or where, to protect the guilty.[1]

I found it so absurd that I had to write down what I was witnessing. Enough time has passed that we can probably all laugh about this now. Times and people have changed.

The seminar was a sales pitch in disguise, of course. The speakers were singing Powerpoint Karaoke to slides real tech people had written. Every now and then, when they advanced a slide, they must have had a panicked moment. “I don’t remember this slide at all!” they must have been thinking. So they’d mumble something really funny and trying-too-hard-to-be-casual about “oh, yeah, [insert topic here] but you all already know this, I won’t bore you with the details [advance slide hastily].” It’s strange how transparent that is to the audience.

Here are some of the things the sales “engineers” said during this seminar, in response to audience questions:

  • Q. How does auto-increment work in replication? A: On slaves, you have to ALTER TABLE to remove auto-increment because only one table in a cluster can be auto-increment. When you switch replication to a different master you have to ALTER TABLE on all servers in the whole cluster to add/remove auto-increment. (This lie was told early in the day. Each successive person who took a turn presenting built upon it instead of correcting it. I’m not sure whether this was admirable teamwork or cowardly face-saving.)
  • Q. Does InnoDB’s log grow forever? A: Yes. You have to back up, delete, and restore your database if you want to shrink it.
  • Q. What size sort buffer should I have? A: 128MB is the suggested starting point. You want this sucker to be BIG.

There was more, but that’s enough for a chuckle. Note to sales engineers everywhere: beware the guy in the front row scribbling notes and grinning.

What are your best memories of worst sales engineer moments?

1. For the avoidance of doubt, it was NOT any of the trainers, support staff, consultants, or otherwise anyone prominently visible to the community. Nor was it anyone else whose name I’ve mentioned before. I doubt any readers of this blog, except for former MySQL AB employees (pre-Sun), would have ever heard of these people. I had to think hard to remember who those names belonged to.

Categories: MySQL

Props to the MySQL Community Team

Xaprb, home of innotop - Sat, 2013-12-07 00:00

Enough negativity sometimes gets slung around that it’s easy to forget how much good is going on. I want to give a public thumbs-up to the great job the MySQL community team, especially Morgan Tocker, is doing. I don’t remember ever having so much good interaction with this team, not even in the “good old days”:

  • Advance notice of things they’re thinking about doing (deprecating, changing, adding, etc)
  • Heads-up via private emails about news and upcoming things of interest (new features, upcoming announcements that aren’t public yet, etc)
  • Solicitation of opinion on proposals that are being floated internally (do you use this feature, would it hurt you if we removed this option, do you care about this legacy behavior we’re thinking about sanitizing) I don’t know who or what has made this change happen, but it’s really welcome. I know Oracle is a giant company with all sorts of legal and regulatory hoops to jump through, for things that seem like they ought to be obviously the right thing to do in an open-source community. I had thought we were not going to get this kind of interaction from them, but happily I was wrong.

(At the same time, I still wish for more public bug reports and test cases; I believe those things are really in everyone’s best interests, both short- and long-term.)

Categories: MySQL

S**t sales engineers say

Xaprb, home of innotop - Sat, 2013-12-07 00:00

Here’s a trip down memory lane. I was just cleaning out some stuff and I found some notes I took from a hilarious MySQL seminar a few years back. I won’t say when or where, to protect the guilty.[1]

I found it so absurd that I had to write down what I was witnessing. Enough time has passed that we can probably all laugh about this now. Times and people have changed.

The seminar was a sales pitch in disguise, of course. The speakers were singing Powerpoint Karaoke to slides real tech people had written. Every now and then, when they advanced a slide, they must have had a panicked moment. “I don’t remember this slide at all!” they must have been thinking. So they’d mumble something really funny and trying-too-hard-to-be-casual about “oh, yeah, [insert topic here] but you all already know this, I won’t bore you with the details [advance slide hastily].” It’s strange how transparent that is to the audience.

Here are some of the things the sales “engineers” said during this seminar, in response to audience questions:

  • Q. How does auto-increment work in replication? A: On slaves, you have to ALTER TABLE to remove auto-increment because only one table in a cluster can be auto-increment. When you switch replication to a different master you have to ALTER TABLE on all servers in the whole cluster to add/remove auto-increment. (This lie was told early in the day. Each successive person who took a turn presenting built upon it instead of correcting it. I’m not sure whether this was admirable teamwork or cowardly face-saving.)
  • Q. Does InnoDB’s log grow forever? A: Yes. You have to back up, delete, and restore your database if you want to shrink it.
  • Q. What size sort buffer should I have? A: 128MB is the suggested starting point. You want this sucker to be BIG.

There was more, but that’s enough for a chuckle. Note to sales engineers everywhere: beware the guy in the front row scribbling notes and grinning.

What are your best memories of worst sales engineer moments?

1. For the avoidance of doubt, it was NOT any of the trainers, support staff, consultants, or otherwise anyone prominently visible to the community. Nor was it anyone else whose name I’ve mentioned before. I doubt any readers of this blog, except for former MySQL AB employees (pre-Sun), would have ever heard of these people. I had to think hard to remember who those names belonged to.

Categories: MySQL
Syndicate content