MySQL

A review of Bose, Sony, and Sennheiser noise-cancelling headphones

Xaprb, home of innotop - Thu, 2014-01-16 00:00

I’ve used active noise-cancelling headphones for over ten years now, and have owned several pairs of Bose, one of Sony, and most recently a pair of Sennheiser headphones. The Sennheisers are my favorites. I thought I’d write down why I’ve gone through so many sets of cans and what I like and dislike about them.

Bose QuietComfort 15 Acoustic Noise Cancelling Headphones

I’m sure you’re familiar with Bose QuietComfort headphones. They’re the iconic “best-in-class” noise-cancelling headphones, the ones you see everywhere. Yet, after owning several pairs (beginning with Quiet Comfort II in 2003), I decided I’m not happy with them and won’t buy them anymore. Why not?

  • They’re not very good quality. I’ve worn out two pairs and opted to sell the third pair that Bose sent me as a replacement. Various problems occurred, including torn speakers that buzzed and grated. I just got tired of sending them back to Bose for servicing.
  • They’re more expensive than I think they’re worth, especially given the cheap components used.
  • They don’t sound bad – but to my ears they still have the classic Bose fairy-dust processing, which sounds rich and pleasant at first but then fatigues me.
  • They produce a sensation of suction on the eardrums that becomes uncomfortable over long periods of time.
  • They can’t be used in non-cancelling mode. In other words, if the battery is dead, they’re unusable.
  • On a purely personal note, I think Bose crosses the line into greed and jealousy. I know this in part because I used to work at Crutchfield, and saw quite a bit of interactions with Bose. As an individual – well, try selling a pair of these on eBay, and you’ll see what I mean. I had to jump through all kinds of hoops after my first listing was cancelled for using a stock photo that eBay themselves suggested and provided in the listing wizard. Here is the information the take-down notice directed me to.

On the plus side, the fit is very comfortable physically, they cancel noise very well, and they’re smaller than some other noise-cancelling headphones. Also on the plus side, every time I’ve sent a pair in for servicing, Bose has just charged me $100 and sent me a new pair.

Sony MDR-NC200D

When I sent my last pair of Bose in for servicing, they replaced them with a factory-sealed pair of new ones in the box, and I decided to sell them on eBay and buy a set of Sony MDR-NC200D headphones, which were about $100 less money than new Bose headphones at the time. I read online reviews and thought it was worth a try.

First, the good points. The Sonys are more compact even than the Bose, although as I recall they’re a little heavier. And the noise cancellation works quite well. The passive noise blocking (muffling) is in itself quite good. You can just put them on without even turning on the switch, and block a lot of ambient noise. The sound quality is also quite good, although there is a slight hiss when noise cancellation is enabled. Active cancellation is good, but not as good as the Bose.

However, it wasn’t long before I realized I couldn’t keep them. The Sonys sit on the ear, and don’t enclose the ear and sit against the skull as the Bose do. They’re on-the-ear, not over-the-ear. Although this doesn’t feel bad at first, in about 20 minutes it starts to hurt. After half an hour it’s genuinely painful. This may not be your experience, but my ears just start to hurt after being pressed against my head for a little while.

I had to sell the Sonys on eBay too. My last stop was the Sennheisers.

Sennheiser PXC 450 NoiseGard Active Noise-Canceling Headphones

The Sennheiser PXC 450 headphones are midway in price between the Bose and the Sony: a little less expensive than the Bose. I’ve had them a week or so and I’m very happy with them so far.

This is not the first pair of Sennheisers I’ve owned. I’ve had a pair of open-air higher-end Sennheisers for over a decade. I absolutely love them, so you can consider me a Sennheiser snob to some extent.

I’m pleased to report that the PXC 450s are Sennheisers through and through. They have amazing sound, and the big cups fit comfortably around my ears. They are a little heavier than my other Sennheisers, but still a pleasure to wear.

The nice thing is that not only does noise cancellation work very well (on par with Bose’s, I’d say), but there is no sensation of being underwater with pressure or suction on the eardrums. Turn on the noise cancellation switch and the noise just vanishes, but there’s no strange feeling as a result. Also, these headphones can work in passive mode, with noise cancellation off, and don’t need a battery to work.

On the downside, if you want to travel with them, they’re a little bigger than the Bose. However I’ve travelled with the Bose headphones several times and honestly I find even them too large to be convenient. I don’t use noise-cancelling headphones for travel, as a result.

Another slight downside is that the earcups aren’t completely “empty” inside. There are some caged-over protrusions with the machinery inside. Depending on the shape of your ears, these might brush your ears if you move your head. I find that if I don’t place the headphones in the right spot on my head, they do touch my ears every now and then.

Summary

After owning several pairs of top-rated noise-cancelling headphones, I think the Sennheisers are the clear winners in price, quality, comfort, and sound. Your mileage may vary.

Categories: MySQL

Xaprb now uses Hugo

Xaprb, home of innotop - Wed, 2014-01-15 00:00

I’ve switched this blog from Wordpress to Hugo. If you see any broken links or other problems, let me know. I’ll re-enable comments and other features in the coming days.

Why not Wordpress? I’ve used Wordpress since very early days, but I’ve had my fill of security problems, the need to worry about whether a database is up and available, backups, plugin compatibility problems, upgrades, and performance issues. In fact, while converting the content from Wordpress to Markdown, I found a half-dozen pages that had been hacked by some link-farm since around 2007. This wasn’t the first such problem I’d had; it was merely the only one I hadn’t detected and fixed. And I’ve been really diligent with Wordpress security; I have done things like changing my admin username and customizing my .htaccess file to block common attack vectors, in addition to the usual “lockdown” measures that one takes with Wordpress.

In contrast to Wordpress or other CMSes that use a database, static content is secure, fast, and worry-free. I’m particularly happy that my content is all in Markdown format now. Even if I make another change in the future, the content is now mostly well-structured and easy to transform as desired. (There are some pages and articles that didn’t convert so well, but I will clean them up later.)

Why Hugo? There are lots of static site generators. Good ones include Octopress and Jekyll, and I’ve used those. However, they come with some of their own annoyances: dependencies, the need to install Ruby and so on, and particularly bothersome for this blog, performance issues. Octopress ran my CPU fan at top speed for about 8 minutes to render this blog.

Hugo is written in Go, so it has zero dependencies (a single binary) and is fast. It renders this blog in a couple of seconds. That’s fast enough to run it in server mode, hugo server -w, and I can just alt-tab back and forth between my writing and my browser to preview my changes. By the time I’ve tabbed over, the changes are ready to view.

Hugo isn’t perfect. For example, it lacks a couple of features that are present in Octopress or Jekyll. But it’s more than good enough for my needs, and I intend to contribute some improvements to it if I get time. I believe it has the potential to be a leading static site/blog generator going forward. It’s already close to a complete replacement for something like Jekyll.

Categories: MySQL

Immutability, MVCC, and garbage collection

Xaprb, home of innotop - Sat, 2013-12-28 20:33

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut. Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly.

Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability.

In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version. Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened? Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality. The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database). ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases — no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely. Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Categories: MySQL

Immutability, MVCC, and garbage collection

Xaprb, home of innotop - Sat, 2013-12-28 00:00

Not too long ago I attended a talk about a database called Datomic. My overall impressions of Datomic were pretty negative, but this blog post isn’t about that. This is about one of the things the speaker referenced a lot: immutability and its benefits. I hope to illustrate, if only sketchily, why a lot of sophisticated databases are actually leaps and bounds beyond the simplistic design of such immutable databases. This is in direct contradiction to what proponents of Datomic-like systems would have you believe; they’d tell you that their immutable database implementations are advanced. Reality is not so clear-cut.

Datomic and Immutability

The Datomic-in-a-nutshell is that it (apparently) uses an append-only B-tree to record data, and never updates any data after it’s written. I say “apparently” because the speaker didn’t know what an append-only B-tree was, but his detailed description matched AOBTs perfectly. Why is this a big deal? Immutable data confers a lot of nice benefits. Here’s an incomplete summary:

  • It’s more cacheable.
  • It’s easier to reason about.
  • It’s less likely to get corrupted from bugs and other problems.
  • You can rewind history and view the state at any point in the past, by using an “old” root for the tree.
  • Backups are simple: just copy the file, no need to take the database offline. In fact, you can do continuous backups.
  • Replication is simple and fast.
  • Crash recovery is simple and fast.
  • It’s easier to build a reliable system on unreliable components with immutability. In general, immutability results in a lot of nice, elegant properties that just feel wonderful. But this is supposed to be the short version.
Prior Art

Datomic is not revolutionary in this sense. I have seen at least two other databases architected similarly. Their creators waxed eloquently about many of the same benefits. In fact, in 2009 and 2010, you could have listened to talks from the architects of RethinkDB, and if you just searched and replaced “RethinkDB” with “Datomic” you could have practically interchanged the talks. The same is true of CouchDB. Just to list a few links to RethinkDB’s history: 1, 2, 3.

That last one links to Accountants Don’t Use Erasers, a blog post that brought append-only storage into the minds of many people at the time.

Beyond databases, don’t forget about filesystems, such as ZFS for example. Many of the same design techniques are employed here.

Back to RethinkDB. Strangely, around 2011 or so, nobody was talking about its append-only design anymore. What happened?

Append-Only Blues

Immutability, it turns out, has costs. High costs. Wait a bit, and I’ll explain how those costs are paid by lots of databases that don’t build so heavily around immutability, too.

Even in 2010, Slava Akhmechet’s tone was changing. He’d begin his talks singing append-only immutability to the heavens, and then admit that implementation details were starting to get really hard. It turns out that there are a few key problems with append-only, immutable data structures.

The first is that space usage grows forever. Logically, people insert facts, and then update the database with new facts. Physically, if what you’re doing is just recording newer facts that obsolete old ones, then you end up with outdated rows. It may feel nice to be able to access those old facts, but the reality is most people don’t want that, and don’t want to pay the cost (infinitely growing storage) for it.

The second is fragmentation. If entities are made of related facts, and some facts are updated but others aren’t, then as the database grows and new facts are recorded, an entity ends up being scattered widely over a lot of storage. This gets slow, even on SSDs with fast random access.

The last is that a data structure or algorithm that’s elegant and pure, but has one or more worst cases, will fall apart rather violently in real-world usage. That’s because real-world usage is much more diverse than you’d suspect. A database that has a “tiny worst-case scenario” will end up hitting that worst-case behavior for something rather more than a tiny fraction of its users; probably a significant majority. An easy example in a different domain is sort algorithms. Nobody implements straightforward best-performance-most-of-the-time sort algorithms because if they do, things go to hell in a handbasket rather quickly. Databases end up with similar hard cases to handle.

There are more problems, many of them much harder to talk about and understand (dealing with concurrency, for example), but these are the biggest, most obvious ones I’ve seen.

As a result, you can see RethinkDB quickly putting append-only, immutable design behind them. They stopped talking and writing about it. Their whitepaper, “Rethinking Database Storage”, is gone from their website (rethinkdb.com/papers/whitepaper.pdf) but you can get it from the wayback machine.

Reality sunk in and they had to move on from elegant theories to the bitterness of solving real-world problems. Whenever you hear about a new database, remember this: this shit is really, really, really hard. It typically takes many years for a database or storage engine to become production-ready in the real world.

This blog post isn’t about RethinkDB, though. I’m just using their evolution over time as an example of what happens when theory meets reality.

The CouchDB Problem

Around the same time as RethinkDB, a new NoSQL database called CouchDB was built on many of the same premises. In fact, I even blogged a quick overview of it as it started to become commercialized: A gentle introduction to CouchDB for relational practitioners.

CouchDB had so many benefits from using immutability. MVCC (multi-version concurrency control), instant backup and recovery, crash-only design. But the big thing everyone complained about was… compaction. CouchDB became a little bit legendary for compaction.

You see, CouchDB’s files would grow forever (duh!) and you’d fill up your disks if you didn’t do something about it. What could you do about it? CouchDB’s answer was that you would periodically save a complete new database, without old versions of documents that had been obsoleted. It’s a rewrite-the-whole-database process. The most obvious problem with this was that you had to reserve twice as much disk space as you needed for your database, because you needed enough space to write a new copy. If your disk got too full, compaction would fail because there wasn’t space for two copies.

And if you were writing into your database too fast, compaction would never catch up with the writes. And there were a host of other problems that could potentially happen.

Datomic has all of these problems too, up to and including stop-the-world blocking of writes (which in my book is complete unavailability of the database).

ACID MVCC Relational Databases

It turns out that there is a class of database systems that has long been aware of the problems with all three of the databases I’ve mentioned so far. Oracle, SQL Server, MySQL (InnoDB), and PostgreSQL all have arrived at designs that share some properties in common. These characteristics go a long ways towards satisfying the needs of general-purpose database storage and retrieval in very wide ranges of use cases, with excellent performance under mixed workloads and relatively few and rare worst-case behaviors. (That last point is debatable, depending on your workload.)

The properties are ACID transactions with multi-version concurrency control (MVCC). The relational aspect is ancillary. You could build these properties in a variety of non-SQL, non-relational databases. It just so happens that the databases that have been around longer than most, and are more mature and sophisticated, are mostly relational. That’s why these design choices and characteristics show up in relational databases – no other reason as far as I know.

Multi-version concurrency control lets database users see a consistent state of the database at a point in time, even as the database accepts changes from other users concurrently.

How is this done? By keeping old versions of rows. These databases operate roughly as follows: when a row is updated, an old version is kept if there’s any transaction that still needs to see it. When the old versions aren’t needed any more, they’re purged. Implementation details and terminology vary. I can speak most directly about InnoDB, which never updates a row in the primary key (which is the table itself). Instead, a new row is written, and the database is made to recognize this as the “current” state of the world. Old row versions are kept in a history list; access to this is slower than access to the primary key. Thus, the current state of the database is optimized to be the fastest to access.

Now, about ACID transactions. Managing the write-ahead log and flushing dirty pages to disk is one of the most complex and hardest things an ACID database does, in my opinion. The process of managing the log and dirty pages in memory is called checkpointing.

Write-ahead logging and ACID, caching, MVCC, and old-version-purge are often intertwined to some extent, for implementation reasons. This is a very complex topic and entire books (huge books!) have been written about it.

What’s happening in such a database is a combination of short-term immutability, read and write optimizations to save and/or coalesce redundant work, and continuous “compaction” and reuse of disk space to stabilize disk usage and avoid infinite growth. Doing these things a little bit at a time allows the database to gradually take care of business without needing to stop the world. Unfortunately, this is incredibly hard, and I am unaware of any such database that is completely immune to “furious flushing,” “garbage collection pause,” “compaction stall,” “runaway purge,” “VACUUM blocking,” “checkpoint stall,” or whatever it tends to be called in your database of choice. There is usually a combination of some kind of workload that can push things over the edge. The most obvious case is if you try to change the database faster than the hardware can physically keep up. Because a lot of this work is done in the background so that it’s non-blocking and can be optimized in various ways, most databases will allow you to overwork the background processes if you push hard enough.

Show me a database and I’ll show you someone complaining about these problems. I’ll start out: MySQL’s adaptive flushing has been beaten to death by Percona and Oracle engineers. Riak on LevelDB: “On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.” PostgreSQL’s VACUUM can stall out. I can go on. Every one of those problems is being improved somehow, but also can be triggered if circumstances are right. It’s hard (impossible?) to avoid completely.

Evolution of Append-Only

Do you see how the simplistic, one-thing-at-a-time architecture of append-only systems, with periodic rewrites of the whole database, almost inevitably becomes continuous, concurrent performing of the same tasks? Immutability can’t live forever. It’s better to do things continuously in the background than to accrue a bunch of debt and then pay it back in one giant blocking operation.

That’s how a really capable database usually operates. These mature, sophisticated, advanced databases represent what a successful implementation usually evolves into over time. The result is that Oracle (for example) can sustain combinations of workloads such as very high-frequency small operations reads and writes, together with days-long read-heavy and write-heavy batch processing, simultaneously, and providing good performance for both! Try that in a database that can only do one thing at a time.

So, keep that in mind if you start to feel like immutability is the elegant “hallelujah” solution that’s been overlooked by everyone other than some visionary with a new product. It hasn’t been overlooked. It’s in the literature, and it’s in the practice and industry. It’s been refined for decades. It’s well worth looking at the problems the more mature databases have solved. New databases are overwhelmingly likely to run into some of them, and perhaps end up implementing the same solutions as well.

Note that I am not a relational curmudgeon claiming that it’s all been done before. I have a lot of respect for the genuinely new advancements in the field, and there is a hell of a lot of it, even in databases whose faults I just attacked. I’m also not a SQL/relational purist. However, I will admit to getting a little curmudgeonly when someone claims that the database he’s praising is super-advanced, and then in the next breath says he doesn’t know what an append-only B-tree is. That’s kind of akin to someone claiming their fancy new sort algorithm is advanced, but not being aware of quicksort!

What do you think? Also, if I’ve gone too far, missed something important, gotten anything wrong, or otherwise need some education myself, please let me know so I can a) learn and b) correct my error.

Categories: MySQL

Early-access books: a double-edged sword

Xaprb, home of innotop - Thu, 2013-12-26 21:46

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too.

To some extent, editions create a similar problem[1]. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

[1] Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator — cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Categories: MySQL

Early-access books: a double-edged sword

Xaprb, home of innotop - Thu, 2013-12-26 00:00

Many technical publishers offer some kind of “early access” to unfinished versions of books. Manning has MEAP, for example, and there’s even LeanPub which is centered on this idea. I’m not a fan of buying these, in most circumstances. Why not?

  • Many authors never finish their books. A prominent example: Nathan Marz’s book on Big Data was supposed to be published in 2012; the date has been pushed back to March 2014 now. At least a few of my friends have told me their feelings about paying for this book and “never” getting it. I’m not blaming Marz, and I don’t want this to be about authors. I’m just saying many books are never finished (and as an author, I know why!), and readers get irritated about this.
  • When the book is unfinished, it’s often of much less value. The whole is greater than the sum of the parts.
  • When the book is finished, you have to re-read it, which is a lot of wasted work, and figuring out what’s changed from versions you’ve already read is a big exercise too. To some extent, editions create a similar problem1. I think that successive editions of books are less likely to be bought and really read, unless there’s a clear signal that both the subject and the book have changed greatly. Unfortunately, most technical books are outdated before they’re even in print. Editions are a necessary evil to keep up with the changes in industry and practice.

I know that O’Reilly has tried to figure out how to address this, too, and I sent an email to my editor along the lines of this blog post.

I know this is a very one-sided opinion. I had a lengthy email exchange with LeanPub, for example. I know they, and a lot of others including likely readers of this blog, see things very differently than I do.

Still, I don’t think anyone has a great solution to the combination of problems created by static books written about a changing world. But early-access to unfinished books has always seemed to me like compounding the problems, not resolving them.

1 Rant: The classic counter-example for editions is math and calculus textbooks, which can charitably be described as a boondoggle. Calculus hasn’t changed much for generations, either in theory or practice. Yet new editions of two leading textbooks are churned out every couple of years. They offer slightly prettier graphics or newer instructions for a newer edition of the TI-something calculator – cosmetic differences. But mostly, they offer new homework sets, so students can’t buy and use the older editions, nor can they resell them for more than a small fraction of the purchase price. Oh, and because the homework is always changing, bugs in the homework problems are ever-present. It’s a complete ripoff. Fortunately, technical writers generally behave better than this. OK, rant over.

Categories: MySQL

Napkin math: How much waste does Celestial Seasonings save?

Xaprb, home of innotop - Sun, 2013-12-22 19:32

I was idly reading the Celestial Seasonings box today while I made tea. Here’s the end flap:

It seemed hard to believe that they really save 3.5 million pounds of waste just by not including that extra packaging, so I decided to do some back-of-the-napkin math.

How much paper is in each package of non-Celestial-Seasonings tea? The little bag is about 2 inches by 2 inches, it’s two-sided, and there’s a tag, staple, and string. Call it 10 square inches.

How heavy is the paper? It feels about the same weight as normal copy paper. Amazon.com lists a box of 5000 sheets of standard letter-sized paper at a shipping weight of 50 pounds (including the cardboard box, but we’ll ignore that). Pretend that each sheet (8.5 * 11 inches = 93.5 square inches) is about 100 square inches. That’s .0001 pounds per square inch.

How much tea does Celestial Seasonings sell every year? Wikipedia says their sales in the US are over $100M, and they are a subsidiary of Hain Celestial, which has a lot of other large brands. Hain’s sales last year were just under $500M. $100M is a good enough ballpark number. Each box of 20 tea bags sells at about $3.20 on their website, and I think it’s cheaper at my grocery store. Call it $3.00 per box, so we’ll estimate the volume of tea bags on the high side (to make up for the low-side estimate caused by pretending there’s 100 square inches per sheet of paper). That means they sell about 33.3M boxes, or 667M bags, of tea each year.

If they put bags, tags, and strings on all of them, I estimated 10 square inches of paper per bag, so at .0001 pound per square inch that’s .001 pound of extra paper and stuff per bag. That means they’d use about 667 thousand pounds of paper to bag up all that tea.

That’s quite a difference from the 3.5 million pounds of waste they claim they save. Did I do the math wrong or assume something wrong?

Categories: MySQL

Napkin math: How much waste does Celestial Seasonings save?

Xaprb, home of innotop - Sun, 2013-12-22 00:00

I was idly reading the Celestial Seasonings box today while I made tea. Here’s the end flap:

It seemed hard to believe that they really save 3.5 million pounds of waste just by not including that extra packaging, so I decided to do some back-of-the-napkin math.

How much paper is in each package of non-Celestial-Seasonings tea? The little bag is about 2 inches by 2 inches, it’s two-sided, and there’s a tag, staple, and string. Call it 10 square inches.

How heavy is the paper? It feels about the same weight as normal copy paper. Amazon.com lists a box of 5000 sheets of standard letter-sized paper at a shipping weight of 50 pounds (including the cardboard box, but we’ll ignore that). Pretend that each sheet (8.5 * 11 inches = 93.5 square inches) is about 100 square inches. That’s .0001 pounds per square inch.

How much tea does Celestial Seasonings sell every year? Wikipedia says their sales in the US are over $100M, and they are a subsidiary of Hain Celestial, which has a lot of other large brands. Hain’s sales last year were just under $500M. $100M is a good enough ballpark number. Each box of 20 tea bags sells at about $3.20 on their website, and I think it’s cheaper at my grocery store. Call it $3.00 per box, so we’ll estimate the volume of tea bags on the high side (to make up for the low-side estimate caused by pretending there’s 100 square inches per sheet of paper). That means they sell about 33.3M boxes, or 667M bags, of tea each year.

If they put bags, tags, and strings on all of them, I estimated 10 square inches of paper per bag, so at .0001 pound per square inch that’s .001 pound of extra paper and stuff per bag. That means they’d use about 667 thousand pounds of paper to bag up all that tea.

That’s quite a difference from the 3.5 million pounds of waste they claim they save. Did I do the math wrong or assume something wrong?

Categories: MySQL

Secure your accounts and devices

Xaprb, home of innotop - Wed, 2013-12-18 20:17

This is a public service announcement. Many people I know are not taking important steps necessary to secure their online accounts and devices (computers, cellphones) against malicious people and software. It’s a matter of time before something seriously harmful happens to them.

This blog post will urge you to use higher security than popular advice you’ll hear. It really, really, really is necessary to use strong measures to secure your digital life. The technology being used to attack you is very advanced, operates at a large scale, and you probably stand to lose much more than you realize.

You’re also likely not as good at being secure as you think you are. If you’re like most people, you don’t take some important precautions, and you overestimate the strength and effectiveness of security measures you do use. Password Security

The simplest and most effective way to dramatically boost your online security is use a password storage program, or password safe. You need to stop making passwords you can remember and make long, random passwords on websites. The only practical way to do this is to use a password safe.

Why? Because if you can remember the password, it’s trivially hackable. For example, passwords like 10qp29wo38ei47ru can be broken instantly. Anything you can feasibly remember is just too weak.

And, any rule you set for yourself that requires self-discipline will be violated, because you’re lazy. You need to make security easier so that you automatically do things more securely. A password safe is the best way to do that, by far. A good rule of thumb for most people is that you should not try to know your own passwords, except the password to your password safe. (People with the need to be hyper-secure will take extraordinary measures, but those aren’t practical or necessary for most of us.)

I use 1Password. Others I know of are LastPass and KeePass Password Safe. I personally wouldn’t use any others, because lesser-known ones are more likely to be malware.

It’s easy to share a password safe’s data across devices, and make a backup of it, by using a service such as Dropbox. The password safe’s files are encrypted, so the contents will not be at risk even if the file syncing service is compromised for some reason. (Use a strong password to encrypt your password safe!)

It’s important to note that online passwords are different from the password you use to log into your personal computer. Online passwords are much more exposed to brute-force, large-scale hacking attacks. By contrast, your laptop probably isn’t going to be subjected to a brute-force password cracking attack, because attackers usually need physical access to the computer to do that. This is not a reason to use a weak password for your computer; I’m just trying to illustrate how important it is to use really long, random passwords for websites and other online services, because they are frequent targets of brute-force attacks.

Here are some other important rules for password security.

  • Never use the same password in more than one service or login. If you do, someone who compromises it will be able to compromise other services you use.
  • Set your password generation program (likely part of your password safe) to make long, random passwords with numbers, special characters, and mixed case. I leave mine set to 20 characters by default. If a website won’t accept such a long password I’ll shorten it. For popular websites such as LinkedIn, Facebook, etc I use much longer passwords, 50 characters or more. They are such valuable attack targets that I’m paranoid.
  • Don’t use your web browser’s features for storing passwords and credit cards. Browsers themselves, and their password storage, are the target of many attacks.
  • Never write passwords down on paper, except once. The only paper copy of my passwords is the master password to my computer, password safe, and GPG key. These are in my bank’s safe deposit box, because if something happens to me I don’t want my family to be completely screwed. (I could write another blog post on the need for a will, power of attorney, advance medical directive, etc.)
  • Never treat any account online, no matter how trivial, as “not important enough for a secure password.”

That last item deserves a little story. Ten years ago I didn’t use a password safe, and I treated most websites casually. “Oh, this is just a discussion forum, I don’t care about it.” I used an easy-to-type password for such sites. I used the same one everywhere, and it was a common five-letter English word (not my name, if you’re guessing). Suddenly one day I realized that someone could guess this password easily, log in, change the password and in many cases the email address, and lock me out of my own account. They could then proceed to impersonate me, do illegal and harmful things in my name, etc. Worse, they could go find other places that I had accounts (easy to find — just search Google for my name or username!) and do the same things in many places. I scrambled to find and fix this problem. At the end of it, I realized I had created more than 300 accounts that could have been compromised. Needless to say, I was very, very lucky. My reputation, employment, credit rating, and even my status as a free citizen could have been taken away from me. Don’t let this happen to you! Use Two-Factor Auth

Two-factor authentication (aka 2-step login) is a much stronger mechanism for account security than a password alone. It uses a “second factor” (something you physically possess) in addition to the common “first factor” (something you know — a password) to verify that you are the person authorized to access the account.

Typically, the login process with two-factor authentication looks like this:

  • You enter your username and password.
  • The service sends a text message to your phone. The message contains a 6-digit number.
  • You must enter the number to finish logging in.

With two-factor auth in place, it is very difficult for malicious hackers to access your account, even if they know your password. Two-factor auth is way more secure than other tactics such as long passwords, but it doesn’t mean you shouldn’t also use a password safe and unique, random, non-memorized passwords.

Two-factor auth has a bunch of special ways to handle other common scenarios, such as devices that can’t display the dialog to ask for the 6-digit code, or what if you lose your cellphone, or what if you’re away from your own computer and don’t have your cellphone. Nonetheless, these edge cases are easy to handle. For example, you can get recovery codes for when you lose or don’t have your cellphone. You should store these — where else? — in your password safe.

There seems to be a perception that lots of people think two-factor auth is not convenient. I disagree. I’ve never found it inconvenient, and I use two-factor auth a lot. And I’ve never met these people, whoever they are, who think two-factor auth is such a high burden. The worst thing that happens to me is that I sometimes have to get out of my chair and get my phone from another room to log in.

Unfortunately, most websites don’t support two-factor authentication. Fortunately, many of the most popular and valuable services do, including Facebook, Google, Paypal, Dropbox, LinkedIn, Twitter, and most of the other services that you probably use which are most likely to get compromised. Here is a list of services with two-factor auth, with instructions on how to set it up for each one.

Please enable two-factor authentication if it is supported! I can’t tell you how many of my friends and family have had their Gmail, Facebook, Twitter, and other services compromised. Please don’t let this happen to you! It could do serious harm to you — worse than a stolen credit card. Secure Your Devices

Sooner or later someone is going to get access to one of your devices — tablet, phone, laptop, thumb drive. I’ve never had a phone or laptop lost or stolen myself, but it’s a matter of time. I’ve known a lot of people in this situation. One of my old bosses, for example, forgot a laptop in the seat pocket of an airplane, and someone took it and didn’t return it.

And how many times have you heard about some government worker leaving a laptop at the coffee shop and suddenly millions of people’s Social Security numbers are stolen?

Think about your phone. If someone stole my phone and it weren’t protected, they’d have access to a bunch of my accounts, contact lists, email, and a lot of other stuff I really, really do not want them messing with. If you’re in the majority of people who leave your phone completely unsecured, think about the consequences for a few minutes. Someone getting access to all the data and accounts on your phone could probably ruin your life for a long time if they wanted to.

All of this is easily preventable. Given that one or more of your devices will someday certainly end up in the hands of someone who may have bad intentions, I think it’s only prudent to take some basic measures:

  • Set the device to require a password, lock code, or pattern to be used to unlock it after it goes to sleep, when it’s idle for a bit, or when you first power it on. If someone steals your device, and can access it without entering your password, you’re well and truly screwed.
  • Use full-device encryption. If someone steals your device, for heaven’s sake don’t let them have access to your data. For Mac users, use File Vault under Preferences / Security and Privacy. Encrypt the whole drive, not just the home directory. On Windows, use TrueCrypt, and on Linux, you probably already know what you’re doing.
  • On Android tablets and phones, you can encrypt the entire device. You have to set up a screen lock code first.
  • If you use a thumb drive or external hard drive to transfer files between devices, encrypt it.
  • Encrypt your backup hard drives. Backups are one of the most common ways that data is stolen. (You have backups, right? I could write another entire blog post on backups. Three things are inevitable: death, taxes, and loss of data that you really care about.)
  • Use a service such as Prey Project to let you have at least some basic control over your device if it’s lost or stolen. If you’re using an Android device, set up Android Device Manager so you can track and control your device remotely. I don’t know if there’s anything similar for Apple devices.
  • Keep records of your devices’ make, model, serial number, and so on. Prey Project makes this easy.
  • On your phone or tablet, customize the lockscreen with a message such as “user@email.com – reward if found” and on your laptops, stick a small label inside the lid with your name and phone number. You never know if a nice person will return something to you. I know I would do it for you.
Things that don’t help

Finally, here are some techniques that aren’t as useful as you might have been told.

  • Changing passwords doesn’t significantly enhance security unless you change from an insecure password to a strong one. Changing passwords is most useful, in my opinion, when a service has already been compromised or potentially compromised. It’s possible on any given day that an attacker has gotten a list of encrypted passwords for a service, hasn’t yet been discovered, and hasn’t yet decrypted them, and that you’ll foil the attack by changing your password in the meanwhile, but this is such a vanishingly small chance that it’s not meaningful.
  • (OK, this ended up being a list of 1 thing. Tell me what else should go here.)
Summary

Here is a summary of the most valuable steps you can take to protect yourself:

  • Get a password safe, and use it for all of your accounts. Protect it with a long password. Make this the one password you memorize.
  • Use long (as long as possible), randomly generated passwords for all online accounts and services, and never reuse a password.
  • Use two-factor authentication for all services that support it.
  • Encrypt your hard drives, phones and tablets, and backups, and use a password or code to lock all computers, phones, tablets, etc when you turn them off, leave them idle, or put them to sleep.
  • Install something like Prey Project on your portable devices, and label them so nice people can return them to you.
  • Write down the location and access instructions (including passwords) for your password safe, computer, backup hard drives, etc and put it in a safe deposit box.

Friends try not to let friends get hacked and ruined. Don’t stop at upgrading your own security. Please tell your friends and family to do it, too!

Do you have any other suggestions? Please use the comments below to add your thoughts.

Categories: MySQL

Secure your accounts and devices

Xaprb, home of innotop - Wed, 2013-12-18 00:00

This is a public service announcement. Many people I know are not taking important steps necessary to secure their online accounts and devices (computers, cellphones) against malicious people and software. It’s a matter of time before something seriously harmful happens to them.

This blog post will urge you to use higher security than popular advice you’ll hear. It really, really, really is necessary to use strong measures to secure your digital life. The technology being used to attack you is very advanced, operates at a large scale, and you probably stand to lose much more than you realize.

You’re also likely not as good at being secure as you think you are. If you’re like most people, you don’t take some important precautions, and you overestimate the strength and effectiveness of security measures you do use.

Password Security

The simplest and most effective way to dramatically boost your online security is use a password storage program, or password safe. You need to stop making passwords you can remember and make long, random passwords on websites. The only practical way to do this is to use a password safe.

Why? Because if you can remember the password, it’s trivially hackable. For example, passwords like 10qp29wo38ei47ru can be broken instantly. Anything you can feasibly remember is just too weak.

And, any rule you set for yourself that requires self-discipline will be violated, because you’re lazy. You need to make security easier so that you automatically do things more securely. A password safe is the best way to do that, by far. A good rule of thumb for most people is that you should not try to know your own passwords, except the password to your password safe. (People with the need to be hyper-secure will take extraordinary measures, but those aren’t practical or necessary for most of us.)

I use 1Password. Others I know of are LastPass and KeePass Password Safe. I personally wouldn’t use any others, because lesser-known ones are more likely to be malware.

It’s easy to share a password safe’s data across devices, and make a backup of it, by using a service such as Dropbox. The password safe’s files are encrypted, so the contents will not be at risk even if the file syncing service is compromised for some reason. (Use a strong password to encrypt your password safe!)

It’s important to note that online passwords are different from the password you use to log into your personal computer. Online passwords are much more exposed to brute-force, large-scale hacking attacks. By contrast, your laptop probably isn’t going to be subjected to a brute-force password cracking attack, because attackers usually need physical access to the computer to do that. This is not a reason to use a weak password for your computer; I’m just trying to illustrate how important it is to use really long, random passwords for websites and other online services, because they are frequent targets of brute-force attacks.

Here are some other important rules for password security.

  • Never use the same password in more than one service or login. If you do, someone who compromises it will be able to compromise other services you use.
  • Set your password generation program (likely part of your password safe) to make long, random passwords with numbers, special characters, and mixed case. I leave mine set to 20 characters by default. If a website won’t accept such a long password I’ll shorten it. For popular websites such as LinkedIn, Facebook, etc I use much longer passwords, 50 characters or more. They are such valuable attack targets that I’m paranoid.
  • Don’t use your web browser’s features for storing passwords and credit cards. Browsers themselves, and their password storage, are the target of many attacks.
  • Never write passwords down on paper, except once. The only paper copy of my passwords is the master password to my computer, password safe, and GPG key. These are in my bank’s safe deposit box, because if something happens to me I don’t want my family to be completely screwed. (I could write another blog post on the need for a will, power of attorney, advance medical directive, etc.)
  • Never treat any account online, no matter how trivial, as “not important enough for a secure password.” That last item deserves a little story. Ten years ago I didn’t use a password safe, and I treated most websites casually. “Oh, this is just a discussion forum, I don’t care about it.” I used an easy-to-type password for such sites. I used the same one everywhere, and it was a common five-letter English word (not my name, if you’re guessing). Suddenly one day I realized that someone could guess this password easily, log in, change the password and in many cases the email address, and lock me out of my own account. They could then proceed to impersonate me, do illegal and harmful things in my name, etc. Worse, they could go find other places that I had accounts (easy to find – just search Google for my name or username!) and do the same things in many places. I scrambled to find and fix this problem. At the end of it, I realized I had created more than 300 accounts that could have been compromised. Needless to say, I was very, very lucky. My reputation, employment, credit rating, and even my status as a free citizen could have been taken away from me. Don’t let this happen to you!
Use Two-Factor Auth

Two-factor authentication (aka 2-step login) is a much stronger mechanism for account security than a password alone. It uses a “second factor” (something you physically possess) in addition to the common “first factor” (something you know – a password) to verify that you are the person authorized to access the account.

Typically, the login process with two-factor authentication looks like this:

  • You enter your username and password.
  • The service sends a text message to your phone. The message contains a 6-digit number.
  • You must enter the number to finish logging in. With two-factor auth in place, it is very difficult for malicious hackers to access your account, even if they know your password. Two-factor auth is way more secure than other tactics such as long passwords, but it doesn’t mean you shouldn’t also use a password safe and unique, random, non-memorized passwords.

Two-factor auth has a bunch of special ways to handle other common scenarios, such as devices that can’t display the dialog to ask for the 6-digit code, or what if you lose your cellphone, or what if you’re away from your own computer and don’t have your cellphone. Nonetheless, these edge cases are easy to handle. For example, you can get recovery codes for when you lose or don’t have your cellphone. You should store these – where else? – in your password safe.

There seems to be a perception that lots of people think two-factor auth is not convenient. I disagree. I’ve never found it inconvenient, and I use two-factor auth a lot. And I’ve never met these people, whoever they are, who think two-factor auth is such a high burden. The worst thing that happens to me is that I sometimes have to get out of my chair and get my phone from another room to log in.

Unfortunately, most websites don’t support two-factor authentication. Fortunately, many of the most popular and valuable services do, including Facebook, Google, Paypal, Dropbox, LinkedIn, Twitter, and most of the other services that you probably use which are most likely to get compromised. Here is a list of services with two-factor auth, with instructions on how to set it up for each one.

Please enable two-factor authentication if it is supported! I can’t tell you how many of my friends and family have had their Gmail, Facebook, Twitter, and other services compromised. Please don’t let this happen to you! It could do serious harm to you – worse than a stolen credit card.

Secure Your Devices

Sooner or later someone is going to get access to one of your devices – tablet, phone, laptop, thumb drive. I’ve never had a phone or laptop lost or stolen myself, but it’s a matter of time. I’ve known a lot of people in this situation. One of my old bosses, for example, forgot a laptop in the seat pocket of an airplane, and someone took it and didn’t return it.

And how many times have you heard about some government worker leaving a laptop at the coffee shop and suddenly millions of people’s Social Security numbers are stolen?

Think about your phone. If someone stole my phone and it weren’t protected, they’d have access to a bunch of my accounts, contact lists, email, and a lot of other stuff I really, really do not want them messing with. If you’re in the majority of people who leave your phone completely unsecured, think about the consequences for a few minutes. Someone getting access to all the data and accounts on your phone could probably ruin your life for a long time if they wanted to.

All of this is easily preventable. Given that one or more of your devices will someday certainly end up in the hands of someone who may have bad intentions, I think it’s only prudent to take some basic measures:

  • Set the device to require a password, lock code, or pattern to be used to unlock it after it goes to sleep, when it’s idle for a bit, or when you first power it on. If someone steals your device, and can access it without entering your password, you’re well and truly screwed.
  • Use full-device encryption. If someone steals your device, for heaven’s sake don’t let them have access to your data. For Mac users, use File Vault under Preferences / Security and Privacy. Encrypt the whole drive, not just the home directory. On Windows, use TrueCrypt, and on Linux, you probably already know what you’re doing.
  • On Android tablets and phones, you can encrypt the entire device. You have to set up a screen lock code first.
  • If you use a thumb drive or external hard drive to transfer files between devices, encrypt it.
  • Encrypt your backup hard drives. Backups are one of the most common ways that data is stolen. (You have backups, right? I could write another entire blog post on backups. Three things are inevitable: death, taxes, and loss of data that you really care about.)
  • Use a service such as Prey Project to let you have at least some basic control over your device if it’s lost or stolen. Android phones now have the Android Device Manager and Google Location History, but you have to enable these.
  • Keep records of your devices’ make, model, serial number, and so on. Prey Project makes this easy.
  • On your phone or tablet, customize the lockscreen with a message such as “user@email.com – reward if found” and on your laptops, stick a small label inside the lid with your name and phone number. You never know if a nice person will return something to you. I know I would do it for you.
External Links and Resources Things that don’t help

Finally, here are some techniques that aren’t as useful as you might have been told.

  • Changing passwords doesn’t significantly enhance security unless you change from an insecure password to a strong one. Changing passwords is most useful, in my opinion, when a service has already been compromised or potentially compromised. It’s possible on any given day that an attacker has gotten a list of encrypted passwords for a service, hasn’t yet been discovered, and hasn’t yet decrypted them, and that you’ll foil the attack by changing your password in the meanwhile, but this is such a vanishingly small chance that it’s not meaningful.
  • (OK, this ended up being a list of 1 thing. Tell me what else should go here.)
Summary

Here is a summary of the most valuable steps you can take to protect yourself:

  • Get a password safe, and use it for all of your accounts. Protect it with a long password. Make this the one password you memorize.
  • Use long (as long as possible), randomly generated passwords for all online accounts and services, and never reuse a password.
  • Use two-factor authentication for all services that support it.
  • Encrypt your hard drives, phones and tablets, and backups, and use a password or code to lock all computers, phones, tablets, etc when you turn them off, leave them idle, or put them to sleep.
  • Install something like Prey Project on your portable devices, and label them so nice people can return them to you.
  • Write down the location and access instructions (including passwords) for your password safe, computer, backup hard drives, etc and put it in a safe deposit box. Friends try not to let friends get hacked and ruined. Don’t stop at upgrading your own security. Please tell your friends and family to do it, too!

Do you have any other suggestions? Please use the comments below to add your thoughts.

Categories: MySQL

How is the MariaDB Knowledge Base licensed?

Xaprb, home of innotop - Mon, 2013-12-16 22:37

I clicked around for a few moments but didn’t immediately see a license mentioned for the MariaDB knowledgebase. As far as I know, the MySQL documentation is not licensed in a way that would allow copying or derivative works, but at least some of the MariaDB Knowledge Base seems to be pretty similar to the corresponding MySQL documentation. See for example LOAD DATA LOCAL INFILE: MariaDB, MySQL.

Oracle’s MySQL documentation has a licensing notice that states:

You may create a printed copy of this documentation solely for your own personal use. Conversion to other formats is allowed as long as the actual content is not altered or edited in any way. You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Oracle disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium. Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.

Can someone clarify the situation?

Categories: MySQL

How is the MariaDB Knowledge Base licensed?

Xaprb, home of innotop - Mon, 2013-12-16 00:00

I clicked around for a few moments but didn’t immediately see a license mentioned for the MariaDB knowledgebase. As far as I know, the MySQL documentation is not licensed in a way that would allow copying or derivative works, but at least some of the MariaDB Knowledge Base seems to be pretty similar to the corresponding MySQL documentation. See for example LOAD DATA LOCAL INFILE: MariaDB, MySQL.

Oracle’s MySQL documentation has a licensing notice that states:

You may create a printed copy of this documentation solely for your own personal use. Conversion to other formats is allowed as long as the actual content is not altered or edited in any way. You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Oracle disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium. Any other use, such as any dissemination of printed copies or use of this documentation, in whole or in part, in another publication, requires the prior written consent from an authorized representative of Oracle. Oracle and/or its affiliates reserve any and all rights to this documentation not expressly granted above.

Can someone clarify the situation?

Categories: MySQL

Props to the MySQL Community Team

Xaprb, home of innotop - Sat, 2013-12-07 21:02

Enough negativity sometimes gets slung around that it’s easy to forget how much good is going on. I want to give a public thumbs-up to the great job the MySQL community team, especially Morgan Tocker, is doing. I don’t remember ever having so much good interaction with this team, not even in the “good old days”:

  • Advance notice of things they’re thinking about doing (deprecating, changing, adding, etc)
  • Heads-up via private emails about news and upcoming things of interest (new features, upcoming announcements that aren’t public yet, etc)
  • Solicitation of opinion on proposals that are being floated internally (do you use this feature, would it hurt you if we removed this option, do you care about this legacy behavior we’re thinking about sanitizing)

I don’t know who or what has made this change happen, but it’s really welcome. I know Oracle is a giant company with all sorts of legal and regulatory hoops to jump through, for things that seem like they ought to be obviously the right thing to do in an open-source community. I had thought we were not going to get this kind of interaction from them, but happily I was wrong.

(At the same time, I still wish for more public bug reports and test cases; I believe those things are really in everyone’s best interests, both short- and long-term.)

Categories: MySQL

S**t sales engineers say

Xaprb, home of innotop - Sat, 2013-12-07 20:51

Here’s a trip down memory lane. I was just cleaning out some stuff and I found some notes I took from a hilarious MySQL seminar a few years back. I won’t say when or where, to protect the guilty.[1]

I found it so absurd that I had to write down what I was witnessing. Enough time has passed that we can probably all laugh about this now. Times and people have changed.

The seminar was a sales pitch in disguise, of course. The speakers were singing Powerpoint Karaoke to slides real tech people had written. Every now and then, when they advanced a slide, they must have had a panicked moment. “I don’t remember this slide at all!” they must have been thinking. So they’d mumble something really funny and trying-too-hard-to-be-casual about “oh, yeah, [insert topic here] but you all already know this, I won’t bore you with the details [advance slide hastily].” It’s strange how transparent that is to the audience.

Here are some of the things the sales “engineers” said during this seminar, in response to audience questions:

  • Q. How does auto-increment work in replication? A: On slaves, you have to ALTER TABLE to remove auto-increment because only one table in a cluster can be auto-increment. When you switch replication to a different master you have to ALTER TABLE on all servers in the whole cluster to add/remove auto-increment. (This lie was told early in the day. Each successive person who took a turn presenting built upon it instead of correcting it. I’m not sure whether this was admirable teamwork or cowardly face-saving.)
  • Q. Does InnoDB’s log grow forever? A: Yes. You have to back up, delete, and restore your database if you want to shrink it.
  • Q. What size sort buffer should I have? A: 128MB is the suggested starting point. You want this sucker to be BIG.

There was more, but that’s enough for a chuckle. Note to sales engineers everywhere: beware the guy in the front row scribbling notes and grinning.

What are your best memories of worst sales engineer moments?

1. For the avoidance of doubt, it was NOT any of the trainers, support staff, consultants, or otherwise anyone prominently visible to the community. Nor was it anyone else whose name I’ve mentioned before. I doubt any readers of this blog, except for former MySQL AB employees (pre-Sun), would have ever heard of these people. I had to think hard to remember who those names belonged to.

Categories: MySQL

Props to the MySQL Community Team

Xaprb, home of innotop - Sat, 2013-12-07 00:00

Enough negativity sometimes gets slung around that it’s easy to forget how much good is going on. I want to give a public thumbs-up to the great job the MySQL community team, especially Morgan Tocker, is doing. I don’t remember ever having so much good interaction with this team, not even in the “good old days”:

  • Advance notice of things they’re thinking about doing (deprecating, changing, adding, etc)
  • Heads-up via private emails about news and upcoming things of interest (new features, upcoming announcements that aren’t public yet, etc)
  • Solicitation of opinion on proposals that are being floated internally (do you use this feature, would it hurt you if we removed this option, do you care about this legacy behavior we’re thinking about sanitizing) I don’t know who or what has made this change happen, but it’s really welcome. I know Oracle is a giant company with all sorts of legal and regulatory hoops to jump through, for things that seem like they ought to be obviously the right thing to do in an open-source community. I had thought we were not going to get this kind of interaction from them, but happily I was wrong.

(At the same time, I still wish for more public bug reports and test cases; I believe those things are really in everyone’s best interests, both short- and long-term.)

Categories: MySQL

S**t sales engineers say

Xaprb, home of innotop - Sat, 2013-12-07 00:00

Here’s a trip down memory lane. I was just cleaning out some stuff and I found some notes I took from a hilarious MySQL seminar a few years back. I won’t say when or where, to protect the guilty.[1]

I found it so absurd that I had to write down what I was witnessing. Enough time has passed that we can probably all laugh about this now. Times and people have changed.

The seminar was a sales pitch in disguise, of course. The speakers were singing Powerpoint Karaoke to slides real tech people had written. Every now and then, when they advanced a slide, they must have had a panicked moment. “I don’t remember this slide at all!” they must have been thinking. So they’d mumble something really funny and trying-too-hard-to-be-casual about “oh, yeah, [insert topic here] but you all already know this, I won’t bore you with the details [advance slide hastily].” It’s strange how transparent that is to the audience.

Here are some of the things the sales “engineers” said during this seminar, in response to audience questions:

  • Q. How does auto-increment work in replication? A: On slaves, you have to ALTER TABLE to remove auto-increment because only one table in a cluster can be auto-increment. When you switch replication to a different master you have to ALTER TABLE on all servers in the whole cluster to add/remove auto-increment. (This lie was told early in the day. Each successive person who took a turn presenting built upon it instead of correcting it. I’m not sure whether this was admirable teamwork or cowardly face-saving.)
  • Q. Does InnoDB’s log grow forever? A: Yes. You have to back up, delete, and restore your database if you want to shrink it.
  • Q. What size sort buffer should I have? A: 128MB is the suggested starting point. You want this sucker to be BIG.

There was more, but that’s enough for a chuckle. Note to sales engineers everywhere: beware the guy in the front row scribbling notes and grinning.

What are your best memories of worst sales engineer moments?

1. For the avoidance of doubt, it was NOT any of the trainers, support staff, consultants, or otherwise anyone prominently visible to the community. Nor was it anyone else whose name I’ve mentioned before. I doubt any readers of this blog, except for former MySQL AB employees (pre-Sun), would have ever heard of these people. I had to think hard to remember who those names belonged to.

Categories: MySQL

EXPLAIN UPDATE in MySQL 5.6

Xaprb, home of innotop - Tue, 2013-11-26 21:35

I just tried out EXPLAIN UPDATE in MySQL 5.6 and found unexpected results. This query has no usable index: EXPLAIN UPDATE ... WHERE col1 = 9 AND col2 = 'something'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: index possible_keys: NULL key: PRIMARY key_len: 55 ref: NULL rows: 51 Extra: Using where

The EXPLAIN output makes it seem like a perfectly fine query, but it’s a full table scan. If I do the old trick of rewriting it to a SELECT I see that: *************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 51 Extra: Using where

Should I file this as a bug? It seems like one to me.

Categories: MySQL

EXPLAIN UPDATE in MySQL 5.6

Xaprb, home of innotop - Tue, 2013-11-26 00:00

I just tried out EXPLAIN UPDATE in MySQL 5.6 and found unexpected results. This query has no usable index:

EXPLAIN UPDATE ... WHERE col1 = 9 AND col2 = 'something'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: index possible_keys: NULL key: PRIMARY key_len: 55 ref: NULL rows: 51 Extra: Using where

The EXPLAIN output makes it seem like a perfectly fine query, but it’s a full table scan. If I do the old trick of rewriting it to a SELECT I see that:

*************************** 1. row *************************** id: 1 select_type: SIMPLE table: foo type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 51 Extra: Using where

Should I file this as a bug? It seems like one to me.

Categories: MySQL

Freeing some Velocity videos

Xaprb, home of innotop - Sat, 2013-11-09 17:51

Following my previous post on Velocity videos, I had some private email conversations with good folks at O’Reilly, and a really nice in-person exchange with a top-level person as well. I was surprised to hear them encourage me to publish my videos online freely!

I still believe that nothing substitutes for the experience of attending an O’Reilly conference in-person, but I’ll also be the first to admit that my talks are usually more conceptual and academic than practical, and designed to start a conversation rather than to tell you the Truth According To Baron. Thus, I think they’re worth sharing more widely.

O’Reilly alleviated my concerns about “killing the golden goose,” but I like one person’s take on the cost of O’Reilly’s conferences. “You think education is expensive? Try ignorance.”

I’ll post some of my past talks soon for your enjoyment.

Categories: MySQL

Freeing some Velocity videos

Xaprb, home of innotop - Sat, 2013-11-09 00:00

Following my previous post on Velocity videos, I had some private email conversations with good folks at O’Reilly, and a really nice in-person exchange with a top-level person as well. I was surprised to hear them encourage me to publish my videos online freely!

I still believe that nothing substitutes for the experience of attending an O’Reilly conference in-person, but I’ll also be the first to admit that my talks are usually more conceptual and academic than practical, and designed to start a conversation rather than to tell you the Truth According To Baron. Thus, I think they’re worth sharing more widely.

O’Reilly alleviated my concerns about “killing the golden goose,” but I like one person’s take on the cost of O’Reilly’s conferences. “You think education is expensive? Try ignorance.”

I’ll post some of my past talks soon for your enjoyment.

Categories: MySQL
Syndicate content