Monday, November 23, 2009

On RAID types, type classes, and reducing risk of catastrophic failure

RAID solutions like RAID 5 and RAID 6 are intended to improve data protection* while sacrificing less capacity than things like RAID 1.

RAID 5 allows you to put N+1 drives into a system, and get N drives' worth of capacity out. Meanwhile, you can lose a drive and still not necessarily lose your data. RAID 6 allows you to put N+2 drives into a system, and get N drives' worth of capacity out. Meanwhile, you can lose two drives out of the system and not necessarily lose your data.

The problem with RAID is that when you're down to N functioning drives, if you lose one more drive, you're in for a massive world of trouble; Good luck getting your data back. If you can even perceive your filesystem, you're going to be missing 1/N of just about each file. And likely not in a contiguous chunk.

So when you lose a drive out of a RAID system, you put one in, fast, and rebuild the array. Rebuilding the array populates the drive with what was in the old drive that went missing.** Once the rebuild is finished, and you're back up to N+1 drives (for RAID 5) or N+2 drives (for RAID 6), then everything should be back to normal; You just survived a drive failure (or two) without losing data.

The problem is that this rebuild process is hell on the drives; It involves a lot of reading of data from the remaining drives, in addition to their existing live load, to rebuild the data to put on the newly re-added drive. It's not unknown to have an additional drive failure during the rebuild period.

Part of the problem is that most of the drives in a fresh RAID setup will be new, which means that after one or two of the original drives have failed, the rest may not be far behind, which drives up the likelyhood of a failure during the rebuild.

So what if one were to induce a drive to fail earlier? I don't mean a time-decided or simulated failure, I mean a physical, time-unknown failure. Say, for example, that when setting up a new RAID, you put the drives through a burn-in period where you intentionally induce a high level of wear, such as by performing a nasty mix of random writes and random reads, inducing spindowns and spinups, etc.

Burn-in periods are already used in some places; They help weed out drives that are prone to early failure. However, if you give each of the drives in your array a different length of burn-in time, then you've reduced each drive's likely lifetime by a different amount, ideally by an exponentiated difference. That, in turn, means that if the drive with the longest burn-in period is the first in the array to fail, then the next drive to fail may be less likely to fail during the rebuild. Given enough of a difference in reduction of expected lifetime, one may even be able to procure something of a safety margin.

The sacrifice, of course, is that you're intentionally reducing the lifetime of your component drives, which means you put out more money in equipment replacement, and you rebuild your array more frequently.

The question is, is that additional equipment replacement cost and rebuild frequency sufficiently offset by the reduction in the likelyhood of having a drive failure reduce you to less than N working drives?

Some other thoughts:

RAID 0 is simple striping. You put N drives in, you get N drives' worth of capacity out, you get faster read times, and if you lose a drive, you've essentially lost all your data.

RAID 5 is similar to RAID 0 in that it uses striping, but an entire drive's worth of error-correction data is spread across all your disks so that if you lose a drive, you can retain your data. That means you get N drives worth of capacity for a system with N+1 drives.

RAID 6 is like RAID 5, but it uses a second drive's worth of data for error correction. You get N drives' worth of data for an array with N+2 drives, and you can lose 2 drives and still retain your data.

In all three of these cases, if you drop below N drives, you're pretty much hosed.

A second recap, more terse:
  • RAID 0: N drives, N drives capacity. Any drive loss means failure of the array.

  • RAID 5: N+1 drives, N drives capacity. Losing more than 1 drive means failure of the array.

  • RAID 6: N+2 drives, N drives capacity. Losing more than 2 drives means failure of the array.


Hopefully, you can see the abstraction I'm trying to point out.

Let's call RAID 0, 5 and 6 members of the same class of RAID array types, and note that for any*** array with N+x drives in the array, the array can withstand the loss of x drives before total failure.

In RAID 0, x is 0. In RAID 5, x is 1. In RAID 6, x is 2. It seems obvious that configurations are possible and practicable for the functionality of this class of RAID types where x may be greater than 2.

I would assume there's going to be a sacrifice in throughput performance as x increases, due to the calculation(writes) and verification(reads) of the error data. Just the same, the potential to increase x leads to the potential to increase N while reducing the additional risk that each increment of N brings.

That means an increase in the (acceptably un-)safe volume size with component drives below the maximum available, meaning component drives which aren't going to be the most expensive on the market. Likewise, as the data density of component drives reach an inevitable**** cap due to the laws of physics, one can select drive types with more weight given to component drive reliability.

* Yes! I know! RAID is not a backup solution. Now that that kneejerk reaction is out of the way...
** The information required to generate that data already exists on the other drives, assuming you haven't dropped below N active drives. Likewise, if one of those other drives were to die, the information on this drive can be used, in concert with the other remaining drives, to regenerate that drive's data.
*** OK, there's still a matter of the array being in a consistent state.
**** I'm not saying we're there yet. I'm not saying we'll be there within the next fifty years. I'm just saying it has to happen eventually.

No comments:

Post a Comment