Musings About RAID

I have been reading about RAID. I am no expert and after hours of reading, remain as much. While all bare metal servers at where I work use RAID, my experience at that level does not provide me significant data that people working in large data centers would have. In all though I think I can offer some back yard observations.

Hard drives fail.

The “R” in the acronym RAID means redundant. RAID serves only one purpose — avoiding down time and providing continuity after a hard drive failure. Understanding that sole purpose renders the topic much simpler.

Arrays don’t fail — hard drives fail. A single disk failure means only that a disk failed. The array keeps running and there is continuity. The array must be rebuilt but the array did not fail.

Don’t be distracted by arguments about wasted space or disk space efficiency. The purpose of RAID is redundancy and continuity — there is no waste.

Rebuild times are inversely related to the number of disks in the array. All things considered, that means RAID 1 takes the longest to rebuild. Using many smaller capacity drives in parity checking arrays means shorter rebuild times, but larger hard drives means longer rebuild times regardless of the RAID configuration and number of drives.

Don’t fret about rebuild times. A rebuild means the RAID strategy was successful at providing continuity.

RAID is not a backup solution.

A failed rebuild means a loss of continuity. Backups are the only remedy for restoring data after a failed rebuild.

Before starting a rebuild, ensure backups are in place.

Hot spares are acceptable if a backup schedule is in place with dependable backups.

RAID does not validate data content. RAID with parity only checks patterns. With a failed disk the parity checking cannot decide what is data and what is not data.

SMART (Self-Monitoring Analysis and Reporting Technology) helps detect hardware failures but does not detect data corruption.

Modern hard disk firmware has built-in read-write validation checking. This is where data integrity checking occurs. These checks work quite well with very low error rates — otherwise hard drives would not be popular or dependable.

Each RAID strategy has advantages and disadvantages. Each strategy is filled with compromises and trade-offs. There is no one-size-fits-all solution.

RAID does not protect against human error or malicious data destruction.

Conclusion?

Don’t get sidetracked by lengthy debates and discussions about RAID pitfalls. Continuity is the name of the game.

Posted: Category: Usability Tagged: General

Next: Software Updates and Breakage

Previous: Dual Network Cards