Follow on Facebook Follow on Twitter Follow on Google buzz RSS

12.12.2008

Causes of Hard Drive Failures

An article in the Proceedings of the 5th USENIX Conference on File and
Storage Technologies this month offers perhaps the most in-depth study
of hard drive failures to date. Google uses hundreds of thousands of
hard drives to store its data, and a sample of one hundred thousand of
Google's drives was studied for five years to determine common causes of
failure. Since this very interesting article is a little dense to read
in its entirety, I thought you would enjoy reading some highlights.
Going against conventional thought, the study determined that increased
temperature and/or activity had little or no correlation to failure
rate. By extension, it was found that drives spinning up and spinning
down most often had the highest failure rates. This means it's best to
uncheck the "Put the hard disk(s) to sleep when possible" box in Energy
Saver — at least in terms of hard drive health. Some SMART (self
monitoring and reporting technology) parameters are excellent indicators
of impending mechanical failure. Among failed drives, a good chunk gave
no warning by SMART, even though SMART- monitored parameters were to
blame for failure. For this reason, SMART is most useful as a
statistical predictor of failure for a population of drives rather than
on individual devices. With that in mind, if your drive reports SMART
errors you should at the very least immediately perform a full backup.
About 3% of drives failed in the first three months, 1.8% in the first 6
months, 1.7% in the first year. From there, failure rates jump to
approximately 8% in the second year, 9% in the third year, fall to 6% in
the fourth year, and jump back to 7% in the fifth year. The whole
article can be read at: <http://labs.google.com/papers/
disk_failures.pdf> ~ Matt Klein Small Dog Electronics


Andrew Webb
-------------------------------------------
blogging just like everyone else at





Share