1825771 Members
2020 Online
109687 Solutions
New Discussion

harddisk failures

 
SOLVED
Go to solution
deheugden
Frequent Advisor

harddisk failures

i got a customer with a dl380 g7. THe server is less than a year old. I isntalled the os on a mirror. Since the purchase of the server, twice one of the disks already failed. According to hp support due to multiple write errors.

I never noticed a hp server to fail twice within a year.

Anyone a suggestion what can be the cause of this?

Maybe i should change the mirror into a raid 5 (iff possible withou data lost, dont want to reinstall the os again).

1 REPLY 1
Matti_Kurkela
Honored Contributor
Solution

Re: harddisk failures

You should remember that a huge flood in Thailand destroyed a significant part of world-wide hard disk manufacturing capacity last year.

(Wikipedia says Thailand accounts for 25% of the total world-wide hard disk manufacturing capacity.)

 

I expect that the other hard disk factories around the world have been running at absolute maximum capacity since then. That might have caused some product quality issues here and there. The same would be true in the factories that are restored/rebuilt in a big hurry to get into full production mode.

 

RAID 5 is not necessarily more fault-tolerant than a mirror - in fact, it is likely to be less so.

If you have a total of 6 disks, with mirroring you can have up to 3 of them fail without losing data, if each failing disk happens to be in a different mirror pair. But in a 6-disk RAID 5, your data will be lost when any 2 disks fail.

 

For a given number of disks, RAID 5 provides more storage space than mirroring.

 

The lessons you should take away from this:

1.) RAID is not a backup. A backup is something that is on a separate storage media that is not being used all the time, so it will have a smaller risk of damage. Ideally, at least some of your backups should be in a separate physical location, so that they will not be lost along with the original system if there is a fire or other disaster.

2.) You should monitor your RAID sets, so that you will receive some form of an alert when one of your disks fail. If the server is critical, you might even want to have a spare disk or two at hand, so that you can replace a disk as soon as it fails. If there is a free disk slot, consider installing a hot spare.

3.) Hard disks are mechanical devices, and thus subject to wear and tear. So it is certain that they will eventually fail: this is a fact of life for all system administrators.

 

 

MK