Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

Spontaneous Drive Rebuilds!

Chai Milburn
Occasional Visitor

Spontaneous Drive Rebuilds!

Hello everyone. If my question has been asked/answered before please point me to poster/date. This is my first time on this forum and I apologize for not searching the entire archives before posting but I do not have the time.
I am curious if anyone has experienced random drive failures on their Storage/12 RAID. The odd part is that the drive is actually fine and will rebuild upon itself. I have happened to arrive in the morning and catch the server rebuilding the drive. Upon checking the event viewer it shows that there were write errors for about an hour then it initiates a rebuild. So far this has occurred 3 times within a 2 month period. We have already replaced the backplane thinking that the connections were bad, but the event happened again. All of the firmware has been upgraded on both the controllers and hard drives. It is not reoccurring on the same drive either. Has anyone experienced this type of hardware error?
We have replaced the "bad" disks prior to this and the same thing happens. Im not sure where to look now. Any ideas?
Thank you
Chai Milburn

Netserver LH3R
Dual 500mhz
512 MB RAM
NT4 sp6a
bios 7/29/99

HP NetServer Rack Storage/12
12 disk: 36GB/disk

NetRaid 4M Controller 3.3
Bios 2.4-1 build 4607
Ultra 3 160mb
Public opinion is a weak tyrant compared with our own private opinion. What a man thinks of himself,
8 REPLIES
Marino Meloni_1
Honored Contributor

Re: Spontaneous Drive Rebuilds!

hi
first check in the disk event log if there are time out error or media errors,
if it is not a media error, check for last bios on the hdd and also the vendor.
The 4M is a hight level netraid controller, and perform some task alone, but have some rules to follow, you need to have the last bios on it, but also the last drivers and the last version of Fast utility
bye

marino
Chai Milburn
Occasional Visitor

Re: Spontaneous Drive Rebuilds!

Hmm we have upgraded both the controller and hard drive to the current firmware. The newest hard drive firmware is only required if you use Seagate harddrives, and after verifiying, our drives are manufactured by IBM.

Currently our firmware is:

IBM HDDs: D94N

FAST: 2.4-1 # 4605
Host Driver: 2.4-0 #4592

4M Controller: 2.4-1 #4607

OK this is odd. Am is this correct? I am using FAST and viewing with Controller view.
The FAST utility is showing build # 4605, the Host driver version is showing 2.4-0 and the build # 4592. Is this right? Should all of the versions match? Could you look at your controller view and see if the versions are different?

Chai
Public opinion is a weak tyrant compared with our own private opinion. What a man thinks of himself,
Vincent Farrugia
Honored Contributor

Re: Spontaneous Drive Rebuilds!

Hello,

Your problem may reside in one of your controllers. The NVRAM in it may be faulty, thus losing information of the logical-to-physical maps. If this is the case, the box may think that one disk is faulty.

I would try to work with only one controller. If the symptoms remain, then use the other controller. If the symptoms remain, then I don't know what else to do. This is quite a strange problem actually.

If the problem seems to be solved, get a new controller and install it in the empty slot.

HTH,
Vince
Tape Drives RULE!!!
Marco Hogeveen
Honored Contributor

Re: Spontaneous Drive Rebuilds!

The versions on the 4M are OK.
Look at http://netserver.hp.com/support/hot_news/lpn10661.asp for details.
The netraid 4M is capable of running Ultra-3 SCSI (160MB/sec). Is your RS12 also capable of Ultra-3 (is it a D5989C type?).
Are the SCSI cables in the RS12 flat-cables or twisted-pair?
You should use twisted-pair cables when connecting to an Ultra-3 controller.
The partnumer is D6025-63007.
Also be sure that there are no errors on the harddisks.

Marco.
Chai Milburn
Occasional Visitor

Re: Spontaneous Drive Rebuilds!

Thank you all for your responses! I will see about trying a new controller card. This error happens to both of our NetRAID racks. I would doubt that the controllers in both systems are bad but we could have picked from a corrupted batch of 4m controllers. The only other thing is that our cables could be bad. Based on Marco's response we should be using D6025-63007 cable. We are currently using D6020A cables. I checked online and saw that these cables were for Netserver/ NetRAID configurations though...

Our NetRAID is DC5989C so it does handle Ultra 3.

The cables are Twisted Pair D6020A.

I found the error was mentioned in the newest Configuration & Upgrade guide. Apparently version 3 of the firmware/software fixes this.

Very weird situation
Public opinion is a weak tyrant compared with our own private opinion. What a man thinks of himself,
Marco Hogeveen
Honored Contributor

Re: Spontaneous Drive Rebuilds!

Chai,

The D6025-63007 are the INTERNAL SCSI cables in the RS12 (from midplane to hotswap backplanes), the D6020A is the external cable (from netraid to RS12).
PLease check the internal cables to make sure they're twisted-pair.
- Remove both power supplies first
- Unscrew the thumbscrew on top of the RS12
- Remove top-cover
- The twisted-pair cables should be black/white or orange/white, if they're ordinary flat cables, they are yellow.

Marco.
Marino Meloni_1
Honored Contributor

Re: Spontaneous Drive Rebuilds!

hi Chai

to check is the problem is related to the U3 communication, you can speed down the scsi controller on the external channel to U2 and check if the problem appear again, then , it is efectively related to some cabling problem.

to check if the problem is realted to hdd problem, you can run the IBM suspect utility which can identify potential defective disk

all the sw, drivers, fw are effectively up to date as they tell you

You can check also here, you will found a note of end of november about some problems with 4M http://netserver.hp.com/support/hot_news/hot_news.asp

bye

marino
William Kelley
Occasional Visitor

Re: Spontaneous Drive Rebuilds!

I had a similar issue with IBM drives failing, but then verifying ok after a failure. HP support should be able to provide you a IBM suspect drive utility to test all the IBM drives in your array and then replace any bad drives you have.