ProLiant Servers - Netservers
1848046 Members
3571 Online
104022 Solutions
New Discussion

Re: LC2000r Raid problem

 
CA1067401
Occasional Advisor

LC2000r Raid problem

I have a HP LC2000r server with RAID 5 (NetRAID-1Si) running on 3 18.2GB HP drives. One of my drives has failed and shows zero capacity in Express Tools. The server is still running with the RAID alarm ringing. So I called HP and ordered a replacement drive. The old drive had a replacement part number of P1166-63001. I was told on the phone that exact drive was unavailable and a drive with part number P1166-63003 could be used it its place. This drive has arrived and I pulled the old drive and plugged the new one in. Since this RAID is hot swappable I was under the impression I could pull the old drive and replace the new one. It has been about four hours with no change to the alarm status or red error light on the new drive. Does it take a long time for the RAID to rebuild itself? IS this the right drive? Do I have to do anything manually 9Express Tools) before I can plug the drive in? Thanks for any response.
9 REPLIES 9
Mark Young_2
Trusted Contributor

Re: LC2000r Raid problem

Hi,

Yes, the drive is a good replacement. Occassionaly, even with hotswap servers, you will have to reboot the server for the drive to rescan properly. If this doesn't do the trick, go to Objects->Physical in Express tools and press space to highlight the FAILED drive and then hit ENTER. You should get an option to choose REBUILD. Try it.

If you still can't get a rebuild to start, post back here or call tech support again if time is an issue.

One other thing: When you try the manual rebuild from within NetRAID Express tools, pay close attention. It is possible that the rebuild is starting but failing right away. If that is the case then you have a whole other issue.

Mark
CA1067401
Occasional Advisor

Re: LC2000r Raid problem

Thanks for the quick response. I will try what you suggested once I get a chance to bring down the server. Should be in the next 24 hours. I will post the results either way.

George
CA1067401
Occasional Advisor

Re: LC2000r Raid problem

Raid problem update:

So I tried rebooting the server and that did not work. Then I tried a manual rebuild in Express Tools. It appears to start successfully but after the progress indicator gets to 3% I get an ERROR on the indicator bar with not further description. I think this is what you (Mark) referred to in you post. So what is the "whole other issue"?

George
Mark Young_2
Trusted Contributor

Re: LC2000r Raid problem

Hi,

Have you ever done consistency checks on your array? If not, how long has it been up and running before this problem.

A consistency check should be done at least once a month on a RAID array in order to maintain the parity data integrity. If you don't do the checks, over time you begin to lose array integrity and you will get a degradation of the array. Eventually you will get to the point where the rebuild will fail. If your array is failing at 3% every time, then it is a very good indication that this is what happened to you.

You have a couple of options, but they are limited. Your main option is to backup all of your data, clear the configuration on the array and recreate it with an initialize. Then you can restore your date after an OS install if needed. From this point you will want to maintain a schedule of regular consistency checks. Your 2nd option is to try a consistency check now. It is never recommended to do one if you never have before, as it can cause data corruption. However, since your other option is to reconfigure your array, it doesn't hurt to try. One of two things could happen: You will complete the check and be able to rebuild the drive or you will further corrupt the data. The worst thing that can happen at this point is you will have to go to option 1. It is definetly worth a shot.

I've seen some rare cases where the problem was caused by other things. With that in mind you may want to put the drive in a different slot and try the rebuild. Also you could be sure that your firmware on the HDD's, NetRAID controller as well as your NetRAID driver and system BIOS are all current. Again, these things might not hurt to try but usually my first solution is the only one that ends up working. I wish you luck.

Mark

CA1067401
Occasional Advisor

Re: LC2000r Raid problem

Thank you. I will give these suggestions a try. We do weekly system backups and nighly data backups, so recovering from backup should be an option.

FYI: This server is running Netware 5.1 and for the most part functions as a file server, but it does hold the master replica for NDS. I will let you know how it all turns out. The server has been running for three years and I have NEVER done a consistency check as I was unaware they were required. I better go check my other RAID servers.

George
CA1067401
Occasional Advisor

Re: LC2000r Raid problem

RAID Update:

So I went ahead and cleared the configuration on the array and recreate it. All went well. Thanks for the help.

George
Mark Young_2
Trusted Contributor

Re: LC2000r Raid problem

Any time George,

Another warning though. When you are checking all of your other servers, keep in mind that you have never done a check, it isn't worth trying now. I would wait until you have problems before doing them as they can cause corruption. Better safe than sorry. Just be sure that you start doing them regularly on any new arrays you create.

Cheers,

Mark
CA1067401
Occasional Advisor

Re: LC2000r Raid problem

We have good backups. I was thinking I could prepare for a complete server rebuild if disaster occurs. I would rather have this problem solved network wide and begining monthly checks system wide. It only will be an issue for 2 more servers out of 8.

George
Mark Young_2
Trusted Contributor

Re: LC2000r Raid problem

George,

I would have to commend you then. If more Network Managers would take these kind of proactive steps then they would have much easier time in the long run. The old addage "An ounce of prevention is worth a pound of cure" really applies when you are dealing with servers.

Kudows to you,

Mark