ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

RA4000 RAID Controller

Steve Gill_1
Occasional Visitor

RA4000 RAID Controller

Here's a bit of backgroud first:

We have 2 x DL580 servers in an active active fibre cluster running on Win2k. We use an external Storageworks 4100 drive enclosure with an RA4000 RAID controller in it. There are 7 x 36GB disks in the external enclosure with 5 logical drives. 2 of the logical drives are 100GB each (one for each node)

Now to the problem, everytime I fire up the Compaq array configuration utility I get a warning like this :

Background parity initialization is currently queued or in progress on Compaq RA4000 Controller in RAID Array G125DBX18026 on the following logical drives:

Logical Drive 4
Logical Drive 5

This is a normal operation that is necessary to initialize logical drives that have a fault tolerance with parity. If background parity initialization is queued, it will start when I/O is performed on the drives. When background parity initialization completes, the performance of the logical drives will improve.

Fairly standard stuff if you have just built the server or you have just had a disk failure. But the problem is the server has been live for 12 months now and I haven't had any disk failures! So the question is why am I getting this message?

Disk performance at the moment is severly impacted on the two 100GB logical drives so the controller is trying to do something with the disks, why would it be taking so long?

I have an identical set up at another site and this has no such problems.

If anyone has any ideas it would be much appreciated...I am about to get the RAID controller replaced to see if this rectifies the problem.

Thanks in advance

Steve.
9 REPLIES
Janine Bertolo
Honored Contributor

Re: RA4000 RAID Controller

Hi Steve;

If the logical drives have not yet been written to by the Operating System, you will continue to get the background parity messages.

Is that a possibility?

Janine

To get results you've never had before, try something you've never tried before.
Steve Gill_1
Occasional Visitor

Re: RA4000 RAID Controller

Unfortunatly no, the system has been running Exchange 2000 for about 12 months...the drive array in question has over 50Gb of data on each logical drive.

This problem only occured when the unit was powered off for the first time in 12 months so I am leaning towards a hardware fault. I am going to try upgrading the firmware too before replacing the controller entirely.

Any other thoughts at all?
Jake Richie
Advisor

Re: RA4000 RAID Controller

Hi Steve,
I was looking at your info and if you have been up for 12 months without a reboot, I would assume that your firmware is now quite dated.
The issue you are showing points exactly to a firmware issue, provided that.... (I assumed there are no red leds on your drives) no drives have been changed or swapped. Use this link ... www.compaq.com/support/files/server/us/download/14694.html array firmware update to create Disk 4 to do firmware update on controller.

Also, I would suggest that you run the Array Diagnostics from within Windows from SStart 5.4 - 4.8
found \cpqadu\cpqadu.exe on cdrom

Also once firmware for controller is done, run firmware update on drives if they haven't been swapped in last year www.compaq.com/support/files/server/us/download/13674.html

Hope this helps
The early bird may get the worm, but the second mouse gets the cheese.
Dharmesh Mistry
Occasional Visitor

Re: RA4000 RAID Controller

Hi Steve,

We have the same issue on 3 different servers and affecting 5 logical dirves between them.

Can you tell me if the firmware upgrade solved your problem?

Also, if parity started building after the firmware upgrade completed and how long parity build took to complete for you.

Appreciate your feedback.
The Best
Steve Gill_1
Occasional Visitor

Re: RA4000 RAID Controller

Hi,

We did not do the firmware upgrade. It turns out that this is a known fault on a Compaq clustered server.

The MS cluster service tries to lock access to the RAID controller on each node in the cluster and that error on the RAID controller is the end result.

We ended up having to turn off both servers (v. important must turn them off) and leaving the external RAID and disk enclosure on for a day so that it could sort itself out. Not the ideal solution but as yet Compaq have no answers for us, and say that this behaviour 'is to be expected'.

We are pushing them ourselves and via our system provider to give us a fix to this problem. If I find out more I will let you know.

Thanks for your reply

Steve.
Dharmesh Mistry
Occasional Visitor

Re: RA4000 RAID Controller

Hi Steve,

Thanks for getting back.
Looks like HP/Compaq have a fix but do not document with their updates as I have spoken to them and been told that it does.

You may still have a problem and that it is reported to have been fixed by HP/Compaq now.

So we're going with HP/Compaq's recommendations as we can not afford to have the down time or the problem in the future.

Thanks,
Dharmesh
The Best
Steve Gill_1
Occasional Visitor

Re: RA4000 RAID Controller

Hi Dharmesh,

Just out of interest, what was the fix they gave to you ?

I have had no joy getting a permanent fix to the problem.

Thanks in advance

Steve.
Dharmesh Mistry
Occasional Visitor

Re: RA4000 RAID Controller

Hi Steve,

We've tested this in our lab and have replicated the problem.

The firmware upgrade to the latest available as of 12.12.2002 was 2.62 for the RA4100 controllers di not fix the problem.
However, as all the different diagnostics software reports no disk failures, we still have redundancy as tested in our lab.

We are trying various things with Compaq now. Will let you know how it goes.

Can you confirm how long the servers were left off before the array completed it's parity rebuild for you?

Thanks,
Dharmesh
The Best
Steve Gill_1
Occasional Visitor

Re: RA4000 RAID Controller

Hi Dharmesh,

We left each node in the cluster off for around 9 hours. The servers were physically powered off during this time and just the external disk cabinet was on.

After this the error disappeared.

If you find out more please let me know.

Thanks in advance

Steve.