Disk Enclosures
1753317 Members
5094 Online
108792 Solutions
New Discussion юеВ

Still problems with "degraded" drives

 
Ayman Altounji
Valued Contributor

Still problems with "degraded" drives

I posted about this problem a while back, have been working with Cpq support but still no resolution. 10 of the 12 36.4GB drives in our ML530 are Quantum Atlas & working fine. However the remaining 2 drives are from a different vendor and are reported as "degraded" even though they seem to work fine. Today, upon advice from Cpq support, I upgraded the system BIOS to 4.06 and the 5300 f/w to v1.62, also cleared nvram and re-ran the system config. utility. This did not change anything -- the same 2 drives are still being reported by the 5300 as "degraded".

Anyone??? I'm going to followup with Cpq on Monday, but this is starting to bug me!

-dave stambaugh
16 REPLIES 16
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

We are having the same problem.Novell 5.1.ML530.We already changed the smartcontroller(f/w 1.62),drive cage and 2 disks (12 total),still problems with "degraded " drives.Eddy
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Hi,

I have the same problem with a Netware 5.1 ML 530 with 2 drive cages, Smart-53xx and 12 Compaq Hot plug 15000 RPM... Eight disks are reported as degraded...

I have already changed disks, cables, Smart Array without any change: new drives becames degraded...

I have flashed wit the last firmware the ML Bios and the Smart Array Bios without any result, except the fact that an other problem occurs: the Smart array flash process fails (!!!) as described elsewhere on this forum (Flash unsuccessful; BH Compaq Smart 5300...)... Same problem with a new 53xx controller...

I have upgraded the CPQRAID.NLM with the last february version (4.94B), just to see my server abending about each 10 minutes because of a fatal exception in the survey module...

So now what to do?

I have a lot of work to do on this server. I have to install an oracle database, but I have stopped everything since the beginning of the problem (22 january). I can't wait anymore !!!!!!!!!!!!!!

Is it safe to continue working despite all these "degraded disks"?

Thanks for advice

Philippe
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

In my last post I forgot to mention that I have replaced the "degraded" drives but the problem does not change. However, none of the replacement drives were Quantum Atlas models, the ones that seem to be working. I requested Quantum replacements from Cpq support but they are unable to specify a make/model, only a generic Cpq part number which could be from any one of a number of drive vendors.

I'm glad to hear someone mention that SURVEY.NLM abends on them too! I think what's happening is if a rebuild is taking place (like after swapping a drive out), and SURVEY.NLM goes out to take a snapshot of the system health while the rebuild is in progress, it abends the server. If no rebuild is in progress, no abend occurs. That's what it looks like to me anyway.

I plan to call Cpq again today and get them to escalate this higher up the support chain.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

One more thing: This problem with "degraded" drives is on a production server. Except for the drive status being reporting as degraded, the array and everything else seems to be working fine. So I continue to keep my fingers crossed that this is not going to turn into some kind of meltdown on me... Right now I'm a little PO'd that our new IT management forced us to buy Compaq instead of Dell (which I've been using for several years with no similar headaches), because "Dell is crap". Bwa ha ha! We shall see.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Hi Dave,

As far as I tested it, the survey.nlm crashes my server even outside a rebuilding process. It started as soon as I have upgraded the CPQRAID.NLM...

If you receive any information from Compaq, you are welcome...

Philippe
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

I'm on the phone with Cpq support right now. First, they say the abend problem with SURVEY.NLM is a known issue, probably unrelated to the "degraded drive" problem with the 53xx controller. They are saying that the entire NSSD support pack needs to be applied, not just the new .HAM driver. I did only the .HAM driver. so I will reapply the complete NSSD thing tonight and see what happens.

They are escalating the "degraded drive" problem and have promised to get back to me within 24 hours. At first they wanted to send me a replacement 53xx, but when I pointed out that at least 1 other user is reporting an identical problem as mine and swapping the controller didn't fix it, they backed off from that and agreed to escalate with engineering.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Hi,

Just a little comment about the survey abend, it happens even after that the last NSSD has been applied..

Sorry, I made a mistake - probably topsy-turvy in all these smart array annoyances -: when I spoke about cpqraid.nlm, it is of course cpqraid.ham that has to be read. This last version can be find in the Softpaq Sp16272.exe with the CPQSHD.CDM... but unfortunately It doesn't help as far as you are not in the mood of seeing your server abending...

Philippe

PS. Of course, I have unloaded survey and I survive without it...
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Well, SP14310 is the latest comprehensive NSSD package. The CPQRAID.HAM driver in it is v1.00, 7/20/00. Also CPQSHD.CDM v1.35, 6/29/00.

SP16272 contains a newer version of CPQRAID.HAM, v1.05 12/14/00, and CPQSHD.CDM v1.36 10/26/00. It appears to be these 2 drivers that are not compatible with SURVEY.NLM. Bah.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

I've phoned with compaq for the problem that was posted by EDBU here above, they told me that there is a problem with the threshold trigger switches on the hard drives itselfs???? The solution they gave me for eventually solving the problem was the do an upgrade of the firmware of the smartctrl an dthe hard drives, if this didn't solved the problem they told me to replace all the disks (12 in total ). The server at our customer is also a production machine so it is not so evident to replace al 12 drives knowing that the problem will not be solved. What I think is that compaq doesn't have a solution for this problem.
I hope that we are getting a quick respons of Compaq for this problem, because our customer is getting very furious!!!!!!!!!!!!!!!!!!!!!!!!!

Jove