Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Still problems with "degraded" drives

Ayman Altounji
Valued Contributor

Still problems with "degraded" drives

I posted about this problem a while back, have been working with Cpq support but still no resolution. 10 of the 12 36.4GB drives in our ML530 are Quantum Atlas & working fine. However the remaining 2 drives are from a different vendor and are reported as "degraded" even though they seem to work fine. Today, upon advice from Cpq support, I upgraded the system BIOS to 4.06 and the 5300 f/w to v1.62, also cleared nvram and re-ran the system config. utility. This did not change anything -- the same 2 drives are still being reported by the 5300 as "degraded".

Anyone??? I'm going to followup with Cpq on Monday, but this is starting to bug me!

-dave stambaugh
16 REPLIES
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

We are having the same problem.Novell 5.1.ML530.We already changed the smartcontroller(f/w 1.62),drive cage and 2 disks (12 total),still problems with "degraded " drives.Eddy
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Hi,

I have the same problem with a Netware 5.1 ML 530 with 2 drive cages, Smart-53xx and 12 Compaq Hot plug 15000 RPM... Eight disks are reported as degraded...

I have already changed disks, cables, Smart Array without any change: new drives becames degraded...

I have flashed wit the last firmware the ML Bios and the Smart Array Bios without any result, except the fact that an other problem occurs: the Smart array flash process fails (!!!) as described elsewhere on this forum (Flash unsuccessful; BH Compaq Smart 5300...)... Same problem with a new 53xx controller...

I have upgraded the CPQRAID.NLM with the last february version (4.94B), just to see my server abending about each 10 minutes because of a fatal exception in the survey module...

So now what to do?

I have a lot of work to do on this server. I have to install an oracle database, but I have stopped everything since the beginning of the problem (22 january). I can't wait anymore !!!!!!!!!!!!!!

Is it safe to continue working despite all these "degraded disks"?

Thanks for advice

Philippe
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

In my last post I forgot to mention that I have replaced the "degraded" drives but the problem does not change. However, none of the replacement drives were Quantum Atlas models, the ones that seem to be working. I requested Quantum replacements from Cpq support but they are unable to specify a make/model, only a generic Cpq part number which could be from any one of a number of drive vendors.

I'm glad to hear someone mention that SURVEY.NLM abends on them too! I think what's happening is if a rebuild is taking place (like after swapping a drive out), and SURVEY.NLM goes out to take a snapshot of the system health while the rebuild is in progress, it abends the server. If no rebuild is in progress, no abend occurs. That's what it looks like to me anyway.

I plan to call Cpq again today and get them to escalate this higher up the support chain.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

One more thing: This problem with "degraded" drives is on a production server. Except for the drive status being reporting as degraded, the array and everything else seems to be working fine. So I continue to keep my fingers crossed that this is not going to turn into some kind of meltdown on me... Right now I'm a little PO'd that our new IT management forced us to buy Compaq instead of Dell (which I've been using for several years with no similar headaches), because "Dell is crap". Bwa ha ha! We shall see.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Hi Dave,

As far as I tested it, the survey.nlm crashes my server even outside a rebuilding process. It started as soon as I have upgraded the CPQRAID.NLM...

If you receive any information from Compaq, you are welcome...

Philippe
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

I'm on the phone with Cpq support right now. First, they say the abend problem with SURVEY.NLM is a known issue, probably unrelated to the "degraded drive" problem with the 53xx controller. They are saying that the entire NSSD support pack needs to be applied, not just the new .HAM driver. I did only the .HAM driver. so I will reapply the complete NSSD thing tonight and see what happens.

They are escalating the "degraded drive" problem and have promised to get back to me within 24 hours. At first they wanted to send me a replacement 53xx, but when I pointed out that at least 1 other user is reporting an identical problem as mine and swapping the controller didn't fix it, they backed off from that and agreed to escalate with engineering.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Hi,

Just a little comment about the survey abend, it happens even after that the last NSSD has been applied..

Sorry, I made a mistake - probably topsy-turvy in all these smart array annoyances -: when I spoke about cpqraid.nlm, it is of course cpqraid.ham that has to be read. This last version can be find in the Softpaq Sp16272.exe with the CPQSHD.CDM... but unfortunately It doesn't help as far as you are not in the mood of seeing your server abending...

Philippe

PS. Of course, I have unloaded survey and I survive without it...
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Well, SP14310 is the latest comprehensive NSSD package. The CPQRAID.HAM driver in it is v1.00, 7/20/00. Also CPQSHD.CDM v1.35, 6/29/00.

SP16272 contains a newer version of CPQRAID.HAM, v1.05 12/14/00, and CPQSHD.CDM v1.36 10/26/00. It appears to be these 2 drivers that are not compatible with SURVEY.NLM. Bah.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

I've phoned with compaq for the problem that was posted by EDBU here above, they told me that there is a problem with the threshold trigger switches on the hard drives itselfs???? The solution they gave me for eventually solving the problem was the do an upgrade of the firmware of the smartctrl an dthe hard drives, if this didn't solved the problem they told me to replace all the disks (12 in total ). The server at our customer is also a production machine so it is not so evident to replace al 12 drives knowing that the problem will not be solved. What I think is that compaq doesn't have a solution for this problem.
I hope that we are getting a quick respons of Compaq for this problem, because our customer is getting very furious!!!!!!!!!!!!!!!!!!!!!!!!!

Jove
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Definitively changing the disks doesn't solve anything !!!!! In my case, the disks in the ML 530 were working wel with SA 431 since months... As I needed more space, I decided to break this 'winning team' and add a second disk cage with this new "on paper wonderful" SA 53xx... I thought our servers deserve it... 2 days after, the diks -the old ones as the new ones- were all flagged as degraded... Changing the disks will put your disks in green only for a few hours... Just the time for a walk...

I have a dream: sometimes somewhere Compaq would admit there is a problem... and stop talking about third party, other manufacturers hardware faults...

Kind regards

Philippe
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

I've phoned with compaq for the problem that was posted by EDBU here above, they told me that there is a problem with the threshold trigger switches on the hard drives itselfs???? The solution they gave me for eventually solving the problem was the do an upgrade of the firmware of the smartctrl an dthe hard drives, if this didn't solved the problem they told me to replace all the disks (12 in total ). The server at our customer is also a production machine so it is not so evident to replace al 12 drives knowing that the problem will not be solved. What I think is that compaq doesn't have a solution for this problem.
I hope that we are getting a quick respons of Compaq for this problem, because our customer is getting very furious!!!!!!!!!!!!!!!!!!!!!!!!!

Jove
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

On the SURVEY.NLM abend issue, last night I back-rev'd CPQRAID.HAM to the previous version (went from v1.05 back to v1.00). This did not solve the abend problem. I'm now thinking it might be related to an updated SERVER.EXE that was installed as part of Novell's OS5FT2A update. I will probably back-rev SERVER.EXE tonight as well and see what happens.

On the degraded drives, it appears that the problem is purely "cosmetic", more of a nuisance issue. The "degraded" drives are functioning perfectly. Not saying this isn't a serious issue that needs to be fixed, but I can live with it while Compaq works on a fix for it. Cpq gave me 3 options: Replace all the drives with Quantums, one at a time (not acceptable); Replace the 53xx controller with a 4200; or wait for the fix. I guess I will wait.

-dave
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Try upgrading the firmware on the drives ....
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

A fix for the degraded drives has been found. Please download and apply SP16277 from <>.

Have a nice day,

Mike
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

Installed the new softpaq last weekend, problem does seem to be fixed. SURVEY.NLM does still abend the server through.
Ayman Altounji
Valued Contributor

Re: Still problems with "degraded" drives

About the SURVEY.NLM issue:

Most of the time it is caused by a corrupt Survey.idi file or one of the other Survey files. Typically, this condition can be fixed by deleting SYS:\SYSTEM\SURVEY.* and then reinstalling the Survey utility using the SINSTALL.NLM. Some rare instance require deleting SYS:\SYSTEM\SURVEY.* plus the SYS:\SYSTEM\COMPAQ\ and SYS:\SYSTEM\CPQMGMT\ directories, reinstalling the Insight Manager Agents and then reinstalling the Survey utility.

If the above steps to not work then one can attempt to determine if some other software module is causing the problem. This can be done by not loading 3rd party software such as Tape Backup, Anti-Virus etc. upon startup of the server.

As another troubleshooting step, it is possible to start SURVEY in debug mode to determine where in the SURVEY process the abend is occurring. To start SURVEY in debug mode on a netware server, use the "-x" option.

Example: load survey -x

The "-x" option will cause SURVEY to output to the console screen what it is currently doing. When the server abends, get into the debugger by pressing and holding the following 4 keys simultaneously "SHIFT"-"ALT"-"SHIFT"-"ESC". Once in the debugger use the "v" command to switch to the SURVEY Debug screen and look at the SURVEY debug output. This may help determine where the abend is occurring within SURVEY.