Netservers
1752511 Members
4738 Online
108788 Solutions
New Discussion юеВ

Re: LC2000 NT4 netraid server hang

 
SOLVED
Go to solution
AnthonyB
Occasional Advisor

LC2000 NT4 netraid server hang

My previously stable system has started hanging frequently. It works fine for a while after reboot.

Its NT4, NetRAID1M, 6x36Gb disks as Raid 5
In the NT4 Event Viewer I am seeing some errors.
System Log= The device \Device\ScsiPort2 did not respond withing the specified timeout period
Application Log = NetRAID.log Adapter0, Channel0, target3, extended error event (with lots of code including Sense Data, errCode=0x70, valbit=1, SegMent=0x00)

Not sure whether this relates to my NetRaid controller card or the drive.
Any ideas what I should look for or is it a case of asking HP to swap the card out and see if it fixes?
7 REPLIES 7
kris rombauts
Honored Contributor
Solution

Re: LC2000 NT4 netraid server hang

Hello Anthony,

most probably your hard disk at SCSI id=3 is sick and you are facing this hang due to a known bug in the Netraid 1M firmware described here :

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00043477

As you see we have a fix for this in newer F/W.


- If you have the Netraid Assistant utility installed then pls do the following:

- open Netraid assistant
- check for any bad disk (disk in failed
state i.e.)
- close the utility again
- copy the file called raid.log located
normal in c:\netraid\client\raid.log or
simmilar directory.
- attach this raid.log file to this post in
the forum.


If the Netraid assistant utility is not installed, get it here :

http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?locale=en_US&pnameOID=19399&taskId=135&prodTypeId=15351&prodSeriesId=50440тМй=en&cc=us&swEnvOID=24#1152

Or you could copy the Windows system and application.evt file and attach it here.


So most likely the hard disk at id=3 is bad or becoming bad and in order to verify if you're hit by the known issue, i need the above info.


This other issue also can cause a hang :
(look at the F/W rev's of the 36 Gbyte disks and see if you have some of those)


http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=lpn12570

HTH

Kris
AnthonyB
Occasional Advisor

Re: LC2000 NT4 netraid server hang

Thanks Kris I think you may be spot on there.

We recently installed NetRaid assistant as it wasn't on and its since then that we have had the problem.
It looks like NetRaid assistant is automatically doing a consistency check on a Sunday and its Sunday/Monday when we have been getting most of the problems.

I will post the log file as you requested shortly- thanks for your help so far.
kris rombauts
Honored Contributor

Re: LC2000 NT4 netraid server hang

Hi Anthony,

if you saw the hangs occuring after the Netraid Assistant was installed, then we can be sure we found the reason here, so good thing.

Yes, by default after installing the utility, the Netraid service does a weekly consistency check every Sunday.
If this schedule does not fit or is in conflict with a backup or such, then you can re-schedule it, have a look at the readme file on how to do that.

I guess you installed this Netraid Assistant in due time, because if another disk besides id=3 would fail now in your array then the chance that a array rebuild fails (after you replaced that bad disk) is very likely since their can only be one missing item in a full raid5 stripe. In the current case you might have one or more bad spots on the disk at id=3 and i.e. one on some other disk, so do a full backup asap and upgrade the 1M f/w and then run the consistency check again (manually from Netraid assistant) and it should not hang anymore BUT you still might have to replace disk at id=3, that depends if it can correct the bad spot and remap it during the consistency check, in which case their is no issue. However if it cannot correct it,then id=3 still needs to be replaced.


Kris
AnthonyB
Occasional Advisor

Re: LC2000 NT4 netraid server hang

Taken a slight turn for the worse.
NetRaid assistant shows all drives are OK currently. System working right now.

I have disabled the next run of consistency check but will probably schedule it to run in about 2 years time so it doesn't do it again for the moment.

After a few system hangs the other day I have only just realised that a chunk of data is missing. I have backups of the 5Gb that appears missing but the puzzling thing is that although the files aren't there the disk space is about the same.
A reboot prompts to run AUTOCHK on the D: drive where the data is (was) but I bypassed that for now. The raid 5 has partitions of c,d,e

Any advice on this - I guess I should restart and let autochk run but I am a bit concerned it will scrap more data in the process (maybe it will magically reappear?)
kris rombauts
Honored Contributor

Re: LC2000 NT4 netraid server hang

Anthony,

if i understand well, at this time the Netraid 1M f/w is still at the older revision right ?

Disks showing ok in Netraid Assistant is not enough to take them for 'good', their individual properties probably show some errors (at least disk id=3).
If a disk becomes 'bad' as seen in the Netraid Assistant GUI then things become even more critical since you have no protection anymore for any future disk failures or small media defects.

It only takes you a total of 20 minutes max downtime to do this F/W upgrade normally and the benefit would be that the next consistency check will not hang when their is a bad block on a disk but it will remap it if still possible and also log it.
This way you'll have a better view on which disk is sick (hopefully not multiple) and if several bad spots then the disk needs to be replaced ASAP.


Remember that the hang is due to a bug in the Netraid F/W but we have a fix for it and it is well documented, but the real problem here on your server is the disk id=3 (and hopefully no other disk).

I would strongly recommend you to make sure a consistency check can be run succesfully first, then decide which disk to replace since it will cost you much more time and money if this situation gets worst and data loss occurs with only one raid 5 array this means a server reinstallation from scratch.

Once a disk is bad/failed you cannot run a consistency check anymore untill a sucessfull rebuild has been done, pls be aware.

Hope i convinced you.

Kris
AnthonyB
Occasional Advisor

Re: LC2000 NT4 netraid server hang

Thanks Kris,
I will schedule a F/W update - My worry not having done this before is the raid array information getting lost on reboot after the FW update.

As you are experienced on these types of issues - would you allow Autochk to run at start up time on the d: drive as it stands right now?
I am not sure whether I should skip this until the FW update has been done?

Thanks
kris rombauts
Honored Contributor

Re: LC2000 NT4 netraid server hang

Anthony,

not sure about the best sequence of actions here now that apparently the file system at the OS level shows a problem already.

i would:

- update f/w
- skip autocheck
- run consistency check
- check data at file system level
- reboot and let autocheck run
(if many small files this can take a
long time)


The F/W update does not delete the array config. For a extra safety you can always save the current array config details and print it out so that you know the stripe size and logical drive size if for some reason the array needs to be re-created again (should not , but in case Murphy comes .. you're still safe)

From Netraid assistant do a Configuration , Display and print that info out.
Their are other ways also but with this one you have enough to manually re-create the array at all time.

good luck

Kris