ProLiant Servers - Netservers
1756525 Members
2259 Online
108848 Solutions
New Discussion

DL350 G4 ASR Lockup Detected

 
Susan Hebel
Advisor

DL350 G4 ASR Lockup Detected

I have a DL350 G4 that ran pretty well until the motherboard's PPM died about two weeks ago. The motherboard was replaced. But the server has not been the same since. I feel like I have been chasing a ghost. The next reboot after the new motherboard was put in there was Array Accelerator Caching errors. Then an error that the Battery on my Smart 641 died. I have ran array controller without their batteries before.

This server is a Netware 6.5 sp6 running Groupwise 6.5 sp6, GWAVA 4.5, Arcserve 11.1, FAxware plus some Netware management. Then the server would not come up with all its components running. It would just reboot with no abend and no ASR. I did some repair to the NSS volumes. That didn't help. Two more reboots that night. One day nothing.. the server stayed up. Then the next day again ASR lockup detected with NMI in CIOS.nlm. Then three days nothing.
On the 18th in the middle of the afternoon, the server came down with the ASR Detected. No NMI error this time.
On the 21st I updated the system BIOS and turned off hyperthreading. Then 2 times overnight the server rebooted with ASR detected. No abend errors.
On the 22nd I upgraded the Array BIOS and put HP Support Pack 8.20. I did everything except the NIC driver. We had a lot of problem with them in the past. The server had an ASR with an NMI error that just gave an address that evening.
On the 23rd the server ended up with a mail issue which had very little to do with the other problems.
On the 24th two more ASR Lockup Detected with and address errors. I thought that the Arcserve database might be corrupted so I tried the backup with an empty database. No problems during any Arcserve operations but afterwards there were.
Today at 6:52am when all the server is doing is receiving mail it had another ASR with an NMI.

This is getting very frustrating. I am assuming at this point that the motherboard is still bad or the Smart array is bad. The memory passed its diags when I put the new motherboard in. The other thing is when I put the processor in was I supposed to replace the thermal tape that sits between the processor and the heatsync? If I was, it was never sent in the box. Could that be causing some of these problems?

Any ideas would be helpful?

Thanks,
Susan