Operating System - HP-UX
1831505 Members
3228 Online
110025 Solutions
New Discussion

Strange Hardware Problems

 
Krishna Prasad
Trusted Contributor

Strange Hardware Problems

Hello all,

Just wanted to see if anyone else has seen anything like what we have hear lately.

We have had the L and N processors for close to two years. They have been very reliable over this time. However, in the last 4 weeks we have lost 3 CPU's, a Core I/O, and an Internal OS drive that has actually caused the systems to crash. It has occured on 4 different hosts. 3 L's and 1 N. I have a hard time believing that all the sudden we are having all these hardware problems. I think they may be the symptoms of a bigger problem.

They only thing I can think of at this time would be around APA. The reason I wonder about APA is that 4 weeks ago ( when the problems started ) we switched our IP address on our host to start to usa APA. They were previously Token-Ring. The hardware is HP 4 Port PCI cards A5506A. We are using the APA software of B.11.00.08 and have the APA cum. Patch installed. I searched all other APA patches and we seem to be O.K. with patches.

Has anyone else had any problems with APA?
Has anyone else expierenced a rash of Hardware Problems on L's and N's?
Positive Results requires Positive Thinking
8 REPLIES 8
Patrick Wallek
Honored Contributor

Re: Strange Hardware Problems

I seriously doubt that APA software caused your problems. I honestly don't see how that software could effect a CPU, a core I/O card or the OS disk.

The only other thing I can think of other than bad luck would be power problems, ie a power spike or something.

Good luck.
Volker Borowski
Honored Contributor

Re: Strange Hardware Problems

Hi,

if several boxes go bad at the same time, I would sure go for common things.

I do not know about APA, but I would surely check the following:

- environmental warnings in the syslog or known problems with aircondition in this timeframe.
- boxes connected to the same UPS and check if peaks or errors noted there.
- if the boxes are only related by delivery date, check if filters in front of the fans are clean (would be cooling problem, but not related to AC faults)

Something like this...

Volker
Craig Rants
Honored Contributor

Re: Strange Hardware Problems

I agree with Patrick. Do you have the servers on an UPS? What is the output voltage of the UPS? When was the last time you have the batteries checked in the UPS?

Thay may be a good place to start.

Good Luck,
C
"In theory, there is no difference between theory and practice. But, in practice, there is. " Jan L.A. van de Snepscheut
Uday_S_Ankolekar
Honored Contributor

Re: Strange Hardware Problems

Hi,

UPS,PDS are the one to start with. Secondly also have a look for any new unconfigured objects (Terminal server) are lately being hooked up to the network. I had faced this problem crashing down both servers simultaneously in a service guard clustered environment!
Goodluck,
-USA.
Good Luck..
Uday_S_Ankolekar
Honored Contributor

Re: Strange Hardware Problems

Hi,

UPS,PDS are the one to start with. Secondly also have a look for any new unconfigured networking objects (ex: Terminal server) are lately being hooked up to the network. I had faced this problem crashing down both servers simultaneously in a service guard clustered environment!
Goodluck,
-USA.
Good Luck..
A. Clay Stephenson
Acclaimed Contributor

Re: Strange Hardware Problems

Hi Ron:

I've used APA and never had and would not expect problms like this. You need to get an electrician out immediately. You actually need to have the power analyzed looking for spikes, noise, high-frequency trash, and transients. The other thing that can cause this is terrible build ground circuits. If these boxes are supplied by a common UPS have that checked out as well. If all your electrician brings out is a VOM, send him home. He needs real equipment.
If it ain't broke, I can fix that.
Krishna Prasad
Trusted Contributor

Re: Strange Hardware Problems

Thanks for the response so far.

Just an update we have the entire datacenter on UPS and a backup generator. If the UPS had a problem it would be more wide spread.

The reason I was wondering about APA was becuase it was the last major change. Just a gut feeling nothing more......

The syslog looks show hardware problems along with the gsp. But still seems strange to have more failures in month, then I had for two years.

Positive Results requires Positive Thinking
Jim Turner
HPE Pro

Re: Strange Hardware Problems

Hi Ron,

I concur with the others. Have your facilities folks put a line monitor (preferably with strip-out) on the power that feeds these boxes. You've almost certainly got power problems. Secondly (as also cited by the previous respondents), check for environmental factors like temperature, humidity, static electricity, and gamma radiation (laugh if you want, but it's there).

Happy hunting,
Jim