Operating System - HP-UX
1847009 Members
4034 Online
110258 Solutions
New Discussion

Server goes down unexpectedly - HP-UX 11.11

 
Stijn_3
Occasional Advisor

Server goes down unexpectedly - HP-UX 11.11

I have this strange thing happening to an old HP9000 server. At random, it loses network connectivity. Just before that, I notice that it is under a heavy load (over 5.0 in top). I can still log on to the console interactively to give it a reboot. This solves the problem until it happens again. I really don't know where to go and look. EMS sends me a Single bit error (SBE) event related to memory, but as far as I know, this can be ignored. I see nothing weird in the syslog, and the boot process is normal, except for a warn 7704 message (related to the memory). Should I go ahead and replace the memory module anyway? Does anyone have any clue where to look?
13 REPLIES 13
Victor Fridyev
Honored Contributor

Re: Server goes down unexpectedly - HP-UX 11.11

Hi,

Which HP9000 model do you have ?
I had such a problem on R380 and rp24xx. Solution for R380 was to replace system disk. rp24xx required to upgrade its GSP Firmware.

HTH
Entities are not to be multiplied beyond necessity - RTFM
vofsky
Frequent Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

Hi Stijn,
According to what you said,you have an old HP machine,so it must use HP-UX 10.00.So you have to modify some kernal parameter values to tune the performance.You could set the netmemmax to -1,this will improve the network performace greatly.Also you could set ninode to 15000 but never go above 4000 with multi-processors.
Do you use nfs?if yes,you should apply the latest patch from the HP website.

Stijn_3
Occasional Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

It is a HP9000 D390 which originally had HP-UX 10.20 on it. I inherited it running HP-UX 11.11.
A few months ago, I replaced the old 10 MB NIC by a 100 MBit FD one, but it worked fine for a while. So it cannot be the network. I am really getting desperate...
Steven E. Protter
Exalted Contributor

Re: Server goes down unexpectedly - HP-UX 11.11

Shalom Stijn,

You have done a number of problematic things.

1) You've replaced a supported OS with one that is not supported, limiting your ability to work directly with HP. It would be better to check under 11.11 if the old binary/code works.

2) You've replaced a NIC card on the fly without stating the model and being sure that its fully supported by the 10.20 OS.

3) You have too quickly elminated possible causes such as the NIC swap.

The initial thought concerning memory is good. There is a utility under 10.20 called xstm. It works with X and has equivalents cstm and mstm.

I recommend using this utility to test (excercize) the memory. If there are possible problems, they may be revealed in the excercize test.

Set the first variable in savecrash configuration to save crash dumps /etc/rc.config.d/savecrash

You might still be able to convince HP to analyze any crashdupmps you receive.

Patch the system as best you can with the 10.20 patches available in the patch database. Patching can often correct system responses that we consider inappropriate to hardware conditions.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Stijn_3
Occasional Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

Stephen,

The machine was upgraded to HP-UX 11.11 before I "inherited" it. The NIC card is fully supported by the D390 and by HP-UX 11.11 (it is a HP J3515A HSC 10/100Base-TX D-Class 1 port that came from another D390).
I will run ioscan -fn when the system hangs again to see if the NIC hasn't 'disappeared' from the system.

Regards
vofsky
Frequent Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

Hi Stijn,
Maybe there is an issue existing in the network device,such as network switch.
Kent Ostby
Honored Contributor

Re: Server goes down unexpectedly - HP-UX 11.11

Stijn --

The WARN 7704 can indicate one of two things:

1) Memory is not configured correctly

2) RAM card is logging errors.

Given the EMC, I'd say its #2. Is this what's causing your drop? I couldn't say. But it is something to take care of.

"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Kent Ostby
Honored Contributor

Re: Server goes down unexpectedly - HP-UX 11.11

Also, just to clarify, a D390 IS SUPPORTED on 11.11. See this document for details:

http://docs.hp.com/en/5991-2814/ch04s10.html#babcjdai
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Stijn_3
Occasional Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

Ken,

Thanks for the reply. I ran some diagnostic scripts to check the memory, and it turns out that the errors did not cause the system to go down. I am keeping a closer eye on it. Maybe it is the NIC Card's driver? Or maybe I should apply the latest HP-UX patch bundles?
Kent Ostby
Honored Contributor

Re: Server goes down unexpectedly - HP-UX 11.11

Stijn --

If you have a support contract on the box, then I would suggest doing a "TC" to get a dump and having the Support Center take a look at it.

Latest patching won't hurt ... you can use the patch tool on itrc to get a custom patch bundle for that box.
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Alan Buynak
Occasional Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

Before you reboot, are you able to view the routing table?
DCE
Honored Contributor

Re: Server goes down unexpectedly - HP-UX 11.11


Stijn,

One potential cause that has been mentioned in this forum several times is a duplicate IP address. Next time the system hangs ping the address from another system on the network and if there is a response, see if you can trace it to its source.

HTH
DCE
Stijn_3
Occasional Advisor

Re: Server goes down unexpectedly - HP-UX 11.11

Hello,

Thank you all for your feedback. I am now pretty sure it must be a network problem. When the system 'disappears from the network', I unplug the network cable and connect it again, and network is restored. So I changed Duplex setting of the NIC to 100 MB HD, both on the NIC and the switch port. This morning, same problem again. I now tried another network cable, and another switch port. Both are again manually set to 100 MB HD. I am afraid this will not solve the problem. I will try to leave to switch port setting to auto next, but this is not good for network throughput.