Operating System - Tru64 Unix
1752590 Members
3950 Online
108788 Solutions
New Discussion юеВ

GS140 server crash - no error or warning messages

 
Rakesh Jha_1
Advisor

GS140 server crash - no error or warning messages

We have on GS140 running Tur64 V5.1B-3 (build 2650) patch info given below -
Patches installed on the system came from following software kits:
------------------------------------------------------------------

- T64V51BB26AS0005-20050502 OSF540
- T64V51BB26AS0005-20050502 TCR540
- TCRKIT1000547-V51BB26-E-20060420 TCR540

This box is a single node cluster. Within two weeks this server has crashed 3-4 times. There is no error in /var/adm/messages, no info in uerf and no crash data. Server just goes down to halt prompt.
Has anyone face this problem or any idea why server crashes without any crash data?

7 REPLIES 7
Kapil Jha
Honored Contributor

Re: GS140 server crash - no error or warning messages

Have you checked your power status.
Did you analyze or asked HP to anaylse binarry.errlog file.
Have you done complete syatem health check.
system temperature etc.
PK5 is seems to be OK.
Any more observations in messages file,evmlogs.
BR,
Kapil
I am in this small bowl, I wane see the real world......

Re: GS140 server crash - no error or warning messages

Rakesh,

Any chance your host started out life as an Alphaserver 8400?

Your box uses three phase power and if your power supplies are original, they're a little long in the tooth. You might start by checking out the power quality coming in and the PSUs as you can have problems there which could crash the box with little or no warning to the operating system.

Have you been "on-console" when it actually crashed? If not, it might be hard to differentiate the box coming down to the SRM from some event from it actually suffering a complete power loss then rebooting only as far as the SRM prompt. Check your SRM config to see what the autoboot setting is.

If you don't mind sharing your binary error log, I don't mind analyzing it.

Nothing in the patches you've installed strikes me as a problem.

Is the box connected to a SAN or other external storage via anything non-fiber?

Jack
Rakesh Jha_1
Advisor

Re: GS140 server crash - no error or warning messages

Jack,

Yuo are right this box was a AS8400 and later upgraded to GS140. Today agian it halted three times after working 4-5 days ok. I asked the services to check the three phase power coming from UPS and they did not find any wrong in power supply.
Please note that this UPS also feeds three phase power to storage which is ok.

If I need to send binary errorlog how I can I send it you ,it is around 256MB.

Thanks,
Rakesh

Re: GS140 server crash - no error or warning messages

Rakesh:

You can just tar/gzip it up and post the error log to an FTP server I can get to on the Internet or if necessary I can set you up an account on one of my servers you can upload to. I've seen them MUCH larger than yours before so don't worry about the size. I suppose you could email it to me as I own my domain's server and have no mailbox size limit for myself, but that's a last resort.

It doesn't surprise me that your power is fine going into, and out of, the UPS; I'm more interested in what the PSUs in your GS140 are doing with it, and even that may be a fool's errand. The binary error log is our best bet of finding something short of actually being on the console when it tanks.

Jack
Rakesh Jha_1
Advisor

Re: GS140 server crash - no error or warning messages

Jack,

Thanks a lot. I tried compressing it but size still 4MB exceeding the limit.

Shall I try sending by email or FTP the file?
Please provide me the details.
Rakesh

Re: GS140 server crash - no error or warning messages

If it's only 4MB, by all means go ahead and email it to me. If you have an FTP server you can post it to, just email me the login information and path to it and I'll take it from there. If you don't have an FTP server you can use, I'll make you an account on one of mine and you can upload it. I'll need to send you login information privately since this is a public forum.

Jack
Rakesh Jha_1
Advisor

Re: GS140 server crash - no error or warning messages

Jack,

Can you please send me your email address so that I can send you binary errlog. Today the box has crashed again wint no trace.

Support vendor today replaced one power supply and reseated the module.

'show power' at SRM prompt seems to be normal. Befor the PS replacement the DC current setting were 10.88A and 9.86A in A and B. Now these are 10.44A and 10.35A

Thanks.