Operating System - HP-UX
1836560 Members
2023 Online
110102 Solutions
New Discussion

"Timed out node" message followed by "SCSI resetting"...

 
a8965
Occasional Advisor

"Timed out node" message followed by "SCSI resetting"...

Cluster of 2 HP servers rp5470 in HPUX 11.00, connected to 2 DS2300.
ServiceGuard release A.11.14

Active server goes down without giving any information. It's no more possible to connect to it. Only solution is to reboot it through GSP.

On the standby server, I find in syslog.log file the following message :

Feb 24 20:17:12 cdrc2 cmcld: (cdrc1) Started package pkg_cdrc on node cdrc1.
Feb 25 11:47:59 cdrc2 cmcld: Timed out node cdrc1. It may have failed.
Feb 25 11:47:59 cdrc2 cmcld: Attempting to adjust cluster membership
Feb 25 11:48:06 cdrc2 vmunix: SCSI: Reset requested from above -- lbolt: 15278134, bus: 6
Feb 25 11:48:06 cdrc2 cmcld: Obtaining Cluster Lock
Feb 25 11:48:07 cdrc2 vmunix: SCSI: Resetting SCSI -- lbolt: 15278234, bus: 6
Feb 25 11:48:07 cdrc2 vmunix: SCSI: Reset detected -- lbolt: 15278234, bus: 6
Feb 25 11:48:12 cdrc2 EMS [1367]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/defa
ult/0_12_0_0_4_0.0.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 89587
734 -r /storage/events/disks/default/0_12_0_0_4_0.0.0 -n 89587717 -a
Feb 25 11:48:15 cdrc2 cmcld: Turning off safety time protection since the cluster
Feb 25 11:48:15 cdrc2 cmcld: may now consist of a single node. If ServiceGuard
Feb 25 11:48:15 cdrc2 cmcld: fails, this node will not automatically halt
Feb 25 11:49:05 cdrc2 cmcld: 1 nodes have formed a new cluster, sequence #4
Feb 25 11:49:05 cdrc2 cmcld: The new active cluster membership is: cdrc2(id=2)

Feb 25 12:29:08 cdrc2 vmunix: SCSI: Reset detected -- lbolt: 15524308, bus: 6
Feb 25 12:29:08 cdrc2 vmunix: SCSI: Reset detected -- lbolt: 15524308, bus: 6

I've checked all the disks & disks connection (ioscan, stm...): everything seems to be ok.

I found that a similar pb should be corrected by PHSS_26056 (point 28). this patch was installed on the system, but the pb still present. I installed PHSS_30028 (latest MC/SG patch) but the pb is still present.

This pb occurs since the cluster creation.
I'd like to know if I can find a solution to this...
4 REPLIES 4
melvyn burnard
Honored Contributor

Re: "Timed out node" message followed by "SCSI resetting"...

sounds like you need to have the active node checked out. Did it TOC? i.e. was there a panic, check /etc/shutdownlog.
If there was, did it save a crash dump?
If so, I strongly recommend you log a call with your local HP response Centre and have the TOC dump analysed.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
a8965
Occasional Advisor

Re: "Timed out node" message followed by "SCSI resetting"...

What I'm sure is that :
* there is no crash dump
* no message in /etc/shutdownlog
* syslog.log stopped brutally (no reboot message).
* no SCSI error message in syslog.log
* the server didn't reboot (as it should do with a system panic).

Connection was possible only through LAN Console with nothing on the screen.
So "GSP> rs" was the only solution...
melvyn burnard
Honored Contributor

Re: "Timed out node" message followed by "SCSI resetting"...

I suspect that you have had a serious hardware failure, that casues the system to die without doing a dump.
Get a hardware call placed and have the system checked out.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
a8965
Occasional Advisor

Re: "Timed out node" message followed by "SCSI resetting"...

I've checked MC/SG config again and found that there was only 1 HB route defined through the network...

I created a direct one. My active server ran for 24 hours without any pb. I thought it was solved...

But I faced again my pb : active server in unstable state and previous messages in syslog.log of standby server.

So let's go for a HP call...
Thanks.