"Timed out node" message followed by "SCSI resetting"...

a8965 · ‎02-24-2004

Cluster of 2 HP servers rp5470 in HPUX 11.00, connected to 2 DS2300.
ServiceGuard release A.11.14

Active server goes down without giving any information. It's no more possible to connect to it. Only solution is to reboot it through GSP.

On the standby server, I find in syslog.log file the following message :

Feb 24 20:17:12 cdrc2 cmcld: (cdrc1) Started package pkg_cdrc on node cdrc1.
Feb 25 11:47:59 cdrc2 cmcld: Timed out node cdrc1. It may have failed.
Feb 25 11:47:59 cdrc2 cmcld: Attempting to adjust cluster membership
Feb 25 11:48:06 cdrc2 vmunix: SCSI: Reset requested from above -- lbolt: 15278134, bus: 6
Feb 25 11:48:06 cdrc2 cmcld: Obtaining Cluster Lock
Feb 25 11:48:07 cdrc2 vmunix: SCSI: Resetting SCSI -- lbolt: 15278234, bus: 6
Feb 25 11:48:07 cdrc2 vmunix: SCSI: Reset detected -- lbolt: 15278234, bus: 6
Feb 25 11:48:12 cdrc2 EMS [1367]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/defa
ult/0_12_0_0_4_0.0.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 89587
734 -r /storage/events/disks/default/0_12_0_0_4_0.0.0 -n 89587717 -a
Feb 25 11:48:15 cdrc2 cmcld: Turning off safety time protection since the cluster
Feb 25 11:48:15 cdrc2 cmcld: may now consist of a single node. If ServiceGuard
Feb 25 11:48:15 cdrc2 cmcld: fails, this node will not automatically halt
Feb 25 11:49:05 cdrc2 cmcld: 1 nodes have formed a new cluster, sequence #4
Feb 25 11:49:05 cdrc2 cmcld: The new active cluster membership is: cdrc2(id=2)

Feb 25 12:29:08 cdrc2 vmunix: SCSI: Reset detected -- lbolt: 15524308, bus: 6
Feb 25 12:29:08 cdrc2 vmunix: SCSI: Reset detected -- lbolt: 15524308, bus: 6

I've checked all the disks & disks connection (ioscan, stm...): everything seems to be ok.

I found that a similar pb should be corrected by PHSS_26056 (point 28). this patch was installed on the system, but the pb still present. I installed PHSS_30028 (latest MC/SG patch) but the pb is still present.

This pb occurs since the cluster creation.
I'd like to know if I can find a solution to this...

melvyn burnard · ‎02-24-2004

sounds like you need to have the active node checked out. Did it TOC? i.e. was there a panic, check /etc/shutdownlog.
If there was, did it save a crash dump?
If so, I strongly recommend you log a call with your local HP response Centre and have the TOC dump analysed.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

a8965 · ‎02-24-2004

What I'm sure is that :
* there is no crash dump
* no message in /etc/shutdownlog
* syslog.log stopped brutally (no reboot message).
* no SCSI error message in syslog.log
* the server didn't reboot (as it should do with a system panic).

Connection was possible only through LAN Console with nothing on the screen.
So "GSP> rs" was the only solution...

melvyn burnard · ‎02-24-2004

I suspect that you have had a serious hardware failure, that casues the system to die without doing a dump.
Get a hardware call placed and have the system checked out.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

a8965 · ‎02-26-2004

I've checked MC/SG config again and found that there was only 1 HB route defined through the network...

I created a direct one. My active server ran for 24 hours without any pb. I thought it was solved...

But I faced again my pb : active server in unstable state and previous messages in syslog.log of standby server.

So let's go for a HP call...
Thanks.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

"Timed out node" message followed by "SCSI resetting"...

"Timed out node" message followed by "SCSI resetting"...

Re: "Timed out node" message followed by "SCSI resetting"...

Re: "Timed out node" message followed by "SCSI resetting"...

Re: "Timed out node" message followed by "SCSI resetting"...

Re: "Timed out node" message followed by "SCSI resetting"...