Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Itanium & Alpha Server crashed

Harry M
Occasional Contributor

Itanium & Alpha Server crashed

Dear Friends,
we have Four Alpha server with 7.3-2 OS & one Itanium(I64) server with 8.31H1 OS, the Alpha servers are connected in cluster using Memory Channel/LAN and Itanium through Lan Cluster alone. All were working fine,recently Itanium was crashed and removed from existing cluster and parllely one alpha node also get crashed ......I suspect some Network issue was happend at that time and since LAN was disturbed the Itanium node got crashed and parallely by Alpha node....please share your knowledge regarding this issue...if u have any queries reply ...........

with regards
Harichander
5 REPLIES
Harry M
Occasional Contributor

Re: Itanium & Alpha Server crashed

gf
Harry M
Occasional Contributor

Re: Itanium & Alpha Server crashed

reply.......
Rob Leadbeater
Honored Contributor

Re: Itanium & Alpha Server crashed

Hi,

Crashed how exactly ? Any error messages to go on ?

Please help us to help you by supplying much more information.

Cheers,

Rob
Robert Gezelter
Honored Contributor

Re: Itanium & Alpha Server crashed

Harichander,

Without more detailed information, it is virtually impossible to identify the cause of the crash.

As a first question, were the systems properly configured to write crash dumps to disk in the event of a failure? If so, were the dumps from these crashes saved?

- Bob Gezelter, http://www.rlgsc.com
Hoff
Honored Contributor

Re: Itanium & Alpha Server crashed

What might you mean by "LAN was disturbed?" You are here aware of and reporting something -- what? -- that happened to the LAN that then apparently triggered the OpenVMS I64 Integrity server and an OpenVMS Alpha node to crash? Yes, an unstable network can cause cluster member crashes, usually when a timeout fires, or when connectivity is restored. You'll typically see CLUEXIT crashes here.

An unstable network makes for an unstable cluster.

If you have HP support, call support now.

If you don't have HP support, consider calling in help. Particularly if this is a production environment, as your periodic pings to this topic might imply.

If you wish to pursue this here in ITRC, first acquire the crashdumps minimally, and post the CLUE CRASH output acquired from the crashdumps here as an attached text file.

If you're not collecting crashdumps for each node, you're effectively not operating in what most would consider a production configuration. Configuring to acquire and to save crashdumps are often central to resolving failures and quickly resuming production operations. Fix that now, and reboot the nodes at your next opportunity.

For what is known of this now, this could potentially be buggy software, buggy hardware, buggy network, excessive radiation or magnetic fields triggering memory errors or other similar instabilities, cluster configuration issues, problems with the local power supply, who really knows?

Stephen Hoffman
HoffmanLabs LLC