- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- One of the cluster node down
Operating System - Tru64 Unix
1753797
Members
7413
Online
108799
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-11-2010 02:08 AM
тАО01-11-2010 02:08 AM
One of the cluster node down
Hello,
We have a 2 node cluster with TRU64.
Today we found one of the cluster nodes is at the boot prompt. We started the node by giving "b" and the node is back online.
When checked the binary logs, it showed that the system had a CPU panic on Jan 9 09:43:38 2010 itself. See the logs below.
We would like to know if there is any serious problem occurred? How can we analyse more?
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 39869.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sat Jan 9 09:43:38 2010
OCCURRED ON SYSTEM bwgc559
SYSTEM ID x000B0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000001
MESSAGE panic (cpu 1): _ics_unable_to_make_progress:
_heartbeat checking blocked
Additionally we are finding the system to be very slow now,
TOP o/p shows this,
load averages: 7.85, 7.44, 7.42 11:06:58
88 processes: 3 running, 33 waiting, 28 sleeping, 24 idle
CPU states: 0.0% user, 0.0% nice, 99.3% system, 0.5% idle
Memory: Real: 2681M/4007M act/tot Virtual: 1479M use/tot Free: 1208M
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
524288 root 0 0 4559M 76M run 45:30 192.60% kernel idle
528506 root 42 0 0K 0K run 0:33 1.20% icssvr_daemon_
We can see that the system is very much occupied.
Can someone please help?
We have a 2 node cluster with TRU64.
Today we found one of the cluster nodes is at the boot prompt. We started the node by giving "b" and the node is back online.
When checked the binary logs, it showed that the system had a CPU panic on Jan 9 09:43:38 2010 itself. See the logs below.
We would like to know if there is any serious problem occurred? How can we analyse more?
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 39869.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sat Jan 9 09:43:38 2010
OCCURRED ON SYSTEM bwgc559
SYSTEM ID x000B0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000001
MESSAGE panic (cpu 1): _ics_unable_to_make_progress:
_heartbeat checking blocked
Additionally we are finding the system to be very slow now,
TOP o/p shows this,
load averages: 7.85, 7.44, 7.42 11:06:58
88 processes: 3 running, 33 waiting, 28 sleeping, 24 idle
CPU states: 0.0% user, 0.0% nice, 99.3% system, 0.5% idle
Memory: Real: 2681M/4007M act/tot Virtual: 1479M use/tot Free: 1208M
PID USERNAME PRI NICE SIZE RES STATE TIME CPU COMMAND
524288 root 0 0 4559M 76M run 45:30 192.60% kernel idle
528506 root 42 0 0K 0K run 0:33 1.20% icssvr_daemon_
We can see that the system is very much occupied.
Can someone please help?
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-11-2010 11:11 AM
тАО01-11-2010 11:11 AM
Re: One of the cluster node down
This is a somewhat generic panic message. It means that the system couldn't communicate across the cluster interconnect for a specified period (longer than cluster_rebuild_delay, which is 240 seconds by default), so it panicked to take itself out of the cluster. There are a few problems that are known to cause this, with fixes in the latest patch kit for V5.1B.
To determine the specific cause for a particular incident of the crash requires analyzing the crash dump. If you have a support contract with HP, you could log a case to have this done. Or if you can do crash analysis yourself, you could at least determine if it's something already fixed in a newer patch kit than you are running. If neither of those is true, all I can suggest is to put on the latest kit and hope for the best.
Martin
To determine the specific cause for a particular incident of the crash requires analyzing the crash dump. If you have a support contract with HP, you could log a case to have this done. Or if you can do crash analysis yourself, you could at least determine if it's something already fixed in a newer patch kit than you are running. If neither of those is true, all I can suggest is to put on the latest kit and hope for the best.
Martin
I work for HPE
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2010 11:26 PM
тАО09-01-2010 11:26 PM
Re: One of the cluster node down
As mentioned above.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP