- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Node eviction due to lost heartbeat interconnect
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2010 08:46 PM
тАО09-08-2010 08:46 PM
Node eviction due to lost heartbeat interconnect
We have installed Oracle 11g database RAC on HP-UX 11iv3.Last week we had a system crash. We analysed the logs and found that ORACLE CRS has initiated the crash. On further analysing the oracle clusterware logs we have found that the node eviction is due to cluster interconnect lost i.e: heartbeat fatal eviction and the possible action suggested by oracle si to checkthe availability of networks (heartbeat) and the os logfiles for reported error related to the interconnect.
But we didn't find any such errors in the OS part(HP-UX 11iv3)Kindly tell us what are all the logs to be checked for heartbeat link failure in the os part.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2010 09:02 PM
тАО09-08-2010 09:02 PM
Re: Node eviction due to lost heartbeat interconnect
from /var/adm/syslog directory says at the time of crash.
BR,
Kapil+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2010 09:11 PM
тАО09-08-2010 09:11 PM
Re: Node eviction due to lost heartbeat interconnect
Aug 21 08:03:00 bap02 ntpdate[8599]: the NTP socket is in use, exiting.
And the syslog contains messages that are captured after reboot
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2010 09:45 PM
тАО09-08-2010 09:45 PM
Re: Node eviction due to lost heartbeat interconnect
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-08-2010 09:54 PM
тАО09-08-2010 09:54 PM
Re: Node eviction due to lost heartbeat interconnect
Please paste the ouput of
netfmt -f /var/adm/nettl.LOG000
This will give the deatils of link down and up.
Manoj K
Manoj K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-09-2010 12:47 AM
тАО09-09-2010 12:47 AM
Re: Node eviction due to lost heartbeat interconnect
Here is the log....
***********************************STREAMS/UX*******************************@#%
Timestamp : Sat Aug 21 IST 2010 09:40:09.276563
Process ID : 4822 Subsystem : STREAMS
User ID ( UID ) : 500 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
Location : 00123
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 09:40:09 165915 1 T.. 2224 24 tl_wput:T_OPTMGMT_REQ:out of state, state=10
***********************************STREAMS/UX*******************************@#%
Timestamp : Sat Aug 21 IST 2010 09:40:09.277966
Process ID : 4822 Subsystem : STREAMS
User ID ( UID ) : 500 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
Location : 00123
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 09:40:09 165915 1 T.. 2224 24 tl_wput:T_OPTMGMT_REQ:out of state, state=10
***********************************STREAMS/UX*******************************@#%
Timestamp : Sat Aug 21 IST 2010 09:40:09.279800
Process ID : 4822 Subsystem : STREAMS
User ID ( UID ) : 500 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
Location : 00123
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 09:40:09 165915 1 T.. 2224 24 tl_wput:T_OPTMGMT_REQ:out of state, state=10
***********************************STREAMS/UX*******************************@#%
Timestamp : Sat Aug 21 IST 2010 09:40:09.280474
Process ID : 4822 Subsystem : STREAMS
User ID ( UID ) : 500 Log Class : ERROR
Device ID : 0 Path ID : 0
Connection ID : 0 Log Instance : 0
Location : 00123
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 09:40:09 165915 1 T.. 2224 24 tl_wput:T_OPTMGMT_REQ:out of state, state=10
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-09-2010 01:23 AM
тАО09-09-2010 01:23 AM
Re: Node eviction due to lost heartbeat interconnect
1) In clusterwarelog which time it is showing the interconnect lost?
2) Does ServiceGuard configured in the servers?
3) Which user is having User ID(UID) 500 ?
4) Attach clusterwarelog?
5) Paste netstat -in output?
6) Whcih lan is using for public and private?
Manoj K
Manoj K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-09-2010 01:59 AM
тАО09-09-2010 01:59 AM
Re: Node eviction due to lost heartbeat interconnect
1)In clusterwarelog which time it is showing the interconnect lost
2010-08-21 08:23:06
2)Does ServiceGuard configured in the servers?
No
3)Which user is having User ID(UID) 500 ?
Oracle
4) Attach clusterwarelog?Attached:
2010-08-21 08:23:06.294
[cssd(3474)]CRS-1612:node bap01 (0) at 50% heartbeat fatal, eviction in 0.000 seconds
2010-08-21 08:23:07.294
[cssd(3474)]CRS-1612:node bap01 (0) at 50% heartbeat fatal, eviction in 0.000 seconds
2010-08-21 08:23:14.294
[cssd(3474)]CRS-1611:node bap01 (0) at 75% heartbeat fatal, eviction in 0.000 seconds
2010-08-21 08:23:18.294
[cssd(3474)]CRS-1610:node bap01 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
2010-08-21 08:23:19.294
[cssd(3474)]CRS-1610:node bap01 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
2010-08-21 08:23:20.294
[cssd(3474)]CRS-1610:node bap01 (0) at 90% heartbeat fatal, eviction in 0.000 seconds
2010-08-21 09:24:25.928
[cssd(3518)]CRS-1605:CSSD voting file is online: /dev/oracle/asmvot1. Details in /home/oracle/product/CRS/log/bap02/cssd/ocssd.log.
[cssd(3518)]CRS-1601:CSSD Reconfiguration complete. Active nodes are bap01 bap02 .
2010-08-21 09:24:26.862
[evmd(3298)]CRS-1401:EVMD started on node bap02.
2010-08-21 09:24:26.956
[crsd(3309)]CRS-1012:The OCR service started on node bap02.
2010-08-21 09:24:29.419
[crsd(3309)]CRS-1201:CRSD started on node bap02.
201
5)Paste netstat -in output
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lo0 32808 127.0.0.0 127.0.0.1 113371872 0 113372555 0 0
lan901 1500 10.7.3.0 10.7.3.201 164089015 0 122297967 0 0
lan900 1500 10.7.1.0 10.7.1.201 580696289 0 1176359227 0 0
lan900:801 1500 10.7.1.0 10.7.1.206 256740362 0 22666 0 0
6)Which lan is using for public and private
All are private only.
Thanks in Advance..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-09-2010 02:36 AM
тАО09-09-2010 02:36 AM
Re: Node eviction due to lost heartbeat interconnect
>>>6)Which lan is using for public and private
>>>All are private only.
is not clear.
Run the command "oifcfg getif" and provide the output.
There was any time differenec between the RAC nodes?
Manoj K
Manoj K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-09-2010 03:14 AM
тАО09-09-2010 03:14 AM
Re: Node eviction due to lost heartbeat interconnect
Sorry ignore that...
Here is the output of oifcfg
lan901 10.7.3.0 global cluster_interconnect
lan900 10.7.1.0 global public
Thanks in advance