1753505 Members
4940 Online
108794 Solutions
New Discussion юеВ

Server Reboot

 
SOLVED
Go to solution
kunjuttan
Super Advisor

Server Reboot

Hi,

I am having a 2 node active-passive cluster server.Last Sunday my primary node gets rebooted.How to find the reason for the reboot of the node.

Sever Model - Superdome2
OS - HP-UX11.31
13 REPLIES 13
Torsten.
Acclaimed Contributor

Re: Server Reboot

I would first take a look at the shutdownlog, the OLDsyslog and the cluster log.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
kunjuttan
Super Advisor

Re: Server Reboot

Thanks for the update.

But may i know the location and how to find the exact details of the logs.
Dennis Handly
Acclaimed Contributor

Re: Server Reboot

>may I know the location

/var/adm/syslog/
/var/adm/shutdownlog
Torsten.
Acclaimed Contributor

Re: Server Reboot

and /etc/cmcluster

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
kunjuttan
Super Advisor

Re: Server Reboot

I have a doubt.In syslog its showing like heartbeat connection lost.But because of this is the server will reboot?
Torsten.
Acclaimed Contributor

Re: Server Reboot

This could be a reason.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   

Re: Server Reboot

>> In syslog its showing like heartbeat connection lost.

Why not actually post the message(s) from syslog rather than something "like" what it says - then we can give you a better answer...

HTH

Duncan

I am an HPE Employee
Accept or Kudo
kunjuttan
Super Advisor

Re: Server Reboot

Hi,

Please find the attached syslog output.And also I want to know in normal case,if a hearbeat lan got failed,the package will move to other node.But what will the happen to the other node from which package got moved to other node.Is this node will get rebooted?
g3jza
Esteemed Contributor

Re: Server Reboot

Jun 5 14:53:52 bilprdci cmnetd[3812]: 172.16.8.165 failed.

Jun 5 14:53:52 bilprdci cmnetd[3812]: lan900 is down at the IP layer.

Jun 5 14:53:52 bilprdci cmnetd[3812]: lan900 failed.

Jun 5 14:53:52 bilprdci cmnetd[3812]: Subnet 172.16.8.0 down

Jun 5 14:54:43 bilprdci vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xac1008c8 .See ndd -h ip_ire_gw_probe for more info

Jun 5 14:56:46 bilprdci cmnetd[3812]: 172.16.8.165 recovered.

Jun 5 14:56:46 bilprdci cmnetd[3812]: Subnet 172.16.8.0 up

Jun 5 14:56:46 bilprdci cmnetd[3812]: lan900 is up at the IP layer.

Jun 5 14:56:46 bilprdci cmnetd[3812]: lan900 recovered.

Jun 5 15:02:26 bilprdci cmnetd[3812]: 172.16.8.165 failed.

Jun 5 15:02:26 bilprdci cmnetd[3812]: lan900 is down at the IP layer.

Jun 5 15:02:26 bilprdci cmnetd[3812]: lan900 failed.

Jun 5 15:02:26 bilprdci cmnetd[3812]: Subnet 172.16.8.0 down

Jun 5 15:03:43 bilprdci vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xac1008c8 .See ndd -h ip_ire_gw_probe for more info

Jun 5 15:06:43 bilprdci vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xac1008c8 .See ndd -h ip_ire_gw_probe for more info

Jun 5 15:09:43 bilprdci vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xac1008c8 .See ndd -h ip_ire_gw_probe for more info

Jun 5 15:12:52 bilprdci cmnetd[3812]: 172.16.8.165 recovered.

Jun 5 15:12:52 bilprdci cmnetd[3812]: Subnet 172.16.8.0 up

Jun 5 15:12:52 bilprdci cmnetd[3812]: lan900 is up at the IP layer.

Jun 5 15:12:43 bilprdci vmunix: Dead gateway detection can't ping the last remaining default gateway at 0xac1008c8 .See ndd -h ip_ire_gw_probe for more info

Jun 5 15:12:52 bilprdci cmnetd[3812]: lan900 recovered.

Jun 5 15:18:38 bilprdci cmnetd[3812]: Link level address on network interface lan901 has been changed from 0xf4ce46f488fa to 0xf4ce46f48808.

Jun 5 15:18:38 bilprdci cmnetd[3812]: lan901 is down at the data link layer.

Jun 5 15:18:38 bilprdci cmnetd[3812]: lan901 failed.

Jun 5 15:18:38 bilprdci cmnetd[3812]: Subnet 10.10.12.0 down

Jun 5 15:18:39 bilprdci cmcld[3803]: Member bilprddb seems unhealthy, not receiving heartbeats from it.

.......

Looks like some network problems, both of your aggregates went down and the node lost the heartbeat with other nodes, which is possible cause of reboot. You should check it with your network team, what was going on exactly...

How many nodes are in the cluster? In the logs it says that the cluster later on formed with only one node and the other 2 were excluded, is that a 3-node cluster then?