- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Network Failure caused reboot?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 10:04 AM
05-16-2003 10:04 AM
Network Failure caused reboot?
The network guys were hot-inserting a card into the switch and caused the switch to reboot this morning. This temporarily interrupted the communication between the primary and takeover nodes. The first thing I notice is that the heartbeat may be running through the network rather than on the crossover cable connecting the two systems. But more importantly, the takeover node rebooted when this happened. Below are the entried from the syslog file. I can find no other information on why it rebooted. Any ideas?
May 16 11:37:18 cadb02a cmcld: Communication to node cadb01a has been interrupted
May 16 11:37:18 cadb02a cmcld: Node cadb01a may have died
May 16 11:37:18 cadb02a cmcld: Attempting to form a new cluster
May 16 11:37:29 cadb02a cmcld: Obtaining Cluster Lock
May 16 11:37:29 cadb02a vmunix: SCSI: Reset requested from above -- lbolt: 53062089, bus: 4
May 16 11:37:30 cadb02a cmcld: Cluster lock was denied. Lock was obtained by another node.
May 16 11:37:30 cadb02a vmunix: SCSI: Resetting SCSI -- lbolt: 53062189, bus: 4
May 16 11:37:30 cadb02a vmunix: SCSI: Reset detected -- lbolt: 53062189, bus: 4
May 16 11:37:34 cadb02a vmunix: NFS server cadb03a not responding still trying
May 16 11:37:30 cadb02a cmcld: Attempting to form a new cluster
May 16 11:37:41 cadb02a cmcld: Cluster lock has been denied
May 16 11:37:41 cadb02a cmcld: Attempting to form a new cluster
~
~
~
~
~
~
~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 10:06 AM
05-16-2003 10:06 AM
Re: Network Failure caused reboot?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 10:13 AM
05-16-2003 10:13 AM
Re: Network Failure caused reboot?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 10:29 AM
05-16-2003 10:29 AM
Re: Network Failure caused reboot?
I don't know a darned thing about Service Gaurd.
But..
May 16 11:37:30 cadb02a vmunix: SCSI: Resetting SCSI -- lbolt: 53062189, bus: 4
May 16 11:37:30 cadb02a vmunix: SCSI: Reset detected -- lbolt: 53062189, bus: 4
Looks like a common hardware problem.
I've seen this kind of stuff triggered by power failures on our switch on one of our older D class systems.
We ended up figuring out tht NIC card was bad and needed to be replaced.
Perhaps its time to do a normal hardware investigation on that second card.
Also, overall, it seems your network configuration isn't all that strong.
We actually plan to bring in Service guard after our training budget is unfrozen. We are going to have a second switch in our HP-9000 rack so that we can have a redundant connection between our machines regardless of whether the core switch is up or down.
Just some things to think about.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 10:38 AM
05-16-2003 10:38 AM
Re: Network Failure caused reboot?
Perfectly normal behavior for an MC/SG cluster. The nodes lost communication with each other, so the cluster reformed. Since the nodes couldn't communicate, they both tried to lock the cluster lock disk. The first one succeeded and reformed the cluster. The node that lost did a TOC so that all its resources and packages would be sure to be free for the other node as needed.
I'd suggest a couple of things. First, slap your network guys for crashing the switch. Next, make sure your heartbeat is configured properly so that each node can still see each other in case of a total lan failure. Are you running a separate lan just for the heartbeat? I like to do that, just using the built-in lan cards plugged into a cheap hub. That way they have a fighting chance of seeing each other despite any nonsense that might be happening on the lan.
JP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 11:33 AM
05-16-2003 11:33 AM
Re: Network Failure caused reboot?
But now I see that the array that serves the takeover node is rebuilding. I'm confused. Could the SCSI lbolt errors been telling me there was a disk going bad in the array or did I get the SCSI lbolt errors because the node was trying to take over the PROD arrays and could not get a lock on them? Is there any way to tell, based on the SCSI -lbolt message, what controller and disk had the problem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-16-2003 11:50 AM
05-16-2003 11:50 AM
Re: Network Failure caused reboot?
http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0xf84063f96280d711abdc0090277a778c,00.html
------------------------------
Now, having said all this - it almost certainly ain't disks. Your cluster completely lost network connectivity and tried to reform. As soon as one box locked the disk, the only safe play for the remaining node was a TOC. It done good. Fix your network. When done correctly, you should be able to yank wires and not break a sweat. After you get your network robust, you need to ask yourself "Now what would happen if I yanked this here SCSI cable (or disk, or power cord)?" This is all MC/SG 101 stuff.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-19-2003 06:11 AM
05-19-2003 06:11 AM