- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- one node reboot
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-04-2003 05:42 PM
тАО06-04-2003 05:42 PM
I have a two node cluster.
2* N4000/HP-UX 11.00
B3935DA A.11.12 MC / Service Guard
Yesterday about 19:45 one of the node rebooted and all the package running on it moved to the other node.
From the OLDsyslog.log I can understand that there seems to have some problem with the samba.
What is this error and what needs to be done?
OLDsyslog.log attached.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-04-2003 06:31 PM
тАО06-04-2003 06:31 PM
Re: one node reboot
Man that's UGLY.
I see that as a cascade failure.
First messages indicate timeouts hinting at network trouble.
Then the first set of errors shows that Samba couldn't open it's DB file which looks like all the world like a connection problem. Then that's reinforced by the inability to create network sockets. Then you seem to exhaust file locks - game's over.
That's a classic "reboot or it ain't gonna recover" scenario - hence the system paniced.
I'd start by asking for network logs & system logs from the *other* end of those connections because I see no errors for the local NIC. By that I mean this system could have well been the "victim" of severe trouble elsewhere. But of the sort where the NIC to switch link never dropped.
My $0.02,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-04-2003 06:44 PM
тАО06-04-2003 06:44 PM
Re: one node reboot
Enclosed is the syslog.log of the 2nd node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-04-2003 06:57 PM
тАО06-04-2003 06:57 PM
Re: one node reboot
To *where* were these samba connections ? I'd bet that system or a network device in it's subnet lunched.
I strongly advise you also look at the Service Guard package logs on both systems for further clues. Usually located in /etc/cmcluster/pkg_name.
Rgds,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-04-2003 09:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-05-2003 12:57 AM
тАО06-05-2003 12:57 AM
Re: one node reboot
Jun 4 19:30:24 ijmsia02 cmcld: Timed out node ijmsia01. It may have failed.
Jun 4 19:30:24 ijmsia02 cmcld: Attempting to form a new cluster
Jun 4 19:30:37 ijmsia02 nmbd[2331]: [2003/06/04 19:30:37, 0] nmbd/nmbd_become_lmb.c:(404)
Jun 4 19:30:37 ijmsia02 nmbd[2331]: *****
Jun 4 19:30:37 ijmsia02 nmbd[2331]:
Jun 4 19:30:37 ijmsia02 nmbd[2331]: Samba name server IJMSIAFS01 is now a local master browser for workgroup SGP.HP.COM on subnet 15.85.28.36
Jun 4 19:30:45 ijmsia02 cmcld: Obtaining Cluster Lock
Jun 4 19:30:46 ijmsia02 cmcld: Turning off safety time protection since the cluster
This is telling us that the second node was unable to communicate with the first via any of it's heartbeat networks - therefore it didn't know the state of the first node and a race for the cluster lock occurred. The second node won this race, so the first node was TOC'd.
As the others have indicated, you seem to have some kind of network issue - this may be in the network itself, or on either node. My advice would be to ensure you are bang up-to-date with all network related patches on both nodes, and see if the problem persists.
HTH
Duncan
I am an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-05-2003 01:29 AM
тАО06-05-2003 01:29 AM
Re: one node reboot
Short time workaround may be to increase the heartbeat timout and heartbeat interval in the cluster config.
Rgds Jarle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2003 07:27 AM
тАО07-23-2003 07:27 AM
Re: one node reboot
Not exactly clear to me if you got the answers you needed - but as I'm just crawling out from the smoking ruins of completely the same experience (down to the tdb nagging failing locks) I'm more than happy to share..!
Do follow the CIFS/9000 (HP-name for Samba) installation-guide - and pay *SPECIAL* attention to the newish requirements for kernel-parameters for the newer versions! (You can find the guides that correspond to your version of "CIFS/9000"/Samba on http://www.docs.hp.com/hpux/netcom/index.html#CIFS/9000) The 'rule of thumb' seems to be something like "10 times as many 'nflocks' as users and 23 times as many 'nfiles' as users".
I'm sorry to say that our CIFS/9000-server failed miserably even if it was well inside these boundaries and had more than 35 filelocks per user at the time of the crash - it is however catering to software-developers, which could possibly translate into "LOTS of open files at any one point in time" and maybe the figures above (the factor 10 part) needs to be adjusted according to the load-type..?! (Jury's still out on that one! :-)
Br.
Claus