- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: strange service guard problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2005 12:45 AM
08-22-2005 12:45 AM
strange service guard problem
We got 2 rp4440 with HPUX 11.11 and MC version 11.14, after it works for 4 months normally, some strange thing happened:
on 1sr server, syslog.log show the following info:
...... skip ........
Aug 21 17:46:28 omc1scs1 cmcld: HB connection to 172.168.0.2 not responding, closing
Aug 21 17:46:28 omc1scs1 cmcld: GS connection to 172.168.0.2 not responding, closing
Aug 21 22:14:44 omc1scs1 named[611]: zone oss.nmuni.com/IN: refresh: unexpected rcode (REFUSED) from master 172.168.0.5#8054
Aug 21 23:30:30 omc1scs1 cmcld: HB connection to 172.168.0.2 is responding
Aug 21 23:30:30 omc1scs1 cmcld: GS connection to 172.168.0.2 is responding
Aug 21 23:52:30 omc1scs1 cmcld: HB connection to 172.168.0.2 not responding, closing
Aug 21 23:52:30 omc1scs1 cmcld: GS connection to 172.168.0.2 not responding, closing
Aug 22 01:32:30 omc1scs1 cmcld: HB connection to 172.168.0.2 is responding
Aug 22 01:32:30 omc1scs1 cmcld: GS connection to 172.168.0.2 is responding
Aug 22 01:54:30 omc1scs1 cmcld: HB connection to 172.168.0.2 not responding, closing
Aug 22 01:54:30 omc1scs1 cmcld: GS connection to 172.168.0.2 not responding, closing
Aug 22 02:12:31 omc1scs1 cmcld: GS connection to 172.168.0.2 is responding
Aug 22 02:34:31 omc1scs1 cmcld: GS connection to 172.168.0.2 not responding, closing
Aug 22 02:55:31 omc1scs1 named[611]: zone oss.nmuni.com/IN: refresh: unexpected rcode (REFUSED) from master 172.168.0.5#8054
Aug 22 03:34:31 omc1scs1 cmcld: GS connection to 172.168.0.2 is responding
Aug 22 03:56:31 omc1scs1 cmcld: GS connection to 172.168.0.2 not responding, closing
Aug 22 07:36:33 omc1scs1 named[611]: zone oss.nmuni.com/IN: refresh: unexpected rcode (REFUSED) from master 172.168.0.5#8054
Aug 22 09:06:18 omc1scs1 rlogind[22425]: Login failure (exit(1) from login(1))
Aug 22 10:09:38 omc1scs1 su: + ta root-omc
Aug 22 10:22:33 omc1scs1 cmcld: HB connection to 172.168.0.2 is responding
Aug 22 10:22:33 omc1scs1 cmcld: GS connection to 172.168.0.2 is responding
and on the 2nd server, syslog.log show the following info:
....... skip .......
Aug 21 17:48:28 omc1dbsr cmcld: accept returned: No buffer space available
Aug 21 17:47:18 omc1dbsr named[611]: zone oss.nmuni.com/IN: refresh: failure trying master 172.168.0.5#8054: timed out
Aug 21 17:48:28 omc1dbsr above message repeats 3 times
Aug 21 17:48:28 omc1dbsr cmcld: Retrying accept due to a transient problem: No buffer space available.
Aug 21 17:48:28 omc1dbsr cmcld: accept returned: Resource temporarily unavailable
Aug 21 17:48:28 omc1dbsr cmcld: Retrying accept due to a transient problem: Resource temporarily unavailable.
Aug 21 17:48:28 omc1dbsr cmcld: accept failed due to a kernel problem: Resource temporarily unavailable.
Aug 21 17:48:31 omc1dbsr cmcld: accept returned: No buffer space available
Aug 21 17:48:28 omc1dbsr cmcld: Retrying accept due to a transient problem: Resource temporarily unavailable.
Aug 21 17:48:31 omc1dbsr cmcld: Retrying accept due to a transient problem: No buffer space available.
Aug 21 17:48:31 omc1dbsr cmcld: Retrying accept due to a transient problem: Resource temporarily unavailable.
Aug 21 17:48:31 omc1dbsr cmcld: accept failed due to a kernel problem: Resource temporarily unavailable.
Aug 21 17:48:31 omc1dbsr cmcld: Retrying accept due to a transient problem: Resource temporarily unavailable.
Aug 21 17:48:37 omc1dbsr cmcld: accept returned: No buffer space available
Aug 21 17:48:31 omc1dbsr cmcld: accept returned: Resource temporarily unavailable
Aug 21 17:48:37 omc1dbsr above message repeats 5 times
so the problem(Resource temporarily unavailable) happened timely, it works for 20-40 minuates and then fail again, what's the log file means for my system? And how can I check the system? the package didn't swithced, so nothing error in /etc/cmcluster.
please help!
wapper
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2005 12:51 AM
08-22-2005 12:51 AM
Re: strange service guard problem
How much memory is in these systems?
Can you post a: kmtune
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2005 01:04 AM
08-22-2005 01:04 AM
Re: strange service guard problem
1. name server - You are running named and it has got some problems.
Check if named is running fine or not. Check if there were any recent changes in configuration of it or not.
2. The hearbeat link and SG link has problems. The name service resolution for cluser hosts should be done using /etc/hosts. That is fast, local and very easy to manage.
This problems seems to be depending upon problem 1.
Make us of /etc/hosts file for SG cluster nodes. Check what network problems were there. netfmt -f /var/adm/nettl.LOGxx
3. No buffer space available.
Many errors can give out this message. Possible checks. - swapinfo -mat, glance -m (memeory utilization)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2005 01:47 AM
08-22-2005 01:47 AM
Re: strange service guard problem
1, both of the server has 8 GB memory
2, BIND version is 9.2
3, named has some problem, that's true! we use namesurf as primary dns server, and it works on one of the SG package, and BIND be installed on each server as secondary dns, so all configuration file under /var/named are generated by named when it is start, but it only happened on 1sr server(omc1scs1), on 2nd server(omc1dbsr), the transfer failed because there is no port 8054( this is the lisenting port for the package which namesurf running)listening! so there is file transfer error like message in syslog.log in omc1dbsr, I can't access the server now because I am not physical there and it is night in China;-), but I will check the other thing later. Unfortunatelly I have no idea why 2nd server can't contact that port, the /etc/named.conf are same in both servers and nslookup works fine on omc1dbsr even the file transfer failed...
thanks!
wapper
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2005 02:23 AM
08-22-2005 02:23 AM
Re: strange service guard problem
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=653340
It describes very similar symptoms, no really obvious answer, several patches recommended.
We have a very similar setup to yours and it's fine so far, MC is 11.16 though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2005 02:56 AM
08-22-2005 02:56 AM
Re: strange service guard problem
Patch mentioned Latest version
PHSS_30028 PHSS_32260
PHNE_29473 PHNE_33395
I hope this helps