Operating System - HP-UX
1834931 Members
2578 Online
110071 Solutions
New Discussion

Re: Service Guard Errors in Syslog

 
likid0
Honored Contributor

Service Guard Errors in Syslog

Hy,

Its an 11.23 with SG 11.17

After having problems with a fiber channel card, we changed it , and rebooted the server, since then we are getting this errors in syslog about SG:


Jan 5 17:41:37 j cmcld[26634]: Sending file $SGRUN/frdump.cmcld.8 (512096 bytes) to file assistant daemon.
Jan 5 17:41:37 j cmcld[26634]: Unable to set socket buffer size to 524288 bytes (No buffer space available), continuing anyway.
Jan 5 17:41:37 j cmfileassistd[12825]: Updated file /var/adm/cmcluster/frdump.cmcld.8 (length = 512096).
Jan 5 17:41:38 cmcld[26634]: Sending file $SGRUN/frdump.cmcld.9 (61719 bytes) to file assistant daemon.

Looks like its creating a dump, in dmesg you can see:

Serviceguard Aborting!
Cause: Unable to attach to network interface
(File: netsen/hpux/os_ns_statistics.c, Line: 234)
NOTICE: Succesfully get server pgid : 2519

Serviceguard Aborting!
Cause: Unable to attach to network interface
(File: netsen/hpux/os_ns_statistics.c, Line: 234)
Serviceguard Aborting!
Cause: Unable to attach to network interface
(File: netsen/hpux/os_ns_statistics.c, Line: 234)
Serviceguard Aborting!
Cause: Unable to attach to network interface
(File: netsen/hpux/os_ns_statistics.c, Line: 234)
Serviceguard Aborting!

all my nics, look ok, and are working fine

ioscan -fnkC lan
Class I H/W Path Driver S/W State H/W Type Description
===========================================================================
lan 0 1/0/1/1/0/6/0 iether CLAIMED INTERFACE HP AB290-60001 PCI/PCI-X 1000Base-T 2-port U320 SCSI/2-port 1000B-T Combo Adapter
lan 1 1/0/1/1/0/6/1 iether CLAIMED INTERFACE HP AB290-60001 PCI/PCI-X 1000Base-T 2-port U320 SCSI/2-port 1000B-T Combo Adapter
lan 2 1/0/12/1/0/6/0 igelan CLAIMED INTERFACE HP A9784-60002 PCI/PCI-X 1000Base-T FC/GigE Combo Adapter
lan 3 1/0/14/1/0/6/0 igelan CLAIMED INTERFACE HP A9784-60002 PCI/PCI-X 1000Base-T FC/GigE Combo Adapter

any idea? or mor info you may need?

Thnx for your help!.

Windows?, no thanks
3 REPLIES 3
likid0
Honored Contributor

Re: Service Guard Errors in Syslog

I have found, in the logs, when the cluster started, it didn't like the new mac from the card we changed, this explains the errors in dmesg:


Dec 28 03:38:39 jjj cmcld[8235]: Heartbeat Subnet: 10.132.4.0
Dec 28 03:38:39 jjj cmcld[8235]: Heartbeat Subnet: 10.10.10.0
Dec 28 03:38:39 jjj cmcld[8235]: The maximum # of concurrent local connection
s to the daemon that will be supported is 4050.
Dec 28 03:38:39 jjj cmcld[8235]: DLPI ack error for primitive 11, errno 8, un
ix errno 0
Dec 28 03:38:39 jjj cmcld[8235]: Unable to get DLPI attach ack from ppa 3, 16
: Device busy
Dec 28 03:38:39 jjj cmcld[8235]: Aborting! Unable to attach to network interf
ace
Dec 28 03:38:41 jjj cmlogd: Unable to initialize with Serviceguard cluster da
emon (cmcld): Software caused connection abort
Dec 28 03:38:41 jjj cmsrvassistd[8238]: Lost connection with Serviceguard clu
ster daemon (cmcld): Software caused connection abort
Dec 28 03:39:14 jjj cmclconfd[8233]: The Serviceguard daemon, /usr/lbin/cmcld
[8235], died upon receiving signal number 6.
Dec 28 16:38:22 jjj cmcld[26634]: Logging level changed to level 0.
Dec 28 16:38:22 jjj cmcld[26634]: Daemon Initialization - Maximum number of p
ackages supported for this incarnation is 150.
Dec 28 16:38:22 jjj cmcld[26634]: Global Cluster Information:
Dec 28 16:38:22 jjj cmcld[26634]: Heartbeat Interval is 2.00 seconds.
Dec 28 16:38:22 jjj cmcld[26634]: Logging level changed to level 0.
Dec 28 16:38:22 jjj cmcld[26634]: Node Timeout is 10.00 seconds.
Dec 28 16:38:22 jjj cmcld[26634]: Network Polling Interval is 2.00 seconds.
Dec 28 16:38:22 jjj cmcld[26634]: IO Timeout Extension is 0.00 seconds.
Dec 28 16:38:22 jjj cmcld[26634]: Auto Start Timeout is 600.00 seconds.
Dec 28 16:38:22 jjj cmcld[26634]: Failover Optimization is disabled.
Dec 28 16:38:22 jjj cmcld[26634]: Information Specific to node jjj:
Dec 28 16:38:22 jjj cmcld[26634]: Cluster lock disk: /dev/dsk/c26t0d7.
Dec 28 16:38:22 jjj cmcld[26634]: lan0 0x001560de0932 10.132.4.126 bridged
net:1
Dec 28 16:38:22 jjj cmcld[26634]: lan3 0x0016353e32ef standby bridged
net:1
Dec 28 16:38:22 jjj cmcld[26634]: lan2 0x0016353e420b 10.10.10.3 bridged n
et:2
Dec 28 16:38:22 jjj cmcld[26634]: Heartbeat Subnet: 10.132.4.0
Dec 28 16:38:22 jjj cmcld[26634]: Heartbeat Subnet: 10.10.10.0
Dec 28 16:38:22 jjj cmcld[26634]: The maximum # of concurrent local connectio
ns to the daemon that will be supported is 4050.
Dec 28 16:38:22 jjj cmcld[26634]: Link level address on network interface lan
3 has been updated from 0x0016353e32ef to 0x0016353ec421.
Dec 28 16:38:22 jjj cmcld[26634]: Proceeding with the new configuration.
Dec 28 16:38:22 jjj cmcld[26634]: rcomm health: Initializing timeout to 1425
00000 microseconds
Dec 28 16:38:23 jjj cmcld[26634]: Total allocated: 35106776 bytes, used: 2039
936 bytes, unused 33066832 bytes

what i dont get is the buffer error..
Unable to set socket buffer size to 524288 bytes (No buffer space available), continuing anyway.

thnx
Windows?, no thanks
Roberto Arias
Valued Contributor

Re: Service Guard Errors in Syslog

Hi Orange:

I think that you have any parameter of kernel with values incorrect ( by the messages "No buffer space available").
Please check values of parameters of kernel and modify values more high if is necesary
regards
The man is your friend
likid0
Honored Contributor

Re: Service Guard Errors in Syslog

I also had messages i had lost the cluster lock, once i remade the zonning of the disk, and got the cluster lock disk up again, all the errors have gone, the buffer message was from the cmfileassistd needed more space to send the dumps.

Thnx for the help
Windows?, no thanks