- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Unexpected Server Shutdown..
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2005 03:34 PM
06-08-2005 03:34 PM
Jun 8 08:47:31 batch cmcld[1414]: Warning: cmcld process was unable to run for the last 9 seconds
Jun 8 08:47:31 batch cmcld[1414]: Communication to node online has been interrupted
Jun 8 08:47:31 batch cmcld[1414]: Node online may have died
Jun 8 08:47:31 batch cmcld[1414]: Attempting to form a new cluster
Jun 8 08:47:34 batch cmcld[1414]: 2 nodes have formed a new cluster, sequence #222
Jun 8 08:47:34 batch cmcld[1414]: The new active cluster membership is: online(id=1), batch(id=2)
Jun 8 08:47:43 batch cmcld[1414]: Warning: cmcld process was unable to run for the last 4 seconds
Jun 8 08:47:43 batch cmcld[1414]: Communication to node online has been interrupted
Jun 8 08:47:43 batch cmcld[1414]: Node online may have died
Jun 8 08:47:43 batch cmcld[1414]: Attempting to form a new cluster
Jun 8 08:47:46 batch cmcld[1414]: 2 nodes have formed a new cluster, sequence #223
Jun 8 08:47:46 batch cmcld[1414]: The new active cluster membership is: online(id=1), batch(id=2)
Jun 8 08:49:06 batch vmunix:
Jun 8 08:49:06 batch vmunix: SCSI: Request Timeout -- lbolt: 234378731, dev: 1f001100
Jun 8 08:49:06 batch vmunix: lbp->state: 4020
Jun 8 08:49:06 batch vmunix: lbp->offset: 2f8
Jun 8 08:49:06 batch vmunix: lbp->uPhysScript: 480000
Jun 8 08:49:06 batch vmunix: From most recent interrupt:
Jun 8 08:49:06 batch vmunix: ISTAT: 2a, SIST0: 90, SIST1: 00, DSTAT: 80, DSPS: 01e80000
Jun 8 08:49:06 batch vmunix: NCR chip register access history (most recent last): 134862816 accesses
Jun 8 08:49:06 batch vmunix: 50, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 1694, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 914, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 896, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 0, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 6901, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 764, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 885, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 890, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 896, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 0, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 0, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 7145, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 2700, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 58, ISTAT<-20
Jun 8 08:49:06 batch vmunix: 0, ISTAT<-20
Jun 8 08:49:06 batch vmunix: lsp: 285a4e00
Jun 8 08:49:06 batch vmunix: bp->b_dev: 1f001100
Jun 8 08:49:06 batch vmunix: scb->io_id: d3c725
Jun 8 08:49:06 batch vmunix: scb->cdb: 28 00 00 72 52 90 00 00 10 00
Jun 8 08:49:06 batch vmunix: lbolt_at_timeout: 234378531, lbolt_at_start: 234375531
Jun 8 08:49:06 batch vmunix: lsp->state: 205
Jun 8 08:49:06 batch vmunix: lbp->owner: 26238280
Jun 8 08:49:06 batch vmunix: bp->b_dev: 1f001500
Jun 8 08:49:06 batch vmunix: scb->io_id: d3c726
Jun 8 08:49:06 batch vmunix: scb->cdb: 28 00 00 72 52 90 00 00 10 00
Jun 8 08:49:06 batch vmunix: lbolt_at_timeout: 234375531, lbolt_at_start: 234375531
Jun 8 08:49:06 batch vmunix: lsp->state: 10d
Jun 8 08:49:06 batch vmunix:
Jun 8 08:49:06 batch vmunix: SCSI: Request Timeout -- lbolt: 234378731, dev: 1f001500
Jun 8 08:49:06 batch vmunix: lbp->state: 4020
Jun 8 08:49:06 batch vmunix: lbp->offset: ffffffff
Jun 8 08:49:06 batch vmunix: lbp->uPhysScript: 480000
Jun 8 08:49:06 batch vmunix: From most recent interrupt:
Jun 8 08:49:06 batch vmunix: ISTAT: 22, SIST0: 04, SIST1: 00, DSTAT: 80, DSPS: 00000006
Jun 8 08:49:06 batch vmunix: NCR chip register access history (most recent last): 10 accesses
Jun 8 08:49:06 batch vmunix: 6702, ISTAT: 22
Jun 8 08:49:06 batch vmunix: 4, SIST0: 04
Jun 8 08:49:06 batch vmunix: 2, SIST1: 00
Jun 8 08:49:06 batch vmunix: 2, DSTAT: 80
Jun 8 08:49:06 batch vmunix: 1, DSPS: 00000006
Jun 8 08:49:06 batch vmunix: 1, SCRATCHA: ff000840
Jun 8 08:49:06 batch vmunix: 3, DSP: 00480058
Jun 8 08:49:06 batch vmunix: 2, SCRATCHA1<-00
Jun 8 08:49:06 batch vmunix: 2, CTEST3<-04
Jun 8 08:49:06 batch vmunix: 0, STEST3<-82
Jun 8 08:49:06 batch vmunix: lsp: 26238280
Jun 8 08:49:06 batch vmunix: bp->b_dev: 1f001500
Jun 8 08:49:06 batch vmunix: scb->io_id: d3c726
Jun 8 08:49:06 batch vmunix: scb->cdb: 28 00 00 72 52 90 00 00 10 00
Jun 8 08:49:06 batch vmunix: lbolt_at_timeout: 234375531, lbolt_at_start: 234375531
Jun 8 08:49:06 batch vmunix: lsp->state: 10d
Jun 8 08:49:06 batch vmunix: lbp->owner: 26238280
Jun 8 08:49:06 batch vmunix: scratch_lsp: 0
Jun 8 08:49:06 batch vmunix: Pre-DSP script dump [16b97030]:
Jun 8 08:49:06 batch vmunix: 78344000 0000000a 78350800 00000000
Jun 8 08:49:06 batch vmunix: 0e000004 004803c0 80000000 00000000
Jun 8 08:49:06 batch vmunix: Script dump [16b97050]:
Jun 8 08:49:06 batch vmunix: 9f0b0000 00000006 0a000000 004803c8
Jun 8 08:49:06 batch vmunix: 721a0000 00000000 c0000004 0048035c
Jun 8 08:49:06 batch vmunix:
Jun 8 08:49:06 batch vmunix: SCSI: Abort Tag -- lbolt: 234378731, dev: 1f001500, io_id: d3c726
Jun 8 08:49:06 batch vmunix: LVM: vg[2]: pvnum=13 (dev_t=0x1f001500) is POWERFAILED
Jun 8 08:49:06 batch vmunix:
Jun 8 08:49:06 batch vmunix: SCSI: Abort Tag -- lbolt: 234378731, dev: 1f001100, io_id: d3c725
Jun 8 08:49:06 batch vmunix: LVM: vg[2]: pvnum=9 (dev_t=0x1f001100) is POWERFAILED
Jun 8 08:49:06 batch vmunix: DIAGNOSTIC SYSTEM WARNING:
Jun 8 08:49:06 batch vmunix: The diagnostic logging facility has started receiving excessive
Jun 8 08:49:06 batch vmunix: errors from the I/O subsystem. I/O error entries will be lost
Jun 8 08:49:06 batch vmunix: until the cause of the excessive I/O logging is corrected.
Jun 8 08:49:06 batch vmunix: If the DEMLOG daemon is not active, use the DIAGSYSTEM command
Jun 8 08:49:06 batch vmunix: in SYSDIAG to start it.
Jun 8 08:49:06 batch vmunix: If the DEMLOG daemon is active, use the LOGTOOL utility in SYSDIAG
Jun 8 08:49:06 batch vmunix: to determine which I/O subsystem is logging excessive errors.
Jun 8 08:49:06 batch vmunix: LVM: PV 9 has been returned to vg[2].
Jun 8 08:49:06 batch vmunix: LVM: PV 13 has been returned to vg[2].
Jun 8 08:49:25 batch cmcld[1414]: Warning: cmcld process was unable to run for the last 4 seconds
Jun 8 08:49:25 batch cmcld[1414]: Communication to node online has been interrupted
Jun 8 08:49:25 batch cmcld[1414]: Node online may have died
Jun 8 08:49:25 batch cmcld[1414]: Attempting to form a new cluster
Jun 8 08:49:28 batch vmunix: DIAGNOSTIC SYSTEM WARNING:
Jun 8 08:49:28 batch vmunix: The diagnostic logging facility is no longer receiving excessive
Jun 8 08:49:28 batch vmunix: errors from the I/O subsystem. 29 I/O error entries were lost.
Jun 8 08:49:28 batch cmcld[1414]: 2 nodes have formed a new cluster, sequence #224
Jun 8 08:49:28 batch cmcld[1414]: The new active cluster membership is: online(id=1), batch(id=2)
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2005 03:40 PM
06-08-2005 03:40 PM
SolutionJun 8 08:49:25 batch cmcld[1414]: Warning: cmcld process was unable to run for the last 4 seconds
Jun 8 08:49:25 batch cmcld[1414]: Communication to node online has been interrupted
Jun 8 08:49:25 batch cmcld[1414]: Node online may have died
The clcld daemon could not run for 4 seconds, which may be a bit of a long time. Since it couldn't it likely could not communicate with the other node in the cluster. In the case of a 2 node cluster, one machine will then TOC so that a valid cluster will be up. The machine that TOC's depends on which machine takes the lock disk first (if you're using a lock disk).
Now also of concern is that you have a LOT of disk errors. This may or may not be related. It appears that at least 2 PVs were temporarily inaccessible. I would also investigate these errors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2005 04:16 PM
06-08-2005 04:16 PM
Re: Unexpected Server Shutdown..
I believe this is the source of most of the problems.
Solve the I/O disk problems, restore any lost data and try again.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2005 04:25 PM
06-08-2005 04:25 PM
Re: Unexpected Server Shutdown..
Re-formation of the cluster occurs under the following conditions (not a complete list):
1. An SPU or network failure was detected on an active node.
2. An inactive node wants to join the cluster. The cluster manager daemon has been started on that node.
3. A node has been added to or deleted from the cluster configuration.
4. The system administrator halted a node.
5. A node halts because of a package failure.
6. A node halts because of a service failure.
7. Heavy network traffic prohibited the heartbeat signal from being received by the cluster.
8. The heartbeat network failed, and another network is not configured to carry heartbeat.
You might have to check the uptime on both the servers whether one of them actually rebooted or is it just because of the heartbeat issues. It looks like the inaccessibility to the daemon happens but the cluster starts to reform with the same two nodes online and batch.
Another area to look into would be the cluster configuration file as to what are the values of the timeout parameters.
package failover scripts may also be verified.
Use the commands cmviewconf -v
and cmviewcl -v
look into the /etc/cmcluster/package/packagename.log
thanks
Devesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2005 05:46 PM
06-08-2005 05:46 PM
Re: Unexpected Server Shutdown..
Jun 9 05:16:23 online cmcld[192]: Timed out node batch. It may have failed.
Jun 9 05:16:23 online cmcld[192]: Attempting to adjust cluster membership
Jun 9 05:16:26 online cmcld[192]: Obtaining Cluster Lock
Jun 9 05:16:27 online vmunix:
Jun 9 05:16:27 online vmunix: SCSI: Reset requested from above -- lbolt: 742672142, bus: 0
Jun 9 05:16:27 online cmcld[192]: Turning off safety time protection since the cluster
Jun 9 05:16:27 online cmcld[192]: now consists of a single node. If ServiceGuard
Jun 9 05:16:27 online cmcld[192]: fails, this node will not automatically halt
Jun 9 05:16:39 online vmunix:
Jun 9 05:16:39 online vmunix: SCSI: Resetting SCSI -- lbolt: 742672242, bus: 0
Jun 9 05:16:39 online vmunix:
Jun 9 05:16:39 online vmunix: SCSI: Reset detected -- lbolt: 742672242, bus: 0
Jun 9 05:16:39 online cmcld[192]: Did not receive vote from node batch in time
Jun 9 05:16:51 online cmcld[192]: 1 nodes have formed a new cluster, sequence #226
Jun 9 05:16:51 online cmcld[192]: The new active cluster membership is: online(id=1)
I remember HP saying that we should try to adjust the TOC of the server( recommended is 8 sec). they already did a diagnostic check & come up to this & not detecting any hardware error. I'm just wondering what causes this error ,i think maybe on the LAN connection plus the Net Card (NIC ) below also is the config of our cluster:
Cluster information:
cluster name: BSCS_RTX_CLUSTER
version: 0
flags: 12 (single cluster lock)
heartbeat interval: 1.00 (seconds)
node timeout: 2.00 (seconds)
heartbeat connection timeout: 4.00 (seconds)
auto start timeout: 600.00 (seconds)
network polling interval: 2.00 (seconds)
first lock vg name: /dev/vgbscs
second lock vg name: (not configured).
.....
If such configuration is needed do you know where can configure this setting, is it just a config file or an application is needed?& is there any impact on our system,DB & application if we made some miscalculation on the configuration?
thank's & regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2005 06:16 PM
06-08-2005 06:16 PM
Re: Unexpected Server Shutdown..
These values can be found in the file /etc/cmcluster/clustername
to make changes shut down the cluster with
cmhaltcl -f
After making changes you will have to do a cmcheckconf -k -v -C /etc/cmcluster/clustername
verify that there are no errors and then apply the configuration
cmapplyconf -k -v -C /etc/cmcluster/clustername
Bring up the cluster after that
cmruncl
Please also check the status of the cluster with the command cmviewcl and cmviewcl -v