Operating System - HP-UX
1834089 Members
2193 Online
110063 Solutions
New Discussion

One server would not come up when other is down in a cluster

 
Anish Nayyar
Advisor

One server would not come up when other is down in a cluster

I have two machines say A and B. Serviceguard is configured on them as HA. NFS mounting is also done and Oracle RAC is installed on them. Now server B is shutdown and server A is rebooted. The NFS wouldn't come up for ever on Server A. Please tell me the solution, so that NFS is back on Server A while B is down.
14 REPLIES 14

Re: One server would not come up when other is down in a cluster

Hmmm your post is abit like going to a mechanic and saying 'I drove my car and crashed it - now it won't start - can you tell me how to fix it?'

Point being - not enough information...

Is the cluster actually up on server A?

What does 'cmviewcl -v' show you on serverA?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Siju Vadakkan
Trusted Contributor

Re: One server would not come up when other is down in a cluster

provide the following details:

1. cmviewcl -v
2. cfscluster status
3. more /etc/exports
4. syslog
Anish Nayyar
Advisor

Re: One server would not come up when other is down in a cluster

1. cmviewcl -v

The command hangs

2. cfscluster status
As cmviewcl doesn't showing any output,so..


3.bash-2.05b# more /etc/exports
/software_mount -root=lcsnew3:nfsPkg:lcsnew1
/data_mount -root=lcsnew3:nfsPkg:lcsnew1
/opt/gmlc/logs -root=lcsnew3,root=lcsnew3

4.This is the tail of syslogs

rt.
Mar 17 11:52:24 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:51:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 11:52:25 lcsnew3 above message repeats 11 times
Mar 17 11:52:24 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 11:52:25 lcsnew3 above message repeats 4 times
Mar 17 11:52:25 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 11:52:25 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:52:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 13:55:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 11:56:28 lcsnew3 above message repeats 11 times
Mar 17 11:56:28 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 11:56:28 lcsnew3 above message repeats 24 times
Mar 17 11:56:28 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:56:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 14:00:00 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 12:00:30 lcsnew3 above message repeats 11 times
Mar 17 12:00:30 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 12:00:30 lcsnew3 above message repeats 26 times
Mar 17 12:00:30 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 12:00:37 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 12:00:40 lcsnew3 above message repeats 13 times
Mar 17 12:00:41 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 14:00:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 14:04:00 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 12:04:31 lcsnew3 above message repeats 11 times
Mar 17 12:04:31 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 12:04:31 lcsnew3 above message repeats 9 times
Mar 17 12:04:35 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 14:04:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 14:05:00 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Mar 17 12:05:00 lcsnew3 above message repeats 2 times
Mar 17 12:05:00 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 12:05:00 lcsnew3 above message repeats 4 times
Mar 17 14:05:59 lcsnew3 syslog: Oracle Cluster Ready Services waiting for HP-UX Service Guard to sta
rt.
Siju Vadakkan
Trusted Contributor

Re: One server would not come up when other is down in a cluster

Provide
#ps -ef | grep -i cmcld

if it is not running,

execute #cmrunnode -n , since u said only one node is up.

and then #cmviewcl -v and syslog again
Fabio Ettore
Honored Contributor

Re: One server would not come up when other is down in a cluster

Hi Anish,

that is a normal and expected behaviour of Serviceguard. When a SG node comes up it expects to have the SG cluster services already active at least on one node, if that is not so then the SG node coming up waits for AUTO_START_TIMEOUT (defined into the cluster ascii configuration file) and then it fails to run cluster services.
You have to run

cmruncl -v -f -n

to have SG cluster starting.
Once the cluster is ran then other nodes (on their reboot) will join the already active cluster.

One more info: this kind of behaviour occurs because rc script of SG contain cmrunnod, the following contents of man cmrunnode:

DESCRIPTION
cmrunnode causes a node to start its cluster daemon to join the
existing cluster. This command verifies the network configuration
before causing the node to start its cluster daemon.

So cmrunnode should have an existing cluster to work well.

I hope that helps you, let me know if something is not clear.

Best regards,
Fabio
WISH? IMPROVEMENT!
Anish Nayyar
Advisor

Re: One server would not come up when other is down in a cluster

Hi Fabio,
The command made cluster up but one of the nfsPkg package is still down.And a volume group is also down.
Fabio Ettore
Honored Contributor

Re: One server would not come up when other is down in a cluster

Hi Anish,

so the response to the original request of this thread was good, I'm glad for this.
Now if you need more help by us you should elaborate a little more the problem about "one of the nfsPkg package is still down.And a volume group is also down."

Some of the questions should be the following:

- Could you please describe better the configuration?
- Could you explain how many NFS filesystem you expect to get started with NFS package?
- Do all of them have problems to start currently or just one of them?
- Does the volume group that is down have any correlations with NFS package or is it another package and so another stuff?
- I suppose you have errors in syslog.log/pkg log files when NFS package is coming up, which errors?

Thanks for providing more info about your problem.

Best regards,
Fabio
WISH? IMPROVEMENT!
Anish Nayyar
Advisor

Re: One server would not come up when other is down in a cluster

Hi Fabio,

I am giving some run of commands, I hope it will give you the information you need.

bash-2.05b# vgdisplay
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 8
Open LV 8
Max PV 16
Cur PV 1
Act PV 1
Max PE per PV 4356
VGDA 2
PE Size (Mbytes) 32
Total PE 4346
Alloc PE 2026
Free PE 2320
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

VG Name /dev/vg_rac
VG Write Access read/write
VG Status available, shared, server
Max LV 255
Cur LV 28
Open LV 28
Max PV 16
Cur PV 2
Act PV 2
Max PE per PV 18728
VGDA 4
PE Size (Mbytes) 4
Total PE 18749
Alloc PE 18156
Free PE 593
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vg01".
bash-2.05b#


bash-2.05b#
bash-2.05b# cmviewcl

CLUSTER STATUS
gmlcCluster up

NODE STATUS STATE
lcsnew3 up running

PACKAGE STATUS STATE AUTO_RUN NODE
vg_activate_pkg up running enabled lcsnew3

NODE STATUS STATE
lcsnew1 down unknown

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
nfsPkg down halted disabled unowned
vg_activate_pkg_remote down halted enabled unowned
bash-2.05b#

tail of syslogs

Mar 17 13:01:13 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:01:13 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:01:13 lcsnew3 above message repeats 2 times
Mar 17 13:01:13 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:05:31 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:05:31 lcsnew3 above message repeats 23 times
Mar 17 13:05:35 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:06:00 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:06:00 lcsnew3 above message repeats 8 times
Mar 17 13:06:00 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:09:38 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:09:38 lcsnew3 above message repeats 3 times
Mar 17 13:09:41 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:10:00 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:10:00 lcsnew3 above message repeats 8 times
Mar 17 13:10:04 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:12:20 lcsnew3 CM-CMD[25886]: cmrunpkg nfsPkg
Mar 17 13:12:33 lcsnew3 CM-CMD[25886]: Request from root on node lcsnew3 to start package nfsPkg
Mar 17 13:10:08 lcsnew3 automountd[1135]: server nfsPkg not responding
Mar 17 13:12:33 lcsnew3 above message repeats 7 times
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Request from root on node lcsnew3 to start package nfsPkg
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Request from node lcsnew3 to start package nfsPkg on node lcsn
ew3.
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Executing '/etc/cmcluster/nfsPkg/nfsPkg.cntl start' for packa
ge nfsPkg, as service PKG*6914.
Mar 17 13:12:33 lcsnew3 LVM[26239]: vgchange -a e vg01
Mar 17 13:12:33 lcsnew3 LVM[26298]: vgchange -a n vg01
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Service PKG*6914 terminated due to an exit(1).
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Package nfsPkg run script exited with NO_RESTART.
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Examine the file /etc/cmcluster/nfsPkg/nfsPkg.cntl.log for mor
e details.
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Switching disabled on package nfsPkg.
Mar 17 13:12:33 lcsnew3 cmcld[27150]: Unable to start package nfsPkg. Node lcsnew3 is not able to ru
n it.
Mar 17 13:12:33 lcsnew3 CM-CMD[25886]: Request from root on node lcsnew3 to start package nfsPkg fai
led

Fabio Ettore
Honored Contributor

Re: One server would not come up when other is down in a cluster

Hi,

what you posted answers to something I asked for. Now we know vg01 is related to NFS package, so you have the first point: vg01 is not active and that is normal and expected when you cannot start NFS package. So the first and most important point is:

why cannot NFS package start?

Until now we cannot know that but in syslog there is a good direction to continue the investigation:

Mar 17 13:12:33 lcsnew3 cmcld[27150]: Examine the file /etc/cmcluster/nfsPkg/nfsPkg.cntl.log for more details.

What into pkg log files?

Best regards,
Fabio
WISH? IMPROVEMENT!
Anish Nayyar
Advisor

Re: One server would not come up when other is down in a cluster

hi,
the vg01 can be up with command vgchange -a e /dev/vg01, but when trying to run pkg using command "cmrunpkg nfsPkg" . It is making vg01 down again and the logs in cntl.log file is::-

########### Node "lcsnew3": Package start failed at Mon Mar 17 12:56:50 MST 2008 ###########

########### Node "lcsnew3": Starting package at Mon Mar 17 13:12:33 MST 2008 ###########
Mar 17 13:12:33 - Node "lcsnew3": Activating volume group vg01 with exclusive option.
Activated volume group in Exclusive Mode.
Volume group "vg01" has been successfully changed.
Mar 17 13:12:33 - Node "lcsnew3": Checking filesystems:
/dev/vg01/softwarelvol
/dev/vg01/datalvol
/dev/vg01/rsoftwarelvol:file system is clean - log replay is not required
/dev/vg01/rdatalvol:file system is clean - log replay is not required
Mar 17 13:12:33 - Node "lcsnew3": Mounting /dev/vg01/softwarelvol at /software_mount
vxfs mount: /dev/vg01/softwarelvol is already mounted, /software_mount is busy,
allowable number of mount points exceeded
ERROR: Function check_and_mount
ERROR: Failed to mount /dev/vg01/softwarelvol
Mar 17 13:12:33 - Node "lcsnew3": Deactivating volume group vg01
Deactivated volume group in Exclusive Mode.
Volume group "vg01" has been successfully changed.

########### Node "lcsnew3": Package start failed at Mon Mar 17 13:12:33 MST 2008 ###########

########### Node "lcsnew3": Starting package at Mon Mar 17 14:12:58 MST 2008 ###########
Mar 17 14:12:58 - Node "lcsnew3": Activating volume group vg01 with exclusive option.
Volume group "vg01" has been successfully changed.
Mar 17 14:12:58 - Node "lcsnew3": Checking filesystems:
/dev/vg01/softwarelvol
/dev/vg01/datalvol
/dev/vg01/rsoftwarelvol:file system is clean - log replay is not required
/dev/vg01/rdatalvol:file system is clean - log replay is not required
Mar 17 14:12:58 - Node "lcsnew3": Mounting /dev/vg01/softwarelvol at /software_mount
Mar 17 14:12:59 - Node "lcsnew3": Mounting /dev/vg01/datalvol at /data_mount
vxfs mount: /dev/vg01/datalvol is already mounted, /data_mount is busy,
allowable number of mount points exceeded
ERROR: Function check_and_mount
ERROR: Failed to mount /dev/vg01/datalvol
Mar 17 14:12:59 - Node "lcsnew3": Unmounting filesystem on /dev/vg01/softwarelvol
Mar 17 14:12:59 - Node "lcsnew3": Deactivating volume group vg01
Deactivated volume group in Exclusive Mode.
Volume group "vg01" has been successfully changed.

########### Node "lcsnew3": Package start failed at Mon Mar 17 14:12:59 MST 2008 ###########
bash-2.05b#
Siju Vadakkan
Trusted Contributor

Re: One server would not come up when other is down in a cluster

sorry for the previous update, I was suppose to provide cmruncl commad :)

Siju
Siju Vadakkan
Trusted Contributor

Re: One server would not come up when other is down in a cluster

It seems the mount point "/data_mount" is busy. means already there in the mnttab.

#umount -f /data_mount
and also do a forceful unmount of all the mount points beloging to vg01.
#rm /etc/mnttab
#mount -a #to recreate /etc/mnttab




Fabio Ettore
Honored Contributor

Re: One server would not come up when other is down in a cluster

Hi,

so you have one more info:

/dev/vg01/softwarelvol
/dev/vg01/datalvol
/dev/vg01/rsoftwarelvol:file system is clean - log replay is not required
/dev/vg01/rdatalvol:file system is clean - log replay is not required

These are the filesystem to be mounted on package starting but for some reasons a couple of them are already mounted.

What to do now:

- umount all filesystems of vg01 (in order to find out the opened files/processes running on filesystems use fuser and lsof - download that from <>

- check by bdf those filesystems are unmounted;

- start the package.

Best regards,
Fabio
WISH? IMPROVEMENT!
Anish Nayyar
Advisor

Re: One server would not come up when other is down in a cluster

The server came up but it took long while reboot. It hangs at "starting NFS server subsystem" and took long time to move on. can i reduce this time.