Operating System - HP-UX
1832880 Members
2691 Online
110048 Solutions
New Discussion

Re: package failover failed

 
Walker_3
Frequent Advisor

package failover failed

Hi all,

I am new in cluster package configuration facing a package failover problem. Its an oracle package running on primary node but failed to run on secondary node. I am attaching my all configuration file. I think, I have something wrong with the script of cntl file. My requirement is oracle will keep running after failover package. Now package and oracle is running in primary node only but can not failover. Please help

Rgds,
Walker
5 REPLIES 5
Steven E. Protter
Exalted Contributor

Re: package failover failed

Shalom Walker,

What do the log files say?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ninad_1
Honored Contributor

Re: package failover failed

It seems you had multiple problems - lan is continuously fluctuating - its failing and recovering. Please check the connectivity of lan1 interface.First resolve this problem.
It seems you had a bad disk as well c4t9d0 which probably the disk was replaced and you have run vgcfgrestore and resynched the mirrors by vgsync. But unless you solve the network problem , its causing the package to fail while starting. Please check the problem of lan failure first and then try starting the package.

Regards,
Ninad
Albert_31
Trusted Contributor

Re: package failover failed

Hello Walker,

Do you still need help on resolving this issue.. let me know if yes.. then I would like to let me know the following

a) Primary & secondary server
b) is it a new / existing setup
c) the package logs from both the servers, the location would be in

/etc/cmcluster//.log

Reply to albert_pereira@yahoo.com as well.

regards

albert
Chauhan Amit
Respected Contributor

Re: package failover failed

Hi Walker,

Checked the Logs :

a) Package Logs -

vgchange: Warning: couldn't query physical volume "/dev/dsk/c4t9d0":
The specified path does not correspond to physical volume attached to
this volume group

b) Syslog -

May 20 18:07:31 sbnrev01 cmcld: lan1 failed
May 20 18:09:13 sbnrev01 cmcld: lan1 recovered
May 20 18:10:15 sbnrev01 cmcld: lan1 failed
May 20 18:12:56 sbnrev01 vmunix: SCSI: isrEscape Controller at 0/4/1/0.
May 20 18:12:56 sbnrev01 vmunix:
May 20 18:11:58 sbnrev01 cmcld: lan1 recovered
May 20 18:12:56 sbnrev01 vmunix: SCSI: First party detected bus hang (HTH) -- lbolt: 45529, dev: cb049002

May 20 18:12:57 sbnrev01 vmunix: SCSI: Resetting SCSI -- lbolt: 45629, bus: 4 path: 0/4/1/0
May 20 18:12:57 sbnrev01 vmunix: SCSI: Reset detected -- lbolt: 45629, bus: 4 path: 0/4/1/0
May 20 18:12:57 sbnrev01 EMS [1932]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/0_4_1_0.9.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 126615579 -r /storage/events/disks/default/0_4_1_0.9.0 -n 126615553 -a
May 20 18:12:57 sbnrev01 vmunix:
May 20 18:13:00 sbnrev01 cmcld: lan1 failed
May 20 18:13:00 sbnrev01 vmunix: LVM: vg[1]: pvnum=3 (dev_t=0x1f049000) is POWERFAILED
May 20 18:13:00 sbnrev01 EMS [1932]: ------ EMS Event Notification ------ Value: "MAJORWARNING (3)" for Resource: "/storage/events/disks/default/0_4_1_0.1.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 126615569 -r /storage/events/disks/default/0_4_1_0.1.0 -n 126615554 -a

There are two issues which are visible in above logs

1) SCSI: First party detected bus hang - Due to the device C4t9d0 --> Which needs to be checked

2) Lan1 failed message is appearing multiple time.

Following logs needs to be checked to come to any conclusion

a) /var/opt/resmon/log/event.log for the Hardware Errors

b) netfmt -f /var/adm/nettl.LOG000 >/var/net.log --> For Network related Errors


-Amit

For Albert:
Happy Foruming :)
If you are not a part of solution , then you are a part of problem
Sosel Somaskanthan
Occasional Contributor

Re: package failover failed

Walker,
The startup and stop function of oracle has been commented out in your package cntl file as below:

# START OF CUSTOMER DEFINED FUNCTIONS

# This function is a place holder for customer define functions.
# You should define all actions you want to happen here, before the service is
# started. You can create as many functions as you need.

function customer_defined_run_cmds
{
# ADD customer defined run commands.
#/etc/cmcluster/sbnoradb/toolkit.sh start
test_return 51
}

# This function is a place holder for customer define functions.
# You should define all actions you want to happen here, before the service is
# halted.

function customer_defined_halt_cmds
{
# ADD customer defined halt commands.
#/etc/cmcluster/sbnoradb/toolkit.sh stop
test_return 52
}

Try to uncomment and stop the package and restart it and see if its stops the oracle on the primary node?