Unable to start second database/node

Saul O. Reyes · ‎07-06-2001

Cluster went down, upon reboot attempted to start cluster via cmruncl,cmrunpkg, etc... commands. All commands executed successfully, but whatever database/node I initiate second will not mount.

John Poff · ‎07-06-2001

Hello Saul,

What error messages are you getting when you try to start the second package?

JP

Saul O. Reyes · ‎07-06-2001

After I run cmrunpkg pkgarch2 & pkg35 all indications look like it succeeded. I attempt to login to database via sqlplus and I get "ORA-01033 initialization or shutdown in progress." I manually perform a "startup shared" and get unable to mount database.

From the syslog.log I see "cmcld: WARNING: cluster lock on disk /dev/dsk/c4tod1 is missing!" and "cmcld: Service PKG*17409 terminated due to an exit(0)"

John Poff · ‎07-06-2001

Saul,

Try running 'ioscan -fnC disk' and see if your machine can still see the /dev/dsk/c4t0d1. Maybe that disk has lost its way. When you say your cluster went down, what happened? Did you have a system crash or a power outage? Maybe if you are using a shared disk array, something might have happened to your disk storage also? It looks like your cluster is configured to use that disk as a cluster lock disk, and the node can't get access to that disk any more.

JP

Saul O. Reyes · ‎07-06-2001

Node 1 had a fan go bad, which shut that node down. I would not expect Node 2 to go down, but it did. HP tech came by and fixed the fan. This problem after Node 1 came back up.

John Poff · ‎07-06-2001

Saul,

Try checking the log files for each package. Look in /etc/cmcluster and see if you have pkgarch.ctl.log and pkg35.ctl.log, or something similar. Those log files should give you some clue about what the package is choking on when it tries to start.

What did your ioscan report? Is the /dev/dsk/c4t0d1 disk in a shared array?

JP

Saul O. Reyes · ‎07-06-2001

Both nodes see /dev/dsk/c4t0d1. The only difference is that Node 1 sees it as "disk 6" and Node 2 sees it as "disk 5" in the output of "ioscan -fnC disk"

John Poff · ‎07-06-2001

The numbers after "disk" are the instance numbers. They can be different on each box, it should be fine. Did they both show up as "CLAIMED" under the S/W State column?

JP

Saul O. Reyes · ‎07-06-2001

./pkg35/control.sh.log reveals the same as I stated earlier. Database is unable to mount. The exact error is "ORA-01183 cannot mount database in SHARED mode"

./pkgarch2/control.sh.log says everything succeeded.

Saul O. Reyes · ‎07-06-2001

They both show as CLAIMED

John Poff · ‎07-06-2001

Does 'bdf' show all your filesystems mounted up correctly? Does 'vgdisplay' on your volume group(s) show the same number of "Cur LV" and "Open LV", and "Cur PV" and "Act PV"?

I'm not an Oracle guru, so I can't help much with the error message. Maybe one of the other local wizards can jump in and help with that part.

JP

Saul O. Reyes · ‎07-06-2001

Numbers are the same for the vgdisplay outputs

Saul O. Reyes · ‎07-06-2001

John,

Given the pkgs I listed above, is the following syntax correct for cmcheckconf:

cmcheckconf -v -k -C cmclconf.ascii -P pkg35.ascii

Pedro Sousa · ‎07-06-2001

Hi Saul.

With oerr you can get some details on that error - ORA-01183:
01183 "cannot mount database in SHARED mode"
// *Cause: Some other instance has the database mounted exclusive.
// *Action: Shutdown other instance then mount shared.

So, maybe you should shutdown the other oracle instance and start them in shared mode.

Although, before you do this, you should understand what caused the problem.

What's the SG you're using? MC/SG for OPS?

I think you'll need to replace the lock disk. But, first, check if you can read every block from that disk by running: "dd if=/dev/dsk/c4t0d1 of=/dev/null bs=64"
after some time, if you get a message like this:
96614+0 records in
96614+0 records out
everything seems fine... else, call the HW support.

your cmcheckconf sintax should be like:
cmcheckconf -v -C /etc/cmcluster/cmclconf.ascii -P /etc/cmcluster/pkg35/pkg35.conf

Do this on the primary node of the cluster, and indicate every package with the option "-P".
You should never use the "-k" option cause it eliminates some disk probing.

good luck.

Saul O. Reyes · ‎07-06-2001

I ran the following

cmquerycl -v -C verify.ascii -n h01ops01 -n h01ops02

Attached is the output. Any comments on the "Warinings"

John Poff · ‎07-06-2001

Saul,

Thanks for posting your output. Here is my take on it:

1. You have the FIRST_CLUSTER_LOCK_PV parameter in your cluster file, but it needs an entry for a disk. You'll need to make it look something like this:

FIRST_CLUSTER_LOCK_PV /dev/dsk/c4t0d1

You've gotta have a cluster lock disk for a two node cluster.

2. Your MAX_CONFIGURED_PACKAGES is set to zero. It needs to be set at least to the number of packages you have. I have mine set for 10, and I have 8 packages.

3. The LAN interfaces. It is complaining about the standby network interfaces because the way it is configured now, there are no LAN interfaces for it to switch over to in case a LAN card fails. Probably you'll need to do away with the entries for the STATIONARY_IP and put those LAN entries under the first two entries. I'd try something like this:

NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.1.3.33
NETWORK_INTERFACE lan9
NETWORK_INTERFACE lan5
HEARTBEAT_IP 172.31.4.4
NETWORK_INTERFACE lan10

The STATIONARY_IP designates a LAN interface to just be used for data. Since your heartbeat traffic is minimal and you have it configured across two interfaces, I'd try something like that for each node.

4. While you are at it, you'll probably want to bump up your NODE_TIMEOUT value to 6 seconds from 2. HP recommends it and the newer versions of MC/SG are supposed to ship with that as the default.

That's my opinion, which is worth the pixels it's displayed on. :)

JP

melvyn burnard · ‎07-06-2001

For information, the following two entries in your syslog mean:

>From the syslog.log I see "cmcld: WARNING: >cluster lock on disk /dev/dsk/c4tod1 is >missing!"

This means the system CANNOT see the physical disk you defined as a cluster lock disc OR the cluster lock information is missing. You need to fix this either by replacing or ensuring the disc is online, or reapplying the cluster binary with the VG activated.

and "cmcld: Service PKG*17409 terminated due to an exit(0)"
This is an informational message. It means that the service or package script that has been run to start the package has finished and terminated without error, hence the return code shown is 0

I also note from your cmquercyl output that this appears to be a ServiceGuard OPS Edition cluster, so you are using Oracle Parallel Server, correct?

Evene more importantly, is this section:
# Warning: No volume groups were found on all nodes.
# A cluster lock volume group is required for clusters of only two nodes.
FIRST_CLUSTER_LOCK_VG

This tells me that something appears to be wrong with your lvm configuration, in that you should have a Volume Group that is shared between the two nodes, but the cmquerycl could not find one. I would check both nodes for the VG info by looking at the lvmtab files, and also the vg directories.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Unable to start second database/node

Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node

Re: Unable to start second database/node