1836638 Members
1987 Online
110102 Solutions
New Discussion

Cluster Not forming

 

Cluster Not forming

Hi,
I have 2 node cluster running SG A.11.09 and HP-UX 11i.

I have these error messages logged on the syslog and attached the same.

Thanks in advance,
Karthik



6 REPLIES 6
Sridhar Bhaskarla
Honored Contributor

Re: Cluster Not forming

Hi Karthik,

Are you sure the heartbeat interfaces are up and reachable from|to both the servers?. You may have to ping to each interface and see if they work along with other configured subnets.

I believe these RESET messages are due to the attempts of the nodes to attain the cluster lock.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Brian M Rawlings
Honored Contributor

Re: Cluster Not forming

Hi. More or less constant SCSI RESET errors begin wheneve the cluster attempts to start. I suspect that you have a fairly significant problem with your actual SCSI cables, adapters, or array controllers.

Or, possibly, no termination on the HBAs, which would be right if you are using in-line terminated SCSI cables, but this can be tricky. The ends with termination must be on the HBA side of the bus.

Or, if you are using ILT cables, and the termination was NOT removed from the HBAs, it could cause SCSI errors of several types.

I believe that your entire cluster, all nodes and the SCSI array, need to be looked over to find the SCSI problem before you will be able to bring up the cluster (or do anything at all with the shared storage).

Regards, --bmr
We must indeed all hang together, or, most assuredly, we shall all hang separately. (Benjamin Franklin)
Brian M Rawlings
Honored Contributor

Re: Cluster Not forming

Forgot to mention, if you could do an 'ioscan -fn' on each node in the cluster, and post the output back here, it would help. Also, additional info on the HW involved, server types, Array type & details, and HBAs/cables used would let us help you more.

Regards, --bmr
We must indeed all hang together, or, most assuredly, we shall all hang separately. (Benjamin Franklin)
Rajeev  Shukla
Honored Contributor

Re: Cluster Not forming

Hi Karthik,
In the syslog there is a continuous message for a SCSI reset and a disk is being reported. I guess this is the cluster lock disk which is either not accessible beacuse of cable problems. Can you first make sure all these errors are stoped by resolving the SCSI reset problem. Also verify whether its the cluster lock disk its talking about.
monasingh_1
Trusted Contributor

Re: Cluster Not forming

well, if you in syslog you have entries like b_dev: which have value like bc049000 and cbxxxxx,

this bc and cb are in hex and point to 188 and 203 major number of your deives. which if you do lsdev, will tell that are SCSI disks and controller.

You need to have you SCSI controller checked. It may have failed. Or as mentioned earlier by others , check the cable and terminators.
Michael Steele_2
Honored Contributor

Re: Cluster Not forming

Run ioscan and logtool for HW evaluation and to determine bus 4. Refer to >4< in the ioscan example below:

#ioscan -kfnC ext_bus

Class I H/W Path Driver S/W State H/W Type Description
===============================================================================
ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C895 Fa
ext_bus 1 0/0/2/0 c720 CLAIMED INTERFACE SCSI C875 Ul
ext_bus >4< 0/0/2/1 c720 CLAIMED INTERFACE SCSI C875 Ul
ext_bus 16 0/2/0/0.2.23.0.0 fcparray CLAIMED INTERFACE FCP Array In

NEXT: Note the HW address and cross reference any errors collected in Logtool.

STM > TOOLS > UTILITY > RUN > LOGTOOL > FILE > VIEW > RAW SUMMARY.

Note the First and Last Dates indicated and calculate the time frame. For example, 2 days and 16 hours between first and last dates. Is Last date a recent time stamp? Has it been only hours or has it been days? Now look for your HW address and check the integer number in parenthesis. Is this number in the hundreds or tens? If hundreds open a HW call.

Also check dmesg and /var/adm/message logs for similar lbolt error messages, which should be identically listed in syslog. No "...dev..." was indicated with the lbolt. To me this indicates the controller itself.

Here is what an lbolt with a bad device looks like:

SCSI: Abort Tag -- lbolt: 91143746, dev: 1f02a000, io_id: 26b2e26

Note the "...dev: 1f02a000...", in particular the "...2a..." crosses to disk c2t10d0. ('a' is hex 10.)
Support Fatherhood - Stop Family Law