Disk Arrays
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoraid controller problem

Masaki Birchmier
Frequent Advisor

Autoraid controller problem

I am having problems getting both controllers on the 12H to wrok properly. The secondary controller is identified as being in a "unknown component state" from arraydsp -a
Sam hangs when creating volume group to the autoraid lun.
It may be a hardware problem such that I am getting
"SCSI: Resetting SCSI - lbolt:525988, bus: 3" error message (dmesg).
I've tried the following:
upgrading firmware to 62,
replacing controller card (have 1 extra),
replacing SCSI cables & terminator,
replacing the 12H chasis,
removing old autoraid patches & installing the latest versions
(PHCO_23262, PHCO_23149)
running it as a single controller

about the only thing I can think of now is the SCSI card it self.

any ideas?

Thanks, Masaki
6 REPLIES
Insu Kim
Honored Contributor

Re: Autoraid controller problem

Did you check out scsi address for both controllers ?
Are you sure that there is no scsi conflict between HBA (host bus adaptor) and AutoRaid controllers ?
What about HBA if you have a couple of them ?
I think that swapping HBAs would be a good idea to make sure that HBAs are o.k.

Based on error message ""SCSI: Resetting SCSI - lbolt:525988, bus: 3", there must be something wrong on the bus 3.
I'm not sure but I suspect that there has a device which has a target id, 3 in the system.
Can you post the output of "ioscan -knf and dmesg" for me to get a clue ?

You said that you applied the latest patch so I recommend that you use "logprint -a "serial number".
That command has reinforced and provides more information than before for troubleshooting.

Hope this helps,
Never say "no" first.
Bill McNAMARA_1
Honored Contributor

Re: Autoraid controller problem

try
ioscan -fnkCext_bus
when you find the bus with instance number 3 (second field)
ioscan -fnkH8/12
if 8/12 is the assocated hardware path.
mark down the autoraid state.

If you see leds on the autoraid controllers are both green and there is no warning on the autoraid front pannel there is probably not a pb with the autoraid.

if on the ioscan all hw is in a sw state of claimed and that there are device files for all luns, and a diskinfo on the rdsk device file for the luns shows up okay then there is probably not a problem with cableing.. although it is most probable that there is sounding by the message...

Try
arraydsp -R

wait a while (around 10 mins)
run
top
in the meantime and wait until you see the ioscans and the ARMServer restarting.. there'll be around 3 ioscans and then the ARMServer will be left running.
arraydsp
should return something then.

Later,
Bill
It works for me (tm)
Masaki Birchmier
Frequent Advisor

Re: Autoraid controller problem

Thanks for your input In-Su and Bill,
I've attached a file that shows the ouput of some of your recommendations.

The autoraid controllers are on SCSI id 0 and 3.
The controller on id 3 doesn't show up on ioscan.

One other note to throw in, the "lbolt" message goes away if I disconnect the autoraid from the N-box and throw on a terminator.
That kina implies a bad cable/terminator/controller but I've replaced all that .. I guess it's possible that my replacement is bad too. I wish I had another differntial SCSI device to test.


Insu Kim
Honored Contributor

Re: Autoraid controller problem

Looked at the "lbolt" message that you posted.
I assume that you have a single HBA where AutoRAID is attatched to.
To be sure, the "lbolt" message came from a device, c3t2d0 and I have no idea what that is based on your output posted.

Check that device status why it's producing such tremendous errors or you can remove it on the bus and run ioscan -f to determine if it caused bus starvation and eventually, make one of the AutoRAID controller disappear.
( The device files for AutoRAID would be c3t3d0, c3t0d0, c3t2d0 for unknown device which causes "lbolt error". )

One more thing is if a scsi cable pins are bent, this symptom can happen.
You mentioned that you replaced a HBA.
Did you check out terminators on the HBA ?
It always should be firmly there on assumption that you don't use Inline terminator cable.


Hope this helps,
Never say "no" first.
Bill McNAMARA_1
Honored Contributor

Re: Autoraid controller problem

I put the o/p of the log through an analyser, but there is a panic Abnormal termination code (hex) = 19 1, caused by a Host SCSI Bus Timeout, that ends up trying to make you believe there are Microprocessor Hardware Fault and Uncorrectable ECC Error During Initialization... so the problem is not directly related to the autoraid, apart from a panic situation (unexpected events occuring - should still behave well and not panic)

I'm not great at interpreting the scsi mc's but am sure that that data will help identify what is causing the scsi timeout resulting in the bus reset.
You may want to start with cable replacement.. make sure that the connector on the back of the autoraid isn't damaged causing the new cable to break on insertion. If a known good cable doesnt work, change the adaptor.

Later,
Bill
It works for me (tm)
Masaki Birchmier
Frequent Advisor

Re: Autoraid controller problem

Problem solved!
Replaced the SCSI card and it was good to go!
Thanks to all that helped me troubleshoot the problem.