cancel
Showing results for 
Search instead for 
Did you mean: 

SCSI HBA Probs

Steve Burt_1
Advisor

SCSI HBA Probs

Hi There,
Need some help with the following problem...

The facts so far
08:0b.0 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01)
08:0b.1 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01)

I am all of a sudden getting the following errors..

Oct 8 04:03:20 hostname kernel: st5: Error with sense data: Info fld=0x0, Current st5: sense key Aborted Command
Oct 8 04:05:21 hostname kernel: st1: Error with sense data: Info fld=0x0, Current st1: sense key Aborted Command

When using robtest we load up all our drives on our TL700 and then try to unload each drive 1 by 1, much to our fustration the tape does not eject..

If I stop netbackup and use mt -f /dev/st1 eject it just hangs and the tape does not eject.

I think the firmware for the card is scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.36

As you can see from dmesg the drives are loaded
st: Version 20040403, fixed bufsize 32768, s/g segs 256
Attached scsi tape st0 at scsi0, channel 0, id 0, lun 0
st0: try direct i/o: yes (alignment 512 B), max page reachable by HBA 2097151
Attached scsi tape st1 at scsi0, channel 0, id 1, lun 0
st1: try direct i/o: yes (alignment 512 B), max page reachable by HBA 2097151
Attached scsi tape st2 at scsi1, channel 0, id 0, lun 0
st2: try direct i/o: yes (alignment 512 B), max page reachable by HBA 2097151
Attached scsi tape st3 at scsi1, channel 0, id 1, lun 0
st3: try direct i/o: yes (alignment 512 B), max page reachable by HBA 2097151
Attached scsi tape st4 at scsi3, channel 0, id 0, lun 0
st4: try direct i/o: yes (alignment 512 B), max page reachable by HBA 2097151
Attached scsi tape st5 at scsi3, channel 0, id 1, lun 0
st5: try direct i/o: yes (alignment 512 B), max page reachable by HBA 2097151
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 1
Attached scsi generic sg1 at scsi0, channel 0, id 1, lun 0, type 1
Attached scsi generic sg2 at scsi1, channel 0, id 0, lun 0, type 1
Attached scsi generic sg3 at scsi1, channel 0, id 1, lun 0, type 1
Attached scsi generic sg4 at scsi2, channel 0, id 0, lun 0, type 8
Attached scsi generic sg5 at scsi3, channel 0, id 0, lun 0, type 1
Attached scsi generic sg6 at scsi3, channel 0, id 1, lun 0, type 1

I tried the latest kernel and it uses the same driver for adaptec card..

Obviously Our Backup engineer is getting rather woried ;-)

Can someone explain what's going wrong...

My next plan of action is to run the DL585 diagnostics and or replace each hba and test..

--SEB
2 REPLIES
Steven E. Protter
Exalted Contributor

Re: SCSI HBA Probs

Shalom,

Your plan is good. Boot the system off a diag cd/dvd and run a full series of tests.

Replace bad hardware.

Also check fiber cables for crushing and other issues and the port switches and fabric infrastructure.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steve Burt_1
Advisor

Re: SCSI HBA Probs

This is rather interesting...

This happened after an mt -f /dev/st0 eject

Oct 6 18:48:41 rdg-backup2 kernel: aic7xxx_abort returns 0x2002
Oct 6 18:48:41 rdg-backup2 kernel: st3: Error with sense data: Info fld=0x0, Current st3: sense key Aborted Command
Oct 6 18:49:05 rdg-backup2 kernel: scsi1:0:1:0: Attempting to queue an ABORT message
Oct 6 18:49:05 rdg-backup2 kernel: CDB: 0x4d 0x0 0x43 0x0 0x0 0x0 0x0 0x0 0xff 0x0
Oct 6 18:49:05 rdg-backup2 kernel: scsi1: At time of recovery, card was not paused
Oct 6 18:49:05 rdg-backup2 kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
Oct 6 18:49:05 rdg-backup2 kernel: scsi1: Dumping Card State in Command phase, at SEQADDR 0x172
Oct 6 18:49:05 rdg-backup2 kernel: Card was paused
Oct 6 18:49:05 rdg-backup2 kernel: ACCUM = 0x80, SINDEX = 0xa0, DINDEX = 0xe4, ARG_2 = 0x0
Oct 6 18:49:05 rdg-backup2 kernel: HCNT = 0x0 SCBPTR = 0x1
Oct 6 18:49:05 rdg-backup2 kernel: SCSIPHASE[0x0] SCSISIGI[0x84] ERROR[0x0] SCSIBUSL[0xc0]
Oct 6 18:49:05 rdg-backup2 kernel: LASTPHASE[0x80] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0xc2]
Oct 6 18:49:05 rdg-backup2 kernel: SEQCTL[0x10] SEQ_FLAGS[0x0] SSTAT0[0x7] SSTAT1[0x0]
Oct 6 18:49:05 rdg-backup2 kernel: SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac]
Oct 6 18:49:05 rdg-backup2 kernel: SXFRCTL0[0x88] DFCNTRL[0x4] DFSTATUS[0x89]
Oct 6 18:49:05 rdg-backup2 kernel: STACK: 0x34 0xe8 0x16a 0x17f
Oct 6 18:49:05 rdg-backup2 kernel: SCB count = 4
Oct 6 18:49:05 rdg-backup2 kernel: Kernel NEXTQSCB = 2
Oct 6 18:49:05 rdg-backup2 kernel: Card NEXTQSCB = 2
Oct 6 18:49:05 kernel: QINFIFO entries:
Oct 6 18:49:05 kernel: Waiting Queue entries:
Oct 6 18:49:05 kernel: Disconnected Queue entries:
Oct 6 18:49:05 kernel: QOUTFIFO entries:
Oct 6 18:49:05 kernel: Sequencer Free SCB List: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
27 28 29 30 31
Oct 6 18:49:05 kernel: Sequencer SCB Info:
Oct 6 18:49:05 kernel: 0 SCB_CONTROL[0xc0] SCB_SCSIID[0x7] SCB_LUN[0x80] SCB_TAG[0xff]
Oct 6 18:49:05 kernel: 1 SCB_CONTROL[0x40] SCB_SCSIID[0x17] SCB_LUN[0x80] SCB_TAG[0x1]
Oct 6 18:49:05 kernel: 2 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]
Oct 6 18:49:05 kernel: 3 SCB_CONTROL[0x0] SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff]