StoreEver Tape Storage
1753495 Members
4426 Online
108794 Solutions
New Discussion юеВ

DAT 320 SAS, 3U Rack Chassis and SC44Ge - timeouts

 
Rick Gilligan
New Member

DAT 320 SAS, 3U Rack Chassis and SC44Ge - timeouts

I've been stuggling to get our new DAT320 SAS drive to work reliably since we received it in December.

Mostly the errors are SCSI timeout issues.

We've replaced the drive mech., the external cable and the SAS HBA. We've also switched to the other SAS Expander Board in the 3U chassis and to another internal cable.

We have not replaced the power supply in the 3U chassis, nor have we tried any possibly newer Rev. SAS Expander Boards (if they exist).

After replacing nearly everything in the SAS chain, the problem occurs much less frequently, perhaps after five nearly full tapes have been written. It used to occur after about 20% of each tape was written.

OS is RHEL 5.4:

# uname -r
2.6.18-164.el5xen

Processor is:

# uname -p
x86_64

Drivers are the latest HP recommended:

# cat /proc/mpt/version
mptlinux-4.00.13.07
Fusion MPT base driver
Fusion MPT SAS host driver
Fusion MPT SPI host driver
Fusion MPT ioctl driver

Tape drive is an HP AJ830A (DAT 320 SAS) - with the latest (public) firmware:

# cat /proc/scsi/scsi
Attached devices:
Host: scsi4 Channel: 00 Id: 02 Lun: 00
Vendor: HP Model: DAT320 Rev: VS98
Type: Sequential-Access ANSI SCSI revision: 03


in an HP AG576A (3U StorageWorks SAS Rack Mount)
with SAS Expander Boards 403721-002 or 012769-001 Rev 0D.

cable is an HP 406591-002 Rev. B (AE468A SFF-8470 to SFF-8088 4 Meter)

HBA is an HP SC44Ge (3Gb, PCI-e, 4 channel) with the latest (public) firmware:

# cat /proc/scsi/mptsas/4
ioc0: LSISAS1068E B3, FwRev=01172b00h, Ports=1, MaxQ=163

ltt doesn't always show any problems when writing/reading whole tape (320GB), but that's not surprising to me, since it supplies data
at full tape speed (it's not a "real world" test in that regard).

Using tar, I either get SCSI timeouts (typically after writing somewhere between 16 and 65 GB of data) or occasionally IO
Device Missing Delay Retry.

My DAT 160 drive on U320 works flawlessly on the same fileset in the adjacent PCI-e slot.

Here are some of the log entries:

Timeout:

mptscsih: ioc0: attempting task abort! (sc=ffff880053396540)
st 4:0:0:0:
command: Write(6): 0a 00 00 80 00 00
mptbase: ioc0: LogInfo (0x31140000): Originator={PL}, Code={IO Executed}, SubCode (0x0000)
mptscsih: ioc0: task abort: SUCCESS (sc=ffff880053396540)
st 4:0:0:0: timing out command, waited 900s
st0: Error 6080000 (sugg. bt 0x0, driver bt 0x6, host bt 0x8).

Missing delay retry:

mptbase: ioc0: LogInfo (0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
mptbase: ioc0: LogInfo (0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
st0: Error 20000 (sugg. bt 0x0, driver bt 0x0, host bt 0x2).
mptbase: ioc0: LogInfo (0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
st0: Error 20000 (sugg. bt 0x0, driver bt 0x0, host bt 0x2).
st0: Error on write filemark.
mptbase: ioc0: LogInfo (0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
st0: Error 80000 (sugg. bt 0x0, driver bt 0x0, host bt 0x8).
target4:0:1: removing ssp device: fw_channel 0, fw_id 5, phy 4, sas_addr 0x500110a0012eff8c
target4:0:1: delete device: fw_channel 0, fw_id 5, phy 4, sas_addr 0x500110a0012eff8c

Here's the driver loading and hardware detection:

Fusion MPT SPI Host driver 3.04.07rh
sr0: scsi-1 drive Jan 11 20:46:34 pe-phc kernel: ioc0: LSISAS1068E B3: Capabilities={Initiator}
scsi4 : ioc0: LSISAS1068E B3, FwRev=01172b00h, Ports=1, MaxQ=163, IRQ=16
Vendor: HP Model: DAT320 Rev: VS98
Type: Sequential-Access ANSI SCSI revision: 03
scsi 4:0:0:0: Attached scsi generic sg5 type 1
megaraid: probe new device 0x1000:0x0408:0x1028:0x0002: bus 14:slot 14:func 0
GSI 21 sharing vector 0x49 and IRQ 21
ACPI: PCI Interrupt 0000:0e:0e.0[A] -> GSI 18 (level, low) -> IRQ 21
ACPI: PCI Interrupt 0000:0b:08.0[A] -> GSI 17 (level, low) -> IRQ 17

Any ideas for how to proceed?

At this point, I'm figuring either the SAS Expander Board or a firmware issue in the DAT 320 drive.

1 REPLY 1
pat98usb
New Member

Re: DAT 320 SAS, 3U Rack Chassis and SC44Ge - timeouts

I've got similar issue with SAS , i dont know if the problem could be something hardware or software related .

# uname -r
2.6.18-164.11.1.el5

# uname -p
x86_64

# cat /proc/mpt/version
mptlinux-4.20.00.02
Fusion MPT base driver
Fusion MPT SAS host driver
Fusion MPT ioctl driver

# lsmod | grep mpt
mptctl 116744 0
mptsas 90768 0
mptscsih 78336 1 mptsas
mptbase 124868 3 mptctl,mptsas,mptscsih
scsi_transport_sas 66753 1 mptsas
scsi_mod 196697 14 mptctl,mptsas,mptscsih,scsi_transport_spi,scsi_transport_fc,scsi_transport_sas,scsi_dh,st,sr_mod,sg,usb_storage,libata,aacraid,sd_mod

# cat /proc/scsi/scsi
Host: scsi16 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: Ultrium 3-SCSI Rev: C26D
Type: Sequential-Access ANSI SCSI revision: 05

# cat /proc/scsi/mptsas/16
ioc0: LSISAS1068E B3, FwRev=011a0300h, Ports=1, MaxQ=336

Using tar i get
tar: /dev/st0: Cannot write: Input/output error
tar: Error is not recoverable: exiting now

and dmesg shows

mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host bt 0x1).
end_device-16:0: mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 1, phy 4,sas_addr 0x50060b0000800394
phy-16:4: mptsas: ioc0: delete phy 4, phy-obj (0xffff810118facc00)
port-16:0: mptsas: ioc0: delete port 0, sas_addr (0x50060b0000800394)
scsi 16:0:0:0: rejecting I/O to dead device
st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host bt 0x1).
st0: Error on write filemark.
scsi 16:0:0:0: rejecting I/O to dead device
st0: Error 10000 (sugg. bt 0x0, driver bt 0x0, host bt 0x1).

Some more dmesg info for the hba and tape

mptbase: ioc0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total memory = 8179956 kB
mptbase: ioc0: Initiating bringup
ioc0: LSISAS1068E B3: Capabilities={Initiator}
PCI: Setting latency timer of device 0000:03:00.0 to 64
scsi16 : ioc0: LSISAS1068E B3, FwRev=011a0300h, Ports=1, MaxQ=336, IRQ=98
mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 1, phy 4, sas_addr 0x50060b0000800394
target16:0:0: mptsas: ioc0: add device: fw_channel 0, fw_id 1, phy 4, sas_addr 0x50060b0000800394
Vendor: HP Model: Ultrium 3-SCSI Rev: C26D
Type: Sequential-Access ANSI SCSI revision: 05
scsi 16:0:0:0: mptscsih: ioc0: qdepth=64, tagged=1, simple=1, ordered=0, scsi_level=6, cmd_que=1
st 16:0:0:0: Attached scsi tape st0
st0: try direct i/o: no (alignment 512 B)
st 16:0:0:0: Attached scsi generic sg7 type 1
Fusion MPT misc device (ioctl) driver 4.20.00.02
mptctl: Registered with Fusion MPT base driver
mptctl: /dev/mptctl @ (major,minor=10,220)
st0: Block limits 1 - 16777215 bytes.

# mt -f /dev/st0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x44 (no translation).
Soft error count since last status=0
General status bits on (41010000):
BOT ONLINE IM_REP_EN

# tapeinfo -f /dev/sg7
Product Type: Tape Drive
Vendor ID: 'HP '
Product ID: 'Ultrium 3-SCSI '
Revision: 'C26D'
Attached Changer API: No
SerialNumber: '0805K00133'
TapeAlert[50]: Undefined.
MinBlock: 1
MaxBlock: 16777215
SCSI ID: 0
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: Not Loaded
Density Code: 0x44
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x1
DeCompType: 0x1
BOP: yes
Block Position: 0
Partition 0 Remaining Kbytes: 400308
Partition 0 Size in Kbytes: 400308
ActivePartition: 0
EarlyWarningSize: 0
NumPartitions: 0
MaxPartitions: 0

# mtx -f /dev/sg7 status
mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Illegal Request
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 24
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: Field in Error = 05
mtx: Request Sense: BPV=yes
mtx: Request Sense: Error in CDB=yes
mtx: Request Sense: SKSV=yes
mtx: Request Sense: Field Pointer = 00 02
Mode sense (0x1A) for Page 0x1D failed
mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Illegal Request
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 20
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
READ ELEMENT STATUS Command Failed

# loaderinfo -f /dev/sg7
Product Type: Tape Drive
Vendor ID: 'HP '
Product ID: 'Ultrium 3-SCSI '
Revision: 'C26D'
Attached Changer: No
Bar Code Reader: No
mtx: Request Sense: Long Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=70 (Current)
mtx: Request Sense: Sense Key=Illegal Request
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 24
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: Field in Error = 05
mtx: Request Sense: BPV=yes
mtx: Request Sense: Error in CDB=yes
mtx: Request Sense: SKSV=yes
mtx: Request Sense: Field Pointer = 00 02
EAAP: No
Transport Geometry Descriptor Page: No
Device Configuration Page: No

# mt -v
mt-st v. 0.9b
kernel: st: Version 20070203, fixed bufsize 4194304, s/g segs 256

# tar --version
tar (GNU tar) 1.15.1

# modinfo st
filename: /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/scsi/st.ko
alias: char-major-9-*
license: GPL
description: SCSI tape (st) driver
author: Kai Makisara
srcversion: ED808DF94AF3058969A009D
depends: scsi_mod
vermagic: 2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
parm: buffer_kbs:Default driver buffer size for fixed block mode (KB; 32) (int)
parm: max_sg_segs:Maximum number of scatter/gather segments to use (256) (int)
parm: try_direct_io:Try direct I/O between user buffer and tape drive (1) (int)
parm: try_rdio:Try direct read i/o when possible (int)
parm: try_wdio:Try direct write i/o when possible (int)

# /etc/modprobe.conf
alias scsi_hostadapter aacraid
alias scsi_hostadapter1 sata_nv
alias scsi_hostadapter2 usb-storage
options st buffer_kbs=4096
options st try_direct_io=0
alias scsi_hostadapter mptspi
alias scsi_hostadapter1 mptfc
alias scsi_hostadapter2 mptsas

Tried every possible blocksize with or without compression , sometime st crash and mptbase error will go on for 30/40 minutes

mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)
mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501)

# cat /proc/mpt/ioc0/info
ioc0:
ProductID = 0x2204 (LSISAS1068E B3)
FWVersion = 0x011a0300
MsgVersion = 0x0105
FirstWhoInit = 0x00
EventState = 0x00
CurrentHostMfaHighAddr = 0x00000000
CurrentSenseBufferHighAddr = 0x00000001
MaxChainDepth = 0x22 frames
MinBlockSize = 0x20 bytes
RequestFrames @ 0xffff810002402800 (Dma @ 0x0000000002402800)
{CurReqSz=128} x {CurReqDepth=336} = 43008 bytes ^= 0xb000
{MaxReqSz=128} {MaxReqDepth=336}
Frames @ 0xffff810002400000 (Dma @ 0x0000000002400000)
{CurRepSz=80} x {CurRepDepth=128} = 10240 bytes ^= 0x2880
{MaxRepSz=0} {MaxRepDepth=511}
MaxDevices = 255
MaxBuses = 4
PortNumber = 1 (of 1)

# cat /proc/scsi/sg/debug
dev_max(currently)=32 max_active_device=9 (origin 1)
def_reserved_size=32768
# cat /proc/scsi/sg/def_reserved_size
32768

I think there is something wrong between mplinux , scsi driver and mt , could be i/o problem or block size , but could be also sas cable or sas hba or extender or power supply , tape drive has already been replaced .

Or maybe we should use thw newer RH version 5.5 to use 448 920 960 SAS tape ?