Operating System - Linux
1751907 Members
4674 Online
108783 Solutions
New Discussion юеВ

LTO-4 drive disappears after using tar and I/O error

 
Curtis Rempel
New Member

LTO-4 drive disappears after using tar and I/O error

I've got a very similar issue to the thread at http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1439572 with an EH919A - HP LTO4 ULTRIUM 1760 SAS Internal Drive attached to an HP SC44Ge on openSUSE 11.2.

The drive shows up as follows:

server3:~ # hwinfo --tape
30: SCSI 500.0: 10601 Tape
[Created at scsi.1452]
UDI: /org/freedesktop/Hal/devices/pci_1000_58_scsi_host_scsi_device_lun0_scsi_generic
Unique ID: Er1e.jH5QXkQpQf6
Parent ID: B35A.C24hfcjfg26
SysFS ID: /class/scsi_tape/st0
SysFS BusID: 5:0:0:0
SysFS Device Link: /devices/pci0000:00/0000:00:0a.0/0000:02:00.0/host5/port-5:0/end_device-5:0/target5:0:0/5:0:0:0
Hardware Class: unknown
Model: "HP Ultrium 4-SCSI"
Vendor: "HP"
Device: "Ultrium 4-SCSI"
Revision: "U52D"
Driver: "mptsas", "st"
Driver Modules: "mptsas"
Device File: /dev/st0 (/dev/sg3)
Device Files: /dev/st0, /dev/char/9:0, /dev/tape/by-id/scsi-3500110a0012cee22
Device Number: char 9:0 (char 21:3)
Config Status: cfg=no, avail=yes, need=no, active=unknown
Attached to: #26 (SCSI storage controller)

and mt reports:

server3:~ # mt -f /dev/st0 status
drive type = Generic SCSI-2 tape
drive status = 1174405120
sense key error = 0
residue count = 0
file number = -1
block number = -1
Tape block size 0 bytes. Density code 0x46 (unknown).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
server3:~ #

The device files present are:

server3:~ # ls -l /dev/*st0*
crw-rw---- 1 root tape 9, 128 Oct 30 12:05 /dev/nst0
crw-rw---- 1 root tape 9, 224 Oct 30 12:05 /dev/nst0a
crw-rw---- 1 root tape 9, 160 Oct 30 12:05 /dev/nst0l
crw-rw---- 1 root tape 9, 192 Oct 30 12:05 /dev/nst0m
crw-rw---- 1 root tape 9, 0 Oct 30 12:05 /dev/st0
crw-rw---- 1 root tape 9, 96 Oct 30 12:05 /dev/st0a
crw-rw---- 1 root tape 9, 32 Oct 30 12:05 /dev/st0l
crw-rw---- 1 root tape 9, 64 Oct 30 12:05 /dev/st0m

/proc/scsi/scsi reports:

server3:/proc/scsi # cat scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: TSSTcorp Model: DVD+-RW TS-H653G Rev: DW10
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 32 Lun: 00
Vendor: DP Model: BACKPLANE Rev: 1.07
Type: Enclosure ANSI SCSI revision: 05
Host: scsi4 Channel: 02 Id: 00 Lun: 00
Vendor: DELL Model: PERC 6/i Adapter Rev: 1.22
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: Ultrium 4-SCSI Rev: U52D
Type: Sequential-Access ANSI SCSI revision: 05

Small backups done using dd or tar complete without any problem. So far I've tested up to about 500MB. I'm not exactly sure yet, but somewhere in the small GB range, tar reports an I/O error on st0 and then the device files disappear - i.e. /dev/st0 is gone. The only way to get them back it seems is to not just reboot, but to power off, and wait a few minutes.

Here's what happens on a 5 GB test backup:

server3:~ # mt -f /dev/st0 status
drive type = Generic SCSI-2 tape
drive status = 1174405120
sense key error = 0
residue count = 0
file number = 0
block number = 0
Tape block size 0 bytes. Density code 0x46 (unknown).
Soft error count since last status=0
General status bits on (41010000):
BOT ONLINE IM_REP_EN
server3:~ # tar -cf /dev/st0 /home/bob/
tar: Removing leading `/' from member names
tar: /dev/st0: Cannot write: Input/output error
tar: Error is not recoverable: exiting now
server3:~ # echo $?
2
server3:~ # mt -f /dev/st0 status
mt: /dev/st0: rmtopen failed: No such device or address

and in fact /dev/st0 is now missing. I shut down udev (after power cycling the box to get the drive to show up again) to see what would happen - the device files now persist as expected, however, the error is the same and the drive is inaccessible.

After the I/O error is reported, the following (snippet) is seen in dmesg:

[191698.629375] st0: Block limits 1 - 16777215 bytes.
[218431.969364] mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
[218432.450842] mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
[218432.450875] st0: Error 20000 (driver bt 0x0, host bt 0x2).
[218434.696385] mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
[218434.696414] st0: Error 80000 (driver bt 0x0, host bt 0x8).
[218434.696420] st0: Error on write filemark.
[218434.696441] st0: Error 10000 (driver bt 0x0, host bt 0x1).
[218434.696855] end_device-5:0: mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x500110a0012cee20
[218434.696860] phy-5:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff88022f1d9000)
[218434.696870] port-5:0: mptsas: ioc0: delete port 0, sas_addr (0x500110a0012cee20)
[218434.697167] scsi target5:0:0: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x500110a0012cee20
[218492.093127] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x500110a0012cee20
[218512.709936] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218512.709943] scsi 5:0:1:0: CDB: Inquiry: 12 00 00 00 24 00
[218517.033156] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218517.033308] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218546.973836] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218546.973843] scsi 5:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[218551.464821] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218551.464998] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218551.465008] mptscsih: ioc0: attempting target reset! (sc=ffff88022f25c5c0)
[218551.465014] scsi 5:0:1:0: CDB: Inquiry: 12 00 00 00 24 00
[218552.463027] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88022f25c5c0)
[218582.403452] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218582.403459] scsi 5:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[218586.894622] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218586.894814] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218586.894824] mptscsih: ioc0: attempting bus reset! (sc=ffff88022f25c5c0)
[218586.894830] scsi 5:0:1:0: CDB: Inquiry: 12 00 00 00 24 00
[218587.892803] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff88022f25c5c0)
[218627.814237] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218627.814245] scsi 5:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[218632.304679] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218632.304865] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218632.304886] mptscsih: ioc0: attempting host reset! (sc=ffff88022f25c5c0)
[218632.304893] mptbase: ioc0: Initiating recovery
[218686.364004] INFO: task mpt/0:147 blocked for more than 120 seconds.
[218686.364014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[218686.364021] mpt/0 D 0000000000000002 0 147 2 0x00000000
[218686.364031] ffff88022c7ef5d0 0000000000000046 ffff88022c7ef550 0000000000013a00
[218686.364041] 0000000000013a00 ffff88022cbd6c78 0000000000013a00 0000000000013a00
[218686.364053] 0000000000013a00 0000000000013a00 ffff88022cbd6c78 0000000000013a00
[218686.364059] Call Trace:
[218686.364072] [] schedule_timeout+0x1c5/0x220
[218686.364079] [] wait_for_common+0xca/0x1a0
[218686.364086] [] wait_for_completion+0x2b/0x50
[218686.364094] [] blk_execute_rq+0x87/0xe0
[218686.364102] [] scsi_execute+0x121/0x1a0
[218686.364109] [] scsi_execute_req+0xc7/0x210
[218686.364115] [] scsi_probe_lun+0x11b/0x450
[218686.364121] [] scsi_probe_and_add_lun+0x197/0x5b0
[218686.364128] [] __scsi_scan_target+0xe6/0x210
[218686.364134] [] scsi_scan_target+0xdf/0xf0
[218686.364148] [] sas_rphy_add+0x12a/0x180 [scsi_transport_sas]
[218686.364159] [] mptsas_add_end_device+0x109/0x140 [mptsas]
[218686.364168] [] mptsas_hotplug_work+0x288/0x4c0 [mptsas]
[218686.364178] [] mptsas_send_sas_event+0xc8/0x100 [mptsas]
[218686.364188] [] mptsas_firmware_event_work+0x188/0x210 [mptsas]
[218686.364198] [] run_workqueue+0x9a/0x1e0


Presumably, this is due to the device file having disappeared.

Does anybody have any thoughts or suggestions on how to troubleshoot this further? One potential avenue I might explore is using the HP supplied driver for SLES as opposed to the out of the box driver in openSUSE.

Many thanks in advance!
3 REPLIES 3
ericbosba
New Member

Re: LTO-4 drive disappears after using tar and I/O error

My issue seems to be the same as you. I'm using a StorageWorks G2 Tape Autoloader. Files under 500Ko are written, but larger report an I/O error.
dmesg give :

st0: Block limits 1 - 16777215 bytes.
st0: Sense Key : Aborted Command [current]
st0: Add. Sense: Data phase CRC error detected
st0: Sense Key : Aborted Command [current]
st0: Add. Sense: Data phase CRC error detected

And messages log :

kernel: INFO: task mt:6510 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: mt D 0000d350 0 6510 6411 0x00000080
kernel: f4569d98 00000086 652080d3 0000d350 00006791 00000000 f2c34ec4 00000000
kernel: c0a81354 c0a85e60 f2c34ec4 c0a85e60 c0a85e60 f5cda000 0dd4af05 c3e84354
kernel: 00000001 00000000 0000d350 f2c34c20 f6979af0 00000000 00000000 00000000
kernel: Call Trace:
kernel: [] schedule_timeout+0x27/0xb2
kernel: [] ? mptspi_qcmd+0x78/0x7d [mptspi]
kernel: [] ? kobject_put+0x37/0x3c
kernel: [] ? put_device+0x14/0x16
kernel: [] ? scsi_request_fn+0x3be/0x46a
kernel: [] wait_for_common+0xc3/0x10d
kernel: [] ? default_wake_function+0x0/0x12
kernel: [] wait_for_completion+0x17/0x19
kernel: [] st_do_scsi.clone.0+0x2a8/0x2cd [st]
kernel: [] st_int_ioctl+0x5c9/0x9dd [st]
kernel: [] st_ioctl+0xa69/0xd7b [st]
kernel: [] ? file_has_perm+0x89/0xa3
kernel: [] ? st_ioctl+0x0/0xd7b [st]
kernel: [] vfs_ioctl+0x1d/0x76
kernel: [] do_vfs_ioctl+0x493/0x4d1
kernel: [] ? selinux_file_ioctl+0x43/0x46
kernel: [] sys_ioctl+0x46/0x66
kernel: [] sysenter_do_call+0x12/0x28

I'm using fedora 12.
Do you have an idea ?
Sorry for my bad english.
Thanks
Curtis Rempel
New Member

Re: LTO-4 drive disappears after using tar and I/O error

While there is a similar error, it is a different hardware and software configuration. Therefore, just as I did, I would ask that you start a separate thread so as not to confuse the two issues.

I will continue post on this thread as I have updates.

Thank you
Curtis Rempel
New Member

Re: LTO-4 drive disappears after using tar and I/O error

After much testing, the drive was declared to be suspect and sent in for replacement. The replacement unit works just fine and so it was a hardware issue after all.