Operating System - Linux
1839232 Members
2753 Online
110137 Solutions
New Discussion

Re: Tar command issue with Ultrium LTO-4 1760 on SAS bus

 
SOLVED
Go to solution
Curtis Rempel
New Member

Re: Tar command issue with Ultrium LTO-4 1760 on SAS bus

Just wondering if you've got any resolution to this matter yet. I've got a very similar issue with an EH919A - HP LTO4 ULTRIUM 1760 SAS Internal Drive attached to an HP SC44Ge on openSUSE 11.2.

The drive shows up as follows:

server3:~ # hwinfo --tape
30: SCSI 500.0: 10601 Tape
[Created at scsi.1452]
UDI: /org/freedesktop/Hal/devices/pci_1000_58_scsi_host_scsi_device_lun0_scsi_generic
Unique ID: Er1e.jH5QXkQpQf6
Parent ID: B35A.C24hfcjfg26
SysFS ID: /class/scsi_tape/st0
SysFS BusID: 5:0:0:0
SysFS Device Link: /devices/pci0000:00/0000:00:0a.0/0000:02:00.0/host5/port-5:0/end_device-5:0/target5:0:0/5:0:0:0
Hardware Class: unknown
Model: "HP Ultrium 4-SCSI"
Vendor: "HP"
Device: "Ultrium 4-SCSI"
Revision: "U52D"
Driver: "mptsas", "st"
Driver Modules: "mptsas"
Device File: /dev/st0 (/dev/sg3)
Device Files: /dev/st0, /dev/char/9:0, /dev/tape/by-id/scsi-3500110a0012cee22
Device Number: char 9:0 (char 21:3)
Config Status: cfg=no, avail=yes, need=no, active=unknown
Attached to: #26 (SCSI storage controller)

and mt reports:

server3:~ # mt -f /dev/st0 status
drive type = Generic SCSI-2 tape
drive status = 1174405120
sense key error = 0
residue count = 0
file number = -1
block number = -1
Tape block size 0 bytes. Density code 0x46 (unknown).
Soft error count since last status=0
General status bits on (1010000):
ONLINE IM_REP_EN
server3:~ #

The device files present are:

server3:~ # ls -l /dev/*st0*
crw-rw---- 1 root tape 9, 128 Oct 30 12:05 /dev/nst0
crw-rw---- 1 root tape 9, 224 Oct 30 12:05 /dev/nst0a
crw-rw---- 1 root tape 9, 160 Oct 30 12:05 /dev/nst0l
crw-rw---- 1 root tape 9, 192 Oct 30 12:05 /dev/nst0m
crw-rw---- 1 root tape 9, 0 Oct 30 12:05 /dev/st0
crw-rw---- 1 root tape 9, 96 Oct 30 12:05 /dev/st0a
crw-rw---- 1 root tape 9, 32 Oct 30 12:05 /dev/st0l
crw-rw---- 1 root tape 9, 64 Oct 30 12:05 /dev/st0m

/proc/scsi/scsi reports:

server3:/proc/scsi # cat scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: TSSTcorp Model: DVD+-RW TS-H653G Rev: DW10
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi4 Channel: 00 Id: 32 Lun: 00
Vendor: DP Model: BACKPLANE Rev: 1.07
Type: Enclosure ANSI SCSI revision: 05
Host: scsi4 Channel: 02 Id: 00 Lun: 00
Vendor: DELL Model: PERC 6/i Adapter Rev: 1.22
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi5 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: Ultrium 4-SCSI Rev: U52D
Type: Sequential-Access ANSI SCSI revision: 05

Small backups done using dd or tar complete without any problem. So far I've tested up to about 500MB. I'm not exactly sure yet, but somewhere in the small GB range, tar reports an I/O error on st0 and then the device files disappear - i.e. /dev/st0 is gone. The only way to get them back it seems is to not just reboot, but to power off, and wait a few minutes.

After the I/O error is reported, the following (snippet) is seen in dmesg:

[191698.629375] st0: Block limits 1 - 16777215 bytes.
[218431.969364] mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
[218432.450842] mptbase: ioc0: LogInfo(0x31170000): Originator={PL}, Code={IO Device Missing Delay Retry}, SubCode(0x0000)
[218432.450875] st0: Error 20000 (driver bt 0x0, host bt 0x2).
[218434.696385] mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
[218434.696414] st0: Error 80000 (driver bt 0x0, host bt 0x8).
[218434.696420] st0: Error on write filemark.
[218434.696441] st0: Error 10000 (driver bt 0x0, host bt 0x1).
[218434.696855] end_device-5:0: mptsas: ioc0: removing ssp device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x500110a0012cee20
[218434.696860] phy-5:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff88022f1d9000)
[218434.696870] port-5:0: mptsas: ioc0: delete port 0, sas_addr (0x500110a0012cee20)
[218434.697167] scsi target5:0:0: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x500110a0012cee20
[218492.093127] mptsas: ioc0: attaching ssp device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x500110a0012cee20
[218512.709936] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218512.709943] scsi 5:0:1:0: CDB: Inquiry: 12 00 00 00 24 00
[218517.033156] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218517.033308] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218546.973836] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218546.973843] scsi 5:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[218551.464821] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218551.464998] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218551.465008] mptscsih: ioc0: attempting target reset! (sc=ffff88022f25c5c0)
[218551.465014] scsi 5:0:1:0: CDB: Inquiry: 12 00 00 00 24 00
[218552.463027] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88022f25c5c0)
[218582.403452] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218582.403459] scsi 5:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[218586.894622] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218586.894814] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218586.894824] mptscsih: ioc0: attempting bus reset! (sc=ffff88022f25c5c0)
[218586.894830] scsi 5:0:1:0: CDB: Inquiry: 12 00 00 00 24 00
[218587.892803] mptscsih: ioc0: bus reset: SUCCESS (sc=ffff88022f25c5c0)
[218627.814237] mptscsih: ioc0: attempting task abort! (sc=ffff88022f25c5c0)
[218627.814245] scsi 5:0:1:0: CDB: Test Unit Ready: 00 00 00 00 00 00
[218632.304679] mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
[218632.304865] mptscsih: ioc0: task abort: SUCCESS (sc=ffff88022f25c5c0)
[218632.304886] mptscsih: ioc0: attempting host reset! (sc=ffff88022f25c5c0)
[218632.304893] mptbase: ioc0: Initiating recovery
[218686.364004] INFO: task mpt/0:147 blocked for more than 120 seconds.
[218686.364014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[218686.364021] mpt/0 D 0000000000000002 0 147 2 0x00000000
[218686.364031] ffff88022c7ef5d0 0000000000000046 ffff88022c7ef550 0000000000013a00
[218686.364041] 0000000000013a00 ffff88022cbd6c78 0000000000013a00 0000000000013a00
[218686.364053] 0000000000013a00 0000000000013a00 ffff88022cbd6c78 0000000000013a00
[218686.364059] Call Trace:
[218686.364072] [] schedule_timeout+0x1c5/0x220
[218686.364079] [] wait_for_common+0xca/0x1a0
[218686.364086] [] wait_for_completion+0x2b/0x50
[218686.364094] [] blk_execute_rq+0x87/0xe0
[218686.364102] [] scsi_execute+0x121/0x1a0
[218686.364109] [] scsi_execute_req+0xc7/0x210
[218686.364115] [] scsi_probe_lun+0x11b/0x450
[218686.364121] [] scsi_probe_and_add_lun+0x197/0x5b0
[218686.364128] [] __scsi_scan_target+0xe6/0x210
[218686.364134] [] scsi_scan_target+0xdf/0xf0
[218686.364148] [] sas_rphy_add+0x12a/0x180 [scsi_transport_sas]
[218686.364159] [] mptsas_add_end_device+0x109/0x140 [mptsas]
[218686.364168] [] mptsas_hotplug_work+0x288/0x4c0 [mptsas]
[218686.364178] [] mptsas_send_sas_event+0xc8/0x100 [mptsas]
[218686.364188] [] mptsas_firmware_event_work+0x188/0x210 [mptsas]
[218686.364198] [] run_workqueue+0x9a/0x1e0


Presumably, this is due to the device file having disappeared.

Does anybody have any thoughts or suggestions on how to troubleshoot this further?

Many thanks in advance!
JACC
Advisor

Re: Tar command issue with Ultrium LTO-4 1760 on SAS bus

Curtis, up to this point HP has not resolved the issue. In my case the tape drive is attached to a P212 SAS card. The initial thought from HP at this point is that there is a known issue with larger blocking factors not supported on the native P212 card. This issue was fixed and proofed in Red Hat but no historical evidence of it being proofed on SUSE.

In order to support a blocking factor larger than 248 the P212 card requires that a 256 M cache module be attached as well. HP is now in the process of proofing the issue in their support lab. As of yet the issue has not been resolved but we are getting closer.

As for what additionally could be done. I would suggest running the ADU/LTT on the card and open a support call with HP. HP will require you provide the ADU/LTT output to resolve the issue.

Lastly, I would greatly appreciate not confusing the current issue with yours. It seems it would be appropriate that you start a new thread describing your issue in detail and reference my thread if you feel it would help. Your issue is on a different OS with a different card.

I will post the resolution in detail as soon as the issue has been resolved.

Thanks

JACC
Curtis Rempel
New Member

Re: Tar command issue with Ultrium LTO-4 1760 on SAS bus

Thanks for your reply and the update.

Indeed, I should have started a new thread despite the similarities between the two problems, my bad, and my apologies. I will do just that.

In the meantime, I will continue to monitor this thread and thank you for the updates as you are able to provide them which may assist in resolution of the problem I'm experiencing.

Cheers!
JACC
Advisor

Re: Tar command issue with Ultrium LTO-4 1760 on SAS bus

From HP Support:

"the lab has been able to duplicate this issue- which is good cause now they can figure out why & get a fix. No ETA for it but will keep you in the loop"

Will update when issue comes to final conclusion.

JACC
JACC
Advisor

Re: Tar command issue with Ultrium LTO-4 1760 on SAS bus

It is hard to believe but 6 months 10 days later the issue is now resolved. The following configuration will allow for a "gtar" tape blocking factor larger then "248".

Driver Version required: 3.6.26-5 cciss

Hardware Requirement (will not work without cache module):

HP Smart Array P212 Controller AN975A Part# 462594-001

256 MB Cache Module Part # 462974-001

Battery Kit (for Battery Backed Write Cache) Part # 462976-001


Thanks to all for your patience !