Operating System - HP-UX
1833758 Members
2626 Online
110063 Solutions
New Discussion

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

 
Carlos Zoller
Frequent Advisor

Ramdom SCSI errors on ULTRIUM I Tape Unit


Hello, I have an Ultrium-1 (200 GB) tape unit installed on an HP Server (Model: 9000/800/L3000-7x), and it is ramdomly issuing SCSI errors to syslog.log

To explain myself better, last week when I issued some commands to check the status these were the outputs:

mmscdb02:/ #ioscan -fnC tape

mmscdb02:/ #ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
tape 0 0/0/1/0.2.0 stape NO_HW DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BESTn
/dev/rmt/0mb /dev/rmt/c0t2d0BESTnb
/dev/rmt/0mn /dev/rmt/c0t2d0DDS
/dev/rmt/0mnb /dev/rmt/c0t2d0DDSb
/dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDSn
/dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSnb


mmscdb02:/ #mt status
No tape loaded

And suddenly, couple of days ago, the situation changed, some outputs from today.

mmscdb02:/ #ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
tape 0 0/0/1/0.2.0 stape CLAIMED DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDS
/dev/rmt/0mb /dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSb
/dev/rmt/0mn /dev/rmt/c0t2d0BESTn /dev/rmt/c0t2d0DDSn
/dev/rmt/0mnb /dev/rmt/c0t2d0BESTnb /dev/rmt/c0t2d0DDSnb
mmscdb02:/ #date
Fri May 18 10:40:14 SAT 2007

mmscdb02:/ #mt status
Drive: HP Ultrium 1-SCSI
Format:
Status: [0]
File: 0
Block: 0

===============================================================================================

I am the support engineer for the solution running on this platform, I connect remotely to the machine, so I can't check if the tape unit is turned on, with a tape inside, SCSI cables connected properly, etc. And customer is refusing to restart the machine, even when it is a cluster and no service interruption will happen. And also customer sent me an ouput of the ioscan from very early this morning and it seems the hardware went into NO_HW state:

mmscdb02:/ #ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
tape 0 0/0/1/0.2.0 stape NO_HW DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BESTn
/dev/rmt/0mb /dev/rmt/c0t2d0BESTnb
/dev/rmt/0mn /dev/rmt/c0t2d0DDS
/dev/rmt/0mnb /dev/rmt/c0t2d0DDSb
/dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDSn
/dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSnb

mmscdb02:/ # date
Fri May 18 04:17:05 SAT 2007
mmscdb02:/ #

The only fact that I have about this, is that there was a DDS tape before this one, and they were exchanged ONLINE, without rebooting the system.

What else can I check? Maybe a driver problem? How to be sure? I know I will have to involve HP support soon, but I just want to involve them just when I'm sure that this is a Hardware problem, and not anything else.

Any help will be highly appreciated.

Regards,
20 REPLIES 20
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Make sure that it is not a shared tape drive. One situation that comes to mind is a single tape drive being shared by a few servers. The site admins followed a very indisciplined backup schedule i.e. depending on the server that needed to be backed up they would merely switch the SCSI tape cable on the backend from one server to another. Running ioscans would change the status of the tape drive on the plugged and un-plugged server. That changed when they had serious problems and went ahead to buy one tape drive per server and adhere to a regimented backup schedule.

~hope it helps
A. Clay Stephenson
Acclaimed Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

This line says it all:

tape 0 0/0/1/0.2.0 stape NO_HW DEVICE HP Ultrium 1-SCSI

This isn't a driver problem; it's a hardware problem but it isn't clear where the problem actually lies. All you can know is that the device failed to properly respond to a SCSI INQUIRY command.

You now need to do the standard SCSI stuff: 1) Is the bus terminated in EXACTLY two places -- at the physical ends of the bus?
2) Is at least one device on the bus supplying termination power?
3) Does the total length of the bus exceed the maximum?
4) Are all the connections tight with no broken/bent pins?

Surprisingly an unterminated bus will often work almost perfectly --- the worst kind of problem. I would next try to replace the terminators.

5) Finally, you are now down to either a bad tape drive or a bad HBA.

It's time to get some boots on the ground and do hands-on diagnosis.



If it ain't broke, I can fix that.
Steven E. Protter
Exalted Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Shalom,

Tape drives work best with dedicated scsi cards.

Either its the drive, the cabling or the card. It needs to be carefully checked.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Thank you all for your answers, specially Clay for being so detailed.

Now, I am thinking now of performing some stress tests, that is I will request for a blank tape to be put in, and I'll start writing to the tape unit, reading the tape, rewind it, etc, at least two or three times to see if the problem dissapears. What do you think?

What I am afraid of, is that I call HP and no errors will happen, how to see where the problem is if it doesn't happen while testing? Somehow I need to be able to reproduce the problem. Of course, if it exists! What is breaking my mind, is why is this happening so randomly without any pattern.

Any more opinions?

Thanks.
Andrew Young_2
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Hi.

I would check your termination and cabling. I suspect that either a terminator (although the stand alone LTO-Ultrium drives are usually self terminating) or more likely a cable problem, check that the cables are properly in on both ends. We have had more than enough problems with the older SCSI connectors over the years.

It could also be a driver issue. I would change the SCSI address of the drive (if I read it correctly its currently address 2), power cycle the drive and run an ioscan -fnC tape. If the ioscan doesn't give any devices for it run insf -eC tape to install the correct device files.

Regards

AY
Si hoc legere scis, nimis eruditionis habes
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Could you post the errors that you are seeing in the syslog file here. That might help in narrowing down or isolating the cause of the problem.
spex
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Hello,

Unfortunately, intermittent problems are the most difficult ones to diagnose and resolve. In my experience, SCSI cables and terminators rarely go bad. This essentially leaves the drive, HBA, and its power source as possible culprits. If you have access to duplicate hardware, you could try swapping components out, one by one, running tests on the drive, swapping them back, and repeating. Of course, this method requires the problem to be reproducible, such that you know it will occur within a fixed period of time or after a certain number of trials.

If you don't have physical access to the site, you could have an HP CE do the same thing. Simply explain to the engineer on the phone that the problem is intermittent, and there's no guarantee it will show itself while he's on the line, but that it needs to be corrected. Based on the symptoms, they should dispatch someone.

That's what you are paying for in your service agreement.

PCS
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

This the last set of messages that ocurred this morning when customer issued a ioscan.

May 18 04:12:10 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906729446, bus: 0
May 18 04:12:10 mmscdb02 vmunix: lbp->state: 3060
May 18 04:12:10 mmscdb02 vmunix: lbp->offset: 40
May 18 04:12:10 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:10 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:10 mmscdb02 vmunix: ISTAT: 29, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 0000000a
May 18 04:12:10 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:10 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:10 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:10 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:10 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:10 mmscdb02 vmunix: lbolt_at_timeout: 906729346, lbolt_at_start: 906728846
May 18 04:12:10 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:10 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:10 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:10 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:10 mmscdb02 vmunix: 980dff00 0000000a 78351000 00000000
May 18 04:12:10 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:10 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:10 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:11 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906729546, bus: 0
May 18 04:12:11 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906729546, bus: 0
May 18 04:12:22 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906730646, bus: 0
May 18 04:12:22 mmscdb02 vmunix: lbp->state: 1060
May 18 04:12:22 mmscdb02 vmunix: lbp->offset: f8
May 18 04:12:22 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:22 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:22 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:12:22 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:22 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:22 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:22 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:22 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:22 mmscdb02 vmunix: lbolt_at_timeout: 906730546, lbolt_at_start: 906730046
May 18 04:12:22 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:22 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:22 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:22 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:22 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:12:22 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:22 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:22 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:23 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906730746, bus: 0
May 18 04:12:23 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906730746, bus: 0
May 18 04:12:34 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906731846, bus: 0
May 18 04:12:34 mmscdb02 vmunix: lbp->state: 1060
May 18 04:12:34 mmscdb02 vmunix: lbp->offset: f8
May 18 04:12:34 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:34 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:34 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:12:34 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:34 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:34 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:34 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:34 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:34 mmscdb02 vmunix: lbolt_at_timeout: 906731746, lbolt_at_start: 906731246
May 18 04:12:34 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:34 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:34 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:34 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:34 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:12:34 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:34 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:34 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:35 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906731946, bus: 0
May 18 04:12:35 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906731946, bus: 0
May 18 04:12:46 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906733046, bus: 0
May 18 04:12:46 mmscdb02 vmunix: lbp->state: 1060
May 18 04:12:46 mmscdb02 vmunix: lbp->offset: f8
May 18 04:12:46 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:46 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:46 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:12:46 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:46 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:46 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:46 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:46 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:46 mmscdb02 vmunix: lbolt_at_timeout: 906732946, lbolt_at_start: 906732446
May 18 04:12:46 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:46 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:46 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:46 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:46 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:12:46 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:46 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:46 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:47 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906733146, bus: 0
May 18 04:12:47 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906733146, bus: 0
May 18 04:12:52 mmscdb02 vmunix: SCSI: Unhandled interrupt -- lbolt: 906733648, dev: cb002002
May 18 04:12:52 mmscdb02 vmunix: lbp->state: 2060
May 18 04:12:52 mmscdb02 vmunix: lbp->offset: ffffffff
May 18 04:12:52 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:52 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:52 mmscdb02 vmunix: ISTAT: 0a, SIST0: c1, SIST1: 00, DSTAT: 80, DSPS: 00330200
May 18 04:12:52 mmscdb02 vmunix: lsp: 000000004f9a5900
May 18 04:12:52 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:52 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:52 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:52 mmscdb02 vmunix: lbolt_at_timeout: 906734146, lbolt_at_start: 906733646
May 18 04:12:52 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:52 mmscdb02 vmunix: lbp->owner: 0000000000000000
May 18 04:12:52 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:52 mmscdb02 vmunix: Script dump [0000000044a01000]:
May 18 04:12:52 mmscdb02 vmunix: 09000080 00330200 e25c0004 81fbe7f8
May 18 04:12:52 mmscdb02 vmunix: 80080000 81fbe090 80080000 81fbe090
May 18 04:12:53 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906733748, bus: 0
May 18 04:12:53 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906733748, bus: 0
May 18 04:13:05 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906734946, bus: 0
May 18 04:13:05 mmscdb02 vmunix: lbp->state: 1060
May 18 04:13:05 mmscdb02 vmunix: lbp->offset: f8
May 18 04:13:05 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:13:05 mmscdb02 vmunix: From most recent interrupt:
May 18 04:13:05 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:13:05 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:13:05 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:13:05 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:13:05 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:13:05 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:13:05 mmscdb02 vmunix: lbolt_at_timeout: 906734846, lbolt_at_start: 906734346
May 18 04:13:05 mmscdb02 vmunix: lsp->state: 5
May 18 04:13:05 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:13:05 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:13:05 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:13:05 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:13:05 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:13:05 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:13:05 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:13:06 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906735046, bus: 0
May 18 04:13:06 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906735046, bus: 0
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

You have lbolt errors which indicates that the drive most likely is failing. The line below gives the major / minor number of the failing device i.e.

May 18 04:12:34 mmscdb02 vmunix: bp->b_dev: cb002002

major number -> cb (hex) = 203 (dec) which points to the SCSI controller and the minor number 002002 maps to c0t2d0s2. So look for the SCSI controller which is attached to the c0t2d0s2 device and replacing the HBA should fix it.

~hope it helps
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit


I already agree to perform some tests with the customer, we'll do that on monday. In the meantime I tryied to decode the device that appears in the "lbolt" error, I did the following:

mmscdb02:/var/adm/syslog #dmesg | grep lbolt | grep dev

SCSI: Unhandled interrupt -- lbolt: 906733648, dev: cb002002

mmscdb02:/var/adm/syslog #lsdev 203
Character Block Driver Class
203 -1 sctl ctl

mmscdb02:/var/adm/syslog #ll -R /dev | grep 203 | grep 002002
mmscdb02:/var/adm/syslog #ll -R /dev | grep 203
brw-r----- 1 bin sys 31 0x120300 Aug 8 2006 c18t0d3
crw-r----- 1 bin sys 188 0x120300 Aug 8 2006 c18t0d3
crw-r----- 1 bin sys 203 0x007000 May 12 2006 c0t7d0
crw-r----- 1 bin sys 203 0x0c3000 Aug 8 2006 c12t3d0
crw-r----- 1 bin sys 203 0x0c3100 Aug 8 2006 c12t3d1
crw-r----- 1 bin sys 203 0x0c3200 Aug 8 2006 c12t3d2
crw-r----- 1 bin sys 203 0x0c3300 Aug 8 2006 c12t3d3
crw-r----- 1 bin sys 203 0x0c3400 Aug 8 2006 c12t3d4
crw-r----- 1 bin sys 203 0x0c3500 Aug 8 2006 c12t3d5
crw-r----- 1 bin sys 203 0x0c3600 Aug 8 2006 c12t3d6
crw-r----- 1 bin sys 203 0x0c3700 Aug 8 2006 c12t3d7
crw-r----- 1 bin sys 203 0x0d3000 Aug 8 2006 c13t3d0
crw-r----- 1 bin sys 203 0x0d3100 Aug 8 2006 c13t3d1
crw-r----- 1 bin sys 203 0x0d3200 Aug 8 2006 c13t3d2
crw-r----- 1 bin sys 203 0x0d3300 Aug 8 2006 c13t3d3
crw-r----- 1 bin sys 203 0x0d3400 Aug 8 2006 c13t3d4
crw-r----- 1 bin sys 203 0x0d3500 Aug 8 2006 c13t3d5
crw-r----- 1 bin sys 203 0x0d3600 Aug 8 2006 c13t3d6
crw-r----- 1 bin sys 203 0x0d3700 Aug 8 2006 c13t3d7
crw-r----- 1 bin sys 203 0x0e3000 Aug 8 2006 c14t3d0
crw-r----- 1 bin sys 203 0x0e3100 Aug 8 2006 c14t3d1
crw-r----- 1 bin sys 203 0x0e3200 Aug 8 2006 c14t3d2
crw-r----- 1 bin sys 203 0x0e3300 Aug 8 2006 c14t3d3
crw-r----- 1 bin sys 203 0x0e3400 Aug 8 2006 c14t3d4
crw-r----- 1 bin sys 203 0x0e3500 Aug 8 2006 c14t3d5
crw-r----- 1 bin sys 203 0x0e3600 Aug 8 2006 c14t3d6
crw-r----- 1 bin sys 203 0x0e3700 Aug 8 2006 c14t3d7
crw-r----- 1 bin sys 203 0x0f3000 Aug 8 2006 c15t3d0
crw-r----- 1 bin sys 203 0x0f3100 Aug 8 2006 c15t3d1
crw-r----- 1 bin sys 203 0x0f3200 Aug 8 2006 c15t3d2
crw-r----- 1 bin sys 203 0x0f3300 Aug 8 2006 c15t3d3
crw-r----- 1 bin sys 203 0x0f3400 Aug 8 2006 c15t3d4
crw-r----- 1 bin sys 203 0x0f3500 Aug 8 2006 c15t3d5
crw-r----- 1 bin sys 203 0x0f3600 Aug 8 2006 c15t3d6
crw-r----- 1 bin sys 203 0x0f3700 Aug 8 2006 c15t3d7
crw-r----- 1 bin sys 203 0x017000 May 12 2006 c1t7d0
crw-r----- 1 bin sys 203 0x027000 May 12 2006 c2t7d0
crw-r----- 1 bin sys 203 0x037000 May 12 2006 c3t7d0

As you can see, the device doesn't appear, or am I doing something wrong?

Regards,
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Imho since your server is PARISC the IPF device file structure is not relevant so search only for the c0t2d0 string within the /dev directory tree i.e.

# ll -tr /dev | grep "c0t2d0"

...and the dmesg shows that (whichever) the device interrupted the kernel but either it wasn't serviced or ignored. What is the date of that error in dmesg?
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

could you post the output of the following cmd.

# ioscan -fnH 0/0

~thanks
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Sandman, look at the list I sent before there's no c0t2d0, nor any 002002, I have grep'd with those ones, look:

mmscdb02:/var/adm/syslog #ll -tr /dev | grep c0t2d0
mmscdb02:/var/adm/syslog #ll -tr /dev | grep 002002
mmscdb02:/var/adm/syslog #

And that line from dmesg is from today morning, here it goes the full line:

May 18 04:12:52 mmscdb02 vmunix: SCSI: Unhandled interrupt -- lbolt: 906733648, dev: cb002002

Comments?
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

mmscdb02:/ #ioscan -fnH 0/0
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
ba 0 0/0 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782)
lan 0 0/0/0/0 btlan CLAIMED INTERFACE HP PCI 10/100Base-TX Core
/dev/diag/lan0 /dev/ether0 /dev/lan0
ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD
target 33 0/0/1/0.2 tgt CLAIMED DEVICE
tape 0 0/0/1/0.2.0 stape CLAIMED DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDS
/dev/rmt/0mb /dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSb
/dev/rmt/0mn /dev/rmt/c0t2d0BESTn /dev/rmt/c0t2d0DDSn
/dev/rmt/0mnb /dev/rmt/c0t2d0BESTnb /dev/rmt/c0t2d0DDSnb
target 0 0/0/1/0.7 tgt CLAIMED DEVICE
ctl 0 0/0/1/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ext_bus 1 0/0/1/1 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide Single-Ended
target 1 0/0/1/1.0 tgt CLAIMED DEVICE
disk 0 0/0/1/1.0.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
target 2 0/0/1/1.2 tgt CLAIMED DEVICE
disk 1 0/0/1/1.2.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
target 3 0/0/1/1.7 tgt CLAIMED DEVICE
ctl 1 0/0/1/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ext_bus 2 0/0/2/0 c720 CLAIMED INTERFACE SCSI C87x Ultra Wide Single-Ended
target 4 0/0/2/0.0 tgt CLAIMED DEVICE
disk 2 0/0/2/0.0.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c2t0d0 /dev/rdsk/c2t0d0
target 5 0/0/2/0.2 tgt CLAIMED DEVICE
disk 3 0/0/2/0.2.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c2t2d0 /dev/rdsk/c2t2d0
target 6 0/0/2/0.7 tgt CLAIMED DEVICE
ctl 2 0/0/2/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ext_bus 3 0/0/2/1 c720 CLAIMED INTERFACE SCSI C87x Fast Wide Single-Ended
target 7 0/0/2/1.2 tgt CLAIMED DEVICE
disk 4 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 305
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
target 8 0/0/2/1.7 tgt CLAIMED DEVICE
ctl 3 0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
tty 1 0/0/4/0 func0 CLAIMED INTERFACE PCI BaseSystem (103c128d)
tty 0 0/0/4/1 asio0 CLAIMED INTERFACE PCI Serial (103c1048)
/dev/GSPdiag1 /dev/mux0 /dev/tty0p2 /dev/tty0p4
/dev/diag/mux0 /dev/tty0p0 /dev/tty0p3
mmscdb02:/ #
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

My bad...the listing should have had the recursive switch "-R" instead of "-r".

# ll -R /dev/ | grep c0t2d0

and the likely culprit is:

>>ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD<<
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Here you go.

mmscdb02:/ #ll -R /dev/ | grep c0t2d0
crw-rw-rw- 2 bin bin 205 0x002000 Feb 5 12:32 c0t2d0BEST
crw-rw-rw- 2 bin bin 205 0x002080 Jan 2 15:34 c0t2d0BESTb
crw-rw-rw- 2 bin bin 205 0x002040 Jan 10 11:38 c0t2d0BESTn
crw-rw-rw- 2 bin bin 205 0x0020c0 Jan 2 15:34 c0t2d0BESTnb
crw-rw-rw- 1 bin bin 205 0x002001 Jan 2 15:34 c0t2d0DDS
crw-rw-rw- 1 bin bin 205 0x002081 Jan 2 15:34 c0t2d0DDSb
crw-rw-rw- 1 bin bin 205 0x002041 Jan 2 15:34 c0t2d0DDSn
crw-rw-rw- 1 bin bin 205 0x0020c1 Jan 2 15:34 c0t2d0DDSnb
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Sorry, I didn't get the:

and the likely culprit is:

>>ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD<<

What do you mean?

Thanks. Carlos,
Sandman!
Honored Contributor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

'tis the HBA of your tape drive
Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Hello all,

After a while the problem seems to be solved. I am putting the story here so that others can be benefitted.

The Ultrium unit was taken out since it was not under our warranty, and the original DDS4 was set back. It worked ok for a couple of days and then the NO_HW status showed up again.

I opened an HP ticket, and an engineer went to the site, and did the following:

Replace the DDS4 drive and the Tape Array 5300. It was ok for some minutes (in CLAIMED status), and then it went back to NO_HW.

Finally the cable and terminator were replaced. So far it has been ok for the last three days. Backups have been working without any failure, with no error messages on syslog.log. So, it seems the problem was the cable and terminator.

I am now closing this case, thank you all for your valuable help.

Carlos,

Carlos Zoller
Frequent Advisor

Re: Ramdom SCSI errors on ULTRIUM I Tape Unit

Thread closed.