- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 02:03 AM
05-18-2007 02:03 AM
Ramdom SCSI errors on ULTRIUM I Tape Unit
Hello, I have an Ultrium-1 (200 GB) tape unit installed on an HP Server (Model: 9000/800/L3000-7x), and it is ramdomly issuing SCSI errors to syslog.log
To explain myself better, last week when I issued some commands to check the status these were the outputs:
mmscdb02:/ #ioscan -fnC tape
mmscdb02:/ #ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
tape 0 0/0/1/0.2.0 stape NO_HW DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BESTn
/dev/rmt/0mb /dev/rmt/c0t2d0BESTnb
/dev/rmt/0mn /dev/rmt/c0t2d0DDS
/dev/rmt/0mnb /dev/rmt/c0t2d0DDSb
/dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDSn
/dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSnb
mmscdb02:/ #mt status
No tape loaded
And suddenly, couple of days ago, the situation changed, some outputs from today.
mmscdb02:/ #ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
tape 0 0/0/1/0.2.0 stape CLAIMED DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDS
/dev/rmt/0mb /dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSb
/dev/rmt/0mn /dev/rmt/c0t2d0BESTn /dev/rmt/c0t2d0DDSn
/dev/rmt/0mnb /dev/rmt/c0t2d0BESTnb /dev/rmt/c0t2d0DDSnb
mmscdb02:/ #date
Fri May 18 10:40:14 SAT 2007
mmscdb02:/ #mt status
Drive: HP Ultrium 1-SCSI
Format:
Status: [0]
File: 0
Block: 0
===============================================================================================
I am the support engineer for the solution running on this platform, I connect remotely to the machine, so I can't check if the tape unit is turned on, with a tape inside, SCSI cables connected properly, etc. And customer is refusing to restart the machine, even when it is a cluster and no service interruption will happen. And also customer sent me an ouput of the ioscan from very early this morning and it seems the hardware went into NO_HW state:
mmscdb02:/ #ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
tape 0 0/0/1/0.2.0 stape NO_HW DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BESTn
/dev/rmt/0mb /dev/rmt/c0t2d0BESTnb
/dev/rmt/0mn /dev/rmt/c0t2d0DDS
/dev/rmt/0mnb /dev/rmt/c0t2d0DDSb
/dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDSn
/dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSnb
mmscdb02:/ # date
Fri May 18 04:17:05 SAT 2007
mmscdb02:/ #
The only fact that I have about this, is that there was a DDS tape before this one, and they were exchanged ONLINE, without rebooting the system.
What else can I check? Maybe a driver problem? How to be sure? I know I will have to involve HP support soon, but I just want to involve them just when I'm sure that this is a Hardware problem, and not anything else.
Any help will be highly appreciated.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 02:24 AM
05-18-2007 02:24 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
~hope it helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 02:33 AM
05-18-2007 02:33 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
tape 0 0/0/1/0.2.0 stape NO_HW DEVICE HP Ultrium 1-SCSI
This isn't a driver problem; it's a hardware problem but it isn't clear where the problem actually lies. All you can know is that the device failed to properly respond to a SCSI INQUIRY command.
You now need to do the standard SCSI stuff: 1) Is the bus terminated in EXACTLY two places -- at the physical ends of the bus?
2) Is at least one device on the bus supplying termination power?
3) Does the total length of the bus exceed the maximum?
4) Are all the connections tight with no broken/bent pins?
Surprisingly an unterminated bus will often work almost perfectly --- the worst kind of problem. I would next try to replace the terminators.
5) Finally, you are now down to either a bad tape drive or a bad HBA.
It's time to get some boots on the ground and do hands-on diagnosis.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 02:53 AM
05-18-2007 02:53 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
Tape drives work best with dedicated scsi cards.
Either its the drive, the cabling or the card. It needs to be carefully checked.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 03:09 AM
05-18-2007 03:09 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
Now, I am thinking now of performing some stress tests, that is I will request for a blank tape to be put in, and I'll start writing to the tape unit, reading the tape, rewind it, etc, at least two or three times to see if the problem dissapears. What do you think?
What I am afraid of, is that I call HP and no errors will happen, how to see where the problem is if it doesn't happen while testing? Somehow I need to be able to reproduce the problem. Of course, if it exists! What is breaking my mind, is why is this happening so randomly without any pattern.
Any more opinions?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 03:15 AM
05-18-2007 03:15 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
I would check your termination and cabling. I suspect that either a terminator (although the stand alone LTO-Ultrium drives are usually self terminating) or more likely a cable problem, check that the cables are properly in on both ends. We have had more than enough problems with the older SCSI connectors over the years.
It could also be a driver issue. I would change the SCSI address of the drive (if I read it correctly its currently address 2), power cycle the drive and run an ioscan -fnC tape. If the ioscan doesn't give any devices for it run insf -eC tape to install the correct device files.
Regards
AY
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 03:31 AM
05-18-2007 03:31 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 03:31 AM
05-18-2007 03:31 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
Unfortunately, intermittent problems are the most difficult ones to diagnose and resolve. In my experience, SCSI cables and terminators rarely go bad. This essentially leaves the drive, HBA, and its power source as possible culprits. If you have access to duplicate hardware, you could try swapping components out, one by one, running tests on the drive, swapping them back, and repeating. Of course, this method requires the problem to be reproducible, such that you know it will occur within a fixed period of time or after a certain number of trials.
If you don't have physical access to the site, you could have an HP CE do the same thing. Simply explain to the engineer on the phone that the problem is intermittent, and there's no guarantee it will show itself while he's on the line, but that it needs to be corrected. Based on the symptoms, they should dispatch someone.
That's what you are paying for in your service agreement.
PCS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 03:38 AM
05-18-2007 03:38 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
May 18 04:12:10 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906729446, bus: 0
May 18 04:12:10 mmscdb02 vmunix: lbp->state: 3060
May 18 04:12:10 mmscdb02 vmunix: lbp->offset: 40
May 18 04:12:10 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:10 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:10 mmscdb02 vmunix: ISTAT: 29, SIST0: 00, SIST1: 00, DSTAT: 84, DSPS: 0000000a
May 18 04:12:10 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:10 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:10 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:10 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:10 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:10 mmscdb02 vmunix: lbolt_at_timeout: 906729346, lbolt_at_start: 906728846
May 18 04:12:10 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:10 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:10 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:10 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:10 mmscdb02 vmunix: 980dff00 0000000a 78351000 00000000
May 18 04:12:10 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:10 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:10 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:11 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906729546, bus: 0
May 18 04:12:11 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906729546, bus: 0
May 18 04:12:22 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906730646, bus: 0
May 18 04:12:22 mmscdb02 vmunix: lbp->state: 1060
May 18 04:12:22 mmscdb02 vmunix: lbp->offset: f8
May 18 04:12:22 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:22 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:22 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:12:22 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:22 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:22 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:22 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:22 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:22 mmscdb02 vmunix: lbolt_at_timeout: 906730546, lbolt_at_start: 906730046
May 18 04:12:22 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:22 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:22 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:22 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:22 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:12:22 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:22 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:22 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:23 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906730746, bus: 0
May 18 04:12:23 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906730746, bus: 0
May 18 04:12:34 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906731846, bus: 0
May 18 04:12:34 mmscdb02 vmunix: lbp->state: 1060
May 18 04:12:34 mmscdb02 vmunix: lbp->offset: f8
May 18 04:12:34 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:34 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:34 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:12:34 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:34 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:34 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:34 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:34 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:34 mmscdb02 vmunix: lbolt_at_timeout: 906731746, lbolt_at_start: 906731246
May 18 04:12:34 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:34 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:34 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:34 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:34 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:12:34 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:34 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:34 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:35 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906731946, bus: 0
May 18 04:12:35 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906731946, bus: 0
May 18 04:12:46 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906733046, bus: 0
May 18 04:12:46 mmscdb02 vmunix: lbp->state: 1060
May 18 04:12:46 mmscdb02 vmunix: lbp->offset: f8
May 18 04:12:46 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:46 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:46 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:12:46 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:12:46 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:12:46 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:46 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:46 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:46 mmscdb02 vmunix: lbolt_at_timeout: 906732946, lbolt_at_start: 906732446
May 18 04:12:46 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:46 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:46 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:12:46 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:12:46 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:12:46 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:12:46 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:12:46 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:12:47 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906733146, bus: 0
May 18 04:12:47 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906733146, bus: 0
May 18 04:12:52 mmscdb02 vmunix: SCSI: Unhandled interrupt -- lbolt: 906733648, dev: cb002002
May 18 04:12:52 mmscdb02 vmunix: lbp->state: 2060
May 18 04:12:52 mmscdb02 vmunix: lbp->offset: ffffffff
May 18 04:12:52 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:12:52 mmscdb02 vmunix: From most recent interrupt:
May 18 04:12:52 mmscdb02 vmunix: ISTAT: 0a, SIST0: c1, SIST1: 00, DSTAT: 80, DSPS: 00330200
May 18 04:12:52 mmscdb02 vmunix: lsp: 000000004f9a5900
May 18 04:12:52 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:12:52 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:12:52 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:12:52 mmscdb02 vmunix: lbolt_at_timeout: 906734146, lbolt_at_start: 906733646
May 18 04:12:52 mmscdb02 vmunix: lsp->state: 5
May 18 04:12:52 mmscdb02 vmunix: lbp->owner: 0000000000000000
May 18 04:12:52 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:12:52 mmscdb02 vmunix: Script dump [0000000044a01000]:
May 18 04:12:52 mmscdb02 vmunix: 09000080 00330200 e25c0004 81fbe7f8
May 18 04:12:52 mmscdb02 vmunix: 80080000 81fbe090 80080000 81fbe090
May 18 04:12:53 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906733748, bus: 0
May 18 04:12:53 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906733748, bus: 0
May 18 04:13:05 mmscdb02 vmunix: SCSI: First party detected bus hang -- lbolt: 906734946, bus: 0
May 18 04:13:05 mmscdb02 vmunix: lbp->state: 1060
May 18 04:13:05 mmscdb02 vmunix: lbp->offset: f8
May 18 04:13:05 mmscdb02 vmunix: lbp->uPhysScript: 81fbe000
May 18 04:13:05 mmscdb02 vmunix: From most recent interrupt:
May 18 04:13:05 mmscdb02 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80, DSPS: 00000000
May 18 04:13:05 mmscdb02 vmunix: lsp: 0000000000000000
May 18 04:13:05 mmscdb02 vmunix: lbp->owner: 000000004f9a5900
May 18 04:13:05 mmscdb02 vmunix: bp->b_dev: cb002002
May 18 04:13:05 mmscdb02 vmunix: scb->io_id: 17ae9
May 18 04:13:05 mmscdb02 vmunix: scb->cdb: 12 00 00 00 80 00
May 18 04:13:05 mmscdb02 vmunix: lbolt_at_timeout: 906734846, lbolt_at_start: 906734346
May 18 04:13:05 mmscdb02 vmunix: lsp->state: 5
May 18 04:13:05 mmscdb02 vmunix: scratch_lsp: 000000004f9a5900
May 18 04:13:05 mmscdb02 vmunix: Pre-DSP script dump [ffffffff81fbe020]:
May 18 04:13:05 mmscdb02 vmunix: 00000000 00000000 41020000 81fbe290
May 18 04:13:05 mmscdb02 vmunix: 78344000 0000000a 78351000 00000000
May 18 04:13:05 mmscdb02 vmunix: Script dump [ffffffff81fbe040]:
May 18 04:13:05 mmscdb02 vmunix: 0e000005 81fbe540 e0100004 81fbe7f8
May 18 04:13:05 mmscdb02 vmunix: 870b0000 81fbe2d8 98080000 00000005
May 18 04:13:06 mmscdb02 vmunix: SCSI: Resetting SCSI -- lbolt: 906735046, bus: 0
May 18 04:13:06 mmscdb02 vmunix: SCSI: Reset detected -- lbolt: 906735046, bus: 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 03:46 AM
05-18-2007 03:46 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
May 18 04:12:34 mmscdb02 vmunix: bp->b_dev: cb002002
major number -> cb (hex) = 203 (dec) which points to the SCSI controller and the minor number 002002 maps to c0t2d0s2. So look for the SCSI controller which is attached to the c0t2d0s2 device and replacing the HBA should fix it.
~hope it helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 07:23 AM
05-18-2007 07:23 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
I already agree to perform some tests with the customer, we'll do that on monday. In the meantime I tryied to decode the device that appears in the "lbolt" error, I did the following:
mmscdb02:/var/adm/syslog #dmesg | grep lbolt | grep dev
SCSI: Unhandled interrupt -- lbolt: 906733648, dev: cb002002
mmscdb02:/var/adm/syslog #lsdev 203
Character Block Driver Class
203 -1 sctl ctl
mmscdb02:/var/adm/syslog #ll -R /dev | grep 203 | grep 002002
mmscdb02:/var/adm/syslog #ll -R /dev | grep 203
brw-r----- 1 bin sys 31 0x120300 Aug 8 2006 c18t0d3
crw-r----- 1 bin sys 188 0x120300 Aug 8 2006 c18t0d3
crw-r----- 1 bin sys 203 0x007000 May 12 2006 c0t7d0
crw-r----- 1 bin sys 203 0x0c3000 Aug 8 2006 c12t3d0
crw-r----- 1 bin sys 203 0x0c3100 Aug 8 2006 c12t3d1
crw-r----- 1 bin sys 203 0x0c3200 Aug 8 2006 c12t3d2
crw-r----- 1 bin sys 203 0x0c3300 Aug 8 2006 c12t3d3
crw-r----- 1 bin sys 203 0x0c3400 Aug 8 2006 c12t3d4
crw-r----- 1 bin sys 203 0x0c3500 Aug 8 2006 c12t3d5
crw-r----- 1 bin sys 203 0x0c3600 Aug 8 2006 c12t3d6
crw-r----- 1 bin sys 203 0x0c3700 Aug 8 2006 c12t3d7
crw-r----- 1 bin sys 203 0x0d3000 Aug 8 2006 c13t3d0
crw-r----- 1 bin sys 203 0x0d3100 Aug 8 2006 c13t3d1
crw-r----- 1 bin sys 203 0x0d3200 Aug 8 2006 c13t3d2
crw-r----- 1 bin sys 203 0x0d3300 Aug 8 2006 c13t3d3
crw-r----- 1 bin sys 203 0x0d3400 Aug 8 2006 c13t3d4
crw-r----- 1 bin sys 203 0x0d3500 Aug 8 2006 c13t3d5
crw-r----- 1 bin sys 203 0x0d3600 Aug 8 2006 c13t3d6
crw-r----- 1 bin sys 203 0x0d3700 Aug 8 2006 c13t3d7
crw-r----- 1 bin sys 203 0x0e3000 Aug 8 2006 c14t3d0
crw-r----- 1 bin sys 203 0x0e3100 Aug 8 2006 c14t3d1
crw-r----- 1 bin sys 203 0x0e3200 Aug 8 2006 c14t3d2
crw-r----- 1 bin sys 203 0x0e3300 Aug 8 2006 c14t3d3
crw-r----- 1 bin sys 203 0x0e3400 Aug 8 2006 c14t3d4
crw-r----- 1 bin sys 203 0x0e3500 Aug 8 2006 c14t3d5
crw-r----- 1 bin sys 203 0x0e3600 Aug 8 2006 c14t3d6
crw-r----- 1 bin sys 203 0x0e3700 Aug 8 2006 c14t3d7
crw-r----- 1 bin sys 203 0x0f3000 Aug 8 2006 c15t3d0
crw-r----- 1 bin sys 203 0x0f3100 Aug 8 2006 c15t3d1
crw-r----- 1 bin sys 203 0x0f3200 Aug 8 2006 c15t3d2
crw-r----- 1 bin sys 203 0x0f3300 Aug 8 2006 c15t3d3
crw-r----- 1 bin sys 203 0x0f3400 Aug 8 2006 c15t3d4
crw-r----- 1 bin sys 203 0x0f3500 Aug 8 2006 c15t3d5
crw-r----- 1 bin sys 203 0x0f3600 Aug 8 2006 c15t3d6
crw-r----- 1 bin sys 203 0x0f3700 Aug 8 2006 c15t3d7
crw-r----- 1 bin sys 203 0x017000 May 12 2006 c1t7d0
crw-r----- 1 bin sys 203 0x027000 May 12 2006 c2t7d0
crw-r----- 1 bin sys 203 0x037000 May 12 2006 c3t7d0
As you can see, the device doesn't appear, or am I doing something wrong?
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 07:43 AM
05-18-2007 07:43 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
# ll -tr /dev | grep "c0t2d0"
...and the dmesg shows that (whichever) the device interrupted the kernel but either it wasn't serviced or ignored. What is the date of that error in dmesg?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 07:51 AM
05-18-2007 07:51 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
# ioscan -fnH 0/0
~thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 07:53 AM
05-18-2007 07:53 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
mmscdb02:/var/adm/syslog #ll -tr /dev | grep c0t2d0
mmscdb02:/var/adm/syslog #ll -tr /dev | grep 002002
mmscdb02:/var/adm/syslog #
And that line from dmesg is from today morning, here it goes the full line:
May 18 04:12:52 mmscdb02 vmunix: SCSI: Unhandled interrupt -- lbolt: 906733648, dev: cb002002
Comments?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 07:54 AM
05-18-2007 07:54 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
ba 0 0/0 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782)
lan 0 0/0/0/0 btlan CLAIMED INTERFACE HP PCI 10/100Base-TX Core
/dev/diag/lan0 /dev/ether0 /dev/lan0
ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD
target 33 0/0/1/0.2 tgt CLAIMED DEVICE
tape 0 0/0/1/0.2.0 stape CLAIMED DEVICE HP Ultrium 1-SCSI
/dev/rmt/0m /dev/rmt/c0t2d0BEST /dev/rmt/c0t2d0DDS
/dev/rmt/0mb /dev/rmt/c0t2d0BESTb /dev/rmt/c0t2d0DDSb
/dev/rmt/0mn /dev/rmt/c0t2d0BESTn /dev/rmt/c0t2d0DDSn
/dev/rmt/0mnb /dev/rmt/c0t2d0BESTnb /dev/rmt/c0t2d0DDSnb
target 0 0/0/1/0.7 tgt CLAIMED DEVICE
ctl 0 0/0/1/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ext_bus 1 0/0/1/1 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide Single-Ended
target 1 0/0/1/1.0 tgt CLAIMED DEVICE
disk 0 0/0/1/1.0.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
target 2 0/0/1/1.2 tgt CLAIMED DEVICE
disk 1 0/0/1/1.2.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
target 3 0/0/1/1.7 tgt CLAIMED DEVICE
ctl 1 0/0/1/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ext_bus 2 0/0/2/0 c720 CLAIMED INTERFACE SCSI C87x Ultra Wide Single-Ended
target 4 0/0/2/0.0 tgt CLAIMED DEVICE
disk 2 0/0/2/0.0.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c2t0d0 /dev/rdsk/c2t0d0
target 5 0/0/2/0.2 tgt CLAIMED DEVICE
disk 3 0/0/2/0.2.0 sdisk CLAIMED DEVICE HP 36.4GST336753LC
/dev/dsk/c2t2d0 /dev/rdsk/c2t2d0
target 6 0/0/2/0.7 tgt CLAIMED DEVICE
ctl 2 0/0/2/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ext_bus 3 0/0/2/1 c720 CLAIMED INTERFACE SCSI C87x Fast Wide Single-Ended
target 7 0/0/2/1.2 tgt CLAIMED DEVICE
disk 4 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 305
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
target 8 0/0/2/1.7 tgt CLAIMED DEVICE
ctl 3 0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
tty 1 0/0/4/0 func0 CLAIMED INTERFACE PCI BaseSystem (103c128d)
tty 0 0/0/4/1 asio0 CLAIMED INTERFACE PCI Serial (103c1048)
/dev/GSPdiag1 /dev/mux0 /dev/tty0p2 /dev/tty0p4
/dev/diag/mux0 /dev/tty0p0 /dev/tty0p3
mmscdb02:/ #
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 08:08 AM
05-18-2007 08:08 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
# ll -R /dev/ | grep c0t2d0
and the likely culprit is:
>>ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD<<
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 08:30 AM
05-18-2007 08:30 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
mmscdb02:/ #ll -R /dev/ | grep c0t2d0
crw-rw-rw- 2 bin bin 205 0x002000 Feb 5 12:32 c0t2d0BEST
crw-rw-rw- 2 bin bin 205 0x002080 Jan 2 15:34 c0t2d0BESTb
crw-rw-rw- 2 bin bin 205 0x002040 Jan 10 11:38 c0t2d0BESTn
crw-rw-rw- 2 bin bin 205 0x0020c0 Jan 2 15:34 c0t2d0BESTnb
crw-rw-rw- 1 bin bin 205 0x002001 Jan 2 15:34 c0t2d0DDS
crw-rw-rw- 1 bin bin 205 0x002081 Jan 2 15:34 c0t2d0DDSb
crw-rw-rw- 1 bin bin 205 0x002041 Jan 2 15:34 c0t2d0DDSn
crw-rw-rw- 1 bin bin 205 0x0020c1 Jan 2 15:34 c0t2d0DDSnb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 08:31 AM
05-18-2007 08:31 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
and the likely culprit is:
>>ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD<<
What do you mean?
Thanks. Carlos,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-18-2007 08:41 AM
05-18-2007 08:41 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 08:55 AM
06-22-2007 08:55 AM
Re: Ramdom SCSI errors on ULTRIUM I Tape Unit
After a while the problem seems to be solved. I am putting the story here so that others can be benefitted.
The Ultrium unit was taken out since it was not under our warranty, and the original DDS4 was set back. It worked ok for a couple of days and then the NO_HW status showed up again.
I opened an HP ticket, and an engineer went to the site, and did the following:
Replace the DDS4 drive and the Tape Array 5300. It was ok for some minutes (in CLAIMED status), and then it went back to NO_HW.
Finally the cable and terminator were replaced. So far it has been ok for the last three days. Backups have been working without any failure, with no error messages on syslog.log. So, it seems the problem was the cable and terminator.
I am now closing this case, thank you all for your valuable help.
Carlos,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-22-2007 08:58 AM
06-22-2007 08:58 AM