- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- SCSI problem help needed identifying disk
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 12:09 PM
10-28-2003 12:09 PM
SCSI problem help needed identifying disk
I have a scsi problem but I'm not sure which disk it is. The output from dmesg is as follows.
Oct 29 12:04
...
- lbolt: 1061326722, bus: 6
scb->cdb: 28 00 03 c5 f3 80 00 00 20 00
scb->cdb: 28 00 00 1a 3d a0 00 00 60 00
SCSI: Resetting SCSI -- lbolt: 1069236335, bus: 6
SCSI: Reset detected -- lbolt: 1069236335, bus: 6
scb->cdb: 2a 00 00 00 0c 40 00 00 02 00
scb->cdb: 28 00 00 5d f9 00 00 00 60 00
SCSI: Resetting SCSI -- lbolt: 1069793138, bus: 6
SCSI: Reset detected -- lbolt: 1069793138, bus: 6
scb->cdb: 28 00 00 13 76 b0 00 00 50 00
scb->cdb: 28 00 00 10 38 20 00 00 60 00
SCSI: Resetting SCSI -- lbolt: 1074152496, bus: 6
SCSI: Reset detected -- lbolt: 1074152496, bus: 6
scb->cdb: 28 00 00 3a c1 80 00 00 10 00
scb->cdb: 28 00 00 08 5d 00 00 00 80 00
SCSI: Resetting SCSI -- lbolt: 1093971586, bus: 6
SCSI: Reset detected -- lbolt: 1093971586, bus: 6
scb->cdb: 2a 00 00 bb 9b 3c 00 00 02 00
scb->cdb: 28 00 00 2d 25 80 00 00 80 00
I'm assuming that the bus 6 refers to the scsi devices with device files c6t2d0 (for example). There are only three disks on this channel. scsi id's 2 12 and 13.....
Any help in identifying the culprit greatly appreciated.
when these messages appear the system seems to freeze for 20-30 seconds, and then resumes normal operation.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 12:24 PM
10-28-2003 12:24 PM
Re: SCSI problem help needed identifying disk
Does it remain any messages in the syslog?
If installed the STM Diagnosic tool, have a look at the eventlog in /var/opt/resmon/log directory.
It will let you know and find more advanced symptoms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 12:40 PM
10-28-2003 12:40 PM
Re: SCSI problem help needed identifying disk
The messages also appear in the syslog, but no more than the messages from dmesg (although different order).
Oct 29 11:12:06 ceeng vmunix: scb->cdb: 2a 00 00 00 cf f0 00 00 10 00
Oct 29 11:12:06 ceeng vmunix: scb->cdb: 28 00 00 5d f9 00 00 00 60 00
Oct 29 11:12:07 ceeng vmunix: SCSI: Resetting SCSI -- lbolt: 1171439936, bus: 6
Oct 29 11:12:07 ceeng vmunix: SCSI: Reset detected -- lbolt: 1171439936, bus: 6
I checked the event log as suggested, but the last entry was from August 3rd. It is however an entry for one of the disks I thought might be the problem, and was a recoverable read error.
I'll do some non-destructive STM tests on that disk and see if anything shows up. Thanks for the tip.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 12:49 PM
10-28-2003 12:49 PM
Re: SCSI problem help needed identifying disk
What you can parallely do is have a look at vgdisplay -v to see if you find any disk showing offline/ or stale LV
ioscan can also tell you when it happens.
Or have a look at STM logs if online diagnostics is installed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 01:02 PM
10-28-2003 01:02 PM
Re: SCSI problem help needed identifying disk
Then, what's the output of these commands which followed ?
# ioscan -funCdisk
Is there NO_HW or UNCLAIMED disk?
# vgdisplay -v | more
Any disk LV is Staled or not weird Symptoms of disk?
If there are some unnormal the output, take have this testing,
# dd if=/dev/rdsk/cXtYdZ of=/dev/null bs=1024
-Read/write looping test on specified disk
Good luck!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 03:26 PM
10-28-2003 03:26 PM
Re: SCSI problem help needed identifying disk
Tried all suggestions but turned up blank. Couldn't trigger the messages with the dd either. We seem to get the pauses multiple times a day, but the messages in the log seem to be less frequent, I suspect one of the disks is on the way out, but it is managing to read/write after a number of retries, and succeeding most times before the timeout period. I had a similar problem on a linux box before. Changing the offending disk fixed the problem.
There is actually a swap LV on this disk, could be the cause of the pauses. Maybe I should transfer the swap to another disk!
I couldn't see anywhere in the error messages where it specifically points to a particular disk, which made me wonder whether there was in fact a problem with the controller.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 03:50 PM
10-28-2003 03:50 PM
Re: SCSI problem help needed identifying disk
The best i suggest is since this is a production machine, and dont take a chance till the disk completely fails. Install "Event Monitoring Service" which is part of Online diagnostics, this will tell you about any powerfails or I/O error on disk with its hardware address and you can locate the disk easily and schedule some down time and replace the disk.
Dont let users get annoyed by slow preformance of the server.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2003 04:12 PM
10-28-2003 04:12 PM
Re: SCSI problem help needed identifying disk
Unless it has died for some reason, EMS should be running, although it hasn't reported any problems since August. I agree with you about replacing before it dies, it's just a matter of knowing which one :-).... I ran STM and two of the 3 disks on that chanell showed read and write errors when I did an info, the ones playing up a quantum atlas 10K III drives, I have another four of them on the other channel, and all of them show zero errors. I checked the scsi cables, and one connector which went to the last disk on the chain didn't have it's thumb screws tightened up, have done that now just in case, but suspect it is something more than that.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2003 01:16 AM
10-30-2003 01:16 AM
Re: SCSI problem help needed identifying disk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 10:51 AM
11-13-2003 10:51 AM
Re: SCSI problem help needed identifying disk
I did pull the Controller out and reseat it, no change. It is in the correct slot (it's an L1000) so I don't have any choice but to put it in the "turbo" slots.
I suspect that the disk has been marginal for some time, and it's finally started to get to the point where the errors aren't recoverable. It's one of two disks in a mirror, so I should be able to reduce the mirror and do some more extensive tests on it now that I know which one :-)
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 12:25 PM
11-13-2003 12:25 PM
Re: SCSI problem help needed identifying disk
should I run mediainit, stm exerciser (this doesn't ever seem to do much). I'd like to get some hard evidence for warranty claim.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 02:42 PM
11-13-2003 02:42 PM
Re: SCSI problem help needed identifying disk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:01 PM
11-13-2003 03:01 PM
Re: SCSI problem help needed identifying disk
STM > TOOLS > UTILITY > RUN > LOGTOOL > FILE > VIEW > RAW SUMMARY.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:21 PM
11-13-2003 03:21 PM
Re: SCSI problem help needed identifying disk
I thought the problem was probably termination or cable or controller before, as all of the disks on the 0/4/0/1 path were showing similar numbers of errors.
I'm getting a new ultrium on Monday, and the IBM drive has already been replaced (it died completely). It's scsi ID 13 that seems to be the main culprit, ie the Atlas 10K3 73GB drive (based on this mornings syslog messages)
I have added the syslog messages from this morning on the end of the summary.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:23 PM
11-13-2003 03:23 PM
Re: SCSI problem help needed identifying disk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:26 PM
11-13-2003 03:26 PM
Re: SCSI problem help needed identifying disk
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:33 PM
11-13-2003 03:33 PM
Re: SCSI problem help needed identifying disk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:34 PM
11-13-2003 03:34 PM
Re: SCSI problem help needed identifying disk
Verify the disks with this command to find vg1 and pv 5.
# strings /etc/lvmtab
First vg is vg1.
5th disk is pv 5.
######################################
(239) 0/4/0/1.13.0 = QUANTUMATLAS10K3_73_WLS
(208) 0/4/0/1.2.0 = SEAGATEST318203LW
(216) 0/4/0/1.12.0 = QUANTUMATLAS10K3_73_WLS
Check the firmware on these. Increase the timeout:
# pvchange -t 160 /dev/rdsk/cXtYd0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:40 PM
11-13-2003 03:40 PM
Re: SCSI problem help needed identifying disk
Use 'diskinfo' to find the firmware version. See if there are upgrades.
# diskinfo -v /dev/rdsk/cXtYdZ
Please attach:
sar -d 5 5
sar -u 5 5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 03:58 PM
11-13-2003 03:58 PM
Re: SCSI problem help needed identifying disk
There are only 3 devices on that controller they are an 18GB Quantum Atlas 10K3, a 73GB Quantum Atlas 10K3 (both in a kingston data silo case, they aren't "HP" disks). The third device is a segate 18GB disk which is a genuine HP external disk.
The 73GB drive has one 70GB lvol on it which is one half of a lotus notes mirror. When the errors started this morning database compacting was occuring.
The 18GB quantum drive has secondary swap, progress 4GL compiled programs, spool files from the progress system, and a D3 (pick) raw partition (note that all of these are mirror copies as well).
The 18GB seagate disk has a lvol for squid, an lvol for samba, and an lvol for the test progress database, none of these are particulary high volume, none are mirrored.
The sar output is probably a bit pointless at the moment as I'm running a non destructive read/write excerciser on the 73GB disk, but I've attached it anyway.
Unfortunately since the disks are non-hp, firmware upgrade is out of the question. Maxtor don't seem to release new versions.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 04:01 PM
11-13-2003 04:01 PM
Re: SCSI problem help needed identifying disk
That's a handy script, I might put it in as a cron job. Unfortunately at the moment no disks have actually failed, so it didn't pick anything up, but It would have been very handy a few months ago, when I went for about 2 months not realising one of the disks in a mirror had completely failed!!!
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 04:07 PM
11-13-2003 04:07 PM
Re: SCSI problem help needed identifying disk
Just on the io timeout, I'm wondering why the other identical disks (other half of the mirror) on the other channel of the controller wouldn't be getting any errors, there are actually four disks on the other channel, so theoretically it should be even more busy.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 05:43 PM
11-13-2003 05:43 PM
Re: SCSI problem help needed identifying disk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 07:50 PM
11-13-2003 07:50 PM
Re: SCSI problem help needed identifying disk
When you say the filesystem layout do you want the sizes and relative position on the disks, or just for instance BDF output? I'll do a second sar output now that the excercise has finished, but its a bit quiet now (6:35PM)
I've done the sar output and a bdf. It would probably be more meaningfull though if I did it at a peak time like 11:30AM.
Regards,
Tony.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2003 07:51 PM
11-13-2003 07:51 PM