Re: I got this message in my syslog, not sure what it means.....

Unix Administrator_6 · ‎07-31-2000

Jul 28 17:37:59 hp1 vmunix:
Jul 28 17:37:59 hp1 vmunix: SCSI: Request Timeout -- lbolt: 405984558, dev: 1f000000
Jul 28 17:37:59 hp1 vmunix: lbp->state: 4020
Jul 28 17:37:59 hp1 vmunix: lbp->offset: ffffffff
Jul 28 17:37:59 hp1 vmunix: lbp->uPhysScript: 500000
Jul 28 17:37:59 hp1 vmunix: From most recent interrupt:
Jul 28 17:37:59 hp1 vmunix: ISTAT: 22, SIST0: 04, SIST1: 00, DSTAT: 80, DSPS: 00000006
Jul 28 17:37:59 hp1 vmunix: NCR chip register access history (most recent last): 339431571 accesses
Jul 28 17:37:59 hp1 vmunix: 247, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 1035, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 0, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 122780, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 3248557, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 0, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 1226701, ISTAT: 22
Jul 28 17:37:59 hp1 vmunix: 4, SIST0: 04
Jul 28 17:37:59 hp1 vmunix: 5, SIST1: 00
Jul 28 17:37:59 hp1 vmunix: 6, DSTAT: 80
Jul 28 17:37:59 hp1 vmunix: 6, DSPS: 00000006
Jul 28 17:37:59 hp1 vmunix: 5, SCRATCHA: ff000867
Jul 28 17:37:59 hp1 vmunix: 6, DSP: 00500058
Jul 28 17:37:59 hp1 vmunix: 3, SCRATCHA1<-00
Jul 28 17:37:59 hp1 vmunix: 3, CTEST3<-04
Jul 28 17:37:59 hp1 vmunix: 0, STEST3<-82
Jul 28 17:37:59 hp1 vmunix: lsp: 6005000
Jul 28 17:37:59 hp1 vmunix: bp->b_dev: 1f000000
Jul 28 17:37:59 hp1 vmunix: scb->io_id: d17d24
Jul 28 17:37:59 hp1 vmunix: scb->cdb: 2a 00 00 5d 12 70 00 00 10 00
Jul 28 17:37:59 hp1 vmunix: lbolt_at_timeout: 405981458, lbolt_at_start: 405981458
Jul 28 17:37:59 hp1 vmunix: lsp->state: 10d
Jul 28 17:37:59 hp1 vmunix: lbp->owner: 6005000
Jul 28 17:37:59 hp1 vmunix: scratch_lsp: 0
Jul 28 17:37:59 hp1 vmunix: Pre-DSP script dump [5c33030]:
Jul 28 17:37:59 hp1 vmunix: 78346700 0000000a 78350800 00000000
Jul 28 17:37:59 hp1 vmunix: 0e000004 005003c0 80000000 00000000
Jul 28 17:37:59 hp1 vmunix: Script dump [5c33050]:
Jul 28 17:37:59 hp1 vmunix: 9f0b0000 00000006 0a000000 005003c8
Jul 28 17:37:59 hp1 vmunix: 721a0000 00000000 c0000004 0050035c
Jul 28 17:37:59 hp1 vmunix:
Jul 28 17:37:59 hp1 vmunix: SCSI: Abort Tag -- lbolt: 405984558, dev: 1f000000, io_id: d17d24
Jul 28 17:37:59 hp1 vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f000000) is POWERFAILED

Unix Administrator_6 · ‎07-31-2000

one more line to it......

Jul 28 17:38:03 hp1 vmunix: LVM: PV 0 has been returned to vg[1]

Andy Monks · ‎07-31-2000

You've had a problem with a disk. it's the only with the minor number of '0x000000' (check with '' ll /dev/dsk | grep 0x000000").

Probably worth running the diags and seeing if it's detected anything. It could also be related to your patch level.

Patrick Wessel · ‎07-31-2000

What you see is a timeout of a SCSI request. This is not necessarily a hardware problem. The most common reason is heavy IO load on the bus. Check for the latest SCSI patches on your system.
Do you know what kind of devise is the disc c0t0d0? If this is a diskarray, you may want to change the pv-timeout to 180msec.

There is no good troubleshooting with bad data

John Palmer · ‎07-31-2000

It could also be SCSI related. Was anything done to the SCSI bus at this time?

The messages seem to show that the disk was recovered within a few seconds so you are probably OK but I would advise that you check the syslog and dmesg for a while.

Regards

John

Patrick Wessel · ‎07-31-2000

You will some more detailed information following this link:
http://forums.itrc.hp.com/cm/QuestionAnswer/1,1150,0x9b677e990647d4118fee0090279cd0f9,00.html

There is no good troubleshooting with bad data

Anthony deRito · ‎07-31-2000

You need to figure out if this problem is related to a hardware problem or a SCSI timeout problem. If this is related to a SCSI timeout problem you will see the following message shortly after:

vmunix:LVM: pvnum=0 returned to vg[1]

This is related to a timeout on your SCSI disk. You should increase the timeout up to a maximum of 180 seconds as follows:

pvchange -t 180 /dev/dsk/[device]

Increasing the timeout will not effect I/O performance on the disk.

For the next message:

vmunix:LVM:vg[1]: pvnum=0 dev_t=0x1f000000) is POWERFAILED

Here are a few important translation tips:

1) vg[1] - this means that the volume group happens to have a filesystem mounted on it that corresponds to the 1st valid entry in /etc/fstab. If you saw a vg[8] here, it would mean the 8th valid entry in /etc/fstab.

2) dev_t=0x1f000000 - this hex value could be easily translated into a device file by scanning the /dev/dsk directory for minor number 0x1f000000.

If the problem is related to hardware, you should investigate your hardware logs with STM. Look at output of dmesg and also contents of syslog.log.

Hope this helps.

Tony

Alex Mantelos_1 · ‎08-01-2000

You should also check the device to ensure you don't have a hardware issue.
The device is decoded as follows:
dev_t=0x1f000000

1f -this is a hex value, if you conver it to decimal you get 31. This is the major number of the device that produced this error. If you type : lsdev |grep 31 , you will probably see that this relates to the sdisk driver, telling you that this error is from one of your disks. These type of messages can also come from scsi tape drives.
The next two digits represent with card instance this disk is hanging off.
ie c0
the third zero relates to the scsi id
ie 0
the fourth zero related to the lun id
ie 0
and the last two digits are reserved.
Therefore this decodes to c0t0d0 on your system. To verify you don't have a hardware issue you can do the following:
dd if=/dev/rdsk/c0t0d0 of=/dev/null bs=64k
(if this returns without an I/O error) then more than likely it was just a timeout.

Vincente Fernandes · ‎08-01-2000

First find out the path i.e. /dev/dsk/c?t?d?.
Run a dd on this disk
dd if=/dev/dsk/c?t?d? of=/dev/null bs=4096k
If it comes out with I/O error then their is a problem with the disk. Also you can rum STM(Support Tool Manager) if you have the OnlineDiag installed on the system.

Ray Ward · ‎08-03-2000

You can also get this message if you are having problems with fiber channel emitters. You will need to check the error rate on your fiber channel hardware (If you have it that is.).

To err is human. To realy c**k things up you need a computer!

Rita C Workman · ‎08-04-2000

You have two good answers here...Anthony DeRito is correct that you can up time timeout. But since these errors just started recently, Ray Ward is probably right. You need to call HP. You probably have the older version of the Fiber Card in your box. This is a known hardware problem. I had several of these and had to have them all replaced. I highly recommend doing this, since the issues will keep popping up until you do...
Regards,

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: I got this message in my syslog, not sure what it means.....

I got this message in my syslog, not sure what it means.....