Operating System - HP-UX
1833460 Members
3044 Online
110052 Solutions
New Discussion

Re: I got this message in my syslog, not sure what it means.....

 
SOLVED
Go to solution

I got this message in my syslog, not sure what it means.....

Jul 28 17:37:59 hp1 vmunix:
Jul 28 17:37:59 hp1 vmunix: SCSI: Request Timeout -- lbolt: 405984558, dev: 1f000000
Jul 28 17:37:59 hp1 vmunix: lbp->state: 4020
Jul 28 17:37:59 hp1 vmunix: lbp->offset: ffffffff
Jul 28 17:37:59 hp1 vmunix: lbp->uPhysScript: 500000
Jul 28 17:37:59 hp1 vmunix: From most recent interrupt:
Jul 28 17:37:59 hp1 vmunix: ISTAT: 22, SIST0: 04, SIST1: 00, DSTAT: 80, DSPS: 00000006
Jul 28 17:37:59 hp1 vmunix: NCR chip register access history (most recent last): 339431571 accesses
Jul 28 17:37:59 hp1 vmunix: 247, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 1035, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 0, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 122780, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 3248557, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 0, ISTAT<-20
Jul 28 17:37:59 hp1 vmunix: 1226701, ISTAT: 22
Jul 28 17:37:59 hp1 vmunix: 4, SIST0: 04
Jul 28 17:37:59 hp1 vmunix: 5, SIST1: 00
Jul 28 17:37:59 hp1 vmunix: 6, DSTAT: 80
Jul 28 17:37:59 hp1 vmunix: 6, DSPS: 00000006
Jul 28 17:37:59 hp1 vmunix: 5, SCRATCHA: ff000867
Jul 28 17:37:59 hp1 vmunix: 6, DSP: 00500058
Jul 28 17:37:59 hp1 vmunix: 3, SCRATCHA1<-00
Jul 28 17:37:59 hp1 vmunix: 3, CTEST3<-04
Jul 28 17:37:59 hp1 vmunix: 0, STEST3<-82
Jul 28 17:37:59 hp1 vmunix: lsp: 6005000
Jul 28 17:37:59 hp1 vmunix: bp->b_dev: 1f000000
Jul 28 17:37:59 hp1 vmunix: scb->io_id: d17d24
Jul 28 17:37:59 hp1 vmunix: scb->cdb: 2a 00 00 5d 12 70 00 00 10 00
Jul 28 17:37:59 hp1 vmunix: lbolt_at_timeout: 405981458, lbolt_at_start: 405981458
Jul 28 17:37:59 hp1 vmunix: lsp->state: 10d
Jul 28 17:37:59 hp1 vmunix: lbp->owner: 6005000
Jul 28 17:37:59 hp1 vmunix: scratch_lsp: 0
Jul 28 17:37:59 hp1 vmunix: Pre-DSP script dump [5c33030]:
Jul 28 17:37:59 hp1 vmunix: 78346700 0000000a 78350800 00000000
Jul 28 17:37:59 hp1 vmunix: 0e000004 005003c0 80000000 00000000
Jul 28 17:37:59 hp1 vmunix: Script dump [5c33050]:
Jul 28 17:37:59 hp1 vmunix: 9f0b0000 00000006 0a000000 005003c8
Jul 28 17:37:59 hp1 vmunix: 721a0000 00000000 c0000004 0050035c
Jul 28 17:37:59 hp1 vmunix:
Jul 28 17:37:59 hp1 vmunix: SCSI: Abort Tag -- lbolt: 405984558, dev: 1f000000, io_id: d17d24
Jul 28 17:37:59 hp1 vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f000000) is POWERFAILED
10 REPLIES 10

Re: I got this message in my syslog, not sure what it means.....

one more line to it......

Jul 28 17:38:03 hp1 vmunix: LVM: PV 0 has been returned to vg[1]
Andy Monks
Honored Contributor

Re: I got this message in my syslog, not sure what it means.....

You've had a problem with a disk. it's the only with the minor number of '0x000000' (check with '' ll /dev/dsk | grep 0x000000").

Probably worth running the diags and seeing if it's detected anything. It could also be related to your patch level.
Patrick Wessel
Honored Contributor
Solution

Re: I got this message in my syslog, not sure what it means.....

What you see is a timeout of a SCSI request. This is not necessarily a hardware problem. The most common reason is heavy IO load on the bus. Check for the latest SCSI patches on your system.
Do you know what kind of devise is the disc c0t0d0? If this is a diskarray, you may want to change the pv-timeout to 180msec.
There is no good troubleshooting with bad data
John Palmer
Honored Contributor

Re: I got this message in my syslog, not sure what it means.....

It could also be SCSI related. Was anything done to the SCSI bus at this time?

The messages seem to show that the disk was recovered within a few seconds so you are probably OK but I would advise that you check the syslog and dmesg for a while.

Regards

John

Patrick Wessel
Honored Contributor

Re: I got this message in my syslog, not sure what it means.....

You will some more detailed information following this link:
http://forums.itrc.hp.com/cm/QuestionAnswer/1,1150,0x9b677e990647d4118fee0090279cd0f9,00.html
There is no good troubleshooting with bad data
Anthony deRito
Respected Contributor

Re: I got this message in my syslog, not sure what it means.....

You need to figure out if this problem is related to a hardware problem or a SCSI timeout problem. If this is related to a SCSI timeout problem you will see the following message shortly after:

vmunix:LVM: pvnum=0 returned to vg[1]

This is related to a timeout on your SCSI disk. You should increase the timeout up to a maximum of 180 seconds as follows:

pvchange -t 180 /dev/dsk/[device]

Increasing the timeout will not effect I/O performance on the disk.

For the next message:

vmunix:LVM:vg[1]: pvnum=0 dev_t=0x1f000000) is POWERFAILED

Here are a few important translation tips:

1) vg[1] - this means that the volume group happens to have a filesystem mounted on it that corresponds to the 1st valid entry in /etc/fstab. If you saw a vg[8] here, it would mean the 8th valid entry in /etc/fstab.

2) dev_t=0x1f000000 - this hex value could be easily translated into a device file by scanning the /dev/dsk directory for minor number 0x1f000000.


If the problem is related to hardware, you should investigate your hardware logs with STM. Look at output of dmesg and also contents of syslog.log.

Hope this helps.

Tony

Alex Mantelos_1
Occasional Advisor

Re: I got this message in my syslog, not sure what it means.....

You should also check the device to ensure you don't have a hardware issue.
The device is decoded as follows:
dev_t=0x1f000000

1f -this is a hex value, if you conver it to decimal you get 31. This is the major number of the device that produced this error. If you type : lsdev |grep 31 , you will probably see that this relates to the sdisk driver, telling you that this error is from one of your disks. These type of messages can also come from scsi tape drives.
The next two digits represent with card instance this disk is hanging off.
ie c0
the third zero relates to the scsi id
ie 0
the fourth zero related to the lun id
ie 0
and the last two digits are reserved.
Therefore this decodes to c0t0d0 on your system. To verify you don't have a hardware issue you can do the following:
dd if=/dev/rdsk/c0t0d0 of=/dev/null bs=64k
(if this returns without an I/O error) then more than likely it was just a timeout.
Vincente Fernandes
Valued Contributor

Re: I got this message in my syslog, not sure what it means.....

First find out the path i.e. /dev/dsk/c?t?d?.
Run a dd on this disk
dd if=/dev/dsk/c?t?d? of=/dev/null bs=4096k
If it comes out with I/O error then their is a problem with the disk. Also you can rum STM(Support Tool Manager) if you have the OnlineDiag installed on the system.
Ray Ward
New Member

Re: I got this message in my syslog, not sure what it means.....

You can also get this message if you are having problems with fiber channel emitters. You will need to check the error rate on your fiber channel hardware (If you have it that is.).
To err is human. To realy c**k things up you need a computer!
Rita C Workman
Honored Contributor

Re: I got this message in my syslog, not sure what it means.....

You have two good answers here...Anthony DeRito is correct that you can up time timeout. But since these errors just started recently, Ray Ward is probably right. You need to call HP. You probably have the older version of the Fiber Card in your box. This is a known hardware problem. I had several of these and had to have them all replaced. I highly recommend doing this, since the issues will keep popping up until you do...
Regards,