What is going on here...

 
SOLVED
Go to solution
Todd McDaniel_1
Honored Contributor

What is going on here...

I had a few errors over the weekend that I just saw...

They appear to be fixed but I wanted some clarification from you guys... I know quite a bit about LVM Stuff but this is greek to me.


Jul 18 06:16:11 chpcfas3 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x
00000000586a9800), from raw device 0x1f0d0700 (with priority: 0, and current fla
gs: 0x40) to raw device 0x1f120700 (with priority: 1, and current flags: 0x0).
Jul 18 06:16:11 chpcfas3 vmunix:
Jul 18 06:16:11 chpcfas3 vmunix: SCSI: Read error -- dev: b 31 0x0d0700, errno:126, resid: 2048,
Jul 18 06:16:11 chpcfas3 vmunix: blkno: 8, sectno: 16, offset: 8192, bcou
nt: 2048.
Jul 18 06:16:19 chpcfas3 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x
00000000586f5080), from raw device 0x1f0d1000 (with priority: 0, and current flags: 0x40) to raw device 0x1f121000 (with priority: 1, and current flags: 0x0).
Jul 18 06:16:19 chpcfas3 vmunix: LVM: vg[3]: pvnum=0 (dev_t=0x1f120700) is POWERFAILED
Jul 18 06:16:19 chpcfas3 vmunix: LVM: vg[3]: pvnum=1 (dev_t=0x1f121000) is POWERFAILED
Jul 18 06:16:19 chpcfas3 vmunix: LVM: Restored PV 1 to VG 3.
Jul 18 06:16:37 chpcfas3 vmunix: LVM: Path (device 0x1f0d1100) to PV 0 in VG 4 Failed!
Jul 18 06:16:19 chpcfas3 vmunix: LVM: vg[3]: pvnum=0 (dev_t=0x1f120700) is POWERFAILED
Jul 18 06:16:37 chpcfas3 vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x
000000005893b080), from raw device 0x1f0d1100 (with priority: 0, and current flags: 0x40) to raw device 0x1f121100 (with priority: 1, and current flags: 0x0).
Jul 18 06:16:37 chpcfas3 vmunix: LVM: vg[4]: pvnum=0 (dev_t=0x1f121100) is POWERFAILED
Jul 18 06:16:42 chpcfas3 vmunix: LVM: Recovered Path (device 0x1f120700) to PV 0 in VG 3.
Jul 18 06:16:42 chpcfas3 vmunix: LVM: Restored PV 0 to VG 3.
Unix, the other white meat.
7 REPLIES 7
Jeff Schussele
Honored Contributor
Solution

Re: What is going on here...

Hi Todd,

You're getting errors from disk devices

c13t0d7
c13t1d0
c18t0d7
c18t1d0
c13t1d1
c18t1d1

to be exact.
I suspect a timeout issue on the c13 channel & it switches over to th alt c18 path.
You probably need to increase the timeout value from the default 30 seconds
Use either pvchange -t or lvchange -t

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
A. Clay Stephenson
Acclaimed Contributor

Re: What is going on here...

The system detected a problem in the primary path to your disk device (probably a disk array) and automatically switched to the alternate. Because the failure affected a number of devices (LUNS), I suspect that an array controller reset. If this is an array, you should run any available diagnostics on it.
If it ain't broke, I can fix that.
Todd McDaniel_1
Honored Contributor

Re: What is going on here...

Okay, thanks.

It looked as though it corrected the errors, I just wanted to know what was going on.

I will have a look at the diags.
Unix, the other white meat.
RAC_1
Honored Contributor

Re: What is going on here...

Basically these messages tell two things.

1. A PV changed to claternate path.

2. The disk power failed. Stopped spinning and has problems.

Now which disks??

ll /dev/dsk/*|egrep "0x1f12|0x1f0d1"

Anil
There is no substitute to HARDWORK
Todd McDaniel_1
Honored Contributor

Re: What is going on here...

One other monkey in the Wrench....

I had another server had the same problem at the exact same time... within a minute or so.

Now these disks are connected from a SAN to these servers.

I will go over them with my HP CE...


Thanks all.
Unix, the other white meat.
A. Clay Stephenson
Acclaimed Contributor

Re: What is going on here...

I suppose that I should add how to decode those device numbers

e.g 0x1f0d0700

The first 2 hex digits (1fhex - 31decimal) indicate the major device number. Do an lsdev and look for "31". You will find that major block device 31 is SCSI disk. Thus this is a /dev/dsk device node.

The next 2 hex digits (0dhex - 13 decimal) indicate the bus "instance" number or controller number; "c13" in this case.

The next hex digit (0) indicates the SCSI ID or target. "t0" in this case.

The next hex digit (7) indicates the LUN (d7) in this case.

The remaining 2 hex digits are device driver specific.

In summary, 0x1f0d0700 decodes to
/dev/dsk/c13t0d7.
If it ain't broke, I can fix that.
Todd McDaniel_1
Honored Contributor

Re: What is going on here...

Short Story is My CE was doing work at this time and it was related to that....

So now harm done.


PS:

OOPSIE!!!!

Seems I had two posts on this sorry... the website was hanging and I killed my sesssion and restarted IE...and did a second one without looking...

Pleae dont delete either as both have suggestions that might help others.
Unix, the other white meat.