Operating System - HP-UX
1850302 Members
2700 Online
104053 Solutions
New Discussion

Re: Syslog madness - LVM POWERFAILED?

 
SOLVED
Go to solution
A. Daniel King_1
Super Advisor

Syslog madness - LVM POWERFAILED?

Hi, folks.

I've got serveral troubling messages in syslog in a large EMC disk environment. It seems that 0x1f06b200 failed to 0x1f0cb200 and then back. This happened several times.

I think this is c6t11d0 to c12t11d2. Is this right? What might cause this?

Please let me know if I am sane. Some of the actual text of the messages follow. The messages lasted from Apr 7 02:34:44 to Apr 7
03:22:35.

Apr 7 02:34:44 fiserv vmunix: LVM: Recovered Path (device 0x1f06b200) to PV 5 in VG 16.
Apr 7 02:35:05 fiserv vmunix: LVM: vg[16]: pvnum=5 (dev_t=0x1f06b200) is POWERFAILED
Apr 7 02:35:12 fiserv vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x000000004890e000), from raw device 0x1f06b200 (with priority: 0, and current flags: 0xc0) to raw device 0x1f0cb200 (with priority: 1, and current flags: 0x0).
Apr 7 02:36:14 fiserv vmunix: LVM: Restored PV 5 to VG 16.
Apr 7 02:36:46 fiserv vmunix: LVM: vg[16]: pvnum=5 (dev_t=0x1f0cb200) is POWERFAILED
Apr 7 02:36:59 fiserv vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x000000004890e000), from raw device 0x1f0cb200 (with priority: 1, and current flags: 0xc0) to raw device 0x1f06b200 (with priority: 0, and current flags: 0x0).
Apr 7 02:36:30 fiserv vmunix: LVM: Recovered Path (device 0x1f06b200) to PV 5 in VG 16.
Apr 7 02:37:05 fiserv vmunix: LVM: Recovered Path (device 0x1f0cb200) to PV 5 in VG 16.
Apr 7 02:38:50 fiserv vmunix: LVM: Restored PV 5 to VG 16.
Apr 7 02:39:51 fiserv vmunix: LVM: Recovered Path (device 0x1f06b200) to PV 5 in VG 16.
Apr 7 02:41:13 fiserv vmunix: LVM: Restored PV 5 to VG 16.
Apr 7 02:41:29 fiserv vmunix: LVM: vg[16]: pvnum=5 (dev_t=0x1f06b200) is POWERFAILED
Command-Line Junkie
12 REPLIES 12
Paula J Frazer-Campbell
Honored Contributor

Re: Syslog madness - LVM POWERFAILED?

Hi

Play safe.

Full backup and get disk changed

Paula
If you can spell SysAdmin then you is one - anon
Chris Wilshaw
Honored Contributor

Re: Syslog madness - LVM POWERFAILED?

As this is part of an EMC environment, the disk devices reported will be the primary and alternate links to a single physical disk. As both links are reporting failures, it's most likely to be a disk issue (if it was a controller problem, only one of the pair would show).

James R. Ferguson
Acclaimed Contributor
Solution

Re: Syslog madness - LVM POWERFAILED?

Hi Daniel:

One possiblity is that the disk in question is a particularly busy one. In cases such as this, 'powerfail' events from the primary to secondary pv-links (and back again) are sometimes seen when the I/O timeout value is too low for the device.

The timeout for a physical volume can be altered with 'pvchange -t timout pv_path' as noted in the 'pvchange' man pages. A 'pvdisplay' will show the current value.

Regards!

...JRF...
S.K. Chan
Honored Contributor

Re: Syslog madness - LVM POWERFAILED?

In addition to others ..
If it's EMC a few possibilities would cause these errors ..
1- Low timeout value of PV
2- EMC disks are write protected
3- Lost of channel
Number 2 is out because I don;t see any write/read related error. Number 1 is very like, which you can fix easily by changing the timeout value to say 180sec (ie with pvchange -t). Number 3 can be easily determine, if there are other disks in the same channel that do not report any errors then you can eliminate this possibility.
KCS_1
Respected Contributor
A. Daniel King_1
Super Advisor

Re: Syslog madness - LVM POWERFAILED?

The pvtimeout item is exactly what I got from EMC, but I wanted a forum opinion, too.

EMC also mentioned that there was an lv timeout, which I will need to investigate.

Thanks, folks.
Command-Line Junkie
James R. Ferguson
Acclaimed Contributor

Re: Syslog madness - LVM POWERFAILED?

Hi (again) Daniel:

Yes, there is a logical volume timeout. Have a look at the 'lvchange' man pages. Specifically, note that, "The actual duration of the request may exceed the specified IO_timeout value when the underlying physical volume(s) have timeouts which either exceed this IO_timeout value or are not integer multiples of this value."

Regards!

...JRF...
A. Daniel King_1
Super Advisor

Re: Syslog madness - LVM POWERFAILED?

I see all our lv timeouts as default, or infinite. I think this means we will rely upon the PV timeout, which should be fine now.
Command-Line Junkie
Eugeny Brychkov
Honored Contributor

Re: Syslog madness - LVM POWERFAILED?

The following idea may help...
Try 'balancing' across FC host HBAs - a half of LUNs are accessed through one HBA (mean primary link), another half of LUNs are accessed through another HBA
Eugeny
A. Daniel King_1
Super Advisor

Re: Syslog madness - LVM POWERFAILED?

Way ahead of you ... six physical paths, EMC PowerPath.
Command-Line Junkie
Tonny Sejr Kromann
Frequent Advisor

Re: Syslog madness - LVM POWERFAILED?

We had the exact same problem, but with a HP XP512 disk array. It turned out to be a defective fibrechannel port in the XP. Took quite some time before we had tried everything else.

The application SAP r3 / Informix, was also affected by bad performance, even though all we have double paths.

--
Regards
Tonny
TEC-HP
Frequent Advisor

Re: Syslog madness - LVM POWERFAILED?

Hi,

Had the same problem yesterday.
A case was opened at EMC:
Error on console + hang of data base
Call was passed to EMC.
All hardware tests passed ok. No hardware error was found in the Symmetrix
Only AB3E (host abort) errors were seen
note : AB3E errors are known as host aborts this means normally that the Host HBA
issues a abort on a running scsi command. This can happen due to host issues / connection time-outs ...
No real explaination was found for this at this point

HP/UX was rebooted. And all applications were restarted ok.

Application (Oracle Database hung again.)
Errors on console :
Apr 23 14:12:58 ... vmunix: SCSI: Read error -- dev: b 31 0x031000, errno: 126, resid: 2048,
Apr 23 14:12:58 ... vmunix: blkno: 8,sectno: 16, offset: 8192, bcount: 2048.
Apr 23 14:12:58 ... vmunix: LVM: Performed a switch for Lun ID = 0 (pv =0x000000005ccd2800) from raw device 0x1f031000 (with priority: 0, and current flags: 0x40) to raw device 0x1f051000 (with priority: 1, and current flags: 0x0).
Apr 23 14:12:58 ... vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f051000) is POWERFAILED

EMC box : No hardware errors seen. Only a lot of AB3E errors on the device mentioned in syslog
received error log from 11:08 All errors indicate badblock on block 8 of the symm. disk vol 120

emc61462: "ETA emc61462: Symmetrix Error Code: 0A 5110 00
when using Solutions Enabler 5.1:
Problem with symapi system call 8140 thatin some cases will not release a cache lock on Symmetrix.
==> For 5x67 microcode : Fix 19302 is released in 5567.52.29.
It is recommended for all cases when Solutions Enabler 5.1 and higher is installed.