Re: Syslog madness - LVM POWERFAILED?

A. Daniel King_1 · ‎04-07-2003

Hi, folks.

I've got serveral troubling messages in syslog in a large EMC disk environment. It seems that 0x1f06b200 failed to 0x1f0cb200 and then back. This happened several times.

I think this is c6t11d0 to c12t11d2. Is this right? What might cause this?

Please let me know if I am sane. Some of the actual text of the messages follow. The messages lasted from Apr 7 02:34:44 to Apr 7
03:22:35.

Apr 7 02:34:44 fiserv vmunix: LVM: Recovered Path (device 0x1f06b200) to PV 5 in VG 16.
Apr 7 02:35:05 fiserv vmunix: LVM: vg[16]: pvnum=5 (dev_t=0x1f06b200) is POWERFAILED
Apr 7 02:35:12 fiserv vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x000000004890e000), from raw device 0x1f06b200 (with priority: 0, and current flags: 0xc0) to raw device 0x1f0cb200 (with priority: 1, and current flags: 0x0).
Apr 7 02:36:14 fiserv vmunix: LVM: Restored PV 5 to VG 16.
Apr 7 02:36:46 fiserv vmunix: LVM: vg[16]: pvnum=5 (dev_t=0x1f0cb200) is POWERFAILED
Apr 7 02:36:59 fiserv vmunix: LVM: Performed a switch for Lun ID = 0 (pv = 0x000000004890e000), from raw device 0x1f0cb200 (with priority: 1, and current flags: 0xc0) to raw device 0x1f06b200 (with priority: 0, and current flags: 0x0).
Apr 7 02:36:30 fiserv vmunix: LVM: Recovered Path (device 0x1f06b200) to PV 5 in VG 16.
Apr 7 02:37:05 fiserv vmunix: LVM: Recovered Path (device 0x1f0cb200) to PV 5 in VG 16.
Apr 7 02:38:50 fiserv vmunix: LVM: Restored PV 5 to VG 16.
Apr 7 02:39:51 fiserv vmunix: LVM: Recovered Path (device 0x1f06b200) to PV 5 in VG 16.
Apr 7 02:41:13 fiserv vmunix: LVM: Restored PV 5 to VG 16.
Apr 7 02:41:29 fiserv vmunix: LVM: vg[16]: pvnum=5 (dev_t=0x1f06b200) is POWERFAILED

Command-Line Junkie

Paula J Frazer-Campbell · ‎04-07-2003

Hi

Play safe.

Full backup and get disk changed

Paula

If you can spell SysAdmin then you is one - anon

Chris Wilshaw · ‎04-07-2003

As this is part of an EMC environment, the disk devices reported will be the primary and alternate links to a single physical disk. As both links are reporting failures, it's most likely to be a disk issue (if it was a controller problem, only one of the pair would show).

James R. Ferguson · ‎04-07-2003

Hi Daniel:

One possiblity is that the disk in question is a particularly busy one. In cases such as this, 'powerfail' events from the primary to secondary pv-links (and back again) are sometimes seen when the I/O timeout value is too low for the device.

The timeout for a physical volume can be altered with 'pvchange -t timout pv_path' as noted in the 'pvchange' man pages. A 'pvdisplay' will show the current value.

Regards!

...JRF...

S.K. Chan · ‎04-07-2003

In addition to others ..
If it's EMC a few possibilities would cause these errors ..
1- Low timeout value of PV
2- EMC disks are write protected
3- Lost of channel
Number 2 is out because I don;t see any write/read related error. Number 1 is very like, which you can fix easily by changing the timeout value to say 180sec (ie with pvchange -t). Number 3 can be easily determine, if there are other disks in the same channel that do not report any errors then you can eliminate this possibility.

KCS_1 · ‎04-07-2003

hi,

here all of them are also good threads for solving your problem.

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000062922089

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000063235338

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000065678520

have a good day!

Easy going at all.

A. Daniel King_1 · ‎04-08-2003

The pvtimeout item is exactly what I got from EMC, but I wanted a forum opinion, too.

EMC also mentioned that there was an lv timeout, which I will need to investigate.

Thanks, folks.

Command-Line Junkie

James R. Ferguson · ‎04-08-2003

Hi (again) Daniel:

Yes, there is a logical volume timeout. Have a look at the 'lvchange' man pages. Specifically, note that, "The actual duration of the request may exceed the specified IO_timeout value when the underlying physical volume(s) have timeouts which either exceed this IO_timeout value or are not integer multiples of this value."

Regards!

...JRF...

A. Daniel King_1 · ‎04-08-2003

I see all our lv timeouts as default, or infinite. I think this means we will rely upon the PV timeout, which should be fine now.

Command-Line Junkie

Eugeny Brychkov · ‎04-08-2003

The following idea may help...
Try 'balancing' across FC host HBAs - a half of LUNs are accessed through one HBA (mean primary link), another half of LUNs are accessed through another HBA
Eugeny

A. Daniel King_1 · ‎04-08-2003

Way ahead of you ... six physical paths, EMC PowerPath.

Command-Line Junkie

Tonny Sejr Kromann · ‎04-10-2003

We had the exact same problem, but with a HP XP512 disk array. It turned out to be a defective fibrechannel port in the XP. Took quite some time before we had tried everything else.

The application SAP r3 / Informix, was also affected by bad performance, even though all we have double paths.

--
Regards
Tonny

TEC-HP · ‎04-24-2003

Hi,

Had the same problem yesterday.
A case was opened at EMC:
Error on console + hang of data base
Call was passed to EMC.
All hardware tests passed ok. No hardware error was found in the Symmetrix
Only AB3E (host abort) errors were seen
note : AB3E errors are known as host aborts this means normally that the Host HBA
issues a abort on a running scsi command. This can happen due to host issues / connection time-outs ...
No real explaination was found for this at this point

HP/UX was rebooted. And all applications were restarted ok.

Application (Oracle Database hung again.)
Errors on console :
Apr 23 14:12:58 ... vmunix: SCSI: Read error -- dev: b 31 0x031000, errno: 126, resid: 2048,
Apr 23 14:12:58 ... vmunix: blkno: 8,sectno: 16, offset: 8192, bcount: 2048.
Apr 23 14:12:58 ... vmunix: LVM: Performed a switch for Lun ID = 0 (pv =0x000000005ccd2800) from raw device 0x1f031000 (with priority: 0, and current flags: 0x40) to raw device 0x1f051000 (with priority: 1, and current flags: 0x0).
Apr 23 14:12:58 ... vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f051000) is POWERFAILED

EMC box : No hardware errors seen. Only a lot of AB3E errors on the device mentioned in syslog
received error log from 11:08 All errors indicate badblock on block 8 of the symm. disk vol 120

emc61462: "ETA emc61462: Symmetrix Error Code: 0A 5110 00
when using Solutions Enabler 5.1:
Problem with symapi system call 8140 thatin some cases will not release a cache lock on Symmetrix.
==> For 5x67 microcode : Fix 19302 is released in 5567.52.29.
It is recommended for all cases when Solutions Enabler 5.1 and higher is installed.

Categories

Company

Local Language

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Knowledge Base

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Syslog madness - LVM POWERFAILED?

Syslog madness - LVM POWERFAILED?