Operating System - HP-UX
1834137 Members
2056 Online
110064 Solutions
New Discussion

Re: Difference between "Path .. Failed" and "POWERFAILED"

 
Sviatoslav Rimdenok
Occasional Advisor

Difference between "Path .. Failed" and "POWERFAILED"

Hello hpux gurus,

I have got one question for you, since I wasn't able to find relevant documentation and posting that could give me an answer.
I have the following messages in my syslog.log file.
Some of them are :
vmunix: LVM: Path (device 0x1f235600) to PV 0 in VG 16 Failed!

Others are :
vmunix: LVM: vg[2]: pvnum=0 (dev_t=0x1f1a4300) is POWERFAILED

I know how to get logical device name, physical path, etc from hex device id. My question is :

what is a difference between messagese "Path .. to PV .. in VG .. Failed!" and "... is POWERFAILED"? The disks that had problem are SAN disks (EMC Symm to be precise) and have primary and alternative PV links configured.

I guess messages "Path .. Failed" means that certain path has failed, but device is still accessible over another PV link. Message ".. is POWERFAILED" has a meaning that all PV links are failed.

Could you please confirm it or better redirect me to technotes, documentation?

Thank you so much!

cheers,
Slava R.
12 REPLIES 12
G. Vrijhoeven
Honored Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

Hi,

I am not sure but:
If you have a alternate path defined UNIX will try the alternate path before giving a powerfailed message. To get the right disk / vg you can:
About the next i am sure.

# ll -R /dev/ | grep 1f23
get the device file and strings /etc/lvmtab for vgname.


HTH,

Gideon
Stefan Farrelly
Honored Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

POWERFAIL means the physical disk was spun down (due to mech failure or power loss to the disk). LVM/diags can detect this loss of power.

PATH TO PV...FAILED means the physical connection to the disk was lost (scsi cable or fibre cable problem or SAN problem). The disk was powered up aok, but the cabling to it was disrupted.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Sviatoslav Rimdenok
Occasional Advisor

Re: Difference between "Path .. Failed" and "POWERFAILED"

Stefan,

According to your posting - there was one disk that failed (since you translate POWERFAILED as actual disk failure) in EMC.
I don't think that disk has failed, since it is an EMC Symmetrix platform configured RAID1+0.
Moreover, HPUX connected to EMC Clariion platforms could produce sometimes (we have such setup) "POWERFAILED" LVM messages into syslog.log due to trespassing mechanism. Obviuosly trespassing is more related to PV links failure but not to disk failure (located in EMC Clariion) itself.

That makes me think that LVM could make wrong assumption regarding actual disk failure or path to the device failure.

does someone have more ideas?

thank you!
Stefan Farrelly
Honored Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

I didnt say the disk had failed - just that it had spun down. It probably spun back up again immediately and is now working fine. This is a classic sign of a disk on its way to heaven - its spins down then up again.
Weve seen this happen on our EMC's.

Anything is possible with HP-UX and its diagnostic software mis-interpreting hardware faults, but the HP diags are pretty good nowadays and ive found them to very accurate.

You can confirm what happened by contacting EMC - it should have logged or dialed out if a disk did spin down - EMC will be able to log into your EMC frame and confirm it for you.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Sviatoslav Rimdenok
Occasional Advisor

Re: Difference between "Path .. Failed" and "POWERFAILED"

Thank you Stefan!

You are right - I am going to contact EMC

Thanks for your help!

cheers,
Slava R
Steven E. Protter
Exalted Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

A common problem with disk arrays is that if there is no lun0 assigned to the server, the server itself shows up as lun0. HP-UX thinks its a disk but its not.

I do not think that is the case here.

You may have a bad fiber card or cable or a problem on the array.

Good idea to contact EMC

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bart Paulusse
Respected Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

We've had this situation several times with VA7400 disk systems. First a lot of "...Failed" messages followed by a lot of ..."POWERFAILED" messages. In our case(s) this was caused by a defective controller wich caused one of the to paths to the disk subsystem to become unavailable.
LVM doesn't see the disk system, it doesn't know about the hardware in between, it just sees losing contact to some disk device files.
Anyway, we never lost total contact to the disks, they kept being accessible via the other controller / alternate link, so POWERFAILED does not mean all pv links failed.
But in any case, it was a hardware problem so you do need to contact your EMC supplier.
I hope this helps a bit.
Gerhard Roets
Esteemed Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

Hi Slava

On a side note. For the redundant paths to work they must be configured under lvm by defnition. This is a simple procedure of vgextending the vg with the alternate path. HPUX will then pick up that this disc is part of a "current" vg and assign it as a alternate link.

***It is very extremely important NOT to pvcreate the alternate paths.*** ( I hope i stressed it enough ;) ). Since most people use pvcreate -f for some reason.

There is also a timeout value, see man pvchange, also look at the autoswitch value also in pvchange.

Regards
Gerhard
Sviatoslav Rimdenok
Occasional Advisor

Re: Difference between "Path .. Failed" and "POWERFAILED"

I talked to HP support yesterday,

basically they told me the following that corresponds to entries in my syslog.log generated by LVM:

"path .. to PV .. in VG.. Failed" corresponds to first PV link failure.
I believe it is true, since hex device id in that strings correspond to PVlink going through failed SAN switch.

".. is POWERFAILED" corresponds to events when second available PVlink failed as well (i.e. no more PVlinks available).
It contained hex device ids corresponding to second PVlinks goign through independent fabric. HP says that it could faile due to high I/O, since we have default scsi timeout configured for PVs.

HP said that despite on second link failure, it looks like everything was ok, since they saw message :

"Recovered Path..." few seconds later.

I was given some technotes that describes this issues, but I could not find that on ITSR at the moment

thank you everybody,
Slava R.
Philip Kime
Regular Advisor

Re: Difference between "Path .. Failed" and "POWERFAILED"

I have seen this problem many times on EMCs and FC60s when the pvtimeout is left at the default (30 seconds) for SAN disks. HP recommend setting the pvtimeout (pvchange -t) higher than this for SAN disks. EMC recommended setting is about 180 seconds I think.
Floyd Curtis
Frequent Advisor

Re: Difference between "Path .. Failed" and "POWERFAILED"

Normally, with alternate links configured, when one path does not respond, you will see "LVM performed a switch" logged in syslog.
It also will wait the amount of time displayed with "pvdisplay" in the IO_timeout for the disk/LUN before marking the device POWERFAILED. It will periodically repoll the disk to see if the primary path is available and will then give you a "Returned to Volume group" message.

As one of the answers sugested with EMC its sometimes recommended to use pvchange -t 180 to set the IO_Timeout higher to keep a likely good lun from being marked powerfailed. (This can happen if a BC is being performed or if the Array is defragging itself busy doing something internal).

Good Luck.
fwc
RolandH
Honored Contributor

Re: Difference between "Path .. Failed" and "POWERFAILED"

I have had the same problem like Philip. If you have a high IO on a disks it could happen that you get a SCSI timeout from a disk, because the default value is 30s in LVM. If you change this (pvchange -t) for certain storage products then you often eliminates the problem.

Roland
Sometimes you lose and sometimes the others win