cancel
Showing results for 
Search instead for 
Did you mean: 

MPIO redundant link failure

Adel Elamin
Frequent Advisor

MPIO redundant link failure

Dear ITRC members;
With reference to the attachment, we have dual rp3440 servers configured in high availability mode with HP service guard, everything was working perfect until we noticed the system was performing very slow, and the storage report its loop B is failed, I opened the log file on the server, and I found the following errors:
Apr 5 10:23:32 NSS1 vmunix: 0/3/1/0/4/0: Device at device id 0x20000 has disappeared from Name Server GPN_FT
Apr 5 10:23:32 NSS1 vmunix: (FCP type) response, or its 'Port World-Wide Name' has changed.
Apr 5 10:23:32 NSS1 vmunix: device id = loop id, for private loop devices
Apr 5 10:23:32 NSS1 vmunix: device id = nport ID, for fabric/public-loop devices
Apr 5 10:23:32 NSS1 vmunix: System won't be able to see LUNs behind this port.
Apr 5 10:23:32 NSS1 vmunix:
Apr 5 10:23:45 NSS1 vmunix: 0/3/1/0/4/0: Device at device id 0x20000 is back in Name Server GPN_FT (FCP type)
Apr 5 10:23:45 NSS1 vmunix: response, and its 'Port World-Wide Name' remains the same as
Apr 5 10:23:45 NSS1 vmunix: original.
Apr 5 10:23:45 NSS1 vmunix: device id = loop id, for private loop devices
Apr 5 10:23:45 NSS1 vmunix: device id = nport ID, for fabric/public-loop devices
Apr 5 10:23:45 NSS1 vmunix: System will be able to see LUNs behind this port
Apr 5 10:23:45 NSS1 vmunix: (might need to run 'ioscan' first).

I took an action and I removed the failed link (Loop B) form the storage (unplug storage Loop B), then the system perfumed good (without loop B). I requested a support from HP and I sent all requested logs, then they advice me to change the port speed of the SAN switch from Autonegotiate to fixed speed (2gbs), and I did.
After that I plugged the storage Loop B again to the switch and after a while the storage report its link is OK, sound is good, but the server doesnâ t detect Loop B.
I come to the site at the mid night to make sure loop B is working, there for I removed Loop A from the storage and I left Loop B plugged, then the server report the following:

Apr 10 01:42:46 NSS1 vmunix: LVM: VG 64 0x020000: PVLink 31 0x060000 Failed! The PV is not accessible.

I restored Loop A again and I left Loop B as it is, the server report the following:
Apr 10 01:48:40 NSS1 vmunix: LVM: VG 64 0x020000: PVLink 31 0x060000 Recovered.

Thatâ s mean Loop B never works.
I have updated HP support with this information, and they advised me to check MPIO software.
I looked at the server for any MPIO software (swlist â l product) but I couldnâ t find any installed software such as â secure pathâ
Can any one assist me in this case?
I want to make both Loop working again.

Note: same problem exist in both nodes

Regards
Adel
3 REPLIES
Matti_Kurkela
Honored Contributor

Re: MPIO redundant link failure

According to the picture you attached, your storage is a MSA1500cs. It can have one or two controllers: you apparently have two.

In that case, the firmware version of the MSA1500cs is important: the older V5.xx firmware operates the controllers in active/passive fashion, so only one controller at a time is operational. This would mean all the nodes must use either Loop A or Loop B: all should fail over when one does.

The newer V7.xx firmware works in active/active mode, which is much more convenient for High Availability set-ups.

The Compatibility Guide documents for your MSA seem to suggest that the "MPIO software" to use with HP-UX should be PV Links (aka Alternate Paths in LVM configuration), which is a standard HP-UX system feature. I think HP support overlooked the fact that you're using HP-UX.

http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c00868010/c00868010.pdf
http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c01064322/c01064322.pdf

Your original error messages seem to suggest a problem that (intermittently?) makes the Loop B switch unable to see the MSA. You should use the diagnostic features of the Loop B switch to see what happens when the fault is active: does the switch see the MSA at all, or does it see a wrong Port WWN for it?

Also check the fibre between the Loop B switch and the MSA physically: has the cable been kinked or otherwise damaged? Are the connectors clean?

MK
MK
Adel Elamin
Frequent Advisor

Re: MPIO redundant link failure

Hello Matti!

Yes!, my MSA has two controllers, currently configured in Active/Active mode. Not Active/ standby mode.

Please get a look to the attachment which includes the output of (show_tech_support) in the MSA, and the MSA report on its OCP screen Active/Active.
I may go to upgrade the firmware of the storage, but I donâ t think so this will solve my problem.

Yes! I think the MPIO software is PV link, as I see this statement in the syslog.log file, but I donâ t have any idea about configuration and the use of PVlink.

Iâ ve changed the fiber cable and the SFB as well for Loop B of the MSA, however there is a light comes from the MSA to the storage and form the switch to the MSA as well.

Iâ m going to check if the Loop B switch can detect the WWN of the storage or not, and Iâ ll update you again. (Currently Iâ m in a remote location and the SAN switch MP not connected to the LAN).

Note: the MSA loop B used to connect to port0 I have changed to port2, and there is no zone configuration as well.
Note2: I couldnâ t download and see your attached PDF

Thanks in advance
Adel Elamin
Frequent Advisor

Re: MPIO redundant link failure

Yes!

I've just checked the Loop B SAN switch and I find the switch can detect the MSA WWN.

but sitll not detectable by any server.

Regards
Adil