General
1833780 Members
2412 Online
110063 Solutions
New Discussion

AL_PA errors on HP-UX

 
Chris Olson
Occasional Contributor

AL_PA errors on HP-UX

Hello. I recently had a fibre channel problem, that caused me considerable time and grief. The problem has since been fixed, but I'm still trying to fully understand what happened.

To briefly describe the situation... I had an N-class server with two A3740A HBAs attached directly to an Hitachi 7700e disk array. We're migrating to a SAN using 4 Brocade 2800 silkworm switches. The first step was to move the server and disks to connect through the switches. That worked fine after I turned on QuickLoop on the switches. Next I introduced a Qualstar LTO tape library to the same switch. I then started getting the following errors from the kernel:

"fcpbh: taking the following device offline because it's AL_PA is not the same as it's hard address AL_PA=0x02 (corresponding loop_id=0x7c) hard address=0xef"

The disks were unusable with NO_HW at that point. Rebooting the server resulted in a system panic with the same AL_PA error. I removed the tape library and there was no change. I even went so far as to remove the switches and reconnect the server directly to the Hitachi array again. STILL the server would panic with the AL_PA error upon reboot. Not until the Hitachi engineer did a hardware reset of the CHA board on the array did the problem go away.

Can anyone help me understand what that AL_PA error means? Does it mean there's an AL_PA conflict, and if so, why doesn't the server just negotiate a new AL_PA when it boots up? When it says "taking the device offline" does it mean it's taking the A3740 HBA offline or the hitachi disk LUNs offline?

Any insight would be greatly appreciated!
8 REPLIES 8
Frank Skibbe
Advisor

Re: AL_PA errors on HP-UX

This coud because by serval problems.
The interface must be power cycle to get a new map of the device file reboot is not the same as the power cycle of the server. The loop address must be in the hard address range of the server. If the disk array change the WWN the diskarray must change the name to the old WWN back or power cycle is requert by the server this is only for the old A3740A Card.
melvyn burnard
Honored Contributor

Re: AL_PA errors on HP-UX

HP servers and their FC Interfaces do NOT support Soft Physical Addressing (SPA). What you were seeing was the result of some piece of hardware out on the FC that had set itself up for the same HPA (Hard Physical Address) of the interface in the HP server. This then causes the software to attempt to reset the HPA to an SPA, which is not supported, and results in the panic you saw.
Hope that helps
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Chris Olson
Occasional Contributor

Re: AL_PA errors on HP-UX

Hmmm. When the servers are directly connected to the disks, they all have an AL_PA of 0x01. If they don't handle getting new address well, does that mean that two HP-UX boxes with can't be on the same loop without conflicting?
CTS Support
Advisor

Re: AL_PA errors on HP-UX

I would setup zones on the Brocade Switch so that no two servers could see each other. (especially if you are in Quick loop mode!)

in other words only allow one server to be in any one zone.

Also I assume there are some fiber/scsi bridges involved with your setup... check to see if it is using hard addressing or soft... I remember an HP Tech telling me that you have to use hard addressing otherwise some wierd things may happen!!

Chris Olson
Occasional Contributor

Re: AL_PA errors on HP-UX

-SOLUTION-
OK...I think I've finally figured out what's going on. The AL_PA errors I was getting were not talking about the HBA device on the server, but rather the disks it was seeing. The Hitachi array had it's hard AL_PA set to '0xef'. When it was plugged into the switches, there was already an '0xef' address on the loop, so the Hitachi took '0x02'. Apparently it's a "feature" of HP-UX that wont allow devices on a loop to have a soft address which doesn't match it's hard address. I suppose this is to ensure you're talking to the storage you're expecting to. When the HP-UX box would boot and see that the disks had taken a soft address, it would panic.

The fix was to set the Hitachi's hard address to something that was unique for that loop, so it wouldn't have to get a soft address.

Also, zoning the Brocade does not help this situation. Even if nodes on the switch are in seperate zones, if they're using QuickLoop, they're in the same loop and still must have unique addresses within that loop. Aparently, zoning does not prevent the loop initialization traffic from propogating, only data traffic.

Also, the HP-UX box was quite happy to negotiate a '0x02' address (once the Hitachi had released that one) since the other HP-UX box had taken '0x01', so from what I can see, the HBAs are happy getting soft addresses as long as their storage had hard addresses.
Erik Tong
Advisor

Re: AL_PA errors on HP-UX

Sounds like you have the answers. Here are a few more details FYI.

Yes, soft addresses are not allowed to prevent data corruption.

HBAs on HP-UX use soft addresses. This is allowed for hosts.

Why did the controller need to be reset before it would work?
The FC initialization protocol has 4 phases. The phases in order are (Abbreviated) Fabric (LIFA), Previous (LIPA), Hard (LIHA), Soft (LISA). When a conflict in address occurs, one of the devices must claim a soft address during the LISA phase. From then on, the port that took the soft address will try to continue to use that address by claiming it during the LIPA (previous address) phase. The only way out is to correctly set a unique address for the port (which will "reset" the addressing on a Hitachi/XP) OR remove the conflicting addressed device AND reset the port which is using the soft address.
Vincent Fleming
Honored Contributor

Re: AL_PA errors on HP-UX

This is a little off-topic, but...

As an FYI on that tape unit; HP does not support tapes and disks in the same QuickLoop SAN - only on Fabric Login SANs. The reason for this is that a LIP would cause the tape unit to stop and rewind, even if it is in operation. (a LIP is translated into a SCSI reset)

So, to avoid problems you should either have a seperate tape SAN or switch to Fabric Login mode.

Good luck
No matter where you go, there you are.
Bill McNAMARA_1
Honored Contributor

Re: AL_PA errors on HP-UX

as an extra on this,
The panic is caused by the fc driver on sys init, usually the iosearch. The panic routine is called to ensure that when your system boots up and tries to initialise the loop that it points the device file to the correct place. You can imagine the about of bother you would have had if a device file ended up pointing to the wrong disk when the system booted up...
When adding stuff onto an fc loop,
ioscan -fnk | grep 255
and look at all the loop id's.
Before adding the new hw change the devices loopid to something unique and power cycyle it so that the new change applies.
ioscan -fnH 8/8
or whatever hardware path the new device will should be on.
insf -eH 8/8.8.0.255.1.2
or whatever hw path of the newly claimed h/w
If you don't see the new device verify that it has got it's unique loop id... from the switch or from the device ui.

AL_PA means btw, Arbitrated Loop Physical Address.. not being the same as it's sw address means it didn't get assigned a loop id like it wanted. Note that all devices with s/w loop ids don't get shown on an ioscan.
(the host bus adapters have s/w loop ids and are the only devices allowed with sw loop ids.)

Later,
Bill


It works for me (tm)