Operating System - HP-UX
1847499 Members
5553 Online
110265 Solutions
New Discussion

Keep losing device path of SAN tape drives

 
SOLVED
Go to solution
Coolmar
Esteemed Contributor

Keep losing device path of SAN tape drives

Our IBM tape drives are attached to our SAN. We run TSM backups as well as LAN-Free backups. The problem that we are having is that after any power outtage or SAN issues, the tape device paths change. Then we have to go through a very involved process to set the device paths back and it usually involves the server being rebooted after hours. Does anyone have any idea why this would keep happening or what we can do to fix it?
26 REPLIES 26
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

The san folks are suggesting we setup "persistant bindings" for the HBA - I am not sure that will fix our problem because it is not just the HPs that are having the problem...all the servers are. Anyway, we still should test it. Does anyone know how to setup persistant bindings?
Steven E. Protter
Exalted Contributor

Re: Keep losing device path of SAN tape drives

Shalom,

Its probably happening due to instability of the SAN, perhaps LIP storms and other kinds of fabric network congestion.

You might want to look at the SAN and analyze it.

It could also be due to SCSI device conflicts, driver issues on the Fiber cards, bad fiber cards or an intermittant problem on the tape hardware.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Unfortunately we have "san people" who are the only ones allowed to look at the SAN. I doubt there is a FC card problem as this happens on each and every server each time the SAN is rebooted.
Solution

Re: Keep losing device path of SAN tape drives

Sally,

There's no such thing as 'persistent binding' for HP-UX - you only get this on windows and other OS's that track FC devices by WWN.

HP-UX tracks FC devices by domain and FCID. If the FCID of the device changes, or the domain the device is plugged into changes then the device path will change.

Domains and FCIDs are the responsibility of your SAN team - push the issue back to them...

For example - if the FC switches are Cisco, it's possible for FCIDs to be non-persistent across switch reboots - also some FC/SCSI muxes used in tape libraries can choose a different FCID every time they are restarted.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
TwoProc
Honored Contributor

Re: Keep losing device path of SAN tape drives

It could be that when you're having SAN problems someone is rebooting/fooling with the tape robots incorrectly. We used to get this same behavior when an admin would reboot the tape library incorrectly after a SAN episode. This was due to the fact that the fiber interface controllers (blade runner, etc) were left "up" while the tape drives were put off line (while shutting down). Or, it also occurrs when the library was brought back up and not waiting for the drives to come fully up before turning on the fiber interface controllers (blade runner or fiber to scsi interface cards, boxes, etc). This usually made the interface controller start reassigning luns to things.

When rebooting the library - trying turning on the tape drives first, and letting them spin up and check themselves out fully before turning on all of the brainier parts of system - if you can.

To prevent this from occuring again we've taken the following steps:

Go to the tape library and document what drive assignments are made to what virtual devices (LUNS if you will) and print that out and tape it to the inside or outside of the library. Be familiar with how to make changes to this and be able to quickly put them back.

Create our own devices with the mknod command, exactly like the major and minor numbers that were created by insf, calling them something unique, for example for yourself you could use: ibmtlib_d1, ibmtlib_d2, ibmtlib_d3, etc.

Create symbolic links like "D1" and make them point to actual drive device we created. This gives us a nice "handle" we can assign to whatever we need in DataProtector (in your case TSM), without having to reconfigure the software each time we have an event.

Create our own device for the above for the robot interface, "ibmtlib_robot" would work nice.

Create a symbolic link named "robot" for the device above.

In your software, put in the symbolic link names for every device for each tape drive and robot, and remove the "actual" names.

In a nice safe place where you can remember where you put it - create a cpio backup of all of the things above that you created.

cd /dev/rmt
find . -name "ibmtlib_*" -o -name "D[1-9]" -o -name "robot" | cpio -pdmvu /root/tapelibrary_device_backup

This is an important step! When rebooting your server - make sure that the tape library is fully UP (in the order of drives, then interfaces) , then all SAN components along the path(s) - and then the Unix Servers (always last). We've found that "rushing" these steps tends to cause problems.

Now, if your devices ever get changed on reboot, just restore the files from the saved area back into /dev/rmt and all should be fine. If it's not, then the library drive assignment got rearranged. You should have a document attached to the outside of the tape library telling you which virtual LUN assignments to make for each tape drive to put it back the way it was. Put everything back to the way it is supposed to be. And everything should start working fine. You won't have to touch your software configuration in TSM because all you're referencing in there from here on out is symbolic links which can be changed to point to another device anytime you want to from the Unix level. Also, the reason that you created your own naming convention for the devices in the first place, is to greatly lessen the chances that they would be overwritten during a reboot in the first place.

If you do the above steps, then next time the most you should have to do is go to the tape library and reconfigure each tape drive LUN assignent, and 90% of the time, you'll need no other changes to any other portions of your system. Also, if you are careful in the order in which you reboot your tape library, SAN and servers, you should have this occurring less and less over time.
We are the people our parents warned us about --Jimmy Buffett
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Thanks John but the paths just change while the system is still up. We all of a sudden notice that all the devices are screwed up and we have to redo all the devices (change the ioconfig) and reboot *after* everyhting changed in order to clean it up. So I think Duncan might have hit the nail on the head and I have sent his post to the SAN people and it is time for them to investigate. Like I mentioned before, this is happening to all the systems (AIX, windows, etc) whenever the SAN is rebooted or switches change, etc.

Thanks for your replies!

Re: Keep losing device path of SAN tape drives

Sally,

Do you have an example of a 'before' and 'after' hardware path for the same tape device? Looking at what changes in the hardware path should tell us whether its a problem with changing FCIDs (as I suggested), or with changing tape LUN numbers (caused by the FC/SCSI MUX thinking the tape devices are 'new') as John suggested.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Well according to IBM we have to set persistent bindings on the HBA card itself. They realize that HP can't do it but it can be done on the card. I have no idea how to even attempt that and how it can be done.

Re: Keep losing device path of SAN tape drives

Sally,

I take no joy in saying this, and I'm happy to be proved wrong, but I think IBM is talking absolute cr*p.

There is NO SUCH THING AS PERSISTENT BINDING IN HPUX. and there are NO PARAMETERS YOU CAN CHANGE ON THE CARD! (Actually if your system is an Integrity server there are one or two that can be changed relating to EFI drivers used if booting off SANs, but none of these have anything to do with peristent binding.)

I have many many customers who use FC attached tape drives (OK none of them use IBM tape drives but nevertheless...) and NONE of them have this problem or had to setup any peristent binding. *Many* of them had problems with incorrect settings on the SAN or on the tape library which caused the problems you describe.

Are you able to post some 'before' and 'after' hardware paths for these tapes? so we can at least determine whether its the FCID or LUN that is changing?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

I know and I tell them but they just won't budge (the SAN people) because IBM tells them that all we have to do is change the persistant bindings. Here is what they say:


There are two levels of persisten binding: driver level and operating system level. On IBM servers this advanced configuration is set via HBA utilities such as SAN Surfer or FASTMSJ. OF course this varies dependent on the manufacturer of the Host Bus Adapter that is being used. As an example I have included both an excerpt from and the actual document discussing HP-UX implementation on an Emulex HBA:
* At the Emulex lpfc driver level, persistent binding can guarantee that target assignments are preserved between reboots, provided the same devices are present. An FC device may bind to a predefined ID based on teh FC device's WWPN, WWNN, or D_ID.
* At the OS level, this predefined ID becomes the hardware path of the FC LUN. HP-UX uses this ID to preserve the mapping between the I/O object and a set of special device files. To view the mapping, run the ioscan command.
* The lpfc.conf file, located in the /opt/lpfc/conf directory, contains persistent binding variables as well as all of the variables that control driver utilization.
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

tape 0 0/7/0/0.22.1.255.12.3.0 atdd CLAIMED DEVICE IBM ULT3580-TD2
tape 2 0/7/0/0.22.1.255.0.0.0 atdd NO_HW DEVICE IBM ULT3580-TD2
/dev/rmt/2m /dev/rmt/2mnb /dev/rmt/c20t0d0BESTn
/dev/rmt/2mb /dev/rmt/c20t0d0BEST /dev/rmt/c20t0d0BESTnb
/dev/rmt/2mn /dev/rmt/c20t0d0BESTb


Tape 2 was the correct path of the tape drive but says NO_HW now because the path changed out of the blue. Tape 0 is now the new path to the same tape drive. Our path for drive 2 has to match TSM's .... so we have to reconfigure everything.

Re: Keep losing device path of SAN tape drives

Ok, what IBM is referring to here is actual Emulex cards such as a LP10000 for which you can read the driver install and config guide here:

http://www.emulex.com/ts/downloads/hpux/rel/42005/pdf/set.pdf

And this does include data about how to set persistent binding using the Emulex driver.

Of course, almost *no-one* uses the actual Emulex cards themselves on HP9000 or Integrity systems - they use the OEM'd products from HP such as A6795A or A6826A and others... these may be Emulex cards underneath (some are some aren't - can't remember which), but the point is that they *don't* use the emulex drivers - they use HP's own drivers and as such there's nowhere to setup persistent binding.

As I indicated before - the persistence in HP-UX is to the FCID - the assignment of which is usually controlled by the SAN switch. Do you know what model of SAN switch the tape library is plugged into? If we can establish that, there are a few more settings that we might be able to check.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Right...our cards are Tachyon XL2 cards. I found the document he was referring to and those Emulex cards are in Solaris systems generally.
He figures the persistent bindings will map right to the WWN of each individual tape drive rather than the scsi id which is what keeps changing.
Anyway, I have a call into the SAN people regarding the switch model and I will post as soon as I get it.

Thanks for all your help Duncan,
Sally

Re: Keep losing device path of SAN tape drives

Sally,

don't bother with the switch - I don't think that's the problem.

Looking at the HW path we can use the following document to interperet what is going on:

http://docs.hp.com/en/A6795-90006/ch01s11.html?btnNext=next%A0%BB

Now looking at you HW paths we can see:

tape 0 0/7/0/0.22.1.255.12.3.0 atdd CLAIMED DEVICE IBM ULT3580-TD2
tape 2 0/7/0/0.22.1.255.0.0.0 atdd NO_HW DEVICE IBM ULT3580-TD2

so 0/7/0/0 is the FC adapter

22 is the domain ID of the switch the tape is plugged into.

1 is the n-port ID ( this together with the domain makes up the FCID)

255 suggests this device is on an FC arbitrated loop (is it a direct atatched FC tape drive? that would make sense) and that HP-UX is using peripheral device addressing.

Now we're down to the nub of the problem - the last 3 components of the path change from 0.0.0 to 12.3.0. This suggests that the Loop ID for the tape drive changed from 0x00 (0.0) to 0xC3 (12.3), and this suggests that the tape drive is setup to use soft addresses rather than hard addresses (loop IDs that are hard coded and don't change). If you ask me, this is what needs to be changed to sort this out.

I've tried to attach the FC guide which talsk about this - pay attention to the comment on p7 about hard coded loop IDs, and the section on p14 which talks about peripheral device addressing.

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Ok, thanks Duncan. I will look into it and pass it along to the SAN people and hopefully we can get this sorted out.

Thanks so much for your help!

Re: Keep losing device path of SAN tape drives

Sally,

you don't say what sort of IBM tape drive or library you have, but I found this or a similiar reference in several IBM documents:

http://www.exabyte.com/support/online/documentation/drives/Ultrium_2_Setup_Operator_Service_Guide[1].pdf

Look at p19 (as numbered by acrobat reader - its p7 on the actual page). This talks about setting hard loop IDs

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Thanks Duncan. It is an Ultrium 3580.

I still haven't heard from the SAN folks, I think they are still completely hung up on the "persistent bindings" routine.

Re: Keep losing device path of SAN tape drives

... and here's another one - search the following pdf with the string 'loop id'


http://www.redbooks.ibm.com/redbooks/pdfs/sg245946.pdf

thx

Duncan

I am an HPE Employee
Accept or Kudo
SVB
Occasional Advisor

Re: Keep losing device path of SAN tape drives

I agree with Duncan. The only way the last 3 digits of this HW path can change, is if the loopID of the drive itself changes.

The loopID will change on a reinitialization of the loop if it is set to use soft adressing.

If I'm not mistaken, HP recommends to use hard adressing at all times.
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

This is what the SAN people have come back with:

The drives within the library are set to "auto - N" which would mean they will be set to "fabric" mode and auto negotiate the link speed. If they were set to "auto L" then this would be FC-AL or fibre channel arbitrated loop. The WWPN and the loop ID's never change within the library as the library assigns the WWPN's and loop ID's to the drives internally and they never change unless they are set to manual mode by myself or a Customer; These are unique numbers.

The settings can be verified via the totalstorage web interface by doing the following:

From the main panel, select "Manage Drives", then "drive summary". The next screen shot will display WWPN, drive settings and loop ID's for the installed drives within the library. All the drives should be set to "auto N".


Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Duncan....in the document that you attached above, I found on Chapter 15/Page 8 that it says "The Tachyon adapter does not support "fabric mode" ". So this may be the problem, but this document is 4 years old so hopefully it is supported now.
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

I also found the following in the below link. This document is release notes on the Tachyon XL FC Adapter that we are using.

5. The following is a description of a problem that occurs with the Brocade 2800 switch. It is not a problem
with the A6795A adapter or software. We document it here for A6795A customers who use Brocade
switches in Fabric mode in a Public Loop topology.
Problem: In Fabric mode, when multiple hosts are connected on the same switch using a hub (especially
two hosts), FLOGI gets timed out when a host changes from Point-to-Point to a Loop topology. This
problem occurs with Fabric mode only and not with QuickLoop.
Fix: Use the telnet command, portCfgLport, to lock the port in loop mode (L_Port) when multiple hosts
or targets are connected on a single switch port. This command insures that nport_id addresses will not
change during temporary transitions and will not cause any subsequent FLOGI time-outs. This problem
is fixed in Brocade firmware revision V2.1.9d and later releases.
The portCfgLport command is clearly documented in the Brocade release notes for V2.1.9d and in the
Brocade Man pages.

http://docs.hp.com/en/J2635-90019/J2635-90019.pdf

Re: Keep losing device path of SAN tape drives

Sally - the point is that those last 3 digits of the HW path are completely governed by the storage device - nothing you can do on HPUX would change this, wven if you *could* turn on persistent binding, as these last 3 digits represent bus/target/lun - which is effectively *behind* any WWN you might bind to.

The problems on the library, or the interface between the library and the FC switch somewhere. Again the point to UBM here is that plenty other people use FC attached tape drives with HP-UX *without* this problem!

I'm afraid I'm going on my hols now - a whole month in Australia! Lucky me. I'm disappointed I won't have the chance to see this through to resolution - maybe someone else will pick up the baton and help you resolve.

Have you tried cross-posting this question into the storage forum? You might get a different perspective there, and might just find someone else with IBM FC tape drives on HPUX

Good luck!

Duncan

I am an HPE Employee
Accept or Kudo
Coolmar
Esteemed Contributor

Re: Keep losing device path of SAN tape drives

Thank you very much Duncan and have a great vacation!!! You have been a great help to me and hopefully I will get it resolved soon.

Take care,
Sally