StoreEver Tape Storage
1752802 Members
4837 Online
108789 Solutions
New Discussion

MSL 5026 libraries connected in a Master / Slave relationship into a Modular Data Router II

 
Ayman Altounji
Valued Contributor

MSL 5026 libraries connected in a Master / Slave relationship into a Modular Data Router II

I've got a configuration where I have (4) Compaq servers that are connected to (2) 16 port 1 Gigabyte fibre channel switches, connecting to an MA 8000 and a Modular Data Router II that's bridging for (2) MSL 5026 libraries (4 SDLT Drives, 52 slots) connected together in a Master / Slave relationship. In addition, each server has (2) fibre HBAs (the OEM'd Emulex 8000) running the latest version of Secure Path.

Note: We're also running the latest driver for the SDLT drives within the Windows 2000 OS.

The problem that I'm experiencing is that in conducting backup or restore operations there will be a period of time of which a tape device will disappear from all of the SAN attached servers, thus causing backups or restores to fail for that particular device.

This has been an ongoing troubleshooting effort for almost 4 months now and we have the latest HBA drivers / firmware, the latest fibre switch firmware, the latest firmware on the MDR and the latest firmware on the robotics for the MSL 5026 and all of the drives contained within.

Note: The removable storage service within Windows 2000 has been disabled.

In working with Compaq support diligently, we're still no closer to the resolution on this. There's no specific amount in time when these devices drop off or time of day, we've just had to hope for the best in changing one variable at a time to see if this fixes our issue or not. In troubleshooting, we've replaced the I/O modules and the indicated SDLT drives for the MSL 5026 themselves, the whole MDR and all components within, as well as have tried breaking the libraries apart to see if this issue persist in a stand alone environment and to our discovery it does.

It should also be noted that when this happens, we get the errors in the event log making reference to KGPSA1 or 2 stating device timeout when these tape drives appear to drop offline. In addition, when this happens, I can bring up the TSMC and when it does a SCSI probe, it shows the MSL 5026 library with (2) drives underneath it, followed by a drive alone all by itself outside of the hierarchy and lastly a drive missing, which is the one that had dropped off or disappeared.

Note: Normally in the TSMC when things are setup and running correctly, we'll see the MSL 5026 library with (4) drives underneath that within the same hierarchy. In addition, I can use the "inquire -l" command from Legato to go out and do a low level SCSI probe like the TSMC does and it reflects the same display that the TSMC does with when we have errors versus a normal working state.

When this happens we can get things back in order by going into device manager for all of the servers, removing all entries for the medium library changer, MDR, and each of the SDLT drives themselves and then force a rescan and then all the drives will reappear again like they were. If we then follow up with a backup or restore operation, with whatever the period of time is (it varies) a device will eventually drop off.

I've done numerous Compaq SAN installations (my organization does contracted SIS installs) with EBS packages from all of the major players out there and have never come across an issue like this.

I know at this point this is a hardware and not a "Legato" issue due to the fact of what TSMC shows me for when devices drop off - to further add, Legato or any EBS for that matter will only see what the OS sees at any given time.

The last thing we're going to swap is the SCSI cables that go to the library, we did this initially but we're going to try using shorter ones to see if this will get us any further or not.

Is there anybody out there running MSL 5026's on an MDR or MDR II that is having these same kinds of issues or is there anything that can be thought of that might be able to assist me with this?

Thanks in advance for any insight or guidance that you can provide.

Sincerely,

Moakita02
1 REPLY 1
Ayman Altounji
Valued Contributor

Re: MSL 5026 libraries connected in a Master / Slave relationship into a Modular Data Router II

Hi, I have seen something simmilar to this. First, look at all the servers. Are the MSL devices visible via control panel>scsi controllers>emulex card.

If all the servers can see the MSL units, do you have a dedicated backup server, with agents installed on the other servers?

If you do - the default config of the MDR allows everything to see everything - ie. all servers can see all tape devices.

What you have to do is to look at the wold wide node address of the emulex card on the backup server. set up an alias on the MDR that corresponds with the node address. then close the MDR ( I cant remember the command - its something like setdefault model closed) then you have to add the MSL & DLT Drives to the alias. Then clear the scsi map from the MDR.

This basically makes sure that no other servers can see the MSL & only the backup server can. ( you have to reboot the other servers to make sure they cant see the MSL after the config of the MDR has changed)

This solution corrected an issue with a major client of mine. I hope its the possible solution that you need. Apologies if some commands are wrong, I sugges you pull the PDF down from Compaq 'getting started with the Compaq MDR' or the users setup guides. all the correct commands are there. Good luck.