1827894 Members
1590 Online
109969 Solutions
New Discussion

backup problems

 
Ed Volkstorf
Occasional Contributor

backup problems

For the last several weeks our backup process has been a nightmare with backups failing at least 1 out of 3 times. We have an HP SAN infrastructure and use Legato Networker with a Qualstar TLS412360 library. ProLiant Windows 2000/2003 servers attach to the SAN via Emulex or QLogic HBAs. All SAN switches are HP with two fabrics as follows: two 2/16 switches attach to a 4/32 with 25 HP ProLiant servers and an EVA 5000. Also connected to the 4/32 is fibre to another 2/16, approx 150 meters away, with a second EVA5000 and the Qualstar library. The library has what Qualstar calls its FibreChannelOption, left and right bridges, which attach to the 2/16. All of this is on fabric 1. Second fabric is same but without the Qualstar connections. All 2/16 switches have FabOS v3.2.1b, kernel 5.4. The 4/32s have Fab OS 5.1.0, kernel 2.4.19. The 2 Legato servers attach the 2/16 switches, meaning they are an extra hop from the library.

We are periodically seeing these errors in server event logs:

sw2ait errors, different event IDs, 7, 11, 15; description cannot be found

sonyait errors, event ID 15, \Device\Tape1,2,3, etc, not ready for access

PlugPlayManager errors, ID 12, The device 'SONY SDX-700C SCSI Sequential Device’ (SCSI\Sequential&Ven_SONY&Prod_SDX-700C&Rev_0206\5&2fbb0cb6&0&041) disappeared from the system without first being prepared for removal.

We believe one of the Qualstar bridges has an intermittent connection and/or problem with its SCSI connection to its tape drives. Qualstar refuses to replace until it fails outright. Have occasionally seen a "Fibre Channel Communications" error on the Qualstar front panel. They stated that if we do not get the error while disconnected from SAN switch then its not the bridge. Another vendor we work with (who sold us everything but the newer 4/32 switches) believes there is a firmware compatibility issue between 4/32 and 2/16 switches. Another engineer thinks the extra hop from the Legato server 2/16 to 4/32 to 2/16 may be causing problems.

Suggestions and trouble-shooting ideas from anyone would be sincerely appreciated.