StoreEver Tape Storage
1748123 Members
3341 Online
108758 Solutions
New Discussion юеВ

Re: SAN Backup Fails with I/O Error

 
SeanDArmstrong
Advisor

SAN Backup Fails with I/O Error

I have a HP MSA1000 SAN (using lates firmware) with 3 x Windows 2000 SP4 servers attached in a dual swith Secure Path environment.
The servers are all HP DL servers (again latest firmware) using 2 x FCA2101 Fibe adapters (latest firmware).

The MSA100 has 2 x 6 Port switches

Attached to the MSA1000 is an MSL5026SL Tape library (firmware 4.23 - SCSI ID 0 - LUN 1)
In the library are 2 x SDLT drives (firmware V075 SCSI ID's Target 0 - LUN2 and Target 0 - LUN3)

The library is connected to a HP NSR1200 (Firmware 530b) configured with static mapping.

Backup Exec 8.6 Build 3878 is used on two of the servers to backup SAN and remote server data.
Both servers start their backups at 19:00 (using a drive each).
Independantly the backups will fail with a Device I/O error, somtime during their backup (could be 100 GB through, or sometimes 400GB through).
Sometimes it even backs up everything , but more and more they are failing.

At the time of the failure the following errors get logged in the System log on the servers

Event Type: Error
Event Source: CPQKGPSA
Event Category: None
Event ID: 9
Date: 18/11/2004
Time: 23:30:58
User: N/A
Computer: MANCHEX1
Description:
The device, \Device\Scsi\CPQKGPSA1, did not respond within the timeout period.
Data:
0000: 0f 00 10 00 01 00 6a 00 ......j.
0008: 00 00 00 00 09 00 04 c0 .......├А
0010: 01 01 00 50 00 00 00 00 ...P....
0018: e5 99 00 00 00 00 00 00 ├е ......
0020: 00 00 00 00 00 00 00 00 ........
0028: 01 00 00 00 00 00 00 00 ........
0030: 03 00 00 00 07 00 00 00 ........


Event Type: Warning
Event Source: Disk
Event Category: None
Event ID: 51
Date: 18/11/2004
Time: 23:31:02
User: N/A
Computer: MANCHEX1
Description:
An error was detected on device \Device\Harddisk2\DR6 during a paging operation.
Data:
0000: 04 00 22 00 01 00 72 00 .."...r.
0008: 00 00 00 00 33 00 04 80 ....3..
0010: 2d 01 00 00 00 00 00 00 -.......
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 5e fb 3c 04 00 00 00 .^├╗<....
0028: 01 00 00 00 01 00 00 00 ........
0030: 02 00 00 00 2a 00 00 00 ....*...
0038: 02 84 00 00 00 29 06 00 . ...)..
0040: 2a 48 02 1e 7d af 00 00 *H..}┬п..
0048: 08 00 ..


Then the backup fails. The following is taken from the backup log:-


======================================================================
Job Operation - Verify
======================================================================

Verify of "\\MANCHDB1\C$ "
Backup set #1 on storage media #1
Backup set description: "Exchange"
Verify started on 18/11/2004 at 23:21:49 .

Storage device "COMPAQ 2" reported an error on a request to read data from media.

Error reported:
The request could not be performed because of an I/O device error. ^ ^ ^ ^ ^
Verify completed on 18/11/2004 at 23:37:03 .
Verified 32329 files in 2975 directories.
0 files were different.
Processed 2,691,465,633 bytes in 15 minutes and 14 seconds.
Throughput rate: 168.5 MB/min
----------------------------------------------------------------------

======================================================================
Job ended: 18 November 2004 at 23:40:19
Job completion status: Failed
======================================

This happened to the other server exactly the same last night, but at 01:30.
I have seen this with so many customers it is becoming a joke. I have upgraded everything to the latest firmware, even had the drives replaced, replaced the fibre cables
I have disabled the Insight Manager Fibre Agent and disabled the removable media service on all three SAN servers.

Thanks in anticipation.
Sean Armstrong
MCSE, Master ASE StorageW
14 REPLIES 14
Chris Watson
Super Advisor

Re: SAN Backup Fails with I/O Error

Two things, although you probably know the first;

1. Are any SAN attached devices being rebooted during this period?

2. Have your checked if antivirus is impacting on the backup?
Moving along nicely
Claudio Ruzza_1
Valued Contributor

Re: SAN Backup Fails with I/O Error

Disable Removable storage Service in ALL your Windows 2000 servers, if not already done.
Check the version of HP Management agents. If you have version 7.00 or less, do the following: in HP Management agents, remove Fibre Array Information agent in ALL your Windows 2000 servers. Start--> control panel-->HP management Agents. Then look for the fibre array information agent in the left pane and move it to the right pane.
A restart of agents will be required, They will restart without requiring a server reboot.
Your library, drives and NSR firmware is obsolete. Consider upgrading it.

Good luck
Claudio
SeanDArmstrong
Advisor

Re: SAN Backup Fails with I/O Error

None of the three servers are restarted during the night. I have already disabled the Insight manager Fibre agent, as per the original text.

I will try updating the firmware with L&TT now.

Does anyone have good/bad experiances using Backup Exec 8.6 with either the Veritas drivers, or HP drivers ?
Claudio Ruzza_1
Valued Contributor

Re: SAN Backup Fails with I/O Error

Sean,
I apologize. I missed the latest part of your original message.

I have bad experience with backup Exec 8.6 at all.
Backup Exec 9.1 build4691 Service pack 1 and latest hotfixes is much better, specially about device management.

However your problem seems more related to hardware.
Greg Carlson
Honored Contributor

Re: SAN Backup Fails with I/O Error

Sean,

Is the MSL seperated from the SAN on a seperate fabric? I've seen people have backup failures and hangs if the MSL and MSA are on the same fabric.

Ciao,
Greg
Lets Roll!
SeanDArmstrong
Advisor

Re: SAN Backup Fails with I/O Error

No, the NSR1200 Router is attached to the Active embedded MSA1000 Switch. No zoning has been undertaken on the switch. Would this make any difference, as any seperate zone would also need to contain the library and the servers anyway ?

I have upgraded the firmware on the MSL Library and drives. I don't want to change too much at once, as the customer has 6 MSA1000 SAN's in identical configurations, ALL exhibiting the same faults, and I need to know exactly what the answer is.

upgrading to 9.1 is not an option just yet as between the 6 sites over 100 servers are backed up, so we would need to upgrade all the Remote agents, SQL agents etc BIG ├В┬г├В┬г├В┬г
SeanDArmstrong
Advisor

Re: SAN Backup Fails with I/O Error

No, the NSR1200 Router is attached to the Active embedded MSA1000 Switch. No zoning has been undertaken on the switch. Would this make any difference, as any seperate zone would also need to contain the library and the servers anyway ?

I have upgraded the firmware on the MSL Library and drives. I don't want to change too much at once, as the customer has 6 MSA1000 SAN's in identical configurations, ALL exhibiting the same faults, and I need to know exactly what the answer is.

upgrading to 9.1 is not an option just yet as between the 6 sites over 100 servers are backed up, so we would need to upgrade all the Remote agents, SQL agents etc BIG ├В┬г├В┬г├В┬г
SeanDArmstrong
Advisor

Re: SAN Backup Fails with I/O Error

No, the NSR1200 Router is attached to the Active embedded MSA1000 Switch. No zoning has been undertaken on the switch. Would this make any difference, as any seperate zone would also need to contain the library and the servers anyway ?

I have upgraded the firmware on the MSL Library and drives. I don't want to change too much at once, as the customer has 6 MSA1000 SAN's in identical configurations, ALL exhibiting the same faults, and I need to know exactly what the answer is.

upgrading to 9.1 is not an option just yet as between the 6 sites over 100 servers are backed up, so we would need to upgrade all the Remote agents, SQL agents etc BIG ├В┬г├В┬г├В┬г
Greg Carlson
Honored Contributor

Re: SAN Backup Fails with I/O Error

Sean,

Run LTT and post its output.
www.hp.com/support/tapetools

Also, what fw is the 2/6 switch currently at? If it is at 1.00, then we need to upgrade it

The latest (last) bios level for the fabric switch 6 is MSA SAN Switch 6 firmware v101G12

You can update the fabric switch 6 from the MSA 2.38 fw CD
http://h18006.www1.hp.com/products/storageworks/softwaredrivers/msa1000/v238.html

Additionally, you stated you updated all drivers and fw. The FCA HBA drivers need to be updated from the MSA software support CD version 6.51.02 (I noticed your Event ID 9's)
Make sure the driver is installed from the autoplay menu from this cd. Create the latest one from here:
http://h18006.www1.hp.com/products/storageworks/softwaredrivers/msa1000/msa1k_software.html#cd

Finally, Run the Online ADU v 2.40.6.0 and lets look to see if you have any hdd issues with your Event ID 51 errors.
Download it on the fly here (installs and runs w/out a reboot and takes less than 5 minutes)
http://h18007.www1.hp.com/support/files/server/us/download/20061.html
*Post the ADU report here as well*

Ciao,
Greg
Lets Roll!