Storage Boards Cleanup
To make it easier to find information about HPE Storage products and solutions, we are doing spring cleaning. This includes consolidation of some older boards, and a simpler structure that more accurately reflects how people use HPE Storage.
Tape Libraries and Drives
cancel
Showing results for 
Search instead for 
Did you mean: 

SAN Backup Fails with I/O Error

SAN Backup Fails with I/O Error

I have a HP MSA1000 SAN (using lates firmware) with 3 x Windows 2000 SP4 servers attached in a dual swith Secure Path environment.
The servers are all HP DL servers (again latest firmware) using 2 x FCA2101 Fibe adapters (latest firmware).

The MSA100 has 2 x 6 Port switches

Attached to the MSA1000 is an MSL5026SL Tape library (firmware 4.23 - SCSI ID 0 - LUN 1)
In the library are 2 x SDLT drives (firmware V075 SCSI ID's Target 0 - LUN2 and Target 0 - LUN3)

The library is connected to a HP NSR1200 (Firmware 530b) configured with static mapping.

Backup Exec 8.6 Build 3878 is used on two of the servers to backup SAN and remote server data.
Both servers start their backups at 19:00 (using a drive each).
Independantly the backups will fail with a Device I/O error, somtime during their backup (could be 100 GB through, or sometimes 400GB through).
Sometimes it even backs up everything , but more and more they are failing.

At the time of the failure the following errors get logged in the System log on the servers

Event Type: Error
Event Source: CPQKGPSA
Event Category: None
Event ID: 9
Date: 18/11/2004
Time: 23:30:58
User: N/A
Computer: MANCHEX1
Description:
The device, \Device\Scsi\CPQKGPSA1, did not respond within the timeout period.
Data:
0000: 0f 00 10 00 01 00 6a 00 ......j.
0008: 00 00 00 00 09 00 04 c0 .......À
0010: 01 01 00 50 00 00 00 00 ...P....
0018: e5 99 00 00 00 00 00 00 å ......
0020: 00 00 00 00 00 00 00 00 ........
0028: 01 00 00 00 00 00 00 00 ........
0030: 03 00 00 00 07 00 00 00 ........


Event Type: Warning
Event Source: Disk
Event Category: None
Event ID: 51
Date: 18/11/2004
Time: 23:31:02
User: N/A
Computer: MANCHEX1
Description:
An error was detected on device \Device\Harddisk2\DR6 during a paging operation.
Data:
0000: 04 00 22 00 01 00 72 00 .."...r.
0008: 00 00 00 00 33 00 04 80 ....3..
0010: 2d 01 00 00 00 00 00 00 -.......
0018: 00 00 00 00 00 00 00 00 ........
0020: 00 5e fb 3c 04 00 00 00 .^û<....
0028: 01 00 00 00 01 00 00 00 ........
0030: 02 00 00 00 2a 00 00 00 ....*...
0038: 02 84 00 00 00 29 06 00 . ...)..
0040: 2a 48 02 1e 7d af 00 00 *H..}¯..
0048: 08 00 ..


Then the backup fails. The following is taken from the backup log:-


======================================================================
Job Operation - Verify
======================================================================

Verify of "\\MANCHDB1\C$ "
Backup set #1 on storage media #1
Backup set description: "Exchange"
Verify started on 18/11/2004 at 23:21:49 .

Storage device "COMPAQ 2" reported an error on a request to read data from media.

Error reported:
The request could not be performed because of an I/O device error. ^ ^ ^ ^ ^
Verify completed on 18/11/2004 at 23:37:03 .
Verified 32329 files in 2975 directories.
0 files were different.
Processed 2,691,465,633 bytes in 15 minutes and 14 seconds.
Throughput rate: 168.5 MB/min
----------------------------------------------------------------------

======================================================================
Job ended: 18 November 2004 at 23:40:19
Job completion status: Failed
======================================

This happened to the other server exactly the same last night, but at 01:30.
I have seen this with so many customers it is becoming a joke. I have upgraded everything to the latest firmware, even had the drives replaced, replaced the fibre cables
I have disabled the Insight Manager Fibre Agent and disabled the removable media service on all three SAN servers.

Thanks in anticipation.
Sean Armstrong
MCSE, Master ASE StorageW
14 REPLIES
Chris Watson
Super Advisor

Re: SAN Backup Fails with I/O Error

Two things, although you probably know the first;

1. Are any SAN attached devices being rebooted during this period?

2. Have your checked if antivirus is impacting on the backup?
Moving along nicely
Claudio Ruzza_1
Valued Contributor

Re: SAN Backup Fails with I/O Error

Disable Removable storage Service in ALL your Windows 2000 servers, if not already done.
Check the version of HP Management agents. If you have version 7.00 or less, do the following: in HP Management agents, remove Fibre Array Information agent in ALL your Windows 2000 servers. Start--> control panel-->HP management Agents. Then look for the fibre array information agent in the left pane and move it to the right pane.
A restart of agents will be required, They will restart without requiring a server reboot.
Your library, drives and NSR firmware is obsolete. Consider upgrading it.

Good luck
Claudio

Re: SAN Backup Fails with I/O Error

None of the three servers are restarted during the night. I have already disabled the Insight manager Fibre agent, as per the original text.

I will try updating the firmware with L&TT now.

Does anyone have good/bad experiances using Backup Exec 8.6 with either the Veritas drivers, or HP drivers ?
Claudio Ruzza_1
Valued Contributor

Re: SAN Backup Fails with I/O Error

Sean,
I apologize. I missed the latest part of your original message.

I have bad experience with backup Exec 8.6 at all.
Backup Exec 9.1 build4691 Service pack 1 and latest hotfixes is much better, specially about device management.

However your problem seems more related to hardware.
Greg Carlson
Honored Contributor

Re: SAN Backup Fails with I/O Error

Sean,

Is the MSL seperated from the SAN on a seperate fabric? I've seen people have backup failures and hangs if the MSL and MSA are on the same fabric.

Ciao,
Greg
Lets Roll!

Re: SAN Backup Fails with I/O Error

No, the NSR1200 Router is attached to the Active embedded MSA1000 Switch. No zoning has been undertaken on the switch. Would this make any difference, as any seperate zone would also need to contain the library and the servers anyway ?

I have upgraded the firmware on the MSL Library and drives. I don't want to change too much at once, as the customer has 6 MSA1000 SAN's in identical configurations, ALL exhibiting the same faults, and I need to know exactly what the answer is.

upgrading to 9.1 is not an option just yet as between the 6 sites over 100 servers are backed up, so we would need to upgrade all the Remote agents, SQL agents etc BIG £££

Re: SAN Backup Fails with I/O Error

No, the NSR1200 Router is attached to the Active embedded MSA1000 Switch. No zoning has been undertaken on the switch. Would this make any difference, as any seperate zone would also need to contain the library and the servers anyway ?

I have upgraded the firmware on the MSL Library and drives. I don't want to change too much at once, as the customer has 6 MSA1000 SAN's in identical configurations, ALL exhibiting the same faults, and I need to know exactly what the answer is.

upgrading to 9.1 is not an option just yet as between the 6 sites over 100 servers are backed up, so we would need to upgrade all the Remote agents, SQL agents etc BIG £££

Re: SAN Backup Fails with I/O Error

No, the NSR1200 Router is attached to the Active embedded MSA1000 Switch. No zoning has been undertaken on the switch. Would this make any difference, as any seperate zone would also need to contain the library and the servers anyway ?

I have upgraded the firmware on the MSL Library and drives. I don't want to change too much at once, as the customer has 6 MSA1000 SAN's in identical configurations, ALL exhibiting the same faults, and I need to know exactly what the answer is.

upgrading to 9.1 is not an option just yet as between the 6 sites over 100 servers are backed up, so we would need to upgrade all the Remote agents, SQL agents etc BIG £££
Greg Carlson
Honored Contributor

Re: SAN Backup Fails with I/O Error

Sean,

Run LTT and post its output.
www.hp.com/support/tapetools

Also, what fw is the 2/6 switch currently at? If it is at 1.00, then we need to upgrade it

The latest (last) bios level for the fabric switch 6 is MSA SAN Switch 6 firmware v101G12

You can update the fabric switch 6 from the MSA 2.38 fw CD
http://h18006.www1.hp.com/products/storageworks/softwaredrivers/msa1000/v238.html

Additionally, you stated you updated all drivers and fw. The FCA HBA drivers need to be updated from the MSA software support CD version 6.51.02 (I noticed your Event ID 9's)
Make sure the driver is installed from the autoplay menu from this cd. Create the latest one from here:
http://h18006.www1.hp.com/products/storageworks/softwaredrivers/msa1000/msa1k_software.html#cd

Finally, Run the Online ADU v 2.40.6.0 and lets look to see if you have any hdd issues with your Event ID 51 errors.
Download it on the fly here (installs and runs w/out a reboot and takes less than 5 minutes)
http://h18007.www1.hp.com/support/files/server/us/download/20061.html
*Post the ADU report here as well*

Ciao,
Greg
Lets Roll!

Re: SAN Backup Fails with I/O Error

Greg,
The switch(s) are at firmware v101G12.

The Fibre adapter drivers were updated to version v5.4.82.16 from the MSA1000 Support CD when I upgraded the MSA1000 firmware from 2.32 to v4.32 about 6 months ago (in an attempt to fix this problem!!)

I have installed the ADU and attached the report. Thanks for your help so far.
Greg Carlson
Honored Contributor

Re: SAN Backup Fails with I/O Error

Sean,

The latest driver is now 5-5.10a9 or 10a10 for a large SAN envronment.

Verify here:
http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?pnameOID=315735&locale=en_US&taskId=135&prodTypeId=12169&prodSeriesId=315733&swEnvOID=181

As far as the ADU output, your internal hdds off the 5i controller ID's 0,1 have recent errors as recent as 1-5 days ago.
Port 1 ID 00 has 8 timeout errors
Port 1 ID 01 has 7 hard reads (both could be causing problems for you)
You may have an issue going on with your hdds where your NOS lives. You have some hdds which are showing errors as well in the MSA, however none of the errors are recent, the most recent errors on the MSA show over 180 days ago.

Ciao,
Greg
Lets Roll!
Greg Carlson
Honored Contributor

Re: SAN Backup Fails with I/O Error

Sean,

Also was any work performed on your shelf that is off SCSI B? That is port 4 and is where you had some bus fault errors. The errors there were about 180 days ago. It wouldn't hurt to verify cabling and checking for bent pins.

Ciao,
Greg

p.s. points are appreciated, I like to see which answers helped and which ones didn't.. :)
Lets Roll!
Andrew_168
Regular Advisor

Re: SAN Backup Fails with I/O Error

Sean,

There may be a couple of causes, one could be a SCSI reset ocouring during the backup, this would be rectified by zoning, also I have had a customer who's entire stock of tapes was ruined by the tape drives, the drives caused tape edge damage (This is a known problem), the drives and tapes were replaced by HP under warrenty, I seem to remember the replacements being called silver standard drives.

If this problem has been going on for a period of time I suggest you try another backup program, just in case. Data Protector is a free download with 60 days instant on license, or you could try Arcserve.

Rgds.
david_799
Occasional Visitor

Re: SAN Backup Fails with I/O Error

This is a bit like deja vouis, I have got almost the same setup (except I am using a clusterd veritas solution)and am having the same problems (and more, possibly because of the veritas cluster), went through the updating everything (with veritas and HP on the phone) i.e. firmware, drivers, even creating .udl files on the servers checking odbc connectivity, using filemon etc, etc.
I also have the added benefit of the catalogs not being written, sometimes!
Have even got the mobile phone number of one of the veritas 3rd line support guys, whom I phone at home in the evening or weekends if needs be.
Veritas admitted to me about 5 months ago that there was a problem with their product in this type of environment, they only took it seriously when American clients started to suffer the same problems, they suggested that possibly an upgrade to 9.1 would sort my problems out(I have my doubts), they now reckon it definatly will, 1 issue I have is that now version 10 is or has just been released.
3 weeks ago I did an exchange course and 2 other guys from different organisations had almost the same setup as we seem to (I've got an msl 5030 and bexec 9), they have the same issues.
I have learned a few tricks with the BEUtility.exe that seem to resolve issues for a while, like deleting and re-adding the media servers and re-setting the primary SAN SSO Server.
Hope this helps, it's just you have the exact same errors.