StoreEver Tape Storage
1825724 Members
2907 Online
109687 Solutions
New Discussion

DLT1e and 29160 - Getting SCSI timeout errors

 
BR747105
Occasional Advisor

DLT1e and 29160 - Getting SCSI timeout errors

Hi,

I currently have the following items:

DLT1e drive
Adaptec 29160 SCSI card
BackupExec 8.6
Windows NT4.0 SP6a

I have been experiencing backup failures on a regular basis. The problem is similar to the one Zahid submitted last year.

I have tried the following:

Setting the speed on the SCSI ID the DLT drive is on to its lowest setting (10Mb/sec), disabled Domain Validation and enabled/disabled Autodisconnect with no success.

Local backups run fine but when I run the main backup (over 40Gb) the backup typically fails (like every other day) and SCSI timeout exceeded errors appear in the event log.

I've contacted HP Support in the UK who passed on my support ticket to Arimo Laine, however I haven't had a reply to this so I am resubmitting the details and latest error logs etc with this post.

I hope someone can help as this problem is driving me nuts!

Thanks,

Michael Wlach CCNA,MCSE
11 REPLIES 11
Eugeny Brychkov
Honored Contributor

Re: DLT1e and 29160 - Getting SCSI timeout errors

This drive is UW LVD, and 29160 is a best choice for it. Unfortunately I can not decode bugchecks because have no required docs right now, but I would ask you to check if SCSI bus is correctly terminated. Which cable do you use (post all numbers sticked to the cable here) and which terminator (also, what's written on it)
Eugeny
Lewis Finch
Honored Contributor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Michael,

You say local backups run fine but the main backup fails. I interpret this to mean that you having problems backing up over the network. Is this the problem?
"You can't lead the orchestra without turning your back to the crowd"
BR747105
Occasional Advisor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Thanks for the replies so far.

Here is the information as requested:

The cable is marked as being made by Madison Cable Corp and has "LVD FAST 40 SCSI" printed on it.

The terminator has only "ET1 & ET12" printed on it.

Both items were supplied with the original retail drive.

The DLT drive is on a dedicated 29160 SCSI adapter with no other devices attached. The terminator is fitted to the rear of the external drive as per the manual.

The main backup runs at 20:00 each evening when network activity is minimal, I know that you are implying that there may be some form of network connectivity problem. I have looked into this previously in the event log.

The only entries I get in the event log when a failure occurs comes from the SCSI adapter (e.g. \Device\ScsiPort2 failed to respond within the timeout period) and a Veritas error relating to "\device\tape0".

When looking at BackupExec's log you can see that the drive is reporting errors in writing to media due to semaphore timeouts.

Paul Dubovik
Occasional Advisor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Does the BackupExec 8.6 support backups over network?
meloni
Honored Contributor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Try to run L&TT
have a look at the drive error logs, and check the POH and PCYC,
PCYC mean Power Cycle, check this value if it is consistent with the use you are doing.
If you see that the number of power cycle is increasing faster than expected, then open a hardware call in HP and ask for a replacement, probably your power supply is having problems.
---------------X-------------------X--------------------X----------------X--------------
If everything is under control, you are going too slow (Mario Andretti)
Curtis Ballard
Honored Contributor

Re: DLT1e and 29160 - Getting SCSI timeout errors

You have at least one bad tape and bad tapes can cause the Backup Exec error you are seeing.

The LTT support ticket attached to your original message from 4/25 shows quite a few write errors around power on hour 16120. The grouping implies that there may have only been 1 bad tape and this may not be your only problem but it is a problem.

Other than the write errors the drive logs look clean so you probably need to look at things outside of the drive.

One common problem of semaphore timeout errors in Backup Exec is data starvation during network backups. A DLT1 drive requires a minimum of 200Mb/s Ethernet bandwidth. If you are trying to backup over a network and don't have that bandwidth you need to make some improvements to your network before looking at drive issues.

Network backups to DLT, DLT1, LTO or similar performance drives should be done over gigabit Ethernet and you should use backup software that can spool jobs from multiple clients onto the same drive so that clients that don't have gigE can still be backed up without starving the drive.
BR747105
Occasional Advisor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Thanks for the replies so far.

Your comments have made interesting reading. I'll reply to each point separately...

1. BackupExec does support NAS devices like the Quantum SnapServer (which we use) without too many problems. We used to use ArcServe previously and did not experience failures of this kind to my knowledge. ArcServe has its own problems which is why we opted to change.

2. So far the power cycle has remained steady.

3. Regarding faulty tapes, I have often wondered about this as all the tapes we purchase are HP branded. If I could prove a tape was indeed defective how would I go about getting a replacement in the UK?

4. The NAS device we have (Snapserver 4100 160Gb) only has 10/100Mbit capability. I am not totally convinced that Gigabit is really necessary or would help.

The problem still remains, last Friday a failure occured 9 minutes into the the weekly backup.

In an attempt to try an eliminate certain possiblities out I have carried out the following:

a) Ensured that the Server's NIC, the Snapserver's NIC and the switch all have their speed/duplex set to 100Mbit/Full as Autonegotiate has been known to cause connectivity problems.

b) I have recreated the backup jobs in BackupExec so that the local drives in the server are backed up first and then the network drives (this was the other way around previously, BackupExec automatically sorts the file selections unless you disable it in the registry).

The aim of this is to see if failures still occur and more importantly whether they are of a random nature or occur at a specific moment in time. i.e. if a failure occurs while local drives are being backed up then network connectivity problems can be ruled out.

It will take a couple of days to observe this behaviour. I'll get back to you once I have something to report.

Thanks,

Michael Wlach CCNA,MCSE
BR747105
Occasional Advisor

Re: DLT1e and 29160 - Getting SCSI timeout errors

This is what I observed so far:

2nd May - Backup fails 9 minutes into job due to semaphore timeout. (Network share backed up first)

7th May - Recreated backup jobs in BackupExec so that the local drives are backed up first, then the network drives.

9th May - Failed 23 minutes into job due to semaphore timeout. This time during backing up the local drives.

16th May - Failed 2hrs 23 minutes into job. Error log claims failure due to CRC check failure while writing to drive.

I had noticed a pattern with Wednesday backups so I chose to substitute another tape.

I also chose to substitute the weekly tape for another one which resulted in the CRC error on the 16th.

Changing the backup routine has shown that semaphore timeouts can still occur while backing up local drives. This would suggest that the problem is rooted in the SCSI chain and possibly rule out network connectivity issues.

What's frustrating is the fact that I've almost got daily backups working fine, but still have problems with the weekly and monthly ones.

Any comments appreciated.

Regards,

Michael Wlach CCNA,MCSE
BR747105
Occasional Advisor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Well it's been a while now, I think Curtis is right about the source of the problem. It does appear as though some of the new tapes we purchased recently are responsible for the timeouts I've been getting.

I've proved this by re-using monthly backup tapes done a year ago and so far they're fine.

However, isn't there a guarantee from HP regarding defective media? If so, then how do I go about getting replacement media inthe UK.

Any help would be appreciated as I don't want to give management bad news that we've bought ??500 worth of defective media.

Thanks,

Michael Wlach CCNA, MCSE
CA990875
New Member

Re: DLT1e and 29160 - Getting SCSI timeout errors

Hello. I've been having the same problem as you...with the semaphore timeouts. I just installed an adaptec 29160 to try and rectify the problem, but it did not work. I've also got a DLT drive...still failing. Did you ever find a solution?

Thanks,

pk
BR747105
Occasional Advisor

Re: DLT1e and 29160 - Getting SCSI timeout errors

Hello Patrick,

sorry to hear you're having problems with your drive.

In trying to trace this problem I've done virtually everything I could think of, which you can see from the above posts.

The problems with semaphore timeouts is that they offer very little information as they can occur anywhere in the SCSI chain i.e. SCSI card > SCSI cable > DLT Drive > termination.

I've managed to prove that the problem is indeed media related, it would seem that some batches of HP tapes are not very good quality if you can write to them once succesfully and the next time they fail. I found this out the hard way by retiring the defective media and replacing it with media 6-12 months old. In doing this I've managed to get my daily backups running ok and now just have a problem with the weekly/monthly ones.

If you're experiencing random job failures due to semaphore timeouts then work your way through the SCSI chain and confirm that all is well, and then try changing the media.

Hope that helps,

Michael Wlach CCNA, MCSE