StoreEver Tape Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

I/O errors on DLT 7000 drives

SOLVED
Go to solution
John de Villiers
Frequent Advisor

I/O errors on DLT 7000 drives

We have a storagetek with 8 DLT 7000 drives. These drives are connected to a SAN via two FC-SCSI bridges. The SAN swith is a Brocade br16.

These all connect to 5 different HP-UX hosts. Omniback 4.0 is used for the backups.

We have had several drive replaced due to consistent I/O error during the backup. Most of the time we wait to see if it happens again on the same drive, and if it we log a call to have it looked at. HP then nomally replaces the drive.

What i find very suspicious is that most of the i/o errors happen on drive 1, 2 & 8.
Besides an actually faulty/overworked drive, what else can i look at to try and find this problem ?
12 REPLIES
Vincent Farrugia
Honored Contributor

Re: I/O errors on DLT 7000 drives

Hello,

Are drives 1,2 and 8 used more than the others? If so, that might explain your problem.

HTH,
Vince
Tape Drives RULE!!!
Duncan Edmonstone
Honored Contributor

Re: I/O errors on DLT 7000 drives

1. Has the EMS dm_stape process been disabled on all five HPUX hosts?
2. Has the kernel parameter st_ats_enabled been set to 0?
3. Have all 'rewind' type tape device files been removed from /dev/rmt?
4. Have you set a consistent lock name for the tape devices? ( an output from omnidownload would be useful to determine this)
5. Have you checked that all your SAN connections are in fabric-logon mode rather than quickloop?

HTH

Duncan

HTH

Duncan
John de Villiers
Frequent Advisor

Re: I/O errors on DLT 7000 drives

Vincent : All the tapes have similar usage. I tried to spread the tappe allocation in omniback as much as possible. None of the drives are idle between 18:00 and 06:00 the next morning. Theyre all running doing a total of 14 online SAP/Oracle backups ( one of which is 1 TB in size )

Duncan : 1 ) where do i check this ?
2 ) yes
3 ) No - is this an issue ?
4 ) Yes - and index numbers
5 ) All SAN connection are Fabric. currently only tape devices occupy the switch, so there arent even any zones. Everything can see all the drives down all the paths. It means i get to see 20 drives instead of 10. Not an issue for me - i split the access to the drives down 2 paths ( 2 FC cards in each host dedicated for tapes ) 1st 4 drives go down one path and next 4 go down another path.
Duncan Edmonstone
Honored Contributor

Re: I/O errors on DLT 7000 drives

1 ) where do i check this ?
Just run ps -ef | grep stape - you want to see no dm_stape processes - if you do find some, then these can cause intermitent IO errors on the tape devices... The process to stop them running is a little convoluted (and also slightly out of date if you have the latest version of EMS installed) - but I have attached what I have used in the past... If this doesn't work HP should have a more up-to-date procedure

3 ) No - is this an issue ?
Yes...basically when certain tools (like Ignite/UX for instance) scan the devices files of rewind tape devices, they can cause the tape to rewind even tho it is in use - this only happens in SAN environments - I knocked up a startup script to always take care of this (use at your own risk of course!) - I'll have to attach it seperately...

HTH

Duncan



HTH

Duncan
Duncan Edmonstone
Honored Contributor
Solution

Re: I/O errors on DLT 7000 drives

Here's the script...

HTH

Duncan

HTH

Duncan
David Ruska
Honored Contributor

Re: I/O errors on DLT 7000 drives

Duncan,
Your EMS disabling instructions mention:

7. Rename the dm_stape binary (optional; makes sure dm_stape will not run at all)
as follows:

You don't want to do this, as dm_stape is also used as a decoder for logtool.

As you mentioned, that procedure is a bit out of date (and we wrote it, so the above error is no fault of yours :-).

Here's the preferred method to resolve the EMS issue:

Workaround #1: Prevent dm_stape polling - set POLLING_INTERVAL configuration to zero.

This is the preferred and simplest workaround to implement. This feature was introduced in the September 2001 HWE (IPR0109) bundle release. Also, in the December 2001 HWE (IPR0112) bundle this configuration is defaulted to zero (but only if the dm_stape.cfg file has not been modified by the customer after initial install).

IMPORTANT NOTE: With HWE releases previous to September 2001, simply setting the polling interval to zero can have negative side effects and require the installation of an Interim (unofficial) Patch.

Set the ???POLL_INTERVAL??? value in the /var/stm/config/tools/monitor/dm_stape.cfg file to 0 to stop the monitor from polling.

Duncan, if you send your email address to ltt_team@hp.com, I can get you an up-to-date procedure.
The journey IS the reward.
John de Villiers
Frequent Advisor

Re: I/O errors on DLT 7000 drives

2 of the 5 Servers in the SAN show this process running. On both of them the polling interval has already been set to 0.

Should the process show up or not ?

The two systems it shows up on are relatively new HP-UX 11.11 ( N4000-55 ) where the rest are older HP-UX 11.0 ( V & K Class ) boxes.
John de Villiers
Frequent Advisor

Re: I/O errors on DLT 7000 drives

Duncan, with regards to your script for removing the auto-rewind devices.

I'd like to remove all the devices and only use the one i created manually.
I use /dev/rmt/D1...10 for my drives so that i can easilly tie a drive to its position in the storage tek.

Some of the device names that get created is the normal /dev/rmt/8mn type and other look like /dev/rmt/C1T1D4BEST - makes for a really confusing list when you ioscan -fnC tape.

i'll do a script that will search for a D1..10 designator for each hardware path and if found it must remove all other device files but the D1..10 one. That way new drives will show up, but ones with a custom name will only have the custon device file.

We dont use anything other than onmiback and tar do access the tapes, and tar is only in very rare occasions. ( like sending a coredump to hp ;-) )

Make sense ?

John
Duncan Edmonstone
Honored Contributor

Re: I/O errors on DLT 7000 drives

John,

With regards to dm_stape still running... I'm not sure as I've never implemented this the way David describes.

Re the script to remove rewind devices - the process you describe sounds fine... You do need to have this running at boot time as otherwise the device files may get re-created by insf. At some sites where I've implemented this, we have also introduced automated checks for rewind tape devices as administrators or CEs may run 'insf -e' without realising the consequences.

David,

Thanks for the update! I knew there was a better way of doing this - I will drop an e-mail for your attention to the address you gave.

Thanks

Duncan


HTH

Duncan
David Ruska
Honored Contributor

Re: I/O errors on DLT 7000 drives


> On both of them the polling interval has already been set to 0.

> Should the process show up or not ?

Yes, dm_stape will still be running. What it WON'T do is send commands to the devices on a regular basis (aka polling) to see if they are OK. This behavior can cause conflicts with the fibre bridges -- especially if a long operation (such as rewind) is in process. The polling commands can timeout on the bridge, and this can cause the HBA to reset the bus, and abort a backup.

With POLLING_INTERVAL set to zero, the monitor will only be involved when tape alerts or abnormal SCSI status is logged by the driver. It will still publish EMS notifications for these errors. If you wish to disable the EMS messages, then you could follow the additional steps in Duncan's procedure.

The journey IS the reward.
Stijn V
Regular Advisor

Re: I/O errors on DLT 7000 drives

I do want to disable EMS for my SAN tape drive as well .... on HP-UX 11.23 but I do not have the following file:
/var/stm/config/tools/monitor/dm_stape.cfg (0 to stop the monitor from polling).

I disabled the EMS monitoring my SAN tape drives like this:

/var/stm/data/tools/monitor/disabled_instances

added:
/storage/events/tapes/SCSI_tape/*
/storage/events/tapes/default/0_1_2_0.2.14.255.0.0.0
/storage/events/tapes/default/0_1_2_0.2.14.255.0.0.1
/storage/events/tapes/default/0_6_1_0.1.14.255.0.0.1
/storage/events/tapes/default/0_6_1_0.1.14.255.0.0.2

But this doesn't help! If I disconnect a the tape drive, I still receive EMS messages about the tape device....
Stijn V
Regular Advisor

Re: I/O errors on DLT 7000 drives

Altough I do have a file dm_stape.cfg under: /var/stm/data/Archive

Should I copy this file towards the location: /var/stm/data/tools/monitor ?