StoreEver Tape Storage
1752398 Members
5696 Online
108788 Solutions
New Discussion юеВ

Re: MSL2024 1 Drive 1840 backup failure

 
SOLVED
Go to solution
CLEB
Valued Contributor

Re: MSL2024 1 Drive 1840 backup failure

Curtis

I made the registry change and this fixed the issue. I had several weeks to a month worth of successful backups.

This has just started to fail again recently.

The only change has been the installation of PSP 8.70

This is the reg key:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Sequential&Ven_HP&Prod_Ultrium_4-SCSI\8&2f5f9346&0&000400\Device Parameters\Storport]
"BusyRetryCount"=dword:000000fa
CLEB
Valued Contributor

Re: MSL2024 1 Drive 1840 backup failure

I've had the HP Insight Storage Agents service disabled and I'm not seeing the issue now.
Curtis Ballard
Honored Contributor

Re: MSL2024 1 Drive 1840 backup failure

If you don't mind operating with the storage agents disabled that is fine.

If you want to turn the storage agents back on you'll want to look and see if installing the PSP caused Windows to decide to create a new registry entry for that drive. Depending on what the PSP installs Windows can create new registry entries on PSP installation.
CLEB
Valued Contributor

Re: MSL2024 1 Drive 1840 backup failure

It's not ideal as I'd like the functionality that the storage agents provide.

I've attached the two reg locations.

There is only one entry for the actual tape drive but there is another entry for the SCSI HBA.
Curtis Ballard
Honored Contributor
Solution

Re: MSL2024 1 Drive 1840 backup failure

Thanks for posting the registry entries. That was very helpful.

That is an unusual configuration as there is an MSA1000 on the same controller as the tape drive. That isn't a recommended configuration and would probably be considered unsupported but it usually can be made to work.

Since the array is on the same controller as the tape drive it is likely that there are some additional delays happening somewhere.

I would start out by increasing the BusyRetryCount quite a bit higher. The 0xfa value was determined to be a good setting for a single tape drive on the controller but having other devices easily could cause it to need to be much higher. I would recommend changing it to 0xffff.

All that parameter does is say how long to wait before giving up when a device is busy. Before Microsoft created the registry entry the wait time was infinite. That was the default when this card was first made. That worked fine most of the time but some devices could get stuck reporting Busy and cause a hang condition. To fix that a timeout was put into the lower layer drivers but whoever picked it chose a value that is frequently too low.

There are a couple of other registry entries that were created at the same time that you might try:

Value - BusyPauseTime
Type - DWORD
Data - 250 Decimal (default)
Range - number of milliseconds

If you change the pause time I would recommend trying 500 or 1000.

Value - QueueFullWaitIoPercentage
Type - DWORD
Data - 25 Decimal (default)
Range - 1 to 100 percentage of time

This value for tape would be better to be more like 50 to 75 but be careful with an array attached as you could impact performance on a heavily loaded array by making this number too high.

It is a real pain messing with these registry entries especially in a production environment but there are too many potential interactions to calculate precisely what you need. For the retry count, too high of a value does nothing except cause a slightly longer delay to reporting errors on a fatal permanent busy condition (really rare). For the pause time you can cause a few milliseconds extra delay in detecting the end of a busy condition which normally means nothing but can add up if the system is really heavily loaded and busy occurs frequently. The queue full parameter won't effect tape performance at all but can effect disk.
CLEB
Valued Contributor

Re: MSL2024 1 Drive 1840 backup failure

Thanks for all the information Curtis.

The MSA1000 is on a FC1242SR 4Gb PCI-e DC HBA FC HBA.

The tape drive is part of the MSL2024 library which is attached via the SC11Xe which is parallel SCSI.

Perhaps there is something wrong with the registry keys.

The MSA1000 is due to be replaced with a spare MSA70 soon.

I'll have a look at making those changes you recommend. I have an exact hardware copy at another site that I can do testing on.
Curtis Ballard
Honored Contributor

Re: MSL2024 1 Drive 1840 backup failure

Interesting that the MSA is on a different HBA. Sorry about that. In the registry output everything was on the same "Bus" but I missed that Windows has a fourth qualifier for card which didn't show in the registry entries.

If you are willing to experiment a bit hearing how it goes for you would be very helpful. I have requested quite a bit of testing of this specific configuration trying to reproduce problems like you have seen with the BusyRetryCount registry entry set to 250 but none of the lab tests have experienced any failures after setting that entry. You obviously have it set so we might be able to learn something new.

Since you indicate that you have a mirror system outside of production where you can run tests I'll mention that if you would like to try it there is a software SCSI analyzer HP uses that has a client you can download and run to take low level traces and possibly catch a SCSI bus trace of a failure. That tool is called BusTRACE and the busTRACE capture client on the following page can take traces that we can analyze back at the lab.

http://bustrace.com/downloads/free_utilities.php

If the failure happens at the physical level (HBA or on the wire) then that tool won't capture it and we have to use a hardware analyzer but frequently it captures everything we need.
CLEB
Valued Contributor

Re: MSL2024 1 Drive 1840 backup failure

Ok I'll grab this tool.

Are there any specific instructions for using it?
CLEB
Valued Contributor

Re: MSL2024 1 Drive 1840 backup failure

I've got the trace output but unfortunately the backup job didn't fail. Which is just typical.

I filtered on only the LSI adapter and MSL G3 and 1840 tape drive.
Curtis Ballard
Honored Contributor

Re: MSL2024 1 Drive 1840 backup failure

Thanks for the update that you have the tracing utility. It looks like you've figured out how to use it. I'll continue to monitor this thread for any updates. I'd like to figure this one out and appreciate your help.