StoreEver Tape Storage
1751921 Members
4600 Online
108783 Solutions
New Discussion юеВ

Re: "Fibre Array Information" active/Inactive

 
SOLVED
Go to solution
Ross Humphryes
Frequent Advisor

"Fibre Array Information" active/Inactive

Hi Folks

We have HP SAN equiptment and Fibre attached (via NSR)MSL5025SL tape Libraries.

Many W2k & W2k3 servers are attached to the SAN via KGPSA-xx FC HBA's and these systems can all see the tape devices.

We backup to tape using NetWorker but we get quite a lot (too many in my mind) of the following types of errors:

Source: HPQDLT EventID: 7
The device, \Device\Tape0, has a bad block.

Source: CPQKGPSA EventID: 9
The device, \Device\Scsi\CPQKGPSA1, did not respond within the timeout period.

We can't be unlucky enough to be getting so many faulty tapes. We buy new tapes and these are generally Maxell SDLT1 tapes, something must be wrong in our setup.

Removable Storage Manager is disabled because Legato NetWorker has its own Library controller and all the drivers appear to be up to date.

I was wondering if the Compaq/HP Management Agent "Fibre Array Information" needs to be disabled (Inactive). Could somebody explain if the Fibre Array Information agent could be causing the problem and maybe explain why?

Thanks very much
Ross
Get in my belly
5 REPLIES 5
Scott McIntosh_2
Honored Contributor

Re: "Fibre Array Information" active/Inactive

Fibre Array module of CIM is well known to create SAN backup problems. Should be disabled on all systems on the SAN. Supposedly, one day they should be fixing it to not flood the SAN with polling of tape devices. I don't know that has happened yet.

Thanks,
Scott
David Ruska
Honored Contributor
Solution

Re: "Fibre Array Information" active/Inactive

> I was wondering if the Compaq/HP Management Agent "Fibre Array Information" needs to be disabled (Inactive). Could somebody explain if the Fibre Array Information agent could be causing the problem and maybe explain why?

Yes, as mentioned above this is a known issue. Please review the EBS design guide (http://h20000.www2.hp.com/bizsupport/CoreRedirect.jsp?targetPage=http%3A%2F%2Fh200005.www2.hp.com%2Fbc%2Fdocs%2Fsupport%2FSupportManual%2Fc00190922%2Fc00190922.pdf). Look at page 96 under "Known issues".

> We can't be unlucky enough to be getting so many faulty tapes. We buy new tapes and these are generally Maxell SDLT1 tapes, something must be wrong in our setup.

Are they errors happening in the same drive?

If so, perhaps you have one drive with error rate issues. If it's happening in multiple drives, you may have a drive that is damaging tapes. Have you contacted HP support regarding this issue?

You can try to diagnose this yourself using the SDLT drive assessment test in HP Library & Tape Tools (www.hp.com/support/tapetools).
The journey IS the reward.
Stuart Whitby
Trusted Contributor

Re: "Fibre Array Information" active/Inactive

One follow-up to what looks like a full answer from David - is the device not responding within the timeout period on loads or unloads only? If so, check your load/unload/eject sleep settings within NetWorker. These are 5 seconds as default. I know that for DLT, StorageTek recommended 4 minutes for a load sleep to allow the drive to pick up the tape leader, verify block sizing on the tape and .... whatever else it needs to do before control is handed over to the application. In my (pretty extensive) experience, 30 seconds has always been plenty, btw.

The bad block error doesn't look good though. This is coming straight from the tape drive, and is something you need to sort out to have a good degree of confidence that recoveries will work. Is the drive cleaned regularly? Is it in an environment with poor air quality? Particulates in the guts of a tape drive aren't great things to have. Also, if you've cleaned it too often, the heads will get worn and may be causing this issue. It's also possible that it's just not happy with Maxell tapes. Try Fuji and see if they have the same problem. I've seen an issue where a library was perfectly happy using Maxell tapes, but load a brand new Fuji and the cleaning light would come on almost immediately - could be a similar sort of problem.
A sysadmin should never cross his fingers in the hope commands will work. Makes for a lot of mistakes while typing.
David Ruska
Honored Contributor

Re: "Fibre Array Information" active/Inactive

Yes, good advice from Stuart.

For load times, we've measured DLT8K drives over 8 minutes when they needed to calibration retries, and seen unloads take up to 15 minutes when they needed to retry tape directory updates.

Also good advice on contamination potentially causing write errors. If the head gets contamination, using new tapes probably won't help. Even using cleaning tapes doesn't do much good because they tend to just move the contamination around in the drive and not really remove it. The DLT cleaning tapes are not much coarser than unpolished new media. They are great for removing oxide/binder staining on the head, but not very good for larger particle contamination (like carpet fibers), and also not good for sticky contamination like smoke or printer ink.

What type of environment are the drives being used in? Run the LTT diagnostic and collect a support ticket for each drive. Post it here and we can review the logs and device analysis results.
The journey IS the reward.
Ross Humphryes
Frequent Advisor

Re: "Fibre Array Information" active/Inactive

I would like to say thanks so far to all those that have replied. Your responses are all really helpful and free :-) so I hope I can help in return sometime in the future.

Scott
Our Insight Managements Agent software is a mix of 5.50.0.0 to 7.10.0.0 but apparently this issue is addressed in version 7.20.0.0 and later (cheers David). We plan to update to version 7.30.0.0 as a first step and then see what we get so that if a next step is needed we can action one.

David
I would rather update the agents first and then if necessary disable the "Fibre Channel Information" agent or try that HPUtility if the problem persists.

We have three MSL5026 Libraries on this SAN (one charcoal SDLT160/320 & two Beige SDLT 110/220), that's 5 tape devices and the problem happens on different devices (so not the same each time). I have not contacted HP at this time in regards to the possibility of a faulty drive because I am not sure if I have one but I take you point and will investigate this once I have re-assessed the situation with regards to the agents.

Stuart
I have looked at the Library timeouts in NetWorker (we use 7.1.3) and have adjusted these to 30secs but it doesn't actually change the situation, we still get the EventID 9 from CPQKGPSA "The device, \Device\Scsi\CPQKGPSA2, did not respond within the timeout period." Interestingly though, we don't get these errors from all agents, only some but I think once we have standardised on the version of the agent and which are active/inactive we should get something more consistent.

You're right, I don't like the bad block errors either and I am not sure to be honest if these are really faulty tapes or malfunctioning drives. The drives are not cleaned regularly, there is a cleaning tape in the Library which is automatically used when the hardware determines that cleaning is required, I believe this is the correct way to do it. All equipment is in a proper cooled and ventilated server room. I take your point about changing brands and will take this into consideration.

David
Thanks for the info. I'll keep you all posted.

Thanks again everybody.
The Agent software is being updated this week so watch this space.
Get in my belly