Operating System - HP-UX
1832565 Members
5730 Online
110043 Solutions
New Discussion

Re: Frequent backup media failures

 
skt_skt
Honored Contributor

Frequent backup media failures

Here is what i see host's syslog file


vmunix: (FCP type) response, or its 'Port World-Wide Name' has changed.
FCP type) response, or its 'Port World-Wide Name' has changed.

I have this errors coming in six HP-UX(11.11) servers with models(rp7420/rp8420) and the FC cards used are(HP AB379-60001 4Gb Dual Port PCI/PCI-X && HP A6826-60001 2Gb Dual Port PCI/PCI-X)

i have four diffrent firm ware verions among these FC cards(B.11.11.06,B.11.11.09,B.11.11.10,B.11.11.11[latest].

All hosts are connected to a TAN(tape area network) switch(Cisco MDS-9509 with firmware 3.0(2a)through FC cables.There is tape library with tape drive(storagetek 9940B = R1.35.412 && 9840C = RG.35.405) connected to the same switch.
Here is the flow.
host->FC card->FC cable->switch->fc cable->tape drive.
In OS level i have PHSS_31326(Tachyon TL Fibre Channel Driver Patch) and PHKL_34161(Fibre Channel Mass Storage Patch)

This error message comes once or twice in a week. really suspects the high number of media failurs is due to this ..

Any suggetion would be appreciated
17 REPLIES 17
mavrick
Regular Advisor

Re: Frequent backup media failures

Hi,
One thing can be done latest patch update for your OS for FC
skt_skt
Honored Contributor

Re: Frequent backup media failures

They are the latest
skt_skt
Honored Contributor

Re: Frequent backup media failures

Any suggetion from any one else..
Andrew Young_2
Honored Contributor

Re: Frequent backup media failures

Are there many tape library resets/restarts?

If possible (I am not familiar with the Cisco FC switches) is there anything in the Cisco log files?

Lastly are all servers reporting the WWN address changing at the same time and are they reporting there local address change or the Tape Library Switch. Or for the matter the Fibre Channel switch?

Regards

Andrew Y
Si hoc legere scis, nimis eruditionis habes
skt_skt
Honored Contributor

Re: Frequent backup media failures

Are there many tape library resets/restarts?
No
If possible (I am not familiar with the Cisco FC switches) is there anything in the Cisco log files?
Cisco switch does not log anything when we had the error last time.

Lastly are all servers reporting the WWN address changing at the same time and are they reporting there local address change or the Tape Library Switch. Or for the matter the Fibre Channel switch?

All servers report the error at the same time.There will be two ems alerts coming in syslog. One will be about the N-PORT WWN chnaged (resmon command output)and the other will be once it come back online.It tells you may have to perform an ioscan to idenify all the devices behind it(resmon command output).
Jun 8 11:22:08 adea147p EMS [7142]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/adapters/events/ql_adapter/0_0_14_1_1" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 468058139 -r /adapters/events/ql_adapter/0_0_14_1_1 -n 468058113 -a
Jun 8 11:36:21 adea147p EMS [7142]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/adapters/events/ql_adapter/0_0_14_1_1" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmon/bin/resdata -R 468058139 -r /adapters/events/ql_adapter/0_0_14_1_1 -n 468058114 -a

here only one switch invloved whihc is common and connected to all the servers and the tape library(using FC cables). We have a grouping at tape library side(5 drive, 5 drive, 20 drive in another group)

Regards

Andrew Young_2
Honored Contributor

Re: Frequent backup media failures

Hi.

This is often an indication of either a problem on the fibre link to the tape library, or the fibre card on the library having a fault or the library being reset.

Essentially it means the tape library is disappearing off the FC network. You should be able to replicate the symptoms by power cycling your tape library. Diagnosing where the fault lies is going to be difficult though. It could be a marginal LED on the library fibre card, a cable fault or an elecronic fault within the Cisco switch or the Tape library itself.

Regards

Andrew Y
Si hoc legere scis, nimis eruditionis habes
Bill Hassell
Honored Contributor

Re: Frequent backup media failures

This is a SAN switch problem, especially evident with newer HBA cards in HP-UX. If your SAN switch is configured for public loop or arbitrated loop topology, that is the problem -- it is not supported. Also, SAN switch firmware is often forgotten...it needs updating at least once every year or so, especially when newer HBA cards are connected to old switches.


Bill Hassell, sysadmin
skt_skt
Honored Contributor

Re: Frequent backup media failures

Young,

i can try to simulate the issue as u suggested.

Hassel,

We are already covered interms of HBA firmware(mentioned in the thread). Swicth firmware we have upgraded once. We are upgradign to version 3.1(2a) today.But not sure it can help us.

Show topology does not return me anything.
How else i can see the topology
Bill Hassell
Honored Contributor

Re: Frequent backup media failures

You'll need to talk to your SAN administrator. A switch can be setup with public or private loop or as a Fabric. Each switch has very different commands to display the method used. At the very least, get the Cisco switch manual (download if needed) and it should discuss the topology choices.


Bill Hassell, sysadmin
skt_skt
Honored Contributor

Re: Frequent backup media failures

We uses fabric topology

fc1/30 is up
Port description is STK9940 L1P06D06
Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
Port WWN is 20:1e:00:0d:28:6f:1f:c0
Admin port mode is F
snmp link state traps are enabled
Port mode is F, FCID is 0x640007
Port vsan is 2
Speed is 2 Gbps
Transmit B2B Credit is 3
Receive B2B Credit is 12
Receive data field Size is 2112
Beacon is turned off
5 minutes input rate 2400 bits/sec, 300 bytes/sec, 3 frames/sec
5 minutes output rate 2272 bits/sec, 284 bytes/sec, 6 frames/sec
5502195 frames input, 261554036 bytes
0 discards, 0 errors
0 CRC, 0 unknown class
0 too long, 0 too short
332450524 frames output, 694652822120 bytes
0 discards, 0 errors
0 input OLS, 0 LRR, 0 NOS, 0 loop inits
0 output OLS, 0 LRR, 0 NOS, 0 loop inits
12 receive B2B credit remaining
3 transmit B2B credit remaining
Andrew Young_2
Honored Contributor

Re: Frequent backup media failures

Hi.

If I am reading that report correctly there are no errors on the FC switch or on the Fabric itself.

Therefore the first place I would look is at your Storage Tek Tape Library itself. I am unfortunately not familiar with that unit.

Regards

Andrew Y
Si hoc legere scis, nimis eruditionis habes
skt_skt
Honored Contributor

Re: Frequent backup media failures

I got an artcile stating this

"When a system first establishes a login (PLOGI) session with a target,
an authentication procedure takes place. This authentication ensures
that the system is talking to the correct device, avoiding data
corruption due to user accidentally connecting another device at the same nport_id. This applies to the devices connected to the TL adapter only, Tachyon adapters do not go through the same level of authentication"

Does this aplicable to following dual port cards.is the dual ports goes through the same kind of authentications as TL adapters?

HP AB379-60001 4Gb Dual Port PCI/PCI-X
HP A6826-60001 2Gb Dual Port PCI/PCI-X

Bill Hassell
Honored Contributor

Re: Frequent backup media failures

> Does this aplicable to following dual port cards.is the dual ports goes through the same kind of authentications as TL adapters?
>
> HP AB379-60001 4Gb Dual Port PCI/PCI-X
> HP A6826-60001 2Gb Dual Port PCI/PCI-X

Yes. You will see these authentication errors in syslog messages:

vmunix: 1/0/4/1/0: Device at device id 0x31d26 has disappeared from Name Server GPN_FT
vmunix: (FCP type) response, or its 'Port World-Wide Name' has changed.
vmunix: device id = loop id, for private loop devices
vmunix: device id = nport ID, for fabric/public-loop devices
vmunix: System won't be able to see LUNs behind this port

vmunix: 0/0/4/1/0: Device at device id 0x31d26 is back in Name Server GPN_FT (FCP type)
vmunix: response, and its 'Port World-Wide Name' remains the same as
vmunix: original.
vmunix: device id = loop id, for private loop devices
vmunix: device id = nport ID, for fabric/public-loop devices
vmunix: System will be able to see LUNs behind this port
vmunix: (might need to run 'ioscan' first).

The new 2 port cards do indeed have additional authentication and may report these types of errors where older cards will not.


Bill Hassell, sysadmin
skt_skt
Honored Contributor

Re: Frequent backup media failures

I just noticed these messages are coming where ever i have fcd driver (/dev/fcd0). None of the cards with driver type td reports the same messages.

Hers is the error from fcd driver
Jun 25 11:26:27 adea147p vmunix: 0/0/14/1/1: Device at device id 0x640405 is back in Name Server GPN_FT (FCP type)
Jun 25 11:37:23 adea147p vmunix: 0/0/14/1/1: Device at device id 0x64000a is back in Name Server GPN_FT (FCP type)

From td driver

Jun 22 10:06:53 adeda60p vmunix: 1/2/0/0: Unable to access previously accessed device at nport ID 0x640008.

i assume it is the behaviour of the different drivers. They might be reporting the same errors in two different ways.The error is simulated by powering off one of the tape drive connected to the fcd driver type cards.(HP AB379-60001 4Gb Dual Port PCI/PCI-X Fibre Channel Adapter;HP A6826-60001 2Gb Dual Port PCI/PCI-X Fibre Channel Adapter)

Any one else agree to this point?
Bill Hassell
Honored Contributor

Re: Frequent backup media failures

The error messages are exactly what I see with two different machines talking to the same tape SAN switch. The older Tachyon XL2 cards don't say anything when I unplug a tape drive but the A6826A dual channel cards are constantly monitoring the connections.


Bill Hassell, sysadmin
skt_skt
Honored Contributor

Re: Frequent backup media failures

The error is reported when ever tape drive going offline. dual port card uses fcd driver and other card uses td driver and the way these drivers react is different ans hence we are NOT seeing the same errors on all the server.

fcd driver reports the drive with FCID 0xyyyyyy is disapparead and back online. Thanks to all who ever participated on the discussion
skt_skt
Honored Contributor

Re: Frequent backup media failures

closig with the current informations. Thanks to all.