Operating System - HP-UX
1753519 Members
5279 Online
108795 Solutions
New Discussion юеВ

Help with fcmsutil stat output

 
Rick Greene_1
Frequent Advisor

Help with fcmsutil stat output

We're having some LVM issues with drives on one of two channels in a volume group. All LUNs are from an XP24000 disk array, and each LUN is visible on both channels. We have moved the server from one switch to another in our SAN fabric (soft zoning employed), so at this stage it doesn't seem to be likely to be anything but an OS or server hardware issue.

I traced the problems to being on a specific fibre channel card, and see the following entries from the "fcmsutil stat" output on that card:

CE_FCP_FREEZE Request 6
CE_FCP_UNFREEZE Request 6
CE_FCP_UNFREEZE recvd in CS_FCP_FROZEN 6
ERQ/FCP Assists Resumed 6

These numbers are growing in sync, and are the only stats that are growing (aside from packet counts).

Can anyone tell me what these entries mean?
7 REPLIES 7
cnb
Honored Contributor

Re: Help with fcmsutil stat output

Hi,

Check the threads below and look in the EMS Event log entries to see if anything is being generated in there. If so, they will have an EMS event number assisgned that you might be able look up here:

http://docs.hp.com/en/diag/ems/dm_TL_adapter.htm#all


http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1272920621984+28353475&threadId=1161813

http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1272920682753+28353475&threadId=384167

Make sure you have the latest FC drivers, adapter Firmware and O/S patches installed.

Rgds,

cnb
Honored Contributor

Re: Help with fcmsutil stat output

Rick,

This might be a better link for any FCMS events:

http://docs.hp.com/en/diag/ems/dm_FCMS_adapter.htm

Rgds,

Rick Greene_1
Frequent Advisor

Re: Help with fcmsutil stat output

The only two EMS events for this particular card during the time of the issues were ones I believe I generated: one via doing a "fcmsutil reset" and the other when I moved the connection from the card to the SAN to a different cable and switch.

Since starting this posting, the entries listed above have increased to 1269, but we have not seen any further LVM hits, so I'm now suspecting either the cable or port that we were on.

Still would like to understand what the FCP UN/FREEZE and FCP Assists values are about.
chris huys_4
Honored Contributor

Re: Help with fcmsutil stat output

Hi Rick,

There are 2 good reasons why these counters can increase.

1. some bad component between the server and the diskarray.

If this is the case, the "bad rx char" counter in fcmsutil /dev/ stat should also increase, until it reaches 255 and then the clear_stats option, must be used, to reset the counter, so that it can increase again.

2. IO performance issues

Every scsi IO requested, by the server, must be returned within a dedicated amount of time. 10 seconds for the fc hba "scsi" layer.

If this time is exceeded, the fc hba driver will increase the timedout I/O counter, fcmsutil /dev/ devstat all, then for a short time freeze and unfreeze all IO operations on the fc hba card, the CE_FCP_FREEZE/CE_FCP_UNFREEZE counter will increase, fcmsutil /dev/ stat, followed by increasing the ABTS sent counter, indicating a abort IO scsi command was sent to the diskarray fc hba to abort the timed out IO in question. Normally once the IO is aborted, the diskarray will sent then a acknowledge back if the IO abortion was successfull and the "RRQ replies recvd" counter will increase.

#sar -d 1 1000, during the problems will give an indication of IO performance issues.

Also /var/adm/syslog/syslog.log;/var/stm/logs/os/logX.raw will see relevant messages..

and the OS version and hardware model would also be interesting information..

Greetz,
Chris
Rick Greene_1
Frequent Advisor

Re: Help with fcmsutil stat output

well, the issue just happened again, so it is not the cable or port.

/# uname -a
HP-UX corux42 B.11.11 U 9000/800 184404610 unlimited-user license
/# model
9000/800/L3000-8x

Syslog is only showing the LVM related messages (PV link lost, PV link recovered).

SAR showed a huge spike in avserv time for one of the LUNs off this controller, I think I just got lucky that I was watching at the time the problem occurred. It showed 100% busy, avque of .5, r+w/s and blks/s were very low, but the avserv was 48000!

Watching it live now, the %busy is bouncing up and dow, and the avserv is going between 10 and 120 (roughly).

We think we might have a controller going sour, we're going to muck with the alternate paths in LVM to send all the primary I/O down the good channel.
chris huys_4
Honored Contributor

Re: Help with fcmsutil stat output

Hi Rick,

Im betting its a IO performance issue.

The 48000 avserv is of a IO that timedout and got aborted.

Or one of the xp24k frontend fc hba, cant deliver fast enough the IO to all hosts who have luns on the assigned xp24k frontend fc hba, or the fc hba of the L3000 cant take in fast enough the IOs that the xp24k is delivering to it.

Spreading the IO load in both cases, over the different fc hba's of the L3000, like you seem to be doing, might solve the problem.

Greetz,
Chris
Rick Greene_1
Frequent Advisor

Re: Help with fcmsutil stat output

Chris-

we were spread across two fibre channel cards. Things got much better as soon as we took the one card out of the picture.

At this point, we are assuming it is a card that is going bad, and are planning on replacement.