MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Event id 9 (kgpsa timeout) causes W2K to crash

 
BR765533
Advisor

Event id 9 (kgpsa timeout) causes W2K to crash

Hi,
I've got an EVA3000 based SAN enviroment, with 7 servers connected with FC2101, and a MSL5000 for SAN based backups. I'm using Data Protector 5.10. I'm getting on all servers event id 9 errors in event viewer regularly during the day and the rate increases during backup sessions. The backup sessions fail most of the time reporting device I/O errors.

I've got zoning configured, creating a backup zone and a data (eva zone). I've tried with kgpsa driver version 5-4.82a14 and 16 with the same results. I've confirmed the RESET_TPRLO is set to one.
To increase the drama of the situation i've got a couple of servers (the most critical ones) that keep crashing (blue screens) immediatly after a series of event id 9 errors.


Thanks

Ricardo Sousa
17 REPLIES 17
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Someone told me that HP recommends that the RESET_TPRLO setting should be set to 2 when using version 14 and higher of the driver for SAN based backups. Can someone from HP confirm this? What is the correct setting? Will this setting help eliminate the error?
I've read that this can be because someone on the SAN issuing reset commands or polling devices. I think that the zones I've configured would obliviate this problem.
How can assert if this is happening in my SAN?

Thanks
Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
please check parameter spelling: it should be 'ResetTPRLO'. Set it to 2. Also check EmulexOption value - try default 0xAA00 and 0xBA00.
Please log into switch to which library is connected to and run 'porterrshow' and 'switchshow' commands. Are everyone in fabric (F-ports)? Are there any excessive errors (k's, m's) in porterrshow? Which firmware library has?
Eugeny
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

The library firmware is 4.14.~
The porterrshow and the switchshow are attached.
Setting the EmulexOption and ResetTPRLO (now correctly written) values you sugested, will do exactly what? I've read in customer advisory that for driver version 9 and above, the setting should be 1. They also say that in version 14 and above, the default setting is 1, which I confirm.
If set this parameters, is there any chance of losing connection to the Disk subsystem (EVA300)? I'm asking because this is a production enviroment, a critical one.

Obviously, I'm gonna try it on some less critical machines first, but I would you like to know, so I can be prepared for the worse, just in case.
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
I understand and then I would suggest you to open a case with HP and have whole SAN and all devices analyzed. I do not think it's good idea to play with productive environment without being sure where problem is. Although if you have test SAN you can try playing on it
Eugeny
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

have you analized the files I attached? You didn't answer my question regarding, in would the change in those parameters would affect.
Regarding the EmulexOption, I found that some machines have set 0xAA00 and others 0xBA00. Should I try with one, watch for results if any, then try the other?

I set the values you suggested (2 and 0xAA00) on three machines, one of them the backup server. Afterwards I tested the enviroment starting a backup sessions on one of those servers. Although the backup went well, I still got a KGPSA timeout error. For testing sake, would this be of any help? or should I set the parameters on all kgpsa's?

Thank you for interess and replies.

Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Yes, set these options at all the SAN servers because one can affect others. Change 0xAA00 to 0xBA00 and try again. Remember that after you'll change driver parameters reboot is required.
Concerning logs: many enc out errors, but no crc and other errors (in columns prior 'enc out'). You need to run backup and then compare LLI numbers to see if they're increasing
Eugeny
Jeff Drummond
Occasional Visitor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

ResetTPRLO must be set to 2 when using the a14 or a16 driver AND have EBS attached to the SAN per HP Support.

Thanks!!
Jeff
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Hi,

I've set the values as sugested, on all servers, rebooting them afterwards. I'm still getting the kgpsa timeout errors, and subsequently, device i/o errors on the backups.
Do you have any other sugestions?
Did you ever get this kid of error? Were you able to solve it?
I noticed from your profile, that you are a HP employee, working in HP Competence Center. That is in Bobligen, right? I was there last July, on a pre-sales training.
You have lots of equipment there for testing. Has this situation been tested and solved before?

One another thing: you told me to set the resetTPRLO to 2. Wasn't that information suposed to be available, or even suplied via a Customer advisory?

Hoping you have some more sugestions, because I'm running out of ideas of things to try, and HP Portugal hasn't been able (yet) to provide me ANY sugestions, only quetions.

Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
I'm trying to do my best helping you, but unfortunately information provided is insufficient. There're a lot of things need to be checked: EVA firmware, settings (are LUNs presented to hosts set to windows 2000 mode/behavior), switch firmware, FC HBAs firmware, MSL firmware etc.
If local support center asking questions please answer them ASAP allowing them to troubleshoot fast and effective. You can ask locals to elevate and involve competency center
Eugeny
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
here's the number to call
808 208 113 option 3 (local call)
try it, or email 'Suporte.Portugal@hp.com'
Eugeny
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Hi Eugeny,

I've talked with the support to elevate the situation to the competency center, which I was told had already been done. The situation is being evaluated by the Ireland Competency Center.
In case you have time, I'm attaching all the info I have at the moment, and I post whatever else I remember, will yet gather or info you might deem necessary, so you can take a look at it.

Thanks for all your help and availability.

Ricardo Sousa
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Eugeny,

You will find in the attached file some more info, that i've consolidated.

If you think you may need more data, please ask.

Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

My colleague who is more expert in your equipment than I am is dealing with it.
I've took a quick look and feel that you've upgraded drivers, but did not upgrade FCA2101 firmware and bootware - why? In addition what I see is that ID9 events are generated only for segment to which MSL is connected...
Eugeny
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Hi,

I conclude from your remarks that you have spoken with your colleagues in charge of this situation. Have you shared the case? Have you reached any new conclusions or theories aout the problem?
Regarding your other remarks: no I haven't upgraded the drivers. From the start the HBA were installed with kgpsa driver version 5-4.82a14. I've upgraded only a couple of servers with kgpsa driver version 5-4.82a16, and that was on a early stage of my attempts to solve the situation, as have mention it in the postings. I didn't upgrade the firmware because I read the release notes, and there was nothing refering to a possible solution to my problem, and because on other threads I've read, dated much earlier than my that particular version of firmware wasn't yet released, although i didn't rule out that possibility, I simply haven't come to that yet. Do you think that may be the solution?

And yes you are right, the timeouts only occur on the MSL Segment. I thought that had mention it before. That a fact I noticed from the start of the problem. More, the situation only started after the arrival and subsequent connection of the MSL to the SAN.

To happen to have any new thought or idea to share?

Again I would like to thank you for your help, time and availability.

Ricardo Sousa
CA1034283
Occasional Visitor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

I have msa 1000 with 3-dl360-g3 and 1-sp750 workstation with fca2101's. First warning is don't upgrade the firmware.... I upgraded and had to replace all hba's to get backup to run again, using Backup exec v9 and tried v8.6. I tried all the setting changes and havn't seen much change. I also tried zoning vs no zoning and seen very little difference. I've call support 7 or 8 times over 2 months with some help. The latest suggestion was to set the cache to 0% read and 100% write(10/9/2003). Support blamed sp3 on sql as the reason for that. My 3 servers average a blue screen every 3 days. Havn't had server crash since changed the setting 4 days ago new record.
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

My problem seem to have disappeared. The only changes made so far included the following:
1. Disable Fibre Information Agents in Insight Manager
2. The port 0 on both FC Switches showed huge amounts of errors. The SAN Management Appliance was/is connected on that port. Since the FC HBA for SMA are 1Gb, the default value for that port (negotiate) was changed to a fixed 1Gb.

It´s not absolutetly clear which one of the changes did the trick, but the fact is the errors are gone, and my customer is now running a stable SAN enviroment.
There is still some questions in the air, and I know for a fact that HP support is looking into them, but those are minor "problems" and fine tune issues.

I'll post any evolutions as they arise.

I would like to thank HP support and Eugeny in particular for all help in troubleshooting and solving this issue.
Ricardo Sou
CA1044853
Occasional Visitor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Hello all together,

we did have the same situation. We have a MSA1000 with 8Port Borcade Switches, Securpath 3.0B for Netware and 4.0C for Windwos. We use FCAL2101 for 4 Windows Servers, where we used FCAL2112 NetWare

We expirienced the exact same situation, and our solution was, to change all SwitchPorts from Negotation "N" to fix "2G" whitch solved the whole Problem.

About the Other Settings, found in this case, we have:

Driver Version: 5-4.82a16
BIOS Emulex:
RESET_TPRLO: 2
EmulexOption: 0xBA00

(All the above was set by default, no changes from our side)

Thanks for all the usefull Information from your side

Oliver