MSA Storage

Event id 9 (kgpsa timeout) causes W2K to crash

 
BR765533
Advisor

Event id 9 (kgpsa timeout) causes W2K to crash

Hi,
I've got an EVA3000 based SAN enviroment, with 7 servers connected with FC2101, and a MSL5000 for SAN based backups. I'm using Data Protector 5.10. I'm getting on all servers event id 9 errors in event viewer regularly during the day and the rate increases during backup sessions. The backup sessions fail most of the time reporting device I/O errors.

I've got zoning configured, creating a backup zone and a data (eva zone). I've tried with kgpsa driver version 5-4.82a14 and 16 with the same results. I've confirmed the RESET_TPRLO is set to one.
To increase the drama of the situation i've got a couple of servers (the most critical ones) that keep crashing (blue screens) immediatly after a series of event id 9 errors.


Thanks

Ricardo Sousa
17 REPLIES 17
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Someone told me that HP recommends that the RESET_TPRLO setting should be set to 2 when using version 14 and higher of the driver for SAN based backups. Can someone from HP confirm this? What is the correct setting? Will this setting help eliminate the error?
I've read that this can be because someone on the SAN issuing reset commands or polling devices. I think that the zones I've configured would obliviate this problem.
How can assert if this is happening in my SAN?

Thanks
Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
please check parameter spelling: it should be 'ResetTPRLO'. Set it to 2. Also check EmulexOption value - try default 0xAA00 and 0xBA00.
Please log into switch to which library is connected to and run 'porterrshow' and 'switchshow' commands. Are everyone in fabric (F-ports)? Are there any excessive errors (k's, m's) in porterrshow? Which firmware library has?
Eugeny
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

The library firmware is 4.14.~
The porterrshow and the switchshow are attached.
Setting the EmulexOption and ResetTPRLO (now correctly written) values you sugested, will do exactly what? I've read in customer advisory that for driver version 9 and above, the setting should be 1. They also say that in version 14 and above, the default setting is 1, which I confirm.
If set this parameters, is there any chance of losing connection to the Disk subsystem (EVA300)? I'm asking because this is a production enviroment, a critical one.

Obviously, I'm gonna try it on some less critical machines first, but I would you like to know, so I can be prepared for the worse, just in case.
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
I understand and then I would suggest you to open a case with HP and have whole SAN and all devices analyzed. I do not think it's good idea to play with productive environment without being sure where problem is. Although if you have test SAN you can try playing on it
Eugeny
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

have you analized the files I attached? You didn't answer my question regarding, in would the change in those parameters would affect.
Regarding the EmulexOption, I found that some machines have set 0xAA00 and others 0xBA00. Should I try with one, watch for results if any, then try the other?

I set the values you suggested (2 and 0xAA00) on three machines, one of them the backup server. Afterwards I tested the enviroment starting a backup sessions on one of those servers. Although the backup went well, I still got a KGPSA timeout error. For testing sake, would this be of any help? or should I set the parameters on all kgpsa's?

Thank you for interess and replies.

Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Yes, set these options at all the SAN servers because one can affect others. Change 0xAA00 to 0xBA00 and try again. Remember that after you'll change driver parameters reboot is required.
Concerning logs: many enc out errors, but no crc and other errors (in columns prior 'enc out'). You need to run backup and then compare LLI numbers to see if they're increasing
Eugeny
Jeff Drummond
New Member

Re: Event id 9 (kgpsa timeout) causes W2K to crash

ResetTPRLO must be set to 2 when using the a14 or a16 driver AND have EBS attached to the SAN per HP Support.

Thanks!!
Jeff
BR765533
Advisor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Hi,

I've set the values as sugested, on all servers, rebooting them afterwards. I'm still getting the kgpsa timeout errors, and subsequently, device i/o errors on the backups.
Do you have any other sugestions?
Did you ever get this kid of error? Were you able to solve it?
I noticed from your profile, that you are a HP employee, working in HP Competence Center. That is in Bobligen, right? I was there last July, on a pre-sales training.
You have lots of equipment there for testing. Has this situation been tested and solved before?

One another thing: you told me to set the resetTPRLO to 2. Wasn't that information suposed to be available, or even suplied via a Customer advisory?

Hoping you have some more sugestions, because I'm running out of ideas of things to try, and HP Portugal hasn't been able (yet) to provide me ANY sugestions, only quetions.

Ricardo Sousa
Eugeny Brychkov
Honored Contributor

Re: Event id 9 (kgpsa timeout) causes W2K to crash

Ricardo,
I'm trying to do my best helping you, but unfortunately information provided is insufficient. There're a lot of things need to be checked: EVA firmware, settings (are LUNs presented to hosts set to windows 2000 mode/behavior), switch firmware, FC HBAs firmware, MSL firmware etc.
If local support center asking questions please answer them ASAP allowing them to troubleshoot fast and effective. You can ask locals to elevate and involve competency center
Eugeny