- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Emulex HBAs - Tru64UNIX v5.1B-4 - Errors
Operating System - Tru64 Unix
1753725
Members
4910
Online
108799
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-31-2009 06:47 AM
тАО08-31-2009 06:47 AM
Emulex HBAs - Tru64UNIX v5.1B-4 - Errors
Hi,
I am getting a lot of Tru64UNIX v5.1B-4 errors from my KGPSA FC HBAs.
31-Aug-2009 10:36:36 [700] EMX fiber channel adaptor (KGPSA) event
31-Aug-2009 10:36:36 [700] EMX fiber channel adaptor (KGPSA) event
31-Aug-2009 10:36:36 [700] EMX fiber channel adaptor (KGPSA) event
EMX[0]: H/W Error detected - adapter failed to complete io 0xfffffc20fbd88208 (1251707796:297031 vs 1251296703:412209 = 411092884ms) ccb 0xfffffc20fbd883f8 0/56
EMX[0]: H/W Error detected - reset scheduled for failed HBA.
EMX[0]: H/W Error detected - adapter failed to complete io 0xfffffc10fbd7f708 (1251707796:297031 vs 1251296703:412209 = 411092884ms)
EMX[0]: H/W Error detected - reset scheduled for failed HBA.
EMX[0]: H/W Error detected - adapter failed to complete io 0xfffffc20fbd74208 (1251707796:297031 vs 1251296703:412209 = 411092884ms)
ccb 0xfffffc20fbd743f8 0/56
This is a dual port HBA, and I get the messages first on EMX[0] and then EMX [1]. The HBA has not failed. From the system if I do a (# hwmgr -view devices) I can see the FC drives.
If anyon can suggest what might cause this or where I can find further information it would be appreciated.
Thanks
Andrew
I am getting a lot of Tru64UNIX v5.1B-4 errors from my KGPSA FC HBAs.
31-Aug-2009 10:36:36 [700] EMX fiber channel adaptor (KGPSA) event
31-Aug-2009 10:36:36 [700] EMX fiber channel adaptor (KGPSA) event
31-Aug-2009 10:36:36 [700] EMX fiber channel adaptor (KGPSA) event
EMX[0]: H/W Error detected - adapter failed to complete io 0xfffffc20fbd88208 (1251707796:297031 vs 1251296703:412209 = 411092884ms) ccb 0xfffffc20fbd883f8 0/56
EMX[0]: H/W Error detected - reset scheduled for failed HBA.
EMX[0]: H/W Error detected - adapter failed to complete io 0xfffffc10fbd7f708 (1251707796:297031 vs 1251296703:412209 = 411092884ms)
EMX[0]: H/W Error detected - reset scheduled for failed HBA.
EMX[0]: H/W Error detected - adapter failed to complete io 0xfffffc20fbd74208 (1251707796:297031 vs 1251296703:412209 = 411092884ms)
ccb 0xfffffc20fbd743f8 0/56
This is a dual port HBA, and I get the messages first on EMX[0] and then EMX [1]. The HBA has not failed. From the system if I do a (# hwmgr -view devices) I can see the FC drives.
If anyon can suggest what might cause this or where I can find further information it would be appreciated.
Thanks
Andrew
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2009 12:28 PM
тАО09-01-2009 12:28 PM
Re: Emulex HBAs - Tru64UNIX v5.1B-4 - Errors
"The HBA has not failed."
Well, maybe it has.
The specific logic was recently added to the driver in an attempt to detect a common internal hba failure that caused hung systems. It was even a specific systemic failure on one model of the FCA adapters at a specific firmware rev. There's an advisory out on that somewhere. But that's an older issue (~2+ years?).
At each heartbeat the driver looks at the top of each time ordered hash list, if the same io is seen too many times it checks to see if the alloted timeout period has expired. All io with a less than 256 second timeout is timed within the hba itself. If the hba's internal timeout fails as it sometimes does on flaky h/w, the system doesn't get the io back... ever... and this can precip a hang. The "error" you see displayed is the driver complaining that the io timeout within the hardware failed. A soft/firmware reset of the adapter is performed to try and recover rather than just sitting there waiting for a hang. BTW, it is possible to turn off the reset portion of the logic since that is controlled via a different, existing, config in which case the software keeps complaining about the stuck hardware, keeps scheduling the reset but one never happens.
The various sysconfigs should be in the release notes. You can turn them off if you want, and if the h/w really is broken then the system will eventually hang due to io that never completes.
The reset logic forces all io to be retried, so it "heals" the issue but you'd see messages about the hardware being reset which isn't shown.
The one hole in the logic is if you glitch the system clock, then you can make it appear as if the io has been outstanding longer than it really has and cause the logic to falsely trigger... but you need to bump the clock alot.
Well, maybe it has.
The specific logic was recently added to the driver in an attempt to detect a common internal hba failure that caused hung systems. It was even a specific systemic failure on one model of the FCA adapters at a specific firmware rev. There's an advisory out on that somewhere. But that's an older issue (~2+ years?).
At each heartbeat the driver looks at the top of each time ordered hash list, if the same io is seen too many times it checks to see if the alloted timeout period has expired. All io with a less than 256 second timeout is timed within the hba itself. If the hba's internal timeout fails as it sometimes does on flaky h/w, the system doesn't get the io back... ever... and this can precip a hang. The "error" you see displayed is the driver complaining that the io timeout within the hardware failed. A soft/firmware reset of the adapter is performed to try and recover rather than just sitting there waiting for a hang. BTW, it is possible to turn off the reset portion of the logic since that is controlled via a different, existing, config in which case the software keeps complaining about the stuck hardware, keeps scheduling the reset but one never happens.
The various sysconfigs should be in the release notes. You can turn them off if you want, and if the h/w really is broken then the system will eventually hang due to io that never completes.
The reset logic forces all io to be retried, so it "heals" the issue but you'd see messages about the hardware being reset which isn't shown.
The one hole in the logic is if you glitch the system clock, then you can make it appear as if the io has been outstanding longer than it really has and cause the logic to falsely trigger... but you need to bump the clock alot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-01-2009 10:28 PM
тАО09-01-2009 10:28 PM
Re: Emulex HBAs - Tru64UNIX v5.1B-4 - Errors
I also think that FC HBA card may be damaged.
Did you check logs on FC switch?
Did you check logs on FC switch?
In vino veritas, in VMS cluster
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP