MSA Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

 
Kevin Ford_2
Occasional Visitor

AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

We are experiencing occasional issues with the above combination. Issue characteristic is of a disk failure, but no physical failure is logged on the SAN controller. With dual paths to the switches and with Securepath present this simply shouldn't be the case. Securepath is not logging anything unusual, as if the problem is further towards the SAN side.

I attach the relevant AIX errpt output and would be interested if anyone has seen similar conditions, or know of a remedy.

Early vendor support recommendation is to increase rw_timeout, as they judge it as characteristic of a long fibre connection, but cables are only 15 metre, and due to its nature, we are loathe to start performing outages just to tweak settings that shouldn't be causing issue.
5 REPLIES 5
Florian Heigl (new acc)
Honored Contributor

Re: AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

I only had a single mca cambex adapter, so no experience running it dual-pathed, but the errors quite definitely point to the san.
(as You're not using JFSunstable, err JFS2)

Can You do some load testing on each of the paths (simply a dd to /dev/zero) and watch PortErrShow on the corresponding switch?

Neither the cambex drivers nor AIX come with reasonable FC error statistics like HP-UX has, so You'll have to search for the root issue.

maybe something in between of 5.2's MPIO and the cambex driver and SecurePath got confused, letting the error go unnoticed. :/
yesterday I stood at the edge. Today I'm one step ahead.
CA963372
Occasional Visitor

Re: AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

Kevin,

We are having a similar problem:

AIX V5.2 / p660
Secure Path 2.0D
HP/Brocade 2/32 SAN switches (fixed port speeds)
EMA 12000s with VCS 8.7x & 8.8x

We discovered that there's a restriction where you must only have a single path from a LUN mapped to a HBA in secure path. This resolved a problem with our inability to see LUNs dynamically. Do you know if you have more than one path per HBA per LUN?

Are you mirroring your SAN disk, if so, have you applied the LVM mirror patch for AIX: bos.rte.lvm 5.2.0.50? The combination of problems with the HSG and AIX without this patch will result in corruption.

Changes we have applied have resulted in resolving issues with corruption but we are still experiencing availability issues and I/O errors.
Kevin Ford_2
Occasional Visitor

Re: AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

We're mirroring at the SAN level anyway, but our patch for that fileset is 5.2.0.51 anyway. We still haven't resolved the occasional availability issues. It might be of note that the system was upgraded to 5.2 early June but we didn't see issues until mid-August, suggesting a hardware or possibly uptime issue. I suspect the SAN hardware more and more. We're getting an engineer in so I'll update when I know more.
Kevin Ford_2
Occasional Visitor

Re: AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

Apologies if anyone was monitoring this - we've been 'escalated' and 'elevated'. I'll try to remember to add fix if I ever learn more. We've take the queue_depth down to 1 on the SAN disks, it was quiet for a few days before that and it's been quiet since as well, so nothing useful to add.
Kevin Ford_2
Occasional Visitor

Re: AIX 5.2/Cambex PC1000/Securepath 2.0D/HSG80 i/o failure

Still no errors since setting the queue_depth to 1, but no confirmation that this is a supportable fix. We are sending all our system info off to HP Europe so hopefully there is some minor detail that we've missed.