HPE EVA Storage

Volume not failing over with SecurePath

 

Volume not failing over with SecurePath

Hello,

I'm using Secure Path Manager 4.0c SP2 on Windows 2000 Server connected to MSA1500cs array. One of the controllers have failed and I was expecting to see all the volumes failing over to the other controller. But out of 8 volumes, 7 have been failed over and one shows that it has not failed over. While the alternate path is exactly the same for all the volumes, what cause a volume not to fail over ? Has anyone had a similar experience?

Attached is a snapshot of Secure Path Manager.
5 REPLIES 5
Amol Garge
Trusted Contributor

Re: Volume not failing over with SecurePath

Behrang,

This is a weird issue...

Can you attach the show tech support?

Lets see what the controller has to say...

Re: Volume not failing over with SecurePath

Attached is the show tech support output. The strange thing about the situation is that this MSA1500 has been perpetually failing over from one controller to the other without any apparent reason. As I write this message, the failed controller is active again and having all the volumes without any issues. The only thing done between last failover and recent one was changing a failed HD.
Amol Garge
Trusted Contributor

Re: Volume not failing over with SecurePath

OK, first things first:

From the Controller logs I see:

Disk407: Box 4, Bay 07, (B:T:L 3:08:00) was replaced.

This is a part of LUN 7, yes! the same LUN which didn't failover!

So, this is what must have happened:
The Controller failed over for some reason, at the same time or seconds before the the disk has also failed unfortunately the spare has not kicked in at that time, the LUN is in degraded mode and hence does not failover.

After spare kicks in Parity init starts and the controller fails back, the LUN is visible again to that controller like magic!

I guess it was just bad timing.

Coming to the point that why the controllers are failing over, we need to check the whole environment.

I would recommend that you first check the ports where the MSA host ports are connected and then check the FC HBA ports for enc or crc errors.

Hope this helps!
Amol Garge
Trusted Contributor

Re: Volume not failing over with SecurePath

Hey,

Forgot to add one thing...

Access control list(SSP in GUI) is disabled...

I would recommend that you enable it.

Re: Volume not failing over with SecurePath

I'm sure of the fact that the disk has failed before this story since we run HP-OVO on the connected server and it picked up on it before this situation. But I totally agree with you that having a degraded RAID should have been the cause of this behavior(Not failing over the degraded volume) even though I don't precisely recall if I have read this behavior is documented on Secure Path manager documentation. Strangely enough after disk replacement, we haven't observed any fail overs and it's been few hours and the server has been under full load since then.

Apart from that, there have been weird messages like "Fan Failure" or "Controller Battery Low" observed during the turbulence of fail over which have been clearing out in minutes. So I'm suspicious at the malfunction of the whole fabric or some driver incompatibility somewhere.

Unfortunately this box is running on Win2K and HP has stopped providing Win2K drives for a lot of components so this will be a good motivation for us to invest in W2K3 upgrade and eventually to get rid of Secure Path Manager.

I have planned an overhaul on these boxes with HP so hopefully this should help to eliminate the root cause of the controller failure.

Thanks for all the comments and helps!