Simpler Navigation for Servers and Operating Systems - Please Update Your Bookmarks
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
If you have bookmarked forums or discussion boards in Servers and Operating Systems, we suggest you check and update them as needed.
Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Unexpected Path-failover failure

Johnny Vergeer
Occasional Advisor

Unexpected Path-failover failure

Hi all,

I hope someone can shed some light on the following. During a storage migration exercise on EMC Symmetrix, we experienced a failure situation, and we recreated this problem in a lab environment:

- We had 4 paths configured to a device, and "hwmgr –show scsi –full" for the device showed all 4 paths as valid.

- Next we started running an I/O generator to this device, using "dt".

- Then on the EMC side, two of the paths were "write disabled". The write disabling of the channels was done by creating a device group and using the "symld -g xxx -SA x -p x write_disable" command.

- This resulted in "SCSI events" being reported in the system log, and "A change has occurred in an error counter for device (HWID=XXX lid=4)" message for the device.

- The I/O generator crashed with "dt: 'write', errno = 5 - I/O error"

- However, "hwmgr –show scsi –full" still showed the 4 paths as "valid".

- Trying to start the I/O generator again failed with the same "write I/O" error.

- Retrying these last two commands after some delay still gives the same result.

We did find that using “re-zoning” on the EMC side does provide the path-failover required. However why do we see the behavior described above on Tru64?
4 REPLIES
Ivan Ferreira
Honored Contributor

Re: Unexpected Path-failover failure

Maybe is because when you use "write disabled" the lun is visible but writes are prevented.

When you use rezoning, the luns are not visible any more and then a "failover" is needed to maintain access.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Han Pilmeyer
Esteemed Contributor

Re: Unexpected Path-failover failure

We do path fail over when the path fails. But in this case all of the paths are fine. So we use all of them. Through the path where the LUN is write disabled, we receive a device error (can't write). This is not a path error, but a device failure. We expect to see the same device through all paths, so we don't retry using other paths (since they all end up at the same device).

This is the simple answer. Just don't do this.
Johnny Vergeer
Occasional Advisor

Re: Unexpected Path-failover failure

Thanx Ivan & Han,

Yup we found out the hard way "not to do this" :-)

Is it possible to disable a specific path from the OS side?
Han Pilmeyer
Esteemed Contributor

Re: Unexpected Path-failover failure

No, this is not possible.