Operating System - HP-UX
1748167 Members
4039 Online
108758 Solutions
New Discussion юеВ

Re: SAN LUN WWID change, system have some problems

 
SOLVED
Go to solution
Gary L
Super Advisor

SAN LUN WWID change, system have some problems

Hi

We have two HPUX server. For backup reason the two server were created a same SAN disk. In other words: the two servers link the same one SAN disk. Detail as blow:
server1:
SAN disk info: c17t0d0 /dev/vg08 /u09

server2:
SAN disk info: c20t0d6 /dev/vg08 /u08

server2 umount the SAN disk 6:00am
server1 mount the SAN disk 6:50am
server1 umount the SAN disk 7:50am
Server2 moount the SAN disk 7:55am

/u08 and /09 is the same SAN disk.

Today, the SAN disk have some problems, SAN admin re-created a new WWID replace old LUN. I have fixed server1 's problems (spmgr delete c17t0d0, and re-create the disk to vg via SAM and vgcfgrestore -n /dev/vg08 /dev/rdsk/c20t0d6) spmgr display, vgscan, ioscan, vgdisplay -v are all correct.
But the server2 have some problems, I tried to umount the mountpoint and do the same task like server1, spmgr delete .....but I met some problems:
1. vgchange -a n /dev/vg08
vgchange: Couldn't deactivate volume group "/dev/vg08":
Device busy

2. spmgr delete c20t0d6
Error: Device mounted or otherwise busy.

3. ioscan -funC disk
disk 9 255/255/0/0.6 sdisk NO_HW DEVICE HSV110 (C)COMPAQ
/dev/dsk/c20t0d6 /dev/rdsk/c20t0d6

status not "CLAIMED", was "NO_HW"

4. spmgr display
c20t0d6 Path_Status ****FAILED

5. vgscan
Couldn't stat physical volume "/dev/dsk/c20t0d6":
Invalid argument
The Volume Group /dev/vg08 was not matched with any Physical Volumes.

6. SAM
could not find the c20t0d6,

7. could not do the pvcreate

I tried to vgchange -a n /dev/v08 and do the "spmgr delete" and smpgr add NEW_LUN_WWID" or recreate the disk to vg08 via SAM and do the vgcfgrestore. But, how to do?

Any suggestions will be very appreciate!!!


8 REPLIES 8
Santosh Rao
Occasional Advisor
Solution

Re: SAN LUN WWID change, system have some problems

Hi,

What version of HP-UX are you running ? Did you examine /var/adm/syslog/syslog.log to check if there were any errors ?

In particular, do the following :

#cd /var/adm/syslog
#grep replace_dsk syslog.log

If you see any entries containing replace_dsk in syslog, note the time of error message log and follow the intructions.

If the port WWN of the tgt port changes, the host OS will detect an authentication failure and prevent access until the admin has run a fcmsutil replace_dsk. A message is logged in syslog to that effect.

The above behaviour is in pre-11.31.

In 11.31, this is further enhanced and the host OS will perform lun WWID authentication and will prevent access when the lun WWID has changed. Use needs to run scsimgr replace_wwid. (11.31 onwards).

Thanks,
Santosh
Gary L
Super Advisor

Re: SAN LUN WWID change, system have some problems

Hi Santosh Rao

Thank you very much for your fast reply.

Those two HP-UX are rp box, OS version is B.11.11
There is no erro in syslog.log, include your mentioned "replace_dsk". Just "vmunix: Device c20t0d6 busy with openCount=-1,cannot destroy path to stale details and update the new details 0/2/1/0.81.2.0.0.0.7 hsx"

How to do next step

thanks
Santosh Rao
Occasional Advisor

Re: SAN LUN WWID change, system have some problems

Hard to say what's going on specifically here. Sounds like some process may have that lun open and hence, spmgr delete is failing.

You may want to check for all processes to see if any process has the device open.

Another approach is to write a small c program to call pstat_getdisk(2) on that disk dsf and check the psd_status. If this field is 1, it indicates the lun is open.
Gary L
Super Advisor

Re: SAN LUN WWID change, system have some problems

Hi Sanosh

ps -ef | grep u08
" root 24523 1 0 14:34:13 ? 0:00 /sbin/fs/vxfs/umount /dev/vg08/lveva8 /u08"
there is umount process could not be kill -9.
I have tried so many method, but coudl not stop this process
how to kill it
leelangco_1
Frequent Advisor

Re: SAN LUN WWID change, system have some problems

try to use fuser -ku /dev/dsk/c20t0d6
Santosh Rao
Occasional Advisor

Re: SAN LUN WWID change, system have some problems

Hard to say what's going on. Could be IOs pending to that bad lun. You might want to identify the nport_id and the HBA DSF h/w path for that lun (c20t0d6) and post the o/p from the following cmd :

#fcmsutil stat|grep -v " 0$"

#fcmsutil devstat |grep -v " 0$"

#fcmsutil get local
#fcmsutil get remote
#fcmsutil get fabric

As an example, assuming the lun was on HBA /dev/td1 and the nport_id of that tgt port on which the lun was present was 0x02ae4 :

#fcmsutil /dev/td1 stat|grep -v " 0$"
#fcmsutil /dev/td1 devstat 0x02ae4|grep -v " 0$"
#fcmsutil /dev/td1 get local
#fcmsutil /dev/td1 get fabric
#fcmsutil /dev/td1 get remote 0x02ae4


Also try the following to see if it fixes the issue :

#fcmsutil replace_dsk
#fcmsutil reset
Gary L
Super Advisor

Re: SAN LUN WWID change, system have some problems

Hi Leelangco and Santosh

Thank you very much for your helps on weekend.

I have tried fuser to kill the process, details as follows:
#fuser -cu /dev/dsk/c20t0d6
/dev/dsk/c20t0d6: fuser: could not obtain file system ID for file /dev/dsk/c20t0d6

what's up?

I will try fcmsutil check the HBA soon


thanks buddies!
Gary L
Super Advisor

Re: SAN LUN WWID change, system have some problems

Thanks everyone

I have fixed the problems via reboot the server
happy weekend.