1838606 Members
3244 Online
110128 Solutions
New Discussion

NFS share directory

 
Jesse Delk
Frequent Advisor

NFS share directory

I was sharing a directory /u01/exp between two HP-Ux servers.

CT1 - server trying to view the directory
CT2 - server with the directory

The server(CT2) with the directory that I was sharing, I had to power down and move to another location. I brought this server back up and I'm able to telnet into it and able to ping it from CT1.

When doing a bdf on CT1, it hangs up when it gets to the part where usually shows the shared directory.

I never umounted the directory on CT1 when I powered the CT2 server down.

Is there a service I need to restart on CT2 show that CT1 can see this directory? Is there something I need to restart on CT1?
12 REPLIES 12
Geoff Wild
Honored Contributor

Re: NFS share directory

Is it automounted or hard mounted (fstab).

It's best to use autmounts...

Try on CT1:

/sbin/init.d/nfs.client stop

/sbin/init.d/nfs.client start


BTW - which OS version are you using?

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Jesse Delk
Frequent Advisor

Re: NFS share directory

I believe it is 11.11 version 1 (Would know the command for that off the top of your head)

Below is the message I get when running the stop command on CT1.

/sbin/init.d/nfs.client stop

NFS_SERVER not set to one in /etc/rc.config.d/nfsconf, exiting.
killing rpc.lockd
killing rpc.statd
NFS_SERVER not set to one in /etc/rc.config.d/nfsconf, exiting.
umountall: umount : has failed.
umountall: diagnostics from umount
nfs umount: nfs_unmount: /u01/exp: is busy
umount: return error 1.
killing biod
killing automount
Geoff Wild
Honored Contributor

Re: NFS share directory

For OS, uname -a

For OE: swlist |grep OE

There is a 99% chance that you will have to reboot CT1 - as on 11.11 there is no force umount for NFS :(

After doing the /sbin/init.d/nfs.client start
try to mount the filesystem:

mount /u01/exp

If it's still won't work, then I'm pretty sure a reboot is required. :(

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Jesse Delk
Frequent Advisor

Re: NFS share directory

So I may have to reboot CT1? Is this because I did not umount the share before powering CT2 down.

Nothing I can do to force the umount?


# uname -a
HP-UX drtkcp1 B.11.11 U 9000/800 114444650 unlimited-user license

# swlist | grep OE
HPUX11i-OE B.11.11.0312 HP-UX 11i Operating Environment Component
Geoff Wild
Honored Contributor

Re: NFS share directory

You need to be using NFS Version 3 to be able to force umount a filesystem..

umount -f to recover from hung/stale NFS mounts without reboot

With 11.31, NFS gets completely revamped. By default it uses NFS V3 but also has NFS V4 on it.

Rgds...Geoff



Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Simon Hargrave
Honored Contributor

Re: NFS share directory

You have a stale NFS mount. It is generally better (especially when mounting read-only so consistency not required) to mount with -osoft so it doesn't hang.

However to fix the problem, you can use the following procedure. This will remove the stale mount connection on the client. BE CAREFUL though if you get it wrong you could kill arbitrary connections: -

-----

# netstat -an | grep 2049

Now look for the 2049 connection e.g. 140.1.234.204:1023 134.202.101.25:2049

If you want to use the tcp_discon_by_addr, you use a 24 byte string that contains the hex representation of the quadruple.

For example, if the connection that I want to delete is:

Local IP: 192.1.2.3 (0xc0010203)
Local Port: 1024 (0x0400)
Remote IP : 192.4.5.6 (0xc0040506)
Remote Port: 2049 (0x0801)

The "hex" string you pass to tcp_discon_by_addr is:

# ndd -set /dev/tcp tcp_discon_by_addr "c00102030400c00405060801"

NOTE: the preceding 0x that typically indicates a Hex number is NOT part of the string passed.
Jesse Delk
Frequent Advisor

Re: NFS share directory

I ran netstat -an here's part of the resilt I believe we are looking for. How do I get the hex value out of this?

tcp 0 1 10.1.31.100.789 172.16.121.14.2049 SYN_SENT
Jesse Delk
Frequent Advisor

Re: NFS share directory

I may have figured out the hex thing. But I did not know if the letters were upper or lower case. I get the error below.


# ndd -set /dev/tcp tcp_discon_by_addr "0A011F640315AC10790E0801"
operation failed, Invalid argument

# ndd -set /dev/tcp tcp_discon_by_addr "0a011f640315ac10790e0801"
operation failed, Invalid argument
Radovan Rovny
New Member

Re: NFS share directory

Hi
Try to run Simon's procedure on connections in CLOSE_WAIT state and not only on 2049 connections.
Also check if it is possible to close the connections by executing following command:

ndd -get /dev/tcp ?|grep tcp_discon_by_addr
Dave Olker
Neighborhood Moderator

Re: NFS share directory

Hi Jesse,

When you say "moved to another location", is the NFS server now using a different IP address? If so, just forcibly tearing down the stale TCP connection between the client and server is not going to cause the filesystem to magically unmount on the client.

The server's current IP address is still going to be stored in the client's mntinfo structure in the kernel so the next time you access that mount point it will use the same IP address. You have to figure out a way to unmount the filesystem, not just rip out the underlying TCP connection.

Since you're running 11.11 (as opposed to 11.23 or 11.31) you don't have the ability to do a forcible unmount. However, you may be able to get the client to unmount the filesystem without rebooting it.

I've written a technical paper that discusses various strategies to manually unmount filesystems in situations like these. The paper is located here:

http://docs.hp.com/en/3929/ForciblyUnmountingNFSFilesystems.pdf

Good luck,

Dave



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Jesse Delk
Frequent Advisor

Re: NFS share directory

Alot of good info on the White paper....easy enough for even me to follow.

The NFS server CT2 has a new IP Address. I made changes wherever I could find them to point to the new ip address of CT2 on CT1.

Cannot umount because I get the :is busy thing.

When trying to do the netstat -in I only get two lan0 lo0. I don't see any that may be pointing to the old ip address (or similar) to do the ifconfig lan# portion of the white paper.

I know I may be rebooting.




Look as though I may have to reboot.
Dave Olker
Neighborhood Moderator

Re: NFS share directory

Hi Jesse,

> When trying to do the netstat -in I only get
> two lan0 lo0. I don't see any that may be
> pointing to the old ip address (or similar)
> to do the ifconfig lan# portion of the white
> paper.

The instructions in the technical paper are to log onto the client and use the ifconfig command to create a temporary virtual interface using the old server's IP address. In this case you would be able to assign the IP address to either your physical interface (lan0) or your loopback interface (lo0). It doesn't matter, as long as the hung NFS mount point can suddenly get a response from the IP address it's sending to.

Before rebooting, I'd give this a try.

1) Log into the NFS client
2) Use the ifconfig command to create a virtual interface using the NFS server's old IP address:

# ifconfig lo0:1 up

Hopefully creating the temporary virtual interface will cause the client to stop hanging on the mounted filesystem and a bdf command will return an ESTALE (stale file handle error message) for that NFS mount point. At that point you *might* be able to unmount the filesystem successfully. Try a few times, like in the technical paper, because it might take some time for the client's buffer cache pages to flush.

If you're able to get the filesystem to unmount then remove the virtual IP interface from the loopback interface:

# ifconfig lo0:1 0

Hope this helps,

Dave



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo