NFS mount point issue

bin_smr · ‎11-02-2011

Hello Experts ,

Whenever any user login to the workstation which is HP-UX 11.11 OS(model-c3700) and tries to access the mount point which has already migrated from old mount point,the application which is CATIA itself getting keep on restarting by throwing the below error

CATRACE FILE : /home/<user_id>/catraceka00075925665.log

                         ********************************
                         * ERROR NUMBER : S0010         *
                         ********************************

and also i checked the log files but its not giving any error.

But when i am keeping same .model file from new mount point in different locaion like /home/<user_id> or /tmp or anywhere else, its working fine means CATIA application working perfectly without throwing any error . I have doubt could it be system issue or mount point issue ?

NOTE :- Previously mount point was on AIX server and it was working fine but after migration now the mount is having on Solaris server and its working fine except this perticular old workstation.

Can any body suggest me regarding above issue ?

Acclaimed Contributor

Dennis Handly · ‎11-02-2011

>tries to access the mount point

What is the new mount point and is this writable by the user?

bin_smr · ‎11-02-2011

I hope it will be helpful to you . Please find the below output :

1. This is the Solaris server NFS config output where's mount point located :

bash-3.00# more /etc/default/nfs
NFSD_LISTEN_BACKLOG=32
NFSD_PROTOCOL=ALL
NFSD_SERVERS=16
LOCKD_LISTEN_BACKLOG=32
LOCKD_SERVERS=20
LOCKD_RETRANSMIT_TIMEOUT=5
GRACE_PERIOD=90
NFS_SERVER_VERSMIN=3
NFS_SERVER_VERSMAX=3
NFS_CLIENT_VERSMIN=3
NFS_CLIENT_VERSMAX=3

2. This is the HP-UX NFS Config & mount point output :

bash-2.04# more /etc/rc.config.d/nfsconf
#********************** NFSCONF ******************************

# NFS configuration. See nfsd(1m), mount(1m), pcnfsd(1m)
#
# @(#)B.11.11_LR
#
# NFS_CLIENT:   1 if this node is an NFS client, 0 if not
# NFS_SERVER:   1 if this node is an NFS server, 0 if not
#               Note: it is possible for one host to be a client, a server,
#               both or neither! This system is an NFS client if you will
#               be NFS mounting remote file systems; this system is a server
#               if you will be exporting file systems to remote hosts.
#               See Also: nfsd(1M), mount(1M).
# NUM_NFSD:     Number of NFS deamons (nfsd) to start on an NFS server. Four
#               has been chosen as optimal.
# NUM_NFSIOD:   Number of NFS BIO daemons (biod) to start on an NFS client.
#               Four has been chosen as optimal.
# PCNFS_SERVER: 1 if this node is a server for PC-NFS requests. This
#               variable controls the startup of the pcnfsd(1M) server.
#               See Also: pcnfsd(1M).
#
NFS_CLIENT=1
NFS_SERVER=1
NUM_NFSD=30
NUM_NFSIOD=16
PCNFS_SERVER=0
# export feature does not work in this file since files are being
# sourced into another file rc.config and this file is being sourced
# into the startup scripts.
...skipping...
#********************** NFSCONF ******************************

# NFS configuration. See nfsd(1m), mount(1m), pcnfsd(1m)
#
# @(#)B.11.11_LR
#
# NFS_CLIENT:   1 if this node is an NFS client, 0 if not
# NFS_SERVER:   1 if this node is an NFS server, 0 if not
#               Note: it is possible for one host to be a client, a server,
#               both or neither! This system is an NFS client if you will
#               be NFS mounting remote file systems; this system is a server
#               if you will be exporting file systems to remote hosts.
#               See Also: nfsd(1M), mount(1M).
# NUM_NFSD:     Number of NFS deamons (nfsd) to start on an NFS server. Four
#               has been chosen as optimal.
# NUM_NFSIOD:   Number of NFS BIO daemons (biod) to start on an NFS client.
#               Four has been chosen as optimal.
# PCNFS_SERVER: 1 if this node is a server for PC-NFS requests. This
#               variable controls the startup of the pcnfsd(1M) server.
#               See Also: pcnfsd(1M).
#
NFS_CLIENT=1
NFS_SERVER=1
NUM_NFSD=30
NUM_NFSIOD=16
PCNFS_SERVER=0
# export feature does not work in this file since files are being
# sourced into another file rc.config and this file is being sourced
# into the startup scripts.

#
# DAEMON OPTIONS
#
# LOCKD_OPTIONS:   options to be passed to rpc.lockd when it is started.
# STATD_OPTIONS:   options to be passed to rpc.statd when it is started.
# MOUNTD_OPTIONS: options to be passed to rpc.mountd when it is started.
#
LOCKD_OPTIONS=""
STATD_OPTIONS=""
MOUNTD_OPTIONS=""
#
# automount configuration
#
# AUTOMOUNT = 0 Do not start automount
# AUTOMOUNT = 1 Start Automount.
# AUTO_MASTER = filename of the master file passed to automount
# AUTO_OPTIONS = options passed to automount
#
AUTOMOUNT=1
AUTO_MASTER="/etc/auto_master"
AUTO_OPTIONS="-f $AUTO_MASTER"
#
# rpc.mountd configuration. See mountd(1m)
#
# START_MOUNTD: 1 if rpc.mountd should be started by a system startup script.
#               0 if /etc/inetd.conf has an entry for mountd.
#       Note: rpc.mountd should be started from a system startup script,
#       however, it can be started from either nfs.server or inetd, and
#       MUST only be configured in one place.
#
START_MOUNTD=1
#
#autofs configuration. See autofs(1m)
#
#For the 11.0 Release line both AUTOFS and the old Automount
#are delivered. In order to invoke the AUTOFS instead of
#you must set the AUTOFS flag to 1.
#
#/usr/sbin/automount is now a script that sources in this file
#Depending on the variable AUTOFS, either AUTOFS or the old
#automount process will execute. The nfs.client start script
#will also use this variable to start the appropriate process
#during the boot sequence.
#AUTOFS= 0 - use the old automount process.
#         1 - use the new AutoFS.
#AUTOMOUNT_OPTIONS= - options to the AutoFS automount command
#AUTOMOUNTD_OPTIONS= - options to the AutoFS automountd daemon
#
#The AUTOMOUNT flag still needs to be set for either the old
#automount or new AutoFS to be started by the nfs.client script.
#
AUTOFS=1
AUTOMOUNT_OPTIONS=""
AUTOMOUNTD_OPTIONS=""

and below is the new mount point and for this user access has been given:

10.243.0.54:/projects
1882687624 1292630240 590057384 69% /projects

Acclaimed Contributor

Matti_Kurkela · ‎11-02-2011

On the Solaris host, please run this command and show the output:

grep /projects /etc/dfs/dfstab

On the HP-UX host, please run these commands and show the output:

ls -ld /projects
grep /projects /etc/fstab

MK

bin_smr · ‎11-04-2011

Hello Matti ,

Thanks for your response , Please find the below cmd output :

On Solaris host :-

bash-3.00# grep /projects /etc/dfs/dfstab
bash-3.00#
But didn't get any output from above cmd, So i am providing below these cmd output which might be helpful to you.

bash-3.00# more /etc/dfs/dfstab
# Do not modify this file directly.
# Use the sharemgr(1m) command for all share management
# This file is reconstructed and only maintained for backward
# compatibility. Configuration lines could be lost.
#
#       Place share(1M) commands here for automatic execution
#       on entering init state 3.
#
#       Issue the command 'svcadm enable network/nfs/server' to
#       run the NFS daemon processes and the share commands, after adding
#       the very first entry to this file.
#
#       share [-F fstype] [ -o options] [-d "<text>"] <pathname> [resource]
#       .e.g,
#       share -F nfs -o rw=engineering -d "home dirs" /export/home2

bash-3.00# df -h
projects 1.8T 1.2T 561G 69% /projects

On HP-UX host :-

bash-2.04# ls -ld /projects/
drwxr-xr-x 81 root root 90 Oct 2 09:41 /projects/

bash-2.04# grep /projects /etc/fstab
10.243.0.54:/projects /projects nfs rw,suid 0 0

one more thing i want to share with this mount point every where with C8000 & C3700 model its working perfectly problem is only with this system.

Acclaimed Contributor

Dave Olker · ‎11-04-2011

On the HP-UX NFS client, issue the command: showmount -e <NFS server hostname>. This will confirm the NFS client can send RPC requests to the NFS server and show the list of shared/exported filesystems. It should also show if the NFS client is in the access lists on the server, assuming there is one.

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

bin_smr · ‎11-07-2011

Hello Dave,

Just check the below one...
bash-2.04# showmount -e zeus
export list for zeus:
/impdata/caedata (everyone)
/projects/cadds5-mockups (everyone)
/projects (everyone)
/u1 (everyone)
bash-2.04#

Acclaimed Contributor

Dave Olker · ‎11-07-2011

So the server (Solaris) is sharing /projects, the NFS client (HP-UX) is mounting /projects. What specifically is not working? Is the problem that users on the client cannot read or write files in /projects? Is the problem that users cannot lock files in /projects? Can you be very specific about what the exact problem is and are you able to reproduce the problem outside of your application?

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

bin_smr · ‎11-10-2011

Yes server (Solaris) is sharing /projects, the NFS client (HP-UX) is mounting /projects & also users on the client can read or write files in /projects.Problem also i'm not getting specifically.Because for every other system that .model file (inside /projects)working perfectly for catia application but only in this system at the time of using catia application when mounting to that particular project and using the .model file it is getting restarted by showing some s10 or s11 error .Before migration we've installed some NIS patches in all HPUX systems.

So for that particular system do i need to install some more patches so that it'll work.Can you help me on that.

Acclaimed Contributor

Dave Olker · ‎11-10-2011

> Because for every other system that .model file (inside /projects)working perfectly

> for catia application but only in this system at the time of using catia application

> when mounting to that particular project and using the .model file it is getting restarted

> by showing some s10 or s11 error .

I'm not familiar enough with the Catia application to know what an S10 or S11 error is or what it means. Have you checked with Catia to learn what these errors indicate? This is why I recommended you try to reproduce the *symptom* outside of Catia. Aside from the application errors, what is the actual symptom you experience? Does the application hang? Does it end?

The last time I dealt with a Catia problem the issue turned out to be a file locking problem. That might explain your situation, where users can read/write in /projects but Catia throws errors.

If you can explain exactly what the symptom is, and check with Catia to learn what an s10/s11 error means, it would help us narrow down the underlying cause of the problem.

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

bin_smr · ‎11-19-2011

Already i have checked the errors S10 or S11 with catia application even checked with log files of catia but haven't got any solutions what an s10/s11 error meanings .

Outside of catia application everything is working fine.

Follow the below steps :

1. After login on the perticular client (HP-UX,model-c3700) , run "catia"

2. Now, CATIA - MDL01_NEW_MODEL will start.

3. In the open window, will get *.model files under the path /projects/prj_300_09/working/ (/projects/prj_300_09/working/*.model)

4. Under the files when i am trying to select the .model (L92890791-DRW05_001_DRW01_001.model) to get start the catia application its keep on restarting and sometime its freeze.

I spent lot of time on this to troubleshoot without any success. I was unable to find the meaning of s10/s11 error with catia.

Also would like to inform you, when i am keeping same .model (L92890791-DRW05_001_DRW01_001.model) file from mount point /projects in different location on particular client (HP-UX,model-c3700) like /home/<user_id> or /tmp or anywhere else, its working fine means CATIA application getting started without any error.

Please find the Screen shot of how to start catia application which would bring up better understand .

Can you please look into this .

Acclaimed Contributor

Dave Olker · ‎11-19-2011

As I said, I have no idea what the CATIA errors mean. I'm surprised someone from CATIA cannot tell you what these errors mean, or at least offer some suggestions as to how to determine the root cause of the problem.

The fact that you're able to load the model file from a local directory but not an NFS directory leads me back to my original guess that this may be an NFS file locking problem. Since we have nothing else to go on we might as well rule out file locking as the problem.

Try this:

Log into the NFS client system (11.11) as root and issue the commands:

# ps -e | grep rpc.lockd

# ps -e | grep rpc.statd

Find the process ID numbers of both rpc.lockd and rpc.statd. Then send each of these running daemons a SIGUSR2 signal:

# kill -17 <lockd pid>

# kill -17 <statd pid>

That will enable debug logging of these daemons. Now reproduce the CATIA problem. Hopefully that will only take a minute or two to reproduce. Once you've reproduce the problem send another SIGUSR2 signal to the daemons:

# kill -17 <lockd pid>

# kill -17 <statd pid>

That should disable debug logging of these daemons. The debug logfiles should be in the /var/adm directory and they should be called something like rpc.lockd.log and rpc.statd.log. Collect those files and attach them here. Also, issue the command on the 11.11 system:

# ls -l /etc/sm

# ls -l /etc/sm.bak

Copy/paste the ls output here so I can see which files reside in these two directories. This information should at least tell us if NFS file locking is involved.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

bin_smr · ‎11-21-2011

check the attachment & also i'm pasting below:

bash-2.04# ps -e | grep rpc.lockd
1084 ?         0:00 rpc.lockd
bash-2.04# ps -e | grep rpc.statd
1078 ?         0:00 rpc.statd
bash-2.04#
bash-2.04# kill -17 1084
bash-2.04# ps -e | grep rpc.lockd
1084 ?         0:00 rpc.lockd
bash-2.04# kill -17 1078
bash-2.04# ps -e | grep rpc.statd
1078 ?         0:00 rpc.statd

& also under /etc those below files are not there

# ls -l /etc/sm

# ls -l /etc/sm.bak

One more thing i want to share with you that after stopping the daemons i've restarted the workstation & after that when i logged in with that user account catia works perfectly for some time & after same thing happened means restarted the application . Here i'm pasting the catia error snapshot below where you can see in 1st stage catia application executed perfectly without any error & in 2nd stage the restarting error:

==========================================================================

*** OPTIONAL LICENSES REQUESTED ***
ELWS410 LICENSE UNAVAILABLE
GSAS410 LICENSE UNAVAILABLE
RASS410 LICENSE OK
RASS410 LICENSE STILL AVAILABLE FOR    8 DAYS
SB5C410 LICENSE UNAVAILABLE
SBDC410 LICENSE OK
SBDC410 LICENSE STILL AVAILABLE FOR    8 DAYS
***** LICENSING CONFIGURATION *****
PRODUCT GRANTED RASS410 = CATIA.Hybrid Raster Editor Product
PRODUCT GRANTED SBDC410 = CATIA.Mech.Solid-Based Part Design and Detailing Config
Scandir Problem into /tools/CAT425R0/cfg/code/dictionary

Ended at 09:40:28

==========================================================================

END OF CATIA .

==========================================================================

% catlicst

>>> Stop all CATIA License Servers on local node argo <<<
no CATLICSR server currently running

1 CATLICSL server currently running

Stopping CATIA License Server using port 2425 on node argo

> CATIA License server stop request transmitted

% catclear
============================================

Submit the utility

==========================================================================

CATIA START

==========================================================================

*** OPTIONAL LICENSES REQUESTED ***
ELWS410 LICENSE UNAVAILABLE
GSAS410 LICENSE UNAVAILABLE
RASS410 LICENSE OK
RASS410 LICENSE STILL AVAILABLE FOR    8 DAYS
SB5C410 LICENSE UNAVAILABLE
SBDC410 LICENSE OK
SBDC410 LICENSE STILL AVAILABLE FOR    8 DAYS
***** LICENSING CONFIGURATION *****
PRODUCT GRANTED RASS410 = CATIA.Hybrid Raster Editor Product
PRODUCT GRANTED SBDC410 = CATIA.Mech.Solid-Based Part Design and Detailing Config
Scandir Problem into /tools/CAT425R0/cfg/code/dictionary

WARNING :LOG FILENAME NOT FOUND IN DECLARATION FILES
/home/ka000759/catia_abend.log IS USED

CATRACE FILE : /home/ka000759/catraceka0007593777.log

                         ********************************
                         * ERROR NUMBER : S0011         *
                         ********************************

Ended at 09:41:41

==========================================================================

ERROR IN CATIA .

RETURN CODE IS 0 .

ERROR CODE IS -11 .

==========================================================================

PROCESS RESTARTING...

RESTART NUMBER : 1 .

==========================================================================

Acclaimed Contributor

Dennis Handly · ‎11-21-2011

>also under /etc those below files are not there: # ls -l /etc/sm /etc/sm.bak

These are the old 9.x paths. Try:

ll /var/statmon/sm /var/statmon/sm.bak

bin_smr · ‎11-23-2011

Yes you are right. check this

bash-2.04# ll /var/statmon/sm
total 48
--w------- 1 root root 12 Nov 22 09:38 10.243.0.54
--w------- 1 root root 5 Nov 22 09:38 argo
--w------- 1 root root 5 Nov 22 09:38 zeus
bash-2.04# ll /var/statmon/sm.bak
total 0

Acclaimed Contributor

Dave Olker · ‎11-28-2011

I looked at the logfiles and there is nothing obviously wrong in the log files. It does show a lot of lock/test/unlock activity, which confirms my suspicion that NFS file locking is likely involved. A couple of questions:

Which server is "10.243.0.54"?
Is there a reason why system argo cannot resolve this IP address to a hostname?
From the collected data, do you know exactly when the problem occurs? I could focus on that time period in the log files to see what kind of activity was happening during the failure.

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

bin_smr · ‎11-29-2011

1. Which server is "10.243.0.54"?

Ans : SunOS "zeus" 5.10 which is pointing to address "10.243.0.54"

bash-3.00# uname -a
SunOS zeus 5.10 Generic_142900-13 sun4v sparc SUNW,SPARC-Enterprise-T5120

2. Is there a reason why system argo cannot resolve this IP address to a hostname?

Ans : argo is a HP-UX workstation,even i am quite confused why system argo cannot resolve
this IP address to a hostname because all the log files looks like fine.

Please lookinto below output,might be having helpful to you.

# nslookup
Using /etc/hosts on: argo

> 10.243.0.54
Using /etc/hosts on: argo

looking up FILES
Name: zeus
Address: 10.243.0.54
Aliases: zeus.eu.labinal.snecma

> zeus
Using /etc/hosts on: argo

looking up FILES
Name: zeus
Address: 10.243.0.54
Aliases: zeus.eu.labinal.snecma

3. From the collected data, do you know exactly when the problem occurs? I could focus on
that time period in the log files to see what kind of activity was happening during the
failure.

Ans : I believe already we have discussed.

----- >As i have mentioned previously same step :

Follow the below steps :

1. After login on the perticular client (HP-UX,model-c3700) , run "catia"

2. Now, CATIA - MDL01_NEW_MODEL will start.

3. In the open window, will get *.model files under the path /projects/prj_300_09/working/ (/projects/prj_300_09/working/*.model)

4. Under the files when i am trying to select the .model (L92890791-DRW05_001_DRW01_001.model) to get start the catia application its keep on restarting and sometime its freeze.

---> One more thing i want to share with you that after stopping the daemons i've restarted the workstation & after that when i logged in with that user account catia works perfectly for some time & after same thing happened means restarted the application . Here i'm pasting the catia error snapshot below where you can see in 1st stage catia application executed perfectly without any error & in 2nd stage the restarting error:

==========================================================================
*** OPTIONAL LICENSES REQUESTED ***
ELWS410 LICENSE UNAVAILABLE
GSAS410 LICENSE UNAVAILABLE
RASS410 LICENSE OK
RASS410 LICENSE STILL AVAILABLE FOR 8 DAYS
SB5C410 LICENSE UNAVAILABLE
SBDC410 LICENSE OK
SBDC410 LICENSE STILL AVAILABLE FOR 8 DAYS
***** LICENSING CONFIGURATION *****
PRODUCT GRANTED RASS410 = CATIA.Hybrid Raster Editor Product
PRODUCT GRANTED SBDC410 = CATIA.Mech.Solid-Based Part Design and Detailing Config
Scandir Problem into /tools/CAT425R0/cfg/code/dictionary

Ended at 09:40:28
==========================================================================

END OF CATIA .
==========================================================================
% catlicst

>>> Stop all CATIA License Servers on local node argo <<<
no CATLICSR server currently running

1 CATLICSL server currently running

Stopping CATIA License Server using port 2425 on node argo
> CATIA License server stop request transmitted

% catclear
============================================
Submit the utility
==========================================================================
CATIA START
==========================================================================
*** OPTIONAL LICENSES REQUESTED ***
ELWS410 LICENSE UNAVAILABLE
GSAS410 LICENSE UNAVAILABLE
RASS410 LICENSE OK
RASS410 LICENSE STILL AVAILABLE FOR 8 DAYS
SB5C410 LICENSE UNAVAILABLE
SBDC410 LICENSE OK
SBDC410 LICENSE STILL AVAILABLE FOR 8 DAYS
***** LICENSING CONFIGURATION *****
PRODUCT GRANTED RASS410 = CATIA.Hybrid Raster Editor Product
PRODUCT GRANTED SBDC410 = CATIA.Mech.Solid-Based Part Design and Detailing Config
Scandir Problem into /tools/CAT425R0/cfg/code/dictionary

WARNING :LOG FILENAME NOT FOUND IN DECLARATION FILES
/home/ka000759/catia_abend.log IS USED
CATRACE FILE : /home/ka000759/catraceka0007593777.log
********************************
* ERROR NUMBER : S0011 *
********************************
Ended at 09:41:41
==========================================================================

ERROR IN CATIA .
RETURN CODE IS 0 .
ERROR CODE IS -11 .
==========================================================================

==========================================================================

PROCESS RESTARTING...
RESTART NUMBER : 1 .
==========================================================================

One more important things would need to share with you, Previosly NFS mount point was on AIX server and it was working fine but after migration the same NFS mount point on Solaris server,its having only problem with perticular client (HP-UX,model-c3700) except all the client machine working perfectly respect that as per my analyzation this is not problem due to migration becuase before migaration we were sharing NFS fs from AIX server and after migration completed we are sharing same NFS fs from SunOS because AIX server will be decommissioned shortly so we are pointing out to the new NFS migration SunOS server.

Acclaimed Contributor

Dave Olker · ‎11-30-2011

Most NFS file locking problems are caused by hostname resolution problems.

     bash-2.04# ll /var/statmon/sm
     total 48
     --w------- 1 root root 12 Nov 22 09:38 10.243.0.54
     --w------- 1 root root 5 Nov 22 09:38 argo
     --w------- 1 root root 5 Nov 22 09:38 zeus

The fact that the rpc.lockd and rpc.statd daemons on the HP-UX client are able to resolve zeus at one point but fail to resolve it moments later seems like something we need to fix before moving on. Some quick questions:

1) Does zeus have multiple IP addressess?

2) Are you NFS mounting filesystems from zeus using different IP addresses?

3) Are you NFS mounting some filesystems using the hostname "zeus" and other filesystems using zeus' IP address?

4) How do you have your hostname resolution configured in your /etc/nsswitch.conf file? I assume you have your "hosts" entry pointing to "files" first since that's what nslookup used in your example.

One thing to try:

A) Make sure argo resolves the hostname and IP address for zeus to the same IP address. (you already did this)

B) Make sure zeus resolves the hostname and IP address for argo to the same IP address.

C) Terminate the running rpc.lockd and rpc.statd daemons on argo

D) Terminate the running rpc.lockd and rpc.statd daemons on zeus

E) Remove the files in /var/statmon/sm on both systems

F) Re-start the rpc.lockd and rpc.statd daemons on both systems

G) Wait 1 minute for the grace period to expire

H) On argo, try locking a file in one of the NFS filesystems mounted from zeus using a program other than CATIA. If you do not have a simple lock/unlock program write to me at dave.olker@hp.com and I'll provide one for you.

Once you've made absolutely sure that the non-CATIA program can successfully lock and unlock files on zeus, then try running CATIA again and see if it works. If you are unable to successfully lock and unlock files on zeus using a test program then we can collect further data and figure out why.

Regards,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

bin_smr · ‎12-07-2011

1) Does zeus have multiple IP addressess?

-- > No, doesn't have multiple IP addresses.

2) Are you NFS mounting filesystems from zeus using different IP addresses?

-- > No

3) Are you NFS mounting some filesystems using the hostname "zeus" and other filesystems using zeus' IP address?

-- > Yes..some NFS mounting filesystems using with hostname "zeus" & some system using zeus IP address.

4) How do you have your hostname resolution configured in your /etc/nsswitch.conf file? I assume you have your "hosts" entry pointing to "files" first since that's what nslookup used in your example.

-- > Yes.. this is the nsswith.conf file output below:

hosts:        files dns nis

Note :- This format's same for every system.

----------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------

A) Make sure argo resolves the hostname and IP address for zeus to the same IP address. (you already did this)

Ans :- Already I did it

B) Make sure zeus resolves the hostname and IP address for argo to the same IP address.

-- >

bash-3.00# nslookup
> argo
Server:         10.243.100.20
Address:        10.243.100.20#53

Name:   argo.eu.labinal.snecma
Address: 10.243.200.173
> 10.243.200.173
Server:         10.243.100.20
Address:        10.243.100.20#53

173.200.243.10.in-addr.arpa     name = argo.eu.labinal.snecma.

C) Kill the running rpc.lockd and rpc.statd daemons on argo

-- >

bash-2.04# ps -e | grep rpc.lockd
1091 ?         0:00 rpc.lockd
bash-2.04# ps -e | grep rpc.statd
1085 ?         0:00 rpc.statd
bash-2.04# kill -17 1091
bash-2.04# kill -17 1085
bash-2.04# ps -e | grep rpc.statd
1085 ?         0:00 rpc.statd
bash-2.04# ps -e | grep rpc.lockd
1091 ?         0:00 rpc.lockd

D) Kill the running rpc.lockd and rpc.statd daemons on zeus

- > Zeus is the NFS master server ..if i kill the process then rest of the NFS share may get disturbed / unmounted from client. So suggest me what i can do here ???

E) Remove the files in /var/statmon/sm on both systems

Ans:- I can remove this file from client machine only i cann't remove from master server..since it is used by another machines

F) Re-start the rpc.lockd and rpc.statd daemons on both systems

-- > As i informed I cann't do it on zeus.

G) Wait 1 minute for the grace period to expire

H) On argo, try locking a file in one of the NFS filesystems mounted from zeus using a program other than CATIA. If you do not have a simple lock/unlock program write to me at dave.olker@hp.com and I'll provide one for you.

-- > yes please provide me the lock/unlock program to me ..i will test on one of the client and get back to you.

Acclaimed Contributor

Dennis Handly · ‎12-07-2011

> Zeus is the NFS master server .. if I kill the process then rest of the NFS share may get disturbed / unmounted from client.

Yes, that was always the quandary when cleaning up lock demons. Is it better to kill everything to just fix one system? And if you remove the process/files on zeus, what about all of the other machines that mount from zeus, does it snowball to every machine?

Perhaps Dave has some suggestions.

>I can't remove from master server.. since it is used by another machines

You can always remove just the files named argo.

Dave Olker · ‎12-07-2011

> > Zeus is the NFS master server .. if I terminate the process then rest of the

> > NFS share may get disturbed / unmounted from client.

There's no chance that an NFS share will suddenly be unmounted on the NFS client, even if you did terminate and restart the locking daemons.

> Yes, that was always the quandary when cleaning up lock demons.

> Is it better to terminate everything to just fix one system?

> And if you remove the process/files on zeus, what about all of the

> other machines that mount from zeus, does it snowball to every machine?

Here's the 10,000 foot view of how file locking works with NFS version 3.

Lock Reqeust Time

1) Client requests a lock

a. Client creates an entry in /var/statmon/sm for itself and the NFS server it's locking with

2) Server starts processing the lock

b. Server creates an entry in /var/statmon/sm for itself and the NFS client requesting the lock

Over time the NFS client will have entries in /var/statmon/sm for itself and any NFS server it has requested a lock from. Similarly, the NFS server will have /var/statmon/sm entries for itself and any NFS clients it has serviced a lock request for.

Lock Recovery Time

When a client or server reboots (or a reboot is simulated by terminating and restarting the rpc.lockd and rpc.statd daemons) the rpc.statd daemon will take any files in /var/statmon/sm and move them to /var/statmon/sm.bak and then begin the process of notifying the remote systems that a recovery event has happened. Once the local system establishes a good connection with the rpc.statd on the remote system it removes the entry from /var/statmon/sm.bak. That's why any files remaining in /var/statmon/sm.bak after a lock recovery are suspect. It usually means either the remote system is no longer around, is not up, or the lock daemons are not started on that system.

If the local system performing the recovery is an NFS client, it is telling the remote NFS servers "I've just rebooted, so any locks you're holding for me are bogus. Release any of my stale locks." This makes sense in a client reboot case because any applications that were holding locks on the server are well and truly gone, so the server should release them and let others acquire them.

If the local system performing the recovery is an NFS server, it is telling the remote NFS clients "I've just rebooted so any locks you think you are holding with me are gone. I'll now give you 1 minute to reclaim those locks before I open the floodgates and let any NFS client grab locks." This is called the grace period, and only "reclaim" locks are allowed during the grace period.

What happens if you delete /var/statmon/sm (or sm.bak) entries and terminate/start lockd/statd?

If this is done on an NFS client, any applications holding locks are screwed because the locks they were holding are now gone. To make things worse, because the /var/statmon/sm entries were deleted, the client will not send a crash recovery notification to the NFS server. That means the NFS server could potentially be holding a bunch of locks for the client that will never be released (without terminating and restarting lockd/statd on the server). This means the original client cannot reclaim its lock (because the stale copy of the previous lock in the servers' queue is still there so the server considers the new request a conflicting lock request), but any other NFS clients also cannot get those file locks.

If this is done on the NFS server, any applications on remote NFS clients are screwed because the server just destroyed all of it's locks. Because the /var/statmon/sm entries were deleted, the server has no idea which NFS clients to contact to reclaim the locks they're still holding. This means clientA could be the legitimate lock owner but clientB could ask the server for the same lock and the server will grant it because clientA's lock was destroyed during this process of deleting /var/statmon/sm files and restarting the daemons.

Good times, no?

> Perhaps Dave has some suggestions.

As you can see from the above description, it's not too hard to imagine scenarios where NFS file locking can get wedged resulting in lock denial, application failure, etc. In crash & burn environments where you're just trying to get things working properly there's usually little harm in terminating/restarting the daemons and wiping out the statmon files. That typically wipes the slate clean and lets the daemons re-establish a good connection.

From the earlier description it sounded like this environment was more of a test/dev environment where things were being tested and systems were moving around. For that reason I suggested shaking the systems like an Etch-a-Sketch and starting over. Apparently that's not the situation.

Since Zeus is a production NFS server we can postpone any potentially destructive modifications to Zeus until other possibilities are ruled out. For the time being we can focus on the NFS client. Assuming CATIA is the only application doing file locking (which it appeared to be from the log files), you should be able to terminate/start the daemons on the client without hurting anything. Again, you'll have to wait for 60 seconds before testing since the grace period will be in effect.

I've attached a sample file locking program you can use to test outside of CATIA. The program should compile fine with default options. To try locking a file the syntax would be "lockprog -f <file>". You can try running this against a local file first to ensure the program compiled correctly. Then try running this against a file in an NFS-mounted filesystem on server Zeus.

Hope this helps,

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Dennis Handly · ‎12-07-2011

>it's not too hard to imagine scenarios where NFS file locking can get wedged

So I was right to be very very afraid. :-)

bin_smr · ‎12-13-2011

I tried to compiled program "lockprog.c" without any hurting but its throwing below warning

bash-2.04# /usr/bin/gcc /var/statmon/sm/lockprog.c -o lockprog
/usr/ccs/bin/ld: (Warning) At least one PA 2.0 object file (/var/tmp//ccEZ7jmd.o) was detected. The linked output may not run on a PA 1.x system.

bash-2.04# /usr/bin/cc /var/statmon/sm/lockprog.c -o lockprog
(Bundled) cc: "/var/statmon/sm/lockprog.c", line 235: warning 5: "const" will become a keyword.
(Bundled) cc: "/var/statmon/sm/lockprog.c", line 235: error 1000: Unexpected symbol: "const".
(Bundled) cc: "/var/statmon/sm/lockprog.c", line 273: warning 5: "const" will become a keyword.
(Bundled) cc: "/var/statmon/sm/lockprog.c", line 273: error 1000: Unexpected symbol: "const".
(Bundled) cc: "/var/statmon/sm/lockprog.c", line 344: warning 5: "const" will become a keyword.
(Bundled) cc: "/var/statmon/sm/lockprog.c", line 344: error 1000: Unexpected symbol: "const".

Even while i am trying to kill the daemons "kill -17"...its not allowing to kill the daemons.

--- >

bash-2.04# ps -e | grep rpc.lockd
1091 ?         0:00 rpc.lockd
bash-2.04# ps -e | grep rpc.statd
1085 ?         0:00 rpc.statd
bash-2.04# kill -17 1091
bash-2.04# kill -17 1085
bash-2.04# ps -e | grep rpc.statd
1085 ?         0:00 rpc.statd
bash-2.04# ps -e | grep rpc.lockd
1091 ?         0:00 rpc.lockd

Please find the compiled issue output in attach & suggest me what need to do ???

Acclaimed Contributor

Dennis Handly · ‎12-13-2011

># /usr/bin/gcc /var/statmon/sm/lockprog.c -o lockprog

This works fine, use this one. (But you shouldn't have put the source in that directory, you might remove it.)

># /usr/bin/cc /var/statmon/sm/lockprog.c
>(Bundled) cc: error 1000: Unexpected symbol: "const".

The bundled K&R compiler can't compile it, so use the gcc version.

>I am trying to kill the daemons "kill -17"... it's not killing the daemons.

# kill -17 1091

If you want to kill them, do a normal kill, not "kill -USR2". The latter toggles logging.

Dave Olker · ‎12-13-2011

No - do NOT do a normal terminate. The SIGUSR2 is correct. It does not terminate the daemon, it merely tells the daemon to toggle debug logging on and off. We're not trying to terminate any daemons here, just capture logging.

Dave

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

NFS mount point issue

NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue

Re: NFS mount point issue