- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: NFS monitor script causing package to fail
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2006 08:50 AM
02-24-2006 08:50 AM
NFS monitor script causing package to fail
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2006 07:49 AM
02-26-2006 07:49 AM
Re: NFS monitor script causing package to fail
If you can look to see if the change between ".02" and ".03" is related to the monitoring, that might speed up resolution of your problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2006 08:23 PM
02-26-2006 08:23 PM
Re: NFS monitor script causing package to fail
On all versions of the Linux kernel I am aware of it is possible for the kernel routines used by ps to traverse the /proc filesystem which return a list of pids to occaisionally miss some and therefore to not report a process even if it is running. This is especially true on a highly dynamic system where there is a high rate of processes starting and exiting.
Therefore, unfortunately you cannot reply on ps to tell you if a process is running, even if an individual process is checked using the ps -p option.
The solution is to check for processes by looking at the proc filesystem direct. For example it is common for toolkits to use code similar to:
pid=`ps $p_pid | grep $PROC | awk '{print $1}'`
if [ -z "$pid" ]; then
or maybe
pid=`ps -p $p_pid | awk '{print $1}'`
if [ -z "$pid" ]; then
These can both fail. Instead this should be replaced by:
grep $PROC /proc/$p_pid/stat >/dev/null
if [ $? -ne 0 ]; then
and you should then not experience any trouble. By going direct to the individual process proc file entry we bypass the buggy kernel routines which cause the problem.
I believe the toolkits are supposed to be being updated to use this method rather than using ps, but I do not know if they have all been done and I do not know which version you are using.
I suggest checking for ps usage and replace this with direct proc file checking instead. Be careful how you check the proc filesystem since you could hit the same kernel defect, however, if you use code similar to that shown above you should be fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-28-2006 07:18 AM
02-28-2006 07:18 AM
Re: NFS monitor script causing package to fail
for k in /proc/*
do
if [ ! -f $k/stat ]; then
continue
fi
pid=`grep "($1)" $k/stat`
if [ ! -z "$pid" ]; then
break
fi
done
I haven't had time to check for any other differences. I'm wondering if it would be safe to just substitute this nfs.mon script (from A.01.03) to see if that corrects the problem. Unfortunately, I don't have the resources/time to set up a test NFS environment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2006 01:37 AM
03-01-2006 01:37 AM
Re: NFS monitor script causing package to fail
In NFS toolkit, monitoring of NFS daemons is done in two levels. First it checks the status of NFS services using the rpcinfo command. (Eg: rpcinfo -u 127.0.0.1 100003 2). The ps command will check the status of NFS daemons only if the rpcinfo fails. To find out the exact problem, we need to understand why rpcinfo is failing on the production machine and hence it will be good to get the output of the command "rpcinfo -u 127.0.0.1 100003 2" on the production machine.
Substituting nfs.mon script from A.01.03 to A.01.02 will not work as nfs.mon in A.01.03 makes use of hanfs.conf which is not present in A.01.02.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2006 08:18 AM
03-01-2006 08:18 AM
Re: NFS monitor script causing package to fail
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting
Since going "live" in May of 2005, we've seen this problem occur 6/15, 12/21, 2/2 and 2/24 with 3 occurrences on the primary and once on the secondary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2006 01:08 AM
03-06-2006 01:08 AM
Re: NFS monitor script causing package to fail
We have not been able to simulate the problem in our test environment given the fact that the NFS monitoring script fails once in a while and not always. So there can be some issue with the nfs server also which might be causing the package to failover.Can you please confirm us that the package failover is happening though all the nfs daemons are running? Please check the nfs log files and let us know.
I feel that it would be better if you can upgrade your toolkit to A.01.03 as it does not use ps anymore.
Thanks,
Asha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2006 11:59 AM
03-07-2006 11:59 AM
Re: NFS monitor script causing package to fail
When we migrate to new NFS servers, we can upgrade to A.01.03 (or whatever version is available) but we can't justify it now unless this problem gets much worse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2006 10:06 PM
03-07-2006 10:06 PM
Re: NFS monitor script causing package to fail
echo "217" >|/proc/sys/sunrpc/nfs_debug
echo "217" >|/proc/sys/sunrpc/nfsd_debug
After enabling, all the debugging messages of NFS will go to /var/log/messages.
NFS works on rpc mechanism. If rpcinfo command fails, then NFS will not work even if nfs daemon (nfsd) is running. So the monitor script nfs.mon works by periodically checking the status of NFS services using the rpcinfo command. If any service fails to respond, the script exits, causing a switch to an adoptive node.
The monitor script monitors NFS services including:
â ¢ portmap
â ¢ rpc.statd
â ¢ nfsd
â ¢ rpc.mountd
â ¢ rpc.rquotad
â ¢ lockd
If any of the services are dead or hangs, the nfs.mon will cause the package to fail.
So the monitoring of NFS daemons is done using rpcinfo command. â psâ command is used just for logging whether the process is dead or hung.
Can you please post your package log file so that we can investigate further?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2006 04:59 AM
03-08-2006 04:59 AM
Re: NFS monitor script causing package to fail
I've also attached the nfs package log file nfsla1.cntl.log and nfs monitor service log file hanfs.sh.log from the primary cluster server.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2006 09:22 PM
03-08-2006 09:22 PM
Re: NFS monitor script causing package to fail
echo "000" >|/proc/sys/sunrpc/nfs_debug
echo "000" >|/proc/sys/sunrpc/nfsd_debug
I am not sure of the above commands. Please check it out.
I understand from the attached log files that monitoring script failed because of the following reasons
1) On May 23, rpc.mountd process was not up and running.
2) On Jun 15, rpcinfo failed to find nfsd but nfsd process was running.
3) On Oct 17, rpcinfo failed to find mountd but rpc.mountd process was running.
4) On Dec 21, rpcinfo failed to find mountd but rpc.mountd process was running.
5) On Feb 24, rpcinfo failed to find nfsd but nfsd process was running.
Seeing the log messages, I understand that rpcinfo is failing to work sometimes. I feel you should increase your RETRY_INTERVAL and RETRY_TIMES[] for mountd and nfsd in nfs.mon.