- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Lost in Las NFS
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2004 10:42 AM
03-19-2004 10:42 AM
Lost in Las NFS
after several hours I tracked down a weird problem, but I can't find a final solution.
- Several Tru64-systems (5.1a) using NFS-drives for the home accounts (via NIS)
- Suddenly (after a change in the internal domain name, this is my actual guess as reason) the login "hangs".
- This only happens to users having the ksh as login shell
I browsed to all related articles in this forum and did lots of searching in google. Any hints I found I tested:
- showmount
- rpcinfo
- netstat -ai
- netstat -rn
Finally I checked the login process with ps from a privileged account -> the process is hanging "lockcntl". using lockcntl as a keyword in google gives only a few hits, most of them are related to NFS problems.
So I generated some more local users, having their home at several (different systems (in OS and Version) NFS shares. Result: The login hangs.
Using a local attached drive: no problem.
Have you got any hint what to check next? Or has someone of you got the magic rabbit?
Any help is greatly appreciated.
With kind regards
Andreas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2004 09:10 PM
03-19-2004 09:10 PM
Re: Lost in Las NFS
First, this occurs when the ksh wants to do a flock on your history file. As the bourne shell doesn't have an history file, it will not happen there.
You should check if rpc.lockd and rpc.statd are running on the NFS server.
You could also post the output of the command, executed on the NFS server.
# rpcinfo -p
Joris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2004 09:22 PM
03-19-2004 09:22 PM
Re: Lost in Las NFS
thanks for the quick response. rpcinfo -p gives me for all three nfs servers correct answers.
the servers are
- NAN01 (a network appliance alpha based system
- NAN03 (a network appliance nfs server)
- decbb11 (a Tru64 5.1a NFS-Server on an ES40)
The weird thing: They all function without any problem, but obviously the ksh has got a problem.
I think, I'll try to set up TCPDUMP on the test system and analyze the ip traffic. Any things to look for? Or is there any other way to analyze the process hanging in the "lockctl" state?
With kind regards
Andreas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2004 11:13 PM
03-19-2004 11:13 PM
Re: Lost in Las NFS
Andreas,
When you do a "mount -l -t nfs", do you see only the hostname of the servers, or their full qualified name ? If fully qualified, is it the old or the new domainname ?
Are the NFS-servers in /etc/hosts mapped to the new domain ?
Is /etc/resolv.conf correctly configured for
transition from the old to new domainname ?
JB.
_JB_
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-20-2004 02:24 AM
03-20-2004 02:24 AM
Re: Lost in Las NFS
this anwser (mount -l -t nfs) I'll test monday.
I'm quite sure that all the hazzle came due to the untested domain name change. Yes, the site has mostly only unqualified host names.
Maybe I'll see more on monday. Have you got some more hints? The file system IS mounted, it can be accessed without any problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2004 07:21 PM
03-21-2004 07:21 PM
Re: Lost in Las NFS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 06:20 AM
03-22-2004 06:20 AM
Re: Lost in Las NFS
did some more analyzing (thanks for the hints so far).
I'm now able to reproduce the problem on any client on-site.
Condition:
- User account is using a NFS share as home directory
- the KSH is used
Any other thing doesn't care.
The box I used for some more testing is a 5.1B system with the ECO1 patch kit installed.
I installed and configured TCPDUMP for some more testing.
Here is some output:
decn02.b600de6a > nan01.nfs-v3: 116 call getattr fh 1563813.121416221.32.2782664448
nan01.b600de6a > decn02.nfs-v3: 112 reply getattr {dir size 94208 mtime 1079975652.929602000 ctime 1079975652.929602000}
decn02.b700de6a > nan01.nfs-v3: 120 call access fh 1563813.121416221.32.2782664448 want: lookup
nan01.b700de6a > decn02.nfs-v3: 120 reply access {dir size 94208 mtime 1079975652.929602000 ctime 1079975652.929602000} permitted: l
ookup
decn02.b800de6a > nan01.nfs-v3: 120 call access fh 1563813.121416221.32.2782664448 want: lookup
nan01.b800de6a > decn02.nfs-v3: 120 reply access {dir size 94208 mtime 1079975652.929602000 ctime 1079975652.929602000} permitted: l
ookup
decn02.b900de6a > nan01.nfs-v3: 128 call lookup { fh 1563813.121416221.32.2782664448 ".profile"}
nan01.b900de6a > decn02.nfs-v3: 116 reply failed, status No such file or directory: lookup dir {dir size 94208 mtime 1079975652.9296
02000 ctime 1079975652.929602000}
(This happens during login on the share, there is no .profile on the directory, but this isn't the problem).
I found another interesting message exchange:
decn02.65982e02 > nan01.pmap-v2: 56 call getport prog "nlm" V4 prot UDP port 0 (DF)
nan01.65982e02 > decn02.pmap-v2: 28 reply getport 690
decn02.66982e02 > nan01.nlm-v4: 164 call lock_msg cookie 0x84 noblock,excl lock: {"decn02.xxx.de" svid 396160 l_offset 0 l_len 0}
not-reclaim state 0 (DF)
I translate this into:
- DECN02 ask NAN01 for a port for program nlm (NFS Lockmanager)
- NAN01 answers and offers port 690
- DECN02 sets a cookie, but gets no answer.
I asked the guys to check those things:
1) Does NAN01 a correct resolution of the hostname?
2) Is there a firewall between these two systems?
Have you got another hint?
Thanks in advance.
The result of rpcinfo -p decn02 and rpcinfo -p nan01 looks okay, both system are running the lockmanager. But I'm quite sure, this isn't a problem in the box, because the systems are running for months without a problem.
Regards
Andreas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 07:37 PM
03-22-2004 07:37 PM
Re: Lost in Las NFS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 09:23 PM
03-22-2004 09:23 PM
Re: Lost in Las NFS
I've found a very old article at
http://www.faqs.org/faqs/sgi/faq/apps/section-26.html
from 1995.
Date: 15 Oct 1995 00:00:01 EST
ksh(1) uses a single ~/.sh_history file for all of a given user's ksh
processes, so must be able to lock that file. Locking is robust for
local files but not over NFS. Install patch 547 (or its successor) to
fix some known NFS bugs and be sure lockd is 'chkconfig'ed on and
rpc.lockd and rpc.statd are actually running. If all else fails, set
the HISTFILE environment variable to a file on a local disk.
We'll try the HISTFILE workaround and in parallel do an analysis of the locking behavior.
To check the correct behaviour I'll look into the daemon.log files of the NFS server, compare the rpcinfo output of client and server, correct? Any other logfiles to check?
Regards
Andreas
Relevant Cases I've found are:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=296907
Will this work, too?
http://www4.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&admit=-938907319+1080037159406+28353475&docId=200000063201877
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 10:14 PM
03-22-2004 10:14 PM
Re: Lost in Las NFS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-24-2004 07:07 AM
03-24-2004 07:07 AM
Re: Lost in Las NFS
Very important for the lock manager is the correct resolving in both directions - name resolution and reverse lookup. In this case the reverse lookup for the ip address of the nfs server gave another name. that easy.
Thanks a lot for all the hints.
Regards
Andreas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-24-2004 07:25 PM
03-24-2004 07:25 PM
Re: Lost in Las NFS
checking locking means also to have a look into the lock directory checking the names stored as files (in the sm directory). If there are unqualified hostnames in it you have the reason for malfunction.
During check you will also find any nameresolution problems because it is part of the checks.....