Operating System - HP-UX
1834578 Members
3161 Online
110069 Solutions
New Discussion

HP hungs and solaris/linux work NFS

 
Niraj Kumar Verma
Trusted Contributor

HP hungs and solaris/linux work NFS

A brief summary of the problem(s):-

On Monday 9th May Systemadmin reported problems with his Hpux
systems. No users could login into the systems.

On investigation it appeared that the automounter was crashing. It could be restarted
but would crash again two minutes later. All issue pointed to a name resolution problem.

Initially we attempted remote access via telnet. This was very difficult because the
sessions were continually hanging and we had to re-establish connections.

Initially we thought it was an rpc problem. However rcpinfo responded as expected. We
then started to use Trusc to get an idea of what was happening at the lower levels. There
were no clear indicators.

Late Monday evening systemadmin rebooted the whole Unix Network.

Early Tuesday morning the problem was still there. After getting advise from a number
of sources further tests were carried out. A conflicting ip address was found on one machine,
a mismatch between hosts and NIs. This was corrected but did not resolve all the problems.
It was also noticed that nslookup was hanging. Again name resolution.

------------ some more test ----

If my home account is on a local disk I can login as a normal user
and everything is fine. I can then cd to any automounted directory.
Everything seems to work.

If my home directory is automounted my login process hangs.
We have used the following to get some debug:-

/usr/local/bin/tusc su - niraj

The output is attached below. The login process hangs where I have highlighted
in red. Waiting for a file lock. If we control "c" the login process completes.

As far as we can tell the locking daemons on the netapp are running fine.

This problem is happening in a mixed solaris, linux, hpux environment.
Solaris and linux are ok its only on the Hpux boxes we see the problem.
This applies to boxes running 10.20, 11.00 and 11.11 (.Nis master
is on a solaris box.


Quite a number of responses. We discovered that the file lock was on the HISTFILE.
The following line was added to /etc/profile:-

typeset -r HISTFILE=/tmp/......

This allowed us to log straight in.

We then tried cadenv. This hung untill we moved .cadhist to /tmp and created a symlink
in the cadbin directory.

So we have a lock problem.

We also noticed that nslookup hung. when we used truss we could see that it was hanging on
ypwhich. when we tryed ypwhich on the command line this also haung but timed out with
the message:-

sgpic12 is not running portmapper.

Rpc.bind was in the process table.

we tried:-

rpcinfo -T udp sgpic12

this gave

cant contact rpc.bind

we restarted all the rpc stuff and everything works for the first few minutes
and then dies again. Something is killing the portmapper.

We sent a -17 signal to lockd which switches on debugging.

It has errors :-

Max Addr get retries reached: deleting host =>> netapp002 ( Network appliances filer)

which is one of the main file servers.

This is where we are now.

Any help ??

Thanks & Regards
- Niraj Verma
Niraj.Verma@philips.com
5 REPLIES 5
harry d brown jr
Honored Contributor

Re: HP hungs and solaris/linux work NFS



WHAT CHANGED?

Something had to change to cause this to fail.

Did someone update the Master NIS servers? Was a server added to the database?

A bunch of HP-ux boxes just don't quit behaving correctly unless something was modified. I suspect there was something changed on the Solaris NIS Master.

live free or die
harry d brown jr
Live Free or Die
Niraj Kumar Verma
Trusted Contributor

Re: HP hungs and solaris/linux work NFS

This started after the followin modification

-- 1 hpux NIS client added
-- 5 Linux client added
-- NIS hosts file updated
Niraj.Verma@philips.com
harry d brown jr
Honored Contributor

Re: HP hungs and solaris/linux work NFS


After detecting and correcting the duplicate IP's on your network, did you shutdown and reboot your UX boxes?

live free or die
harry d brown jr
Live Free or Die
harry d brown jr
Honored Contributor

Re: HP hungs and solaris/linux work NFS

How many servers are now using the netapp box?

Were a lot of new users also addedd to the NIS master?

live free or die
harry d brown jr
Live Free or Die
Niraj Kumar Verma
Trusted Contributor

Re: HP hungs and solaris/linux work NFS

Yes !! we do rebooted all the servers !! including filers.

There are around 100 users using it .

only 2-3 new users were added to NIS ..

-Niraj
Niraj.Verma@philips.com