Re: rpc.lockd problems related to package control script intervention ?

Harald Coeleveld · ‎01-23-2003

Just thought I'd share this with you all;

We at Philips Semiconductors are facing an incident where the NFS package control scripts logs the following actions:

########### Node "XXXXX": Halting package at Thu Jan 23 08:02:54 MET 2003 ###########
Jan 23 08:02:54 - Node "XXXXX": Remove IP address XXXXXX from subnet XXXXXX
Jan 23 08:02:54 - Node "XXXXX": Unexporting filesystem on /mnt/vg10_lvol1
Jan 23 08:02:54 - Node "XXXXX": Unexporting filesystem on /mnt/vg10_lvol2
Jan 23 08:02:54 - Node "XXXXX": Unexporting filesystem on /mnt/vg10_lvol3
Jan 23 08:02:54 - Node "XXXXX": Halting NFS service XXXXX.monitor
killing rpc.lockd pid = 857
killing rpc.statd pid = 851
Jan 23 08:02:55 - Node "XXXXX": Restarting rpc.statd
Jan 23 08:02:56 - Node "XXXXX": Restarting rpc.lockd
Jan 23 08:02:56 - Node "XXXXX": Unmounting filesystem on /dev/vg10/lvol3
WARNING: Running fuser to remove anyone using the file system directly.
/dev/vg10/lvol3:

Jan 23 08:02:59 - Node "XXXXX": Unmounting filesystem on /dev/vg10/lvol2
WARNING: Running fuser to remove anyone using the file system directly.
/dev/vg10/lvol2:

Jan 23 08:03:01 - Node "XXXXX": Unmounting filesystem on /dev/vg10/lvol1
WARNING: Running fuser to remove anyone using the file system directly.
/dev/vg10/lvol1:

Jan 23 08:03:03 - Node "XXXXX": Deactivating volume group vg10
Deactivated volume group in Exclusive Mode.
Volume group "vg10" has been successfully changed.

########### Node "XXXXX": Package halt completed at Thu Jan 23 08:03:07 MET 2003 #######
####

System log file:
Jan 23 08:02:54 XXXXX CM-CMD[16853]: cmhaltpkg XXXXX
Jan 23 08:02:54 XXXXX cmcld: Request from node XXXXX to halt package XXXXX.
Jan 23 08:02:54 XXXXX cmcld: Executing '/etc/cmcluster/XXXXX/XXXXX.cntl stop' for package XXXXX, as service PKG*34314.
Jan 23 08:02:54 XXXXX CM-XXXXX[16865]: cmmodnet -r -i 161.85.253.118 161.85.253.0
Jan 23 08:02:54 XXXXX CM-XXXXX[16883]: cmhaltserv XXXXX.monitor
Jan 23 08:03:07 XXXXX LVM[16995]: vgchange -a n vg10
Jan 23 08:03:09 XXXXX cmcld: Service PKG*34314 terminated due to an exit(0).
Jan 23 08:03:09 XXXXX cmcld: Halted package XXXXX on node XXXXX.

- Note: This package switch was intentional(!)

- We noticed the rpc.lockd not running on one of the cluster nodes, which caused serious problems (login delays before receiving user prompt appears, applications requiring file locks hang, etc.).

- User login may take upto about 30 minutes before a prompt appears. Investigation by HP has resulted in a client patch to reduce this delay to 10 - 20 seconds.

- After restarting the rpc.lockd manually things look ok.

- We occassionally encountered the rpc.lockd process not running on this HP-UX 11.0 cluster.
It was said that this occurs even without any package switching.

- System patch levels are monitored/configured precise.

- This has been logged with HP for some time already. HP continues to investigate. (call number is not to be published on a forum I think, for now; you may contact me personally, if required.)

Anyone else encountered this behaviour ?

Kind regards,

Harald Coeleveld
UNIX sysadmin
Philips Semiconductors Nijmegen (Netherlands)

U.SivaKumar_2 · ‎01-23-2003

Hi,

Stop rpc.lockd

#cp /dev/null /var/adm/rpc.lockd.log

Start again

Do this on all the nodes of the cluster

See whether the problem comes.

I also recommend restarting rpcd daemon along with rpc.statd and rpc.lockd when switching over.

Also try increasing grace period of rpc.lockd by -g option and see the effects.

regards,
U.SivaKumar

Innovations are made when conventions are broken

U.SivaKumar_2 · ‎01-23-2003

Before doing anything , analyse /var/adm/rpc.lockd.log for any error message concerned to this problem.

regards,
U.SivaKumar

Innovations are made when conventions are broken

Harald Coeleveld · ‎01-24-2003

rpc.lockd logging is default disabled due to excessive amount of logging.

When temporary logging is enabled, the following message then floods the log file:

"/usr/sbin/rpc.lockd: fcntl (local_lock) : errno = 70!"

Any ideas ?

Wayne Green · ‎01-24-2003

Had problems with an MCSG package nfs mounting users home directory area. Initally couldn't unmount the nfs mount on the package node until I reset rpc.lockd and rpc.statd. I missed this function in the nfs toolkit initially.

Now the package stops and starts OK but login on the other nodes sharing the nfs hangs. Didn't have the patience to wait for 30 mins a couple was too long for me.

Haven't been able to get to grips with exactly what is happening so if anyone can explain it would be useful but to rectify this I first tried running nfs.server and nfs.client stop / starts which is obviously all the nfs daemons. Always get a complaint about unmounting the nfs share but after re-exporting the nfs mount again on the package node this usually works.

More recently I have been recycling rpc.statd and rpc.lockd and waiting for the locks to be re-established. This worked well in the last test. Maybe worth a try.

I'll have a beer, thanks

Wayne Green · ‎01-24-2003

Had problems with an MCSG package nfs mounting users home directory area. Initally couldn't unmount the nfs mount on the package node until I reset rpc.lockd and rpc.statd. I missed this function in the nfs toolkit initially.

Now the package stops and starts OK but login on the other nodes sharing the nfs hangs. Didn't have the patience to wait for 30 mins a couple was too long for me.

Haven't been able to get to grips with exactly what is happening so if anyone can explain it would be useful but to rectify this I first tried running nfs.server and nfs.client stop / starts which is obviously all the nfs daemons. Always get a complaint about unmounting the nfs share but after re-exporting the nfs mount again on the package node this usually works.

More recently I have been recycling rpc.statd and rpc.lockd on the client nodes and waiting for the locks to be re-established. This worked well in the last test. Maybe worth a try.

I'll have a beer, thanks

U.SivaKumar_2 · ‎01-24-2003

Hi,

errno: 70 indicates a Stale file Handle.

regards,

U.SivaKumar

Innovations are made when conventions are broken

U.SivaKumar_2 · ‎01-24-2003

Hi,

Are the user's home directories NFS mounted ?

If yes , change the .sh_history path to the local disk of the user's machine. As the shell
have to issue a lock on .sh_history it will cause problems in NFS environment.

regards,

U.SivaKumar

Innovations are made when conventions are broken

Dietmar Konermann · ‎01-24-2003

Harald,
since you have a call open with HP, I assume that the latest NFS patch (PHNE_27217 for 11.00) is installed on your system? It solved some serious lockd/statd issues.

Best regards...
Dietmar.

"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)

Harald Coeleveld · ‎01-24-2003

User mounted homedir ".sh_history" files have already been rerouted to "/tmp" of the login server(s) via symbolic links.
(we use thin-client systems that connect to a login server).

However, we have additional software applications that use a similar log file mechanism. Rerouting all to /tmp is not possible / desirable.

Patch PHNE_27217 for HP-UX 11.00 seems fairly new and has not (yet) been taken into account.
I will investigate...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: rpc.lockd problems related to package control script intervention ?

rpc.lockd problems related to package control script intervention ?