cancel
Showing results for 
Search instead for 
Did you mean: 

NFS/RPC issue in 11.31

Rikki hinn Ogurlegi
Frequent Advisor

NFS/RPC issue in 11.31

Hello all.

 

I have a two node ServiceGuard cluster running 11.23.  The packages are mostly filesystems shared via NFS and Samba to various groups of clients.   The servers are both behind a stateful firewall (Linux/iptables).   Each group of clients also has their own subnet which are connected to the firewall.   This firewall regulates access between the groups of clients and then their access to the server's.

 

When we first installed this system we found that SMB (Samba) clients had no problems accessing the shares because each SMB instance was configured to listen only to the package IP address. However when NFS clients accessed the shares the servers replied with their fixed address.   So basicly a NFS client would send a request to the MC/SG package IP but then the reply would come from the fixed address of the host.   This is something the firewall did not like.   After some searching I found a set of patches for 11.23 that fixed my issue.  Here is a small excerpt from the readme of one of those patches:

 

-----

PHNE_37487:
        ( QX:QXCR1000573775 SR:8606468110 CR:JAGag23452 )
        In a Serviceguard environment, the NFS client uses the NFS
        package address to send a request to the NFS server. The NFS
        server replies with it's fixed IP address. The NFS client,
        which is connected through a firewall, will not receive the
        reply if the firewall uses "stateful inspection", as the
        firewall will discard the reply.

        Resolution:
        Instead of replying with the fixed IP address, the NFS
        server replies with the NFS package IP address. This fix is
        disabled by default. To enable the fix, you must have the
        following patches installed: ARPA Transport patch
        PHNE_37670, RPC Commands and Daemons patch PHNE_36981,
        libnsl patch PHNE_37488, and Lock Manager patch PHNE_37489
        (or superseding patches). Note that PHNE_37488 has a
        dependency on NIS/NIS+ patch PHNE_37490, so that patch will
        also need to be installed. After these patches are
        installed, set the value of the kernel parameter
        rpc_svc_ippktinfo_opt to 1 with the following command:

        kctune rpc_svc_ippktinfo_opt=1
-----

 

Once these patches had all been applied the cluster funtioned perfectly for many years.

 

Now I am working on a rolling update of the machines to 11.31 and I have already installed 11.31 on one of the nodes.

I then failed over one of the filesystems to the 11.31 host and noticed that NFS was broken.   A quick study with tshark on the firewall revieled that the old symptoms where back.    I was unable to locate similar patches for 11.31 but the kernel tunable seems to be there:

 

1131host# uname -a
HP-UX 1131host B.11.31 U ia64 0323217814 unlimited-user license

[root@1131host ~]# kctune rpc_svc_ippktinfo_opt=1       
       * The automatic 'backup' configuration has been updated.
       * The requested changes have been applied to the currently
         running configuration.
Tunable                          Value  Expression  Changes
rpc_svc_ippktinfo_opt  (before)      0  0           Immed
                       (now)         1  1           

But it does not have the desired effect and changes nothing in how the host responds to NFS or RPC packets.   After some googling I located ndd parameter on 11.31 called ip_strong_es_model that is supposed to change this behavior, not only for NFS/RPC but the whole IP stack.   This model is quite confusing and requires multiple default gateways and the fact that we have multiple client subnets makes this even more complicated.

 

Does anyone here know how I can get the 11.23 behavior to work or any other workaround that can fix my problem?

 

Thanks in advance,

Richard.

 

4 REPLIES
Dave Olker
HPE Pro

Re: NFS/RPC issue in 11.31

Hi Richard,

 

Which version of ONCplus are you running?  A quick look at the ONCplus release notes for the latest version (B.11.31.13) shows:

 

QXCR1000852734

NFS responds with fixed IP address and not the Serviceguard IP address.

Fixed in: B.11.31.09

 

If you're running an ONCplus version prior to B.11.31.09 I'd suggest updating to the latest version to see if that resolves the problem.  You can download the latest ONCplus versions from here:

 

https://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=ONCplus

 

Dave

 

 

 

 

Rikki hinn Ogurlegi
Frequent Advisor

Re: NFS/RPC issue in 11.31

Hello Dave.

 

Are you the same Dave who authored my favorite NFS book from HP? :)

 

Anyway, I thought I had updated most things on the machine before adding it into the cluster.

 

[root@1131host ~]# swlist  | grep -i ONCplus          
  ONCplus                               B.11.31.13     ONC+ 2.3       

[root@1131host ~]# swlist  | grep -i QPK    
  QPKAPPS                               B.11.31.1109.367a Applications Patches for HP-UX 11i v3, September 2011
  QPKBASE                               B.11.31.1109.367a Base Quality Pack Bundle for HP-UX 11i v3, September 2011

 

swverify on ONCplus gives no errors either.

 

Perhaps this fix needs to be activated somehow like it was in 11.23?

 

Regards,

Richard

Dave Olker
HPE Pro

Re: NFS/RPC issue in 11.31

Hello Richard,

 

> Are you the same Dave who authored

> my favorite NFS book from HP? :)

 

Guilty.  :)

 

> [root@1131host ~]# kctune rpc_svc_ippktinfo_opt=1
...
> But it does not have the desired effect and changes

> nothing in how the host responds to NFS or RPC packets.

 

Really?  It works on my 11.31 systems when I try it.  Here's how my 11i v3 NFS server is set up:

 

atcux12(/) -> netstat -in
IPv4:
Name      Mtu  Network         Address         Ipkts              Ierrs Opkts              Oerrs Coll
lo0      32808 127.0.0.0       127.0.0.1       33386206           0     33386189           0     0  
lan0:1    1500 192.1.1.0       192.1.1.200     4                  0     0                  0     0  
lan1      1500 15.43.208.0     15.43.209.141   81605056           0     32244146           0     0  
lan0      1500 192.1.1.0       192.1.1.12      1606270            0     38510              0     0  

I have a physical NIC lan0 with the IP address 192.1.1.12.  I create an IP alias, similar to what Serviceguard would do, called "lan0:1" with the IP address 192.1.1.200.  I leave rpc_svc_ippktinfo_opt set to the default value (0) and from my remote NFS client I issue a "showmount -e 192.1.1.200" to get the list of exported filesystems.  Here's what a packet trace shows:

 

  3.348678   192.1.1.10 -> 192.1.1.200  Portmap 98 V2 GETPORT Call MOUNT(100005) V:1 TCP
  3.349184   192.1.1.12 -> 192.1.1.10   Portmap 70 V2 GETPORT Reply (Call In 4) Port:54650

 

The RPC packet is sent from the NFS client (192.1.1.10) to the IP alias (192.1.1.200) but the reply goes out using the physical address 192.1.1.12.  Now I use kctune to enable rpc_svc_ippktinfo_opt on the NFS server:

 

atcux12(/) -> kctune rpc_svc_ippktinfo_opt=1     
       * The automatic 'backup' configuration has been updated.
       * The requested changes have been applied to the currently
         running configuration.
Tunable                          Value  Expression  Changes
rpc_svc_ippktinfo_opt  (before)      0  Default     Immed
                       (now)         1  1          

 

I try the experiment again and see:

 

  0.000000   192.1.1.10 -> 192.1.1.200  Portmap 98 V2 GETPORT Call MOUNT(100005) V:1 TCP
  0.000582  192.1.1.200 -> 192.1.1.10   Portmap 70 V2 GETPORT Reply (Call In 1) Port:54650

Now the reply comes back using the alias IP address.  Can you please try this simple experiment and let me know if you get different behavior?

 

Thanks,

 

Dave

 

Rikki hinn Ogurlegi
Frequent Advisor

Re: NFS/RPC issue in 11.31

Hi again Dave.

 

Boy, do I have egg on my face today...   I was unable to duplicate your results here on my own until I discovered that my test machine had ip_strong_es_model set to 1.   As soon as I put it back to zero it all works.

All my fault because when I first started looking for a solution, that parameter was the first (and only) thing I found that had the potential to fix the issue.   I guess I forgot to put it back to zero after experimenting with it.

 

Anyway, thanks from a very happy camper! :)

 

Richard.