StoreVirtual Storage
cancel
Showing results for 
Search instead for 
Did you mean: 

Strange NFS quorem witness connectivity/permission failure

 
darienn
Occasional Contributor

Strange NFS quorem witness connectivity/permission failure

Hi there,

We have used NFS quorem witness for our 2-node VSA (12.6.00.0155.0) on VMware vSphere 6.0 platform fine for many years until recently where the CMC reported NFS quorem witness failure as follows, yet we made sure the NFS share was fine and online:

Event: E0000040A EID_QUORUM_CONFIG_STATUS_LOCK_FILE_UNREACHABLE
Severity: Critical
Component: SAN/iQ
Object Type: Management Group
IP/Hostname: VSA2
Message: Quorum Witness shared lock file is not accessible. Possible causes include loss of network connectivity or missing lock file. Further errors could cause IO to stop.

As a troubleshooting measure, we tried removing the NFS quorem witness configuration from CMC and re-adding it. During re-adding, CMC reported the NFS share is unavailable and asked us to check connectivity or permission (full error message as follows).

Failed to connect to the host forthe Quorum Witness while storing configuration.
Please re-conf‌igure the Quorum Witness again.
Conf‌iguration failed, please check server connectivity or permissions.

Due to the lack of a NFS quorem witness or FOM, the below error message is also shown on CMC:

Event: EID_QUORUM_CONFIG_STATUS_MISSING_FOM_QW E00000409
Severity: Critical
Component: SAN/iQ
Object Type: Management Group
IP/Hostname: VSA2
Message: The management group 'Group_Name_Hidden' requires a Failover Manager (FOM) or Quorum Witness (QW). A management group with only 2 storage systems requires 3 regular managers. Add another storage system to management group 'Group_Name_Hidden' and ensure '3' regular managers are started, or add a FOM or QW to management group 'Group_Name_Hidden'.

This condition is unreasonable as we are sure the NFS share is fine as we can mount it from another Linux or Windows workstation under the same subnet (i.e. firewall is not the cause).

Additional things we verified:

- We tried rebooting the NFS server (CentOS 6.4) but have not tried rebooting the VSA nodes.

- NFS permissions are correct (readable and writable to VSA1 and VSA2) and the NFS server already uses the no_root_squash option in /etc/exports.

- We tried mapping the NFS share from the command line (CLIQ) on VSA1 and VSA2 and the error message is exactly the same as doing it from CMC GUI

- We tried the 'ping' CLIQ command utility towards the NFS server on both VSA1 and VSA2 and the results are successful

- We tried the 'nmap' CLIQ command utility towards the NFS server on both VSA1 and VSA2 and the NFS ports (111 and 2049) are correctly reported.

- We tried the 'netstat -tulpen' CLIQ command utility on VSA1 and VSA2 and the NFS server, despite already being unconfigured, indicates an ESTABLISHED state (even after the NFS server was rebooted)

- We tried the 'service --status-all' CLIQ command utility on VSA1 and VSA2 and it reported NFS mountpoints /mnt/tnqw are active (even after the NFS server was rebooted)

We can no longer add the NFS quorem witness. Is adding a FOM (Fail-Over Manager) temporarily and rebooting the VSA nodes the only solution in order for us to connect to the NFS again?

Any suggestion would be much appreciated. 

2 REPLIES
Highlighted
BenLoveday
Frequent Advisor

Re: Strange NFS quorem witness connectivity/permission failure

We had the same issues with the NFS witness shares. In a lot of cases we went back to a full FOM which isn't always possible. Check the latest LHOS patches, there was a recent patch related to NFS witnesses. Worst case you might need to stand up a temporary FOM to patch before you can use the NFS witness again.

Good luck!

darienn
Occasional Contributor

Re: Strange NFS quorem witness connectivity/permission failure

Thank you for the information! May I ask which of the below patches are you referring to, as I don't seem to notice NFS in them? 

https://support.hpe.com/hpesc/public/home/driverHome?pmasr=1&sp4ts.oid=5442251

Actually (perhaps we are lucky), this is only the first time it occurred since our implementation in 2016. I guess for our case it would be an overkill to solve this issue that may not recur (or rarely occur) by upgrading the whole system from 12.6 (our version) to 12.7, which I heard may introduce some alerts during nightly backup. For patching within 12.6, it may be more convincible, IMHO.

On the other hand, could there be a workaround that does not require patching, or even not requiring a reboot perhaps? :)

In case the patch does not apply to the specific NFS issue, as I mentioned this is the first occurrence since 2-3 years, for this time, we may also consider a simpler approach – rebooting the VSAs one by one (with a tempoary FOM to handle the automatic failover) and hopefully after reboot, NFS will be usable and this won't occur a second time. (Unfortunately, permanently having the FOM is unsuitable for us.)

Any suggestion would be much appreciated.