Array Setup and Networking
1819681 Members
3721 Online
109605 Solutions
New Discussion юеВ

Issue with Peer Persistence & ASO

 
J-Philippe
Valued Contributor

Issue with Peer Persistence & ASO

Hi to the Nimble Community, I'm facing an issue that is driving me crazy (for sure I'v opened a case to Nimble support but I want to find a solution asap)

Here it is.

2 Nimbles Arrays with Peer Persistence & ASO in a 2 nodes vSphere 7 cluster
ESXi ok with NCM HPE-Storage-Connection-Manager-for-VMware-7.0-7.0.2-700014
2 FC Fabrics Zoning
ESXi from both IT ROOM have access to all Volumes.
Remote Recovery point status are "Now"
All path are OK in vSphere and Nimble
Witness can reach both arrays

VM can move between ESXi A & ESXi B


Nominal Mode is the following

IT ROOM A
Nimble A
ESXi A
Volume A1 (Upstream)
Volume A2 (Upstream)
Volume B1 (Downstream)
Volume B2 (Downstream)

IT ROOM B
Nimble B
ESXi B
Volume B1 (Upstream)
Volume B2 (Upstream)
Volume A1 (Downstream)
Volume A2 (Downstream)

IT ROOM C
Witness C


Test 1 :

Unplug from power Nimble A => Ok the volumes / datastores are still available and VM on both ESXi and any volumes / datastores are working.
From VMWare except some dead path all is running fine.
The volumes on Nimble B are :

Change from :
Volume A1 (Downstream) to Volume A1 (Upstream) => Expected behaviour
Volume A2 (Downstream) to Volume A2 (Upstream) => Expected behaviour
No change on Volume B1 (Upstream) => Expected behaviour
No change on Volume B2 (Upstream) => Expected behaviour

Remote Recovery Point change from "Now" to "date_hour_of_the_stop" => Expected behaviour


Reconnect to power the Nimble A ==> waiting for the replication to synchronise and once all is up to date doing the handover on Volumes A1 and A2 to return to Nominal Mode.


A few minutes after all is "back to green"

Test 2 :
Unplug from power Nimble B => Ok the volumes / datastores are still available and VM are NOT working on Volume / Datastore Volume B1 & B2
(VM Ping OK) but VM is KO (Console and vMotion are not working) No HA event from vSphere as the VM is still answering the ping.
Try to Rescan FC adapter => still running and nothing happen.
VM on Volumes / Datastores A1 & A2 are working fine.

From the Nimble A the volumes are OK :

The volumes on Nimble A are :

Change from :
Volume B1 (Downstream) to Volume B1 (Upstream) => Expected behaviour
Volume B2 (Downstream) to Volume B2 (Upstream) => Expected behaviour
No change on Volume A1 (Upstream) => Expected behaviour
No change on Volume A2 (Upstream) => Expected behaviour

Remote Recovery Point change from "Now" to "date_hour_of_the_stop" => Expected behaviour

Reconnect to power the Nimble B ==> waiting for the replication to synchronise and once all is up to date doing the handover on Volumes B1 and B2 to retour to Nominal Mode.
All VM are working.

VM on Volumes A1 & A2 run on ESXi A

VM on Volumes B1 & B2 run on ESXi B

Test are done this way to do not restart the VM.

This issue is repoductible in both directions A to B / B to A.

If any of you have an idea or if I missed something let me know.

Best Regards,

Jean-Philippe

 

 

 

 

3 REPLIES 3
Mahesh202
HPE Pro

Re: Issue with Peer Persistence & ASO

Hi Jean-Philippe

On Test 2. The datastores are still available but the VMs are not up and running which means there is no break on the storage layer.
Have you installed NCM on the ESXI Host?
Before we conclude on the Storage layer we need to ensure connectivity between Nimble and ESXI Host.

Please have a look at the Fibre channel Zoning and verify if any ports are flapping.


Regards
Mahesh



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
J-Philippe
Valued Contributor

Re: Issue with Peer Persistence & ASO

HI Mahesh,

NCM have been installed on both ESXi hosts. Going to review the zoning but i should be fine as ESXi have exactly the same number of path.

PSP is NIMBLE_PSP_DIRECTED

Each Datastore have 8 Paths (2 Active & 6 Standby)

Having some errors on some san switch ports Link Failure & Loss of sync on Nimble Ctrl and ESXi but I don't know if this kind of errors are "normal" No CRC errors on concering ports.

Mahesh202
HPE Pro

Re: Issue with Peer Persistence & ASO

Hi Jean-Philippe

Please Check for the Port flapping on the Switches, also check if there are any errors on FC HBA's from the Nimble as well.
For VM pinging not responding: We need to check the logs to get a better understanding hence, I would suggest engaging HPE Support for log analysis.

If the console cannot be opened or reports an error, then the virtual machine might be experiencing an issue, or network connectivity to the host might be interrupted.

Regards
Mahesh.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo