StoreVirtual Storage
1752701 Members
6581 Online
108789 Solutions
New Discussion юеВ

VMware ESX what happens if a NSM node restarts

 
ChrisEgan
New Member

VMware ESX what happens if a NSM node restarts

Hello, we have a number of ESX servers with no spare storage space on them but lots of processing power, we want to use the advanced features like smart clones on our LeftHand Networks SAN to rapidly deploy new servers as needed.

My question is that since each ESX server creates a single connection to one NSM node to access a volume what happens if that node fails, would the virtual machine stop and need to be restarted by HA or would it just pause while ESX connects to another node?

We run terminal servers and web servers and database servers so having them just die while in use would be a big issue.

Thanks
7 REPLIES 7
Gauche
Trusted Contributor

Re: VMware ESX what happens if a NSM node restarts

If a node fails the iSCSI sessions going to that node just reconnect to another node in the cluster. You don't have to do anything to your VMs or ESX servers.
Adam C, LeftHand Product Manger
ChrisEgan
New Member

Re: VMware ESX what happens if a NSM node restarts

Thanks for the reply, so the virtual machine running from that connection will just pause while ESX reconnects to the other node?
Gauche
Trusted Contributor

Re: VMware ESX what happens if a NSM node restarts

Not even, the VM does not need to "pause". Disk IO will pause for a second while iSCSI sessions reconnect, but the VM is live.
Adam C, LeftHand Product Manger
OH003
New Member

Re: VMware ESX what happens if a NSM node restarts

I've been searching for an answer to this as well. last week we had a node failure in one of our sites. Site contains 2 ESX (3.5 U4) hosts and 3 NSM2120 SAS nodes (all in the same cluster). When the SAS failed some of the VM's also failed. We had to restart them from vCenter manually to bring them back. By failed, what I mean is there is no ping reponse, opening the console from vCenter shows a black screen.

Is there any config parameter on the ESX side we should look at? Some sort of time out we can increase perhaps??

Thanks
Damon Rapp
Advisor

Re: VMware ESX what happens if a NSM node restarts

It depends on how picky your cluster is.

I have one LH cluster that you can lose a NSM node and it just reconnects to one of the other nodes in the cluster going on it's happy way. Assuming 2 way replication.

Then my other LH cluster, you lose a node and all hell breaks lose. Any volumes served off that node are not able to be seen by the ESX nodes no matter how many rescan or reboots you do in ESX. I have even deleted all the iscsi config files manually in ESX to try to resolve the issue. I also tried shutting down my entire second LH cluster. Eventually the LH cluster seems to fix things but it is many hours later.

So it depends on how your cluster reacts. It is supposed to work like my first cluster.

BTW, both clusters are running SANIQ7SP1 and the ESX is 3i fully patched.

Tyler Modell
New Member

Re: VMware ESX what happens if a NSM node restarts

Adding to Damon's thoughts...

I am fortunate enough to have our setup work the same way (I.e. - Node failure has np/minimal impact on the VM) except... with our Solaris based VMs. All of our Windows VMs continue working as if it was nothing more than a hiccup, the Solaris VMs will exhibit the black screen in the console and require a reboot. I haven't yet figured out a way to resolve this other than to sVMotion the VM to a datastore that is not on the node that will be taken offline (assuming regular maint not random failure).

I reccomend ensuring you have the back end network setup correctly to avoid the issues others have seen. I've found that having it Gig-E, seperated w/ Jumbo frames ensures not only best performance during regular operating periods, but also great response during node failure/maintenance.

Tyler

Damon Rapp
Advisor

Re: VMware ESX what happens if a NSM node restarts

Did want to update my previous reply.

When we updated our clusters from San/IQ 7SP1 to 8, it resolved our issues with the cluster that would fail when it lost a node. Now both cluster successfully (and correctly) don't skip a beat when they lose a node.

Thanks,

Damon