StoreVirtual Storage
1748061 Members
5632 Online
108758 Solutions
New Discussion

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

 
Frankiboy
Advisor

Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

I have a pair of p4300, been running for 2 years with no problem.

2 months ago i installed a pair of storevirtual 4330 8TB. Been working fine for 2 months but today all my virtual machines had gone down in our datacenter. Took a look on the san, and surpringsly one of the node was in inoperable state. The other noed was fine, but this made our 2-node hyper-v 2012 have io_timeout which caused all the vm to shut down because of loosing contact with the san.

After reboot the inoperable node from ilo it came up fine but with the errors that the write cache was disabled and charging.

 

1. is this a known bug and is there a fix for this?

2. Why would this timeout? isnt the point of investing in two network raid nodes to be that no matter what happens to the first node the customer wont notice? Instead all our vm went down..

8 REPLIES 8
Bart_Heungens
Honored Contributor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

Check the ILO IML and other status pages for the hardware  health...

 

If 1 node is down for some kind of reason, the SAN volumes should stay online if you have quorum (majority). Do you have the FOM installed and added in the Management Group? If so, your volumes should have stayed online...

If you don't have a FOM, it is normal that I/O stops if 1 node of 2 is not reachable...

 

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
a_o
Valued Contributor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

Are you running an FOM or are you running just a Virtual Manager on one of the nodes?

Are your volumes/LUNs setup to be fault tolerant?

What NR are your volumes setup with?

 

I see that Bart got in just before me.

oikjn
Honored Contributor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

I"m suprised neither of you two mentioned MPIO and DSM installed/configured correctly.  That could easily lead to problem with LUN access during a failover.

 

BTW, I have seen problems where a cluster node has a problem that kills its performance, but not enough to take it offline which then grinds the cluster to a hault.  I would love it if HP actually had a configurable limit where if a node's performance suddenly goes out of spec that it is automatically deemed offline.

Bart_Heungens
Honored Contributor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

If MPIO is not correctly configured (or not active at all) you should see the volumes double or sometimes even four times for a single volume, so I assume that that one is OK... If a volume goes down, always check first the number of managers... There should be 3 or 5 following the best practices of HP...

I agree with MPIO and DSM but for me it is second or third place...

 

At the end it could be also a network problem where 1 leg of the network (if it is build in a redundant way) that is not functional... If you want a complete answer you will need more information to troubleshoot anyway...

 

Kr,

Bart

--------------------------------------------------------------------------------
If my post was useful, clik on my KUDOS! "White Star" !
Frankiboy
Advisor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

yes, i run a fom and the manager on the node is running, so total 5. Or else I would get a best practice alarm from cmc.

 

i think perhaps I can be facing a network problem that the switch is not been able to switch the path perhaps.

 

no dsm is installed, but mpio is enabled and setup with two path (2 iscsi interfaces).

Frankiboy
Advisor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

the capacitor is stilll charging, chould i wait for appx. 48 hours before i open a case?

oikjn
Honored Contributor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

capacitor charging?  Those should charge almost instantly... not the same as batteries.

 

Why no DSM for MPIO?  Its definitely better than the standard microsoft one with HP and microsoft both recommending it.

 

If the disks timed out, but did actually come back, I believe there are some tuning settings you need to adjust for disk timeouts w/ iSCSI so the disk doesn't fail before any iSCSI failover completes.  I used to have a link for them, but I can't find it now.  I just did a quick search and found http://blogs.msdn.com/b/san/archive/2011/12/02/updated-guidance-on-microsoft-mpio-settings.aspx and http://itinfras.blogspot.com/2010/05/what-is-mpio-and-best-practices-of-mpio.html which should help.

 

You definitely could have a switching issue, but my guess would be something in the config of your iSCSI initiators.

Frankiboy
Advisor

Re: Storevirtual 4330 - disabled cache on one node make whole cluster inaccessible

i will open a support case then, perhaps the cache is faulty on this unit.

 

Will update this post on what HP finds out.