HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

Manager and Failover Manager Issue

L1nklight
Valued Contributor

Manager and Failover Manager Issue

I've got a 2 node 4500G1 cluster. Both nodes are running managers and I have a FOM running on an ESX server. I recently re-IP'd the nodes. I can see all resources online and I can actively ping them (the FOM and both 4500G1) but the FOM and one of the nodes is stating that their manager is offline.

 

I've tried several things like rebooting all the resources, but I can't seem to get the managers online on the failover manager and the one 4500 node. The other 4500 node is reporting it's manager as online.  Again, all resources can be pinged and they all appear to be online. 

 

Anyone seen this behavior before?

 

 

P.S. This thread has been moved from HP 3PAR StoreServ Storage to HP StoreVirtual Storage / LeftHand. - Hp Forum Moderator

4 REPLIES
Emilo
Trusted Contributor

Re: Manager and Failover Manager Issue

This sounds like a network issue.

Have you tired re-scaning the managers?

Also can all the nodes ping each other?

 

L1nklight
Valued Contributor

Re: Manager and Failover Manager Issue

Yeah, its a deep issue. I got off the phone with LHN and basically the VIP is locked on a node right now and it's down. The quorom therefore is down. Rebooting the node with the failed VIP does not result in the VIP being brought up on the second node. It's a huge mess right now. So basically the FOM can be pinged, node 1 can be pinged, and node 2 can be pinged. The VIP can't be pinged and it's located on node 2. Node 2 is stating that the manager is down and offline. 

David_Tocker
Regular Advisor

Re: Manager and Failover Manager Issue

Did you get anywhere with this?

 

When we got our first set of nodes, I did a lot of experimentation - tried to burn the units down in anyway I could before I put them into production.

Found out about FOMs the hard way while testing, had a loss of quorum issue and actually had to reload one of the units after a hard evict to repair.

 

We are based in NZ and apparently if you are not in the US timetable for support, you can't have it, which is total crap and HP should hang their heads in shame over that one. What a sorry state of affairs (you are okay if you have a 3par) So I had to figure it all out byself via the pretty average documentation.

 

However in your case, If the CMC is not pointing out an obvious quorum issue then you may need to just wait on the experts to sort it out. Have you considered dropping the 'dodgy unit' and forcing quorum using the FOM and the other Node? Keep in mind the possibility of data loss, but if you havent written anything to the nodes since the failure you should be okay...

 

 

Regards.

David Tocker
L1nklight
Valued Contributor

Re: Manager and Failover Manager Issue

Yeah man, figured it out. Here's the deal. We have HP switches. While I was messing with them this weekend I noticed that I had jumbo frames enabled on the SAN and on the VMware hosts that the SAN was hooked up to. The HP switches however, had NO jumbo frame statements to start with. So somehow this whole setup was working without the switches having jumbo frames explicitly stated.

Upon noticing that jumboframes were not explicitly stated on the switch ports, I decided to turn them on. When I turned them on, I must have configured them improperly the first time and that's when everything went offline/sideways. So I then attempted to back out the jumbo frames statements from the ports corresponding to the SAN traffic. Still no dice.

I got LHN support on the phone (finally after 24 hours of contract bickering) and they managed to SSH into the p4000 and they ran a ping with oversized packets. Once they saw that standard ping was working but oversized ping packets were not, it pretty obviously pointed to a misconfiguration on the network. I again went back in to the switches and reconfigured the ports for jumboframes but oversized the frames a few bytes bigger. Once I did this, the network came right online, the quorum dispute went away, and the SANs resynched.

As best as I can tell, by default, the HP A5800 switches let all packets pass whether they are jumbo or basic. Once you turn on the jumboframe type, its a 1 or a 0 at that point. You either have to have the command enabled or disabled. Subsequently if you disable the jumbo frame statement, it still leaves:

undo jumboframe enable

Instead of clearing the entire statement from the port. Very peculiar. Hope this helps someone and if you need more explicit information, I can go into more detail.