StoreVirtual Storage
1752801 Members
5840 Online
108789 Solutions
New Discussion юеВ

Re: P4300 G2 complete meltdown when switch comes online

 
Bryan McMullan
Trusted Contributor

Re: P4300 G2 complete meltdown when switch comes online

My stack are Cisco 3750's and the links come up after the switch fully boots (which I believe is how things are supposed to work). I guess you could be link up, admin down...but that doesn't sound right.

Have you verified that there are no firmware updates for your switches? To me, it sounds as though the switches are acting funky.

It does sound like unplugging before powering up the switch would work. But it's still not the correct function of the switch. You shouldn't be link up/admin up until the switch is ready to pass traffic.

Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

Bryan - I'm looking at buying 3750's to just be done with this mess. This way I can also use 802.3ad on the SAN storage nodes (right?). I'm curious as to which model you are using and how single switch faults are handled (I'm never used stacking switches in Ciscoworld... only the 4000 and 6000 chassis style). Thanks for all your help man.
Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

One of the iSCSI ports has a huge output failure. I'm confused as to why flow control is enabled (and activated) but there were still output buffer failures and no PAUSE frames sent.

GigabitEthernet0/4 is up, line protocol is up
Hardware is Gigabit Ethernet, address is 0009.4494.4784 (bia 0009.4494.4784)
Description: iSCSI
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 8/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s
input flow-control is on, output flow-control is on
ARP type: ARPA, ARP Timeout 04:00:00
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue :0/40 (size/max)
5 minute input rate 26000 bits/sec, 20 packets/sec
5 minute output rate 33076000 bits/sec, 2883 packets/sec
13221748 packets input, 2709740682 bytes, 0 no buffer
Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 681158 pause input
0 input packets with dribble condition detected
323344807 packets output, 834052607 bytes, 94316 underruns
0 output errors, 0 collisions, 1 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
94682 output buffer failures, 0 output buffers swapped out
Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

I wanted to update everyone on how I'm doing with this issue.

I replaced the 3550-12T switches with a pair of HP 2900-24G's, stacked with a pair of 10-gig CX4 trunks. Similar configuration as before except RSTP is completely disabled. Jumbo frames still disabled, flow control still enabled (and active) on all iSCSI ports. The HP switch specs far exceed the older 3550's and we noticed a measurable (but not substantial) improvement in I/O performance.

The first problem is now completely fixed. The new switches do not bring ports online before booting is complete and I can restart a switch without affecting communication.

The second problem we thought was fixed but it appears to just be greatly reduced. After about 2.5 weeks on the new switches it appeared that we had solved the problem with spontaneous quorum loss, but this morning it happened again. Four hours after our backup window, early in the morning, at a time with very little I/O activity - the FOM and NSM's reported no communication, quorum was lost, then everything came up in a degraded state a few seconds later. I had to cycle power on one of the NSM's to regain quorum.

I guess my next step is getting back on the phone with HP support. Last time they looked at my logs in depth and found that it must be a network issue.
ACHCHGUY
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

Hi Mathew,

I had a quick word to our network gurus

( I am not a network guru so I hope i explained the issue well to them and explain there answer back).


You can not do a fail over solution you are trying to do on that model of switch unless using the stacking cable out the back.


The only way you can use utp and create trunks is each p4000's nics must be in the same switch.


The way you are doing it and what you describe is what the network gurus here say would happen.

so to summarise.

You can use trunking utp cables only if each disk shelfs nics are in same switch not across the multiple switches.


If using the stack cables out the back it will ok with the p4000 shelf's nics spread across the multiple switches.

Hope that helps.

Also turn of trunk negotiation was another suggestion.
Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

Hey Steve,

So awesome of you to check with your network guys. I really appreciate that.

We are not using trunks to the P4000's. The only trunks are between the switches (because we have vMotion ports on a different VLAN). The P4000's are using ALB mode, which according to HP's configuration guide should work the way we have it connected. I understand why it wouldn't work with LACP.

We are using a pair of 10-gig stacking cables with the new switches. I wonder if we can enable LACP now. Hrmm... something to try after hours I suppose.
Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

For anyone who is still following this massively long thread... thanks!

Our problems have returned after upgrading to 2900-24G switches and 10-gig trunks. Flow control enabled end-to-end and STP disabled (no need for it). Quorum still gets lost on one of the two storage nodes. We've ruled out VMware being at fault and Lefthand support continues to tell me that it must be a network problem. If I can't get something done next week we are going to move to a different storage platform. Unfortunately we just can't get Lefthand stable in our environment for whatever reason.
Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

Since the topic of this thread has actually been solved and the only problems that remain aren't described in said topic, I will open a new thread for the issue that remains.
JacobS_4
Occasional Advisor

Re: P4300 G2 complete meltdown when switch comes online

Matthew,
Did you finally get the problem resolved with the LeftHand nodes losing quarum? We're having a slightly different problem, but I think there could be some similarities.

Thanks
-Jake-
Matthew D
Frequent Advisor

Re: P4300 G2 complete meltdown when switch comes online

Hi Jake,

The question was answered on the second thread that was started. To make a long story short the problem went unsolved for about two months. HP then released a patch to fix a bug with the Intel NIC drivers on the G2 series hardware. The patch description sounded like it was our problem exactly. We applied the patch and the SAN has been rock solid ever since. We've since enabled round robin load balancing and the SAN has been happy to comply. Guess we didn't need new switches after all but they certainly don't hurt.

Hope this helps.

Matt