Storage Software
1748162 Members
3639 Online
108758 Solutions
New Discussion юеВ

Re: Polyserve Cluster unexpected reboots!

 
reachkrishna
New Member

Polyserve Cluster unexpected reboots!

Hi All,
Have a 6 (blades)node polyserve cluster set up with the following configuration.

HP Proliant BL460c G6
OS- Win 2k3 Server ent x64 Sp2
Nic: Hp NC532i Dual Port 10gbe multi funct BL-c adapter.
Polyserve Matrix Server 3.6.1
Build version 3.6.1.0574
installed Solution packs:
SSAS3.6.2.0202
SQL2000 3.6.2.0202
MSDTC 3.6.2.0202

HP EVA san Array as storage
Sybase Open CLient 12.5.1

For couple of weeks, observed two scenairos/issues
1)that the nodes get rebooted (may be fenced) or so for whatever reason-event logs (system/app/matrix) dont say much , except for the matrix server terminated messg
2) while the restarted node comes back online, for some reason, all the other nodes in the cluster go into a hung state, the failover doesnt happen till this node comes back alive totally.
The rebooted node does take considerable amount of time to come back to a fully operational state.Once thats done, the other nodes are ok too.

Anyone here come across such a situtation?
Appreciate any help/suggestions :)

thanks,
krishna

5 REPLIES 5
BlackHawkEH!
Occasional Advisor

Re: Polyserve Cluster unexpected reboots!

We had something similar with Fencing and it came down to NIC settings and binding orders for the most part.

I'll poke through my notes and see what I can find.
Binary: easy as 1, 10, 11
Emil Velez
Honored Contributor

Re: Polyserve Cluster unexpected reboots!

Check what version of the broadcom driver you have. THere are a few known issues with that.

Once the node is fenced the cluster should be ok and the node should just reboot and join the cluster but the cluster should be ok without that fenced node.

It sounds like the node gets fenced but for some reason the cluster doesnt think it fenced the node.

Check your ILO fencing settings to make sure the credentials are correct.



reachkrishna
New Member

Re: Polyserve Cluster unexpected reboots!

Hi Guys,
Thanks for you responses.

Emil,
I verified the NIC drivers to be of the following version .
BroadCom Corp
5.0.13.0

Aware of this version causing issues?

regards,
Krishna
Dan Tyndale
Advisor

Re: Polyserve Cluster unexpected reboots!

If the node is fenced and if iLO fencing (vs. SAN fencing), then it will be rebooted. That is the expected behavior to protect the integrity of the data. What is not expected is the hang of every other node.

If this was my cluster, I would contact HP Support and have them review the data from HPS reports (our data, log collector) from each node to review. Otherwise we are just guessing. Yes, blades take a long time to reboot, however if the node is fenced, it will NOT affect the other nodes.

Some guesses:
1) Is the underlying HW firmware current or one rev back from all your blades and enclosure. Includes HBA, Broadcom, ProCurve switches, etc? Recommend HP Firmware Maintenance CD and other firmware requirements be close to current for BL460c G6 at drivers section at www.hp.com

2) Are your HP drivers current or one rev back for PSP. Recommend you download current PSP from www.hp.com, Support and Drivers, BL460c G6.

3) Is the underlying OS current for this EOL Microsoft OS?
http://support.microsoft.com/?id=935640

4) Are you running current PolyServe hotfixes? 3.6.1 has a few based upon your use of the product. www.hp.com, Support and Drivers, PolyServe (case sensitive).

caveot - before any updates to any nodes
a) Mgt console - right click - pause node
b) CMD line - net stop matrixserver
Update the node, reboot, test, next node...
Tammy Lawson
Advisor

Re: Polyserve Cluster unexpected reboots!

I had a similar issue too on my 3.6.1 cluster. I verified NIC settings, flow control, etc per the documentation and HP support. What it ended up being was a self-inflicted problem...before we moved to Polyserve we had implemented a procedure to gzip our SQL backups via a DOS script every night (we get charged per Gig). Anywho - on Polyserve it would fence occassionally when the DOS script tried to compress a file or LUN that was currently in use. Not sure of the specifics there - just the DOS script was the common denominator. On a side note, we now use Litespeed for compression and have no issues.