- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Cluster Interconnect problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 01:18 AM
тАО10-28-2009 01:18 AM
My customer just upgraded their application envirnoment, OpenVMS upgrade to Version V8.3 with ECOs. one EVA8100 and five ES40s had be formed a one-sys_disk cluster. the cluster interconnection were used fastFD NI(DE602,one card two ports on each ES40).
after cluster running continuously about 10 days, today 3 nodes were be down, the quorum lost.
while I was at site, I found there were many errors on PEA0 Channel under SCACP, I think it isn't normal status.
My question is,
1.how should I do to improve the cluster reliability?
2.could I dedicated LAN segments be used for cluster communication?
Thanks,
Charles Song
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 02:26 AM
тАО10-28-2009 02:26 AM
Re: Cluster Interconnect problem
the first question to be answered should be: why were the 3 nodes down ? What happened ? Crashed and unable to reboot ? Hung ?
Exactly which counters did you look at with SCACP ? Some 'errors' may just be normal.
Are both ports of the DE602 LAN card connected to the same of different LAN(s) and both being used for cluster communications ?
Do the switch ports and the LAN devices agree on speed and duplex settings ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 03:17 AM
тАО10-28-2009 03:17 AM
Re: Cluster Interconnect problem
I would also check your OPERATOR.LOGs, on all nodes, for any occurrences of "CNXMAN" and "PEA", since you booted.
The logs are at SYS$MANAGER:OPERATOR.LOG (note this is node-specific)
Also need more information about infrastructure, i.e. (as suggested by Volker), how do your NIC Ports connect physically to the network switch(es), are the paths between nodes redundent, etc.
Dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 04:28 AM
тАО10-28-2009 04:28 AM
Re: Cluster Interconnect problem
customer told me that their ORACLE server couldn't access from client side, then they reset all 5 systems and reboot, the cluster rebuild and was normal.
one port(EIA) on each ES40 were connected to public network and the other(EIB) was connected to private networkand, EIB port wasn't be configurated with any network protocol, just connected cable to switch.
under SRM, I set EIA0_MODE and EIB0_MODE to
FastFD and also set under LANCP utility.
the SCACP counter be put in Attachment.
for this is the remote customer, I had ask customer that help me to get the OPERATOR.LOGs on each ES40.
Thinks
Charles Song
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 04:51 AM
тАО10-28-2009 04:51 AM
Re: Cluster Interconnect problem
the XMT:Tmo ratio does not look very healthy. It is in the 400-600 range, which indicates, that a retransmission due to a transmit timeout is happening every 400-600 packets. I normally see values in the range of 20000 and much higher.
WSR1 EIB might have a duplex mismatch problem, check the switch-port, whether it agrees with 100 Mbit FDX and is not set to auto-negotiation.
The 'problem description' your customer has given is not really helpful. Please educate your customer to force a system crash instead of just 'resetting' the systems, if something appears hung. Some PING tests could have probably helped to determine, which - if any system - was hung. You might need the console output to determine, what was wrong.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 05:02 AM
тАО10-28-2009 05:02 AM
Re: Cluster Interconnect problem
if you look at the Channel Rexmit Errors, you see that they are much higher on the EIB LAN adapter from WSR1 to all other 4 nodes, than on the EIA LAN adapter, so there seems to be some problem on that LAN (most likely speed/duplex/auto-negotiation settings). You need to check this data on all nodes.
I don't believe, that whatever problem had happened, that it has been caused by 'cluster interconnect problems'.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 05:39 AM
тАО10-28-2009 05:39 AM
Re: Cluster Interconnect problem
I will tell customer to make force crash, while next time the cluster be out of working.
I check and set EIx0_mode carefully while I intsalled the cluster, but I couldn't confirm the switch side's values.
Why the XMT:Tmo ratio was so low, and how can I adjust?
another question, could I dedicated one NIC port be used as cluster connection? and I want to set the port to twist-pair 10Mb mode, I think this mode will be more reliable?
Thanks.
Charles Song
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 05:50 AM
тАО10-28-2009 05:50 AM
Solutionyou need to find out the switch port settings. Otherwise, you could set the OpenVMS side to auto-negotiation and hope that the switch also is set to auto. You can dynamically try this with LANCP first (i.e. LANCP> SET DEV EIB/AUTO) and check, if the duplex mode mismatch errors disappear. With LANCP SHOW DEV/INT EIB you are able to see the LAN driver error messages, which are otherwise only visible on the console terminal.
Setting the EIx LAN interface to 10 Mbit HDX may only make things worse.
You cannot improve the XMT:Tmo ratio except by making sure, that the underlying network works reliably, especially the EIB LAN, as indiciated by the higher number of Channel Rexmit Errors.
Don't blame the cluster protocol for what has happened, until after you understand what has really happened ! The cluster communication protocol is very reliable and if there is a working channel between the systems, it will find and use it !
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 06:08 AM
тАО10-28-2009 06:08 AM
Re: Cluster Interconnect problem
good idea, I will change the values at LANCP, and check the result on just one ES40. in my case, all two ports may be used as cluster communication channel?
Thanks
Charles Song
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО10-28-2009 06:18 AM
тАО10-28-2009 06:18 AM
Re: Cluster Interconnect problem
as MC SCACP SHOW CHAN shows, you have 2 channels between each of the ES40s. One channel is formed between the EIA LAN interfaces of both nodes and the other channel between the EIB LAN interfaces, both network segments are NOT connected to each other.
As long as one of those 2 channels is functioning, the virtual circuit between the nodes is intact and cluster messages can be exchanged.
Volker.