- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: RX2800 node crash when on same network as a Re...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-29-2016 05:23 AM
тАО04-29-2016 05:23 AM
Hi folks,
Not sure of this is a networking issue or not. We have a 2 node cluster comprising of a SAN array and two RX2800s, both on OpenVMS 8-4 Update 7 and TCPIP/IP 5-7 ECO 3.
Both nodes boot into the cluster quite happily and will sit there, however a soon as a Redhat server is introduced onto the network, both nodes bug check with NOTWCBWCB, Corrupted WCB list. Both nodes use an NFS share running on the Redhat system., Rolling back the Redhat system allows the nodes to reboot and then function as expected. The network cards are plugged into the PCI riser card. I havent verified whether or not the issue will ocur with the network ports on the motherboard.
We run several installation of this system and at least one of these works correctly with the update Redhat server, the primary difference being that that particular system is a few months older than the one crashing.
A secondary concern is that at around the same time the RX2800 managed to lose its boot options and iLO passwords, I'm not sure if this is related or just plain bad luck. In any event the system no longer boots.
thanks in advance
Brian
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-29-2016 05:29 AM
тАО04-29-2016 05:29 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Brian,
a NOTWCBWCB crash is most likely a software issue (pool corruption ?). Could you provide the CLUE file from CLUE$COLECT:CLUE$node_yymmdd_hhmm.LIS) as an attachment ?
TCPIP NFS client would be the most likely culprit. What happens, if you just DO NOT mount these NFS shares on that Redhat NFS server from your rx2800 ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-29-2016 05:41 AM
тАО04-29-2016 05:41 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Volker,
I've attached one of the CLUE files. We've rolled the Redhat server back to give the customer a working system. The manufacturers will be in the office next week to do some more testing, at which point we'll try and work out which protocol caused the issue.
The concern from my point of view is that the new version of the system this works at their office but not live.
cheers
Brian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-29-2016 06:02 AM - edited тАО04-29-2016 06:03 AM
тАО04-29-2016 06:02 AM - edited тАО04-29-2016 06:03 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Brian,
the code in [F11X]WITURN (in routine MARK_COMPLETE) walks a list of Files-11 related data structures in nonpaged pool. If it finds a packet, which is NOT of the expected type (in this case a WCB=Window Control Block), it bugchecks:
BUG_CHECK (NOTWCBWCB, FATAL, 'Currupted WCB list');
So my initial analysis still holds: most likey pool corruption by some software component and something in the TCPIP stack or the TCPIP NFS client is most likely the culprit.
Compare NFS versions (TCPIP SHOW VERS/ALL) between a working system and a failing one.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-29-2016 06:35 AM
тАО04-29-2016 06:35 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Volker,
I've jas had a quick look at ome of the the other bugcheck reasons, I got:
28-APR-2016 11:54 V8.4 HP rx2800 i2 (1.60 MAINT1 CPUSPINWAIT 80193B20 SYSTEM_SYNCHRONIZATION_ 00010F20
28-APR-2016 12:08 V8.4 HP rx2800 i2 (1.60 MAINT1 FATALEXCPT MSS_20_SY_50199 80704462 LOCKING 00056C62
28-APR-2016 12:21 V8.4 HP rx2800 i2 (1.60 MAINT1 NOTFCBFCB SYS_MONITOR 80787430 F11BXQP 00043030
28-APR-2016 12:50 V8.4 HP rx2800 i2 (1.60 MAINT1 SSRVEXCEPT SIG_ACTPRO 801EFBC0 SYSTEM_SYNCHRONIZATION_ 0006CFC0
And
28-APR-2016 11:53 V8.4 HP rx2800 i2 (1.60 MAINT2 INVEXCEPTN NULL 80A33430 SECURITY 0002C430
28-APR-2016 12:08 V8.4 HP rx2800 i2 (1.60 MAINT2 NOTWCBWCB CIMDAEMON 807898A0 F11BXQP 000454A0
I think one of the nodes will have 20 or bgchecks for tuesday night. I suspect they're all related.
cheers
Brian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-29-2016 02:41 PM
тАО04-29-2016 02:41 PM
Re: RX2800 node crash when on same network as a Redhat 6 server
> [...] TCPIP/IP 5-7 ECO 3
If you do suspect an NFS-related problem, then you might start by
considering getting the TCPIP software up to date. I have an
ill-maintained hobbyist system with newer than that, and the
availability of newer than mine would not amaze me.
REX $ tcpip show version
HP TCP/IP Services for OpenVMS Industry Standard 64 Version V5.7 - ECO 4
on an HP rx2600 (1.50GHz/6.0MB) running OpenVMS V8.4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-03-2016 06:54 AM
тАО05-03-2016 06:54 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
OK, been able to experiment a bit (without having a customer breathing down my kneck);
It looks as though the core networking is fine, without the application starting up I can manually mount the NFS share hosted on the Redhat host, I can ping the host, SSH to it and so on. So at that level it looks as though things are OK.
As soon as I startup the application up things go awry, the bugchecks seem to be inconsistent:
3-MAY-2016 11:22 V8.4 HP rx2800 i2 (1.60 NWRCC1 NOTFCBFCB SIG_20_SY_2674 80787430 F11BXQP 00043030
3-MAY-2016 12:14 V8.4 HP rx2800 i2 (1.60 NWRCC1 UNXSIGNAL NWRCC1_HW_IA64 00000000 <not available> 00000000
3-MAY-2016 12:27 V8.4 HP rx2800 i2 (1.60 NWRCC1 INVEXCEPTN TCPIP$RE_BG2101 80118240 SYSTEM_PRIMITIVES_MIN 00108240
3-MAY-2016 12:38 V8.4 HP rx2800 i2 (1.60 NWRCC1 SSRVEXCEPT DNFS2011ACP 80704351 LOCKING 00056B51
3-MAY-2016 12:48 V8.4 HP rx2800 i2 (1.60 NWRCC1 INVEXCEPTN DNFS2012ACP 80102820 SYSTEM_PRIMITIVES_MIN 000F2820
3-MAY-2016 13:42 V8.4 HP rx2800 i2 (1.60 NWRCC1 UNXSIGNAL SIG_20_SY_46329 90AFB7CF <not available> 00000000
Th exceptions may be netowrk related but I'm more concerned by the seemingly random nature of the crashes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-03-2016 06:58 AM
тАО05-03-2016 06:58 AM
SolutionHi Brian,
these are the TYPICAL symptoms of nonpaged pool corruptions: crashes all over the place ! You may even be able to reproduce these crashes WITHOUT starting the application by copying files from/to the NFS share on the Redhat server.
Get and install the most recent TCPIP ECO first !
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-03-2016 09:14 AM
тАО05-03-2016 09:14 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Brian,
I will echo Volker's recommendation. Spending time now trying to locate crash causes will be a waste of time. With pool corruption, the crashes will be random and not show any particular cause. Upgrade TCPIP at least to the most recent available patch you can. I would look too at any release notes availale for more recent VMS releases to see if there are any pool related "updates". if these problems continue after the upgrade, a more drastic investigation effort may be necessary.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-04-2016 12:08 AM
тАО05-04-2016 12:08 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Folks,
I'll try the patches today., although I still have to explain to the customer why, out of 4 two node RX2800 clusters and 1 RX2660 running the same version of the OS (including patches) and the same application software, two of clustered systems fail with this error.
Are there any tools etc. I can use (now and in the future) to invesgiate these problems?
cheers
Brian