- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: RX2800 node crash when on same network as a Re...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-04-2016 12:28 AM
тАО05-04-2016 12:28 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Brian,
you could diagnose nonpaged pool corruption problems by setting the system parameter SYSTEM_CHECK=1 or POOLCHECK and analyzing the system crashes, they may become more frequent with these parameters set and there is some hope, that the problems will be detected 'earlier'.
You could also look at the current crashes, but you need to save each of them, for trying to detect common patterns of nonpaged pool corruption.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-04-2016 05:25 AM
тАО05-04-2016 05:25 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
As far as an explanation for why some machines vs others: I would suggest that this is a matter of timing. It is quite possible that the corruption is occurring on all machines. Thee particular workload of the ones failiing is such that the corruption is seen.
As an example, I had a client with software that worked at many sites without problems. At one site, the application would fail with an ACCVIO. Same software, same machine hardware. The problem was traced to a variable that was not initialized properly. This resulted in a memory location being used that was not expected. This particular client's set parameters resulted in the consumption of that location (high end of virtual memory) where others did not. Thus, the problem reported.
Here, the corrupt area may not be used very often by the other machines based upon the workload and the corruption (while still there ) is avoided.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-04-2016 06:37 AM
тАО05-04-2016 06:37 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Ah, this old chestnut. May I translate the customer's request for you? This request is either "please spend more than a little time and effort to re-debug the known crashes and that have already fixed by the patches, and prove to me which one is involved here" or my favorite variation "exactly which patch do I need to install to fix this, because I can't install all mandatory patches for {reasons}."
I've gone on more than a few of these rock-fetches over the years, and the best and simplest answer is usually that somebody screwed up and didn't load the mandatory patches, and that there should be a policy of installing mandatory patches as they become available and can be tested. Once the mandatory patches are all loaded and once any subsequent crashes have been run through a crash scanner, then the system crashes get far more interesting to everybody involved.
If your customer wants to know the specific cause here, then you're going to be using the source listings for OpenVMS itself in conjunction with the system dump analyzer to determine what has apparently corrupted pool and тАФ in this case тАФ there's a non-trivial chance you'll be reverse-engineering the binary code for TCP/IP Services as there are no source listings available for that. Probably the first step here is to wander around and see what's getting corrupted in pool, what's building up in pool, and what patterns might exist to the corruptions, or if there are registers or some other resource getting corrupted. (Pool corruptions and register corruptions can be some of the most wonderfully difficult bugs to locate, too тАФ the triggers can be subtle, and the faulty code can be somewhere completely unexpected. There was an NFS floating point register corruption from a ~dozen years ago that is still one of my benchmarks for bizarre crashes.)
Now once you're done with the rock-fetch and know the trigger, then the information you'll have gathered will usually either lead to the outcomes "apply the patches to fix this" or "apply the patches and submit a crashdump" тАФ knowing the specific trigger doesn't solve any of this, unless you're also going to be creating the patch yourself. Either of the usual outcomes here can be predicted with some certainty, and usually only serve to delay the actual and desired outcome of a stable system, too.
TL;DR: install the mandatory patches, and escalate any subsequent crashes to HPE or VSI, and figure out why the mandatory patches and updates aren't being loaded expeditiously.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2016 12:26 AM
тАО05-05-2016 12:26 AM
Re: RX2800 node crash when on same network as a Redhat 6 server
Hi Folks,
Applying the patches appears to have resolved the problem, although more testing is required to satisfy ourselves. The next interesting job is rolling out the patches to a number of sites.
Thanks for all your help and advice.
cheers
Brian
- « Previous
-
- 1
- 2
- Next »