- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: AlphaServer Memexer
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 09:55 AM
тАО03-21-2011 09:55 AM
AlphaServer Memexer
There was an issue with one of our DS20e servers which is being described as "something got corrupted in memory". My manager wants me to perform a diagnostic scan so we can be sure that there is nothing physically wrong with it to rule it out the hardware as the root cause.
I found the "memexer" SRM console command that sounds like it will discover any memory issues. Since the system will need to be offline I'd like to determine approximately how long the scan will take so I can schedule the outage accordingly. I'm running a 2 pass scan "memexer 2" on my test AlphaServer800 and it's been running for over an hour now. The AlphaServer800 only has 128MB where the DS20e has 1.5GB.
Does this mean the DS20e will take 12 times as long, or does the scan usually take the same amount of time regardless of the amount of memory?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 10:55 AM
тАО03-21-2011 10:55 AM
Re: AlphaServer Memexer
> in memory". [...]
Do you always believe what you're told? (I
have a bridge for sale, ...)
HELP SHOW MEMORY Examples
Look for "Bad Page List".
You do have ECC memory, right? Memory
hardware errors don't normally go unnoticed
by the OS.
As usual, showing actual commands with actual
error messages can be more helpful than vague
descriptions or interpretations.
And running physical memory tests before
having any real evidence of a physical memory
problem can waste considerable time.
> Does this mean the DS20e will take 12 times
> as long, [...]
Perhaps, if the memory speed of a DS20e is
the same as that of the other system, and if
"memexer" does exactly the same things on
both systems. Most of which seems unlikely.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 10:59 AM
тАО03-21-2011 10:59 AM
Re: AlphaServer Memexer
The usual trigger for the "something got corrupted in memory" reports is an application bug, or (more rarely, but there are some cases known) a kernel bug. These won't show via ECC/EDC, as they're not hardware errors.
Check the error logs.
This is old gear, and you're best headed for a hardware upgrade regardless of this particular case. If not, look at getting yourself a used DS20e as a source for spare parts and upgrades.)
As for the direct answer to your "how long?", I don't know. The last round of (gonzo) memory tests I was running were on a GS1280-class box, and those took a couple of hours. But that's not going to be particularly comparable to your box.
Put another way, run the diagnostic. It takes as long as it takes. Your boss said to do it, so... (And if you can't afford the downtime, start looking at bringing a spare server online for your environment. There's another issue here for your boss to consider.)
The memexer tool runs in the background and completes silently on success, so it may well have already finished. (If you haven't already found it, the show_status command shows you progress.)
If your version of SRM has the command available, then memexer_mp can use both processors here for testing.
Don't run parallel sets of memexer or memexer_mp as they can get tangled and report errors.
kill_diags ends the testing on command, and can be useful if you approach the end of your "guestimated" maintenance window.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 11:40 AM
тАО03-21-2011 11:40 AM
Re: AlphaServer Memexer
I've been running the "while true; show_status; sleep 10; done" on my test machine so I can monitor it. It's coming up on 3 hours. Under the Pass column of the show_status it's at 1280 and the Bytes Written/Read is at 135450853376 which seems high since it only has 128MB.
The AlphaServer800 doesn't have memexec_mp when I run HELP but I'd have to check the DS20e.
From what you described it sounds like the scan will end eventually (I was getting worried that it will run forever) so I'll let it continue... more to feed my curiosity than anything. :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 11:45 AM
тАО03-21-2011 11:45 AM
Re: AlphaServer Memexer
As stated before, with no indication from VMS that there was an error, it is likely that the problem is a programming one, not hardware. At this point, you are wasting time. You are best spending time trying to locate the portion of the application that encountered the problem and checking that code for errors.
Disclaimer:
There are many of us here that provide consulting services to assist with these types of problems. I am one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 12:29 PM
тАО03-21-2011 12:29 PM
Re: AlphaServer Memexer
But it'll probably give your boss some ammo for having a subsequent chat with whomever tossed out that "something got corrupted in memory" statement.
A memory error is either silently corrected, or it tosses a honking obvious parity error and an associated run-time snit underneath whatever was using the page. If the upstairs software using the bad page happened to be some core part of VMS, well, bye-bye VMS.
View the error logs, and see if there are any CPU, cache, memory, disk or other core hardware errors. (Don't depend on SHOW ERROR here, either, as memory errors don't get logged there until things get, um, nasty. View the error logs directly.) Any error details from the log will be more reliable than the memory exercisers; those are only particularly useful once you know you have a hardware error.
(A transient memory error won't repeat, so the memory exerciser won't find it. A failing memory component will repeat as the memory is hit and either corrected or logged as a hard error, so those errors will show up in the error logs.)
Also start instrumenting and bench-checking the code, as that's the most likely culprit for a corruption.
It's commonplace for a multiprocessor box to reveal all manner of weird and latent errors in existing application code, too, particularly when there is shared memory or any asynchronous code involved.
Given the statements so far, my bet is on an application error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 01:10 PM
тАО03-21-2011 01:10 PM
Re: AlphaServer Memexer
> 128MB.
Who said that it looks at any byte only once?
It's called "memexer", not "memhardlytest".
> Given the statements so far, my bet is on
> an application error.
Give the complete lack of useful evidence,
I'd tend to wait for some useful evidence.
But if I had to bet blind, my money would be
on the software. Show me an actual error
report, and I'll think harder.
http://en.wikipedia.org/wiki/Per_se
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 06:13 PM
тАО03-21-2011 06:13 PM
Re: AlphaServer Memexer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 07:15 PM
тАО03-21-2011 07:15 PM
Re: AlphaServer Memexer
By the application, the operating system, hardware error or cosmic rays.
Where did this diagnostic come from? And what was the behavior of the system when corruption was present?
If you have 2 CPUs isntalled, use memexer_mp. Besides testing memory, you'll also exercise the CPUs. I once saw a new ES-45 crash with random memory errors logged. We changed memory options without resolving the error. After running memexer_mp for a a weekend, we captured a console error reporting CPU cache problems. Use a device that captures console output.
Did you review error logs? Do you have hardware support on this system?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-21-2011 07:26 PM
тАО03-21-2011 07:26 PM