- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Processes Mysteriously Being Deleted
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 01:33 AM
01-22-2008 01:33 AM
Re: Processes Mysteriously Being Deleted
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 01:53 AM
01-22-2008 01:53 AM
Re: Processes Mysteriously Being Deleted
OpenVMS Developer & System Manager
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 02:48 AM
01-22-2008 02:48 AM
Re: Processes Mysteriously Being Deleted
In that case the wait is not working correctly. He asked to wait 10 minutes ...
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 03:01 AM
01-22-2008 03:01 AM
Re: Processes Mysteriously Being Deleted
Is the job keeping a task-to-task open ?
If it doesn't receives any information for 75 seconds, the remote stuff stops (75 coming from our already gone 6.2 systems). May be this caused the bugcheck because the abort is about 60 seconds after the wait started.
Fwiw
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 03:06 AM
01-22-2008 03:06 AM
Re: Processes Mysteriously Being Deleted
Nothing like that happens in the process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 03:26 AM
01-22-2008 03:26 AM
Re: Processes Mysteriously Being Deleted
$ open/read/write x node::"task=xx.com".
Don't know if rms is involved but could be imo.
Or may be a spaw/nowait doing something ?
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 04:26 AM
01-22-2008 04:26 AM
Re: Processes Mysteriously Being Deleted
OpenVMS Developer & System Manager
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 04:42 AM
01-22-2008 04:42 AM
Re: Processes Mysteriously Being Deleted
If the process terminated at 07:45:52, the wait started at 07:45:00 and was 10 minutes long.....then it MUST have been running the wait when the DELPRC occurred.
This is how I know it was running a DCL WAIT, regardless of what information was flushed or not.
Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 04:54 AM
01-22-2008 04:54 AM
Re: Processes Mysteriously Being Deleted
I just did a test where the parent was doing a wait while a sun terminated. The termination caused the parent to do a few IO's.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 10:45 AM
01-22-2008 10:45 AM
Re: Processes Mysteriously Being Deleted
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 04:39 PM
01-22-2008 04:39 PM
Re: Processes Mysteriously Being Deleted
Is it always the GBSINVRUN job that getting the RMS Bugchecks?
If so, what is special about that job?
Are there other processes in the job? If not then we can eliminate subprocesses as having an effect.
Can you do a
SDA> set proc/id=PID of GBSINVRUN job
SDA> show process/chan
while the GBSINVRUN job is in a WAIT?
Are any special drivers or execlets loaded?
SDA> show exec/summary
Is the job copying files from other systems? If so, how are these transfers happening? Any NFS or other software providing "disks" to VMS?
Are you still using HSG80, or something else?
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 09:02 PM
01-22-2008 09:02 PM
Re: Processes Mysteriously Being Deleted
ETA$MGA
Look at the SP reported in the BUGCHECK, it does not appear to be a valid address, and it is not aligned correctly. And it has valid ASCII charaters in it.
FP x000000007FF8BC80
SP x0041474D24415445 <--
PC xFFFFFFFF804A3550
0x ASCII
45 E
54 T
41 A
24 $
4D M
47 G
41 A
It seems to me that something has trashed something that later got loaded into the SP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2008 10:49 PM
01-22-2008 10:49 PM
Re: Processes Mysteriously Being Deleted
the most important question to be answered: is there always a RMS non-fatal bugcheck associated with the forced process exit ?
Could the flushing of the logfile write have initiated the bugcheck ?
Can you look at the instruction stream around the reported PC ?
$ ANAL/SYS
SDA> EXA/INS 804A3550-40;50
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2008 01:46 AM
01-23-2008 01:46 AM
Re: Processes Mysteriously Being Deleted
This looks like one of our tape drives, BETA$MGAn:
There have been around 6 different job streams that have failed, all doing completely different tasks.
Trying to find similarities is difficult because these are 3 years apart, so I can't tell you much more detail than you already know.
Even the backups we took have now expired, so I can't look at the original log files, only use what we wrote down about the problem.
Remember that the first time we saw this, we put it down to a glitch, and didn't take much notice of it.
Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2008 03:02 AM
01-23-2008 03:02 AM
Re: Processes Mysteriously Being Deleted
if these batch process exits are always associated with RMS non-fatal bugchecks, you don't need to activate all the auditing around process deletion. The best information you can get right now are the bugcheck entries from ERRLOG.SYS. Please look at all of them (consider to post them in an attached .TXT file) and look for similarities.
It will be very hard - if not impossible - to diagnose this problem based on this minimum amount of information. As Hein said, the only way to get real information is a system crash (setting BUGCHECKFATAL=1 dynamically once before expecting this problem to happen).
Some information may be spotted in the registers (as Jon found out !), but it would be pure luck, if you can nail the problem this way. Did all those batch jobs do anything with your Fibre Channel tapes ?
Please report the failing instruction stream, so that we can try to find out, if other registers may contain important information.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2008 03:13 AM
01-23-2008 03:13 AM
Re: Processes Mysteriously Being Deleted
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-23-2008 04:16 AM
01-23-2008 04:16 AM
Re: Processes Mysteriously Being Deleted
a plain call to $DELPRC in a program running in batch yields the same retained-on-error status as you've reported. RMS explicitly does a $DELPRC call after reporting the RMS internal error with BUG_CHECK RMSBUG. I would have expected an exit status of %RMS-F-RMSBUG, but I have no way to test this.
Running Hein's cmexec_crash example gives a SYSTEM-F-ACCVIO exit status and a non-fatal bugcheck, but it does not exactly create a RMS bugcheck !
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2008 12:55 PM
01-25-2008 12:55 PM
Re: Processes Mysteriously Being Deleted
This sounds (vaguely) like a context switch or condition handler tried to handle a condition and in the process, mis-aligned the stack so that you have data where addresses should have been. The crash would then occur because the real error is that I'll bet that SP address is no good.
So the RMS BUGCHECK error is merely that RMS suddenly found itself with an impossible stack, when in fact the true error should be some sort of hardware misalignment. The text leaves "45" as the low-order two hex digits in the SP register, which is clearly misaligned, can't use that as an SP on an Alpha 'cause it'll barf every time.
If you can enable a full-blown crash dump, you might be able to look at what else was running that might have hosed the EXEC stack, which is of course shared among all processes (I think I remember that from my years-ago Internals class...) - so the task that is dying is only partly the culprit and in fact might be a victim of a third-party ACP task that hoses the stack for you and leaves the mess for someone else to clean up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2008 01:37 AM
01-26-2008 01:37 AM
Re: Processes Mysteriously Being Deleted
the EXEC stack, which is of course shared among all processes
the EXEC stack is per process, as are the USER, SUPERVISOR and KERNEL stacks.
If those processes have not done anything to those tapes in EXEC (or higher) mode previously, it is hard to imagine, how the EXEC mode stack pointer could be corrupted with what looks like part of a tape device name.
Without a system crash (BUGCEHCKFATAL=1), it will be very unlikely to successfully diagnose this problem and even with a crash, it may be very hard.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2008 12:27 PM
01-26-2008 12:27 PM
Re: Processes Mysteriously Being Deleted
In the very area of the reported RMS problem the system is busily juggling the stack pointer(s), getting ready to switch it to a fresh area to field an RMS requested AST.
Here is a few more interpreted register values:
R6 x0000000000018001 RMS$_STALL (or long at IFB+IFI for SYS$OUTPUT)
R11 x000000007FFD00B8 PIO$GW_PIOIMPA
R12 x000000007FFCDA60 CTL$AG_CLIDATA
R16 x00000000000003A0 BUG$_RMSBUG
R17 xFFFFFFFFFFFFFFFA ASBALLFAIL
R19 x000000007FFD0010 PIO$GW_STATUS
R16/r17 are consistent with and RMS BUG.
R19 suggests RMS was active as well.
R11 reads Process IO, not Image IO, suggesting it is DCL releated IO.
R12 is a global value used by DCL all the time.
So far so good.
Now ASBALLFAIL is only used from RM$ASY_THREAD_START.
That in turn is only (supposed to be) called from LOCKFILE for the file lock, and RELEASE for a blocking AST on a Deferred Write buffer.
Since DCL does not do DFW, it must be a file lock.
Typically DCL will do an (RMS) file lock either. It only grabs those for $OPEN/SHARE.
Robert, Is that the case for this script?
Can you run submit it, wait for it to be waiting, and do a ANAL/SYSTEM... SET PROC BATCH_xxx... SHOW PROC/LOCK ?
Is there a resource wih name RMS$...
Does it have a blocking AST perhaps for 86829400
Do you see that lock back with:
SDA> SHOW PROC/RMS=(PIO,SFSB)
[ Also, if interested, check out things like SDA> SHOW PROC/RMS=(PIO,BDBSUM) and examine the buffer contents. ]
Anyway, If so, any other process opening the same file will trigger an RMS file lock blocking AST out of context, at any time. During an image, during a wait, anytime.
RMS takes the AST, tries to allocate a thread control structure for it. If that fails, there is no place to report back, and it crashes the process as only way to report the fact that it could not do what it was asked to do (downgrade the lock).
What may have caused this?
Well, I suspect that the process must simply have run put of Process IO memory at an unfortunate time.
I suspect that if you change the script to have it open (a few) more files, notably indexed or shared, right where it is doing that WAIT, that it might error out with DME (can you try?!)
The amount of memory RMS allocated is MIN(2, SGN$GL_KSTACKPAG) times PAGE_SIZE (8192).
So it is also possible that the system has an excessive value for SGN$GL_KSTACKPAG.
Check with SDA or MCR SYSGEN SHOW KSTACK.
3 is typical. What is it on the system?
btw... the little tester Volker refers to is in my (underappreciated :^) reply to http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1195742
Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2008 02:23 PM
01-27-2008 02:23 PM
Re: Processes Mysteriously Being Deleted
Perhaps I should open another thread, but how does RMS handle indefinite postponement in the case where a process is holding a lock, and someone with CMKRNL does a
$ set process/suspend=kernel
on the process holding the lock. That would seem to prevent the blocking AST from being delivered and therefore acted upon. I would assume that the can lead to other processes blocking, but that should not end in a bugcheck. And as this is the result of a user action, the fact that it can block other processes is to be expected. (I believe that set proc/susp used to have this effect, and that the "soft suspend in SUPERVISOR mode" was to address this issue.)
Another case that could delay for a long time is if the job holding the lock is running at a low process priority, and there are other compute bound jobs that are keeping the processor(s) busy. PIXSCAN would eventually give the process some CPU, but I am not aware of any mechanism to boost the software priority when exec mode ASTs are queued. (Special Kernel mode AST's can specify a priority boost, but I am not aware of any priority boost applied to processes that are blocking EXEC mode locks.)
I see that there was just a new patch released for 7.3-2 dealing with RMS locks, global buffers and CPU busy.
ftp://ftp.itrc.hp.com/openvms_patches/alpha/V7.3-2/VMS732_RMS-V0500.txt
5.2.2 Process Hang
5.2.2.1 Problem Description:
Under very rare conditions, processes on a system may
hang while accessing a file with global buffers enabled.
One of the processes will be in a compute bound loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-27-2008 04:18 PM
01-27-2008 04:18 PM
Re: Processes Mysteriously Being Deleted
Yeah you should.
>> but how does RMS handle indefinite postponement in the case where a process is holding a lock, and someone with CMKRNL does a
$ set process/suspend=kernel
I would hope it waits forever possibly triggering deadlock search, but it does not seem worth my limited brain cycles.
I hope it hurts! (Doctor it hurts...)
You should get what you ask for.
OpenVMS gives you the gun.
You can choose to shoot yourself in the foor or not.
>> but that should not end in a bugcheck.
I don't see why that would bugcheck.
>> Another case that could delay for a long time is if the job holding the lock is running at a low process priority, and there are other compute bound jobs that are keeping the processor(s) busy. PIXSCAN would eventually give the process some CPU,
That is the very reason pixscan was invented.
Mind you, rms will not generally hold 'bucket' locks. It does that only if the application request Deferred write. RMS will by default hold 1, the last accessed, record locked in a write share environment.
>> I see that there was just a new patch released for 7.3-2 dealing with RMS locks, global buffers and CPU busy.
I don't see that mentioning locks, even though there will of course be locks, for there to be global buffers.
I suspect that patch is for a day 1 bug with a timing window on the global buffer replacement algoritme, which is now easier to trigger with more, and faster, CPUs.
If it is that problem, then it also requires a severely undersized number of global buffers to be likely to trigger (like just 2 global buffers, which should be a little silly no matter what).
hth,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-28-2008 01:46 AM
01-28-2008 01:46 AM
Re: Processes Mysteriously Being Deleted
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
KSTACKPAGES 6 1 1 768 Pages
ALPHA_ROB$
If I completely understood all of the low level stuff that you've been discussing, I'm not sure how we could take it any further, because the problem cannot be reproduced.
I've suggested we need to crash the system to get more information, but it's been decided that taking the hit on the process dying versus a full system crash is preferrable.
Unless the problem frequency increases, this is possibly one we'll never get to the bottom of. I'm hoping a move to Itanium and/or VMS 8.3 might magically cure the problem.
Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-28-2008 06:41 AM
01-28-2008 06:41 AM
Re: Processes Mysteriously Being Deleted
KSTACKPAGES 6 1 1 768 Pages
So that's a little increase, but to come from a limited area (PIOPAGES). Is the reason why this was changed from the default still valid?
>> I'm not sure how we could take it any further, because the problem cannot be reproduced.
Well, if you could run the script, in a test environment perhaps, or pointing to scratch copies of files, then you can possibly get to the bottom. Either analitically with ANAL/SYS checking memory consumption, or emperically by opening a few more (indexed) files right where that wait would be. See if you get RMS$_DME. With the DME error almost there, have an other process open/share some of the files opened by DCL in the waiting process. That might just trigger it.
>> I'm hoping a move to Itanium and/or VMS 8.3 might magically cure the problem.
Not if it is was I suspect, RMS simply running out of process (versus image) memory space.
That's much like the DME error, but at a bad time.
$ help /mes/fac=rms dme
So after all this, maybe just increase (double?) PIOPAGES a bunch (200? 500?) and hope for the best !?
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-28-2008 06:50 AM
01-28-2008 06:50 AM
Re: Processes Mysteriously Being Deleted
Trying to recreate the failure is impossible. We already run this script about 8 times every day, as it forms part of our Invoicing suite.
It's also run on various test systems, so I can't see how we could possibly make it go wrong, especially as we already know the problem only happens once a year, and it's not always the same script anyway.
Rob.