- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Processes Mysteriously Being Deleted
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-28-2008 09:47 AM
тАО01-28-2008 09:47 AM
Re: Processes Mysteriously Being Deleted
The only way I know to fix this kind of problem is to somehow know EXACTLY what is running at the time of the event, which means briefly turning on image accounting.
Also, be absolutely sure you have enough space set aside for crash dump and for retention of the errorlog buffers so that you don't lose any more "evidence" than is possible. Keep everything logging full-out at the critical time starting just before that job's scheduled run. Turn it off after next year's crash. (Egad, that sounds weird to say it, but that what is being described.)
Do you have ANY third-party software running as an ACP? Is there any third-party device driver talking to a non-HP / COMPAQ / DEC device? (Don't know how old your system is, but we've got some of each on our old clunker of an Alpha cluster.) Did you buy your system from DEC / COMPAQ / HP or was there an OEM in the picture?
Do you have ANY performance tools from a third party?
Even if the EXEC stack is private vs. common to all, having text in the SP is not good because it means something is terminating and you are missing the real cause of the problem - an abort is leaving the stack useless and it is the stuff that has just been popped off the stack that tells you what really happened. If it is still there, which certainly is not guaranteed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-28-2008 01:01 PM
тАО01-28-2008 01:01 PM
Re: Processes Mysteriously Being Deleted
So it is also possible that the system has an excessive value for SGN$GL_KSTACKPAG."
Did you mean MAX(2, SGN$GL_KSTACKPAG) ?
The AXP V1.5 IDSM says "The Process I/O segment contains RMS data structures describing process-permanent files, those that can and usually do remain open across image activations. The SYSGEN parameter PIOPAGES specifies the size in pagelets"
If I understand what you wrote, the PIO segment is also used for the "thread control structure" you mention in [RMS takes the AST, tries to allocate a thread control structure for it. If that fails, there is no place to report back, and it crashes the process as only way to report the fact that it could not do what it was asked to do (downgrade the lock).].
Since PIOPAGES is expressed in 512 byte pagelets, there are 16 pagelets in an Alpha 8192 byte page. So if RMS is allocating 6 Alpha pages, that is 96 pagelets.
Hein's suggestion to try opening some additional file in the context of the job to see if you got a DME (dynamic memory exhausted) error prompted me to do a
$ help/message dme
That has a lot of useful information. I wasn't aware there was a limit of 63 Process Permanent Files. The other interesting thing is that it mentions buffers, so if set RMS /buff=x /block=y and x*y is a large value, and the command file is opening files, then the 2000 pagelets (125 Alpha Pages) for PIOPAGES may not be as big as it seems.
What would the SDA command to display available PIO segment memory be?
Rob,
I think the answer to your question "to see if anyone else had seen anything like this at yearend" is that no one (here) is aware of a similar problem to what you are seeing.
Did the increase in PIOPAGES reduce the frequency? (if you have had only 6 events in 3 years, I suppose that is not easy to say.
Do you know how many Process-Permanent Files (PPF) are open, and if so how many are using open/share?
It still isn't clear that from what you have told us whether there is an RMS bugcheck each time the process is deleted. For last December, you hinted that there were at least two unexplained process deletions, but showed us only one RMS Bugcheck errorlog entry. Was that the only one? If so, then the cause of the process deletions could be totally unrelated.
And I can understand management does not want to allow a crash, especially since there is no guarantee that you will be able to find the cause even with a crash dump. However, the probability of being able to find it is much higher with a valid crash dump than with only the extremely limited context that the bugcheck errorlog entry provides.
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-28-2008 02:21 PM
тАО01-28-2008 02:21 PM
Re: Processes Mysteriously Being Deleted
I appreciate the sentiment, but in this particular case the bogus SP is 99.99% to be a red herring. Remember on Alpha register numbers are just a matter of convention/convience.
Nothing at all to special about the SP. Just a register with a nickname.
If my ramblings are correct, then RMS is busy trying to come op with a fresh value to stick into the register conveniently labeled SP, and because it failed to find a new value it had to crash out.
That crash is perfectly controlled, with reasonalle looking output. So the bad looking SP is an effect, not a cause. This is re-enforced by the fine looking values for KSP, ESP, SSP and USP early in the error log entry.
Jon P wrote> Did you mean MAX(2, SGN$GL_KSTACKPAG) ?
Jon... there you go again. Hijacking the thread some more! Anyway... Yes of course.
> So if RMS is allocating 6 Alpha pages, that is 96 pagelets.
Yes. An ASB makes a big dent PIOPAGES. It is the largest structure (except for some IO buffers).
>> Hein's suggestion to try opening some additional file in the context of the job
I actually tried this also, opening an indexed file several times (bucket sized bufers) and cleaning the remainder with sequential file opens.
Easy enough to get the DME.
Could not trigger an ASBALLFAIL for now.
The ASB lookaside probably gave one ASB.
So we'd need a second interupt while the first is stalling.
Jon> so if set RMS /buff=x /block=y and x*y is a large value, and the command file is opening files, then the 2000 pagelets (125 Alpha Pages) for PIOPAGES may not be as big as it seems.
Yeah, but DCL/RMS does not always ust the defaults. Easy enough to verify with the earlier mentioned SDA> SHOW PROC/RMS=(PIO,BDBSUM).
Jon>> What would the SDA command to display available PIO segment memory be?
SDA> READ RMSDEF.STB
SDA> FORMAT/TYPE=IMP PIO$GW_PIOIMPA
SDA> VALIDATE QUEUE PIO$GW_PIOIMPA+IMP$L_ASB_LOOKASIDE_LIST
SDA> VALIDATE QUEUE PIO$GW_PIOIMPA+IMP$L_FREEPGLH
SDA> EXA @(PIO$GW_PIOIMPA+IMP$L_FREEPGLH);8
SDA> EXA @.;8 ! Repeat for each element in queue Total free is sum of the 6th longwords.
But ofcourse you'd need a single one big enough for an ASB. So if you open file A,B,C,D,E,F and close A,D and E, then it may look like plenty of room, but it might not be useable for an ASB.
"to see if anyone else had seen anything like this at yearend" is that no one (here) is aware of a similar problem to what you are seeing."
That's a valid and good reason for a post though! I do remember a couple reported issues with ASBALLFAIL during my time @HP.
I think ACMS was involved at times,
and I thing PIOPAGES always brought relieve.
$max = 1000 ! Fools guard
$if p2.nes."" then max = 'p2
$i=0
$open:
$i = i + 1
$open/share=write/read/write/error=error x'i 'p1 ! Big buffers
$if i .lt. p2 then goto open
$error:
$error = $status
$write sys$output "Error ''error' after ''i' files. ", f$mess(error)
$more:
$open/share=write/read/write/error=next x'i sys$login:login.com ! a few more Small buffers
$i = i + 1
$if i .lt. 1000 then goto more
$next:
$error = $status
$write sys$output "Next error ''error' after ''i' files. ", f$mess(error)
$inquire/nopun ok "Close files? "
$close:
$i = i - 1
$close/nolog x'i
$if i .gt. 1 then goto close
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-28-2008 11:32 PM
тАО01-28-2008 11:32 PM
Re: Processes Mysteriously Being Deleted
if you're seeing this probelm (seldom enough) with different DCL procedures, there should be something in common with those scripts PLUS some additional factor (like end-of-year load ?) to trigger the problem.
Do these scripts really open files at DCL level and keep them open during the 10-minute WAIT ?
Instead of allowing a crash (with BUGCHECKFATAL=1) you could - theoretically - replace the BUG_CHECK RMSBUG instruction inside RMS with a 'BR .' sending the process into a compute loop in EXEC mode. This would just effect this process, which would have been killed anyway, but let the system survive. Then you can lower the priority or suspend that looping process and force a crash at a convenient time...
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-29-2008 03:10 AM
тАО01-29-2008 03:10 AM
Re: Processes Mysteriously Being Deleted
The one thing we do all agree on is that there isn't enough information at the moment to determine the real cause.
I have noted everyones suggestions, and will determine what changes we can make towards the end of the year.
Although I'm closing this rather abruptly, I genuinely do appreciate everyones help with this, and I'll try and update you all when I get something.
Thanks, Robert.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-15-2009 10:16 AM
тАО12-15-2009 10:16 AM
Re: Processes Mysteriously Being Deleted
It's that time of year! If you are going to prepare to put in extra monitoring, or plans to trap the process generating the bugcheck, now is the time to get ready.
Good luck,
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-21-2009 06:44 AM
тАО12-21-2009 06:44 AM
Re: Processes Mysteriously Being Deleted
We've been running a 'watcher' process for the last year. Hopefully our move to Itanium will see the end of this nasty little critter :)
Rob.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-21-2009 02:00 PM
тАО12-21-2009 02:00 PM
Re: Processes Mysteriously Being Deleted
1 - What else was running at the time that would have sufficient privilege to kill a process? I'm particularly interested in images that are not part of VMS because I'm pondering a "victim of mistaken identity" when trying to terminate another process (i.e. attempted termination but used wrong PID or wrong process name). This is a long shot because it doesn't account for the RMS error.
2 - Does any of your software do anything highly privileged like mess with S0 space in Executive or Kernel modes? I'm wondering about a data corruption deep in VMS data structures.
- « Previous
- Next »