- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- losing ASTs rapidly
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 04:07 PM
02-26-2009 04:07 PM
This follows on from a problem I reported earlier which I haven’t been able to address. We support an IA64 system under VMS which runs a racetrack. Other than selling bets we have to send messages to TV screens every 10s. The messages leave the host computer head to the Ethernet, then to our communications devices and finally to the TVs.
The problem I am getting relates to process quotas being exceeded. I have added some diagnostics to the program and I can see that the AST count is the quota in question. The count is initially 200 (which ties in with software limits) and is drained to 0 in 80s. All other quotas remain untouched.
The process in question hibernates for 10s and checks to see which of 6 TV channels to update. It then decides which individual communications devices to broadcast a message to (there may be multiple TVs connected to 1 comms device). I would expect one AST to be used for each broadcast.
I have calculated that there would be 11 devices receiving each message. The depreciation I see in the AST count each 10s is 11, 44, 11, 33, 11, 33, 55 = 198. When I check the activity of that process over the same period it sends out 18 messages (11 * 18 = 198).
I have the Ethernet logs for the day and this ties in with what I have already found. IE. Every broadcast is sent in groups of 11 and I have matched this up to the comms devices.
So the behaviour of the system is consistent throughout the day but at some point my AST count is not getting refreshed. I can not track back and find whether the TVs were updated or not.
Is there something I am missing here where an individual process is not able to communicate with the Ethernet? I should point out that the rest of the system is ticking over just fine!
Using an analogy: after an hour or 2 in the pub the bar person has decided simply to ignore your requests while carrying on serving other customers! And you are not drunk :)
Is this a stupid question: is there a limit to how much data one process can pass to the Ethernet during its lifetime? Or a similar restriction which can only be cleared once the process is restarted? FYI when the process is restarted it carries on as normal.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 04:41 PM
02-26-2009 04:41 PM
Re: losing ASTs rapidly
In a word, there is no limit on Ethernet connectivity (if there was, one would be able to hear it without the benefit of electronic amplification). Large numbers of sites move gigabytes (or more) per day through their Ethernets without incident.
What is happening is that the AST limit for the process is being reached. When the process is terminated, all ASTs are ended. The new process starts over with a clean slate.
There are many mis-conceptions floating around about ASTs. Having taught many courses on their use, I can assure you that they are one of the soundest and safest mechanisms in OpenVMS. (see "Introduction to AST Programming" at http://www.rlgsc.com/cets/2000/index.html ).
Guessing is dangerous. There are two ways to locate this problem:
- a review of the sources
- using ANALYZE/SYSTEM to see where all of the AST quotas have disappeared to.
It is unlikely that they have disappeared. More likely, they are being tied up with some operation that is taking a long time to expire. There are several system requests that can specify ASTs (e.g., locks, timers, IO). I would suspect that some code path related to the write cycle is doing something incorrectly and tying up pending ASTs.
Without reviewing the sources, it is difficult to analyze.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 05:00 PM
02-26-2009 05:00 PM
Re: losing ASTs rapidly
Given the quota is so low, as a first cut diagnostic, I'd increase it significantly, say 4096 (I tend to use powers of 2 for quotas, probably more voodoo than science). I'd then track the process to make sure it's not a continuous leak.
It's possible the program is running faster than whenever this worked, so it's racing with the ASTs (and winning ;-).
It's also possible that there are more things consuming ASTs. As long as the process plateau's in its maximum usage of ASTs, quotas in that order of magnitude really don't matter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 06:31 PM
02-26-2009 06:31 PM
Re: losing ASTs rapidly
The activity of this program should not change much during the day.
So, I am treating this as a continuous leak. I am guessing that (as per Bob's email) there is something else in the program which is causing this. And which is not evident on the logs I am reviewing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 07:36 PM
02-26-2009 07:36 PM
Re: losing ASTs rapidly
Is one/some of your ASTs doing synchronous i/o? Waiting for something else like a lock in synchronous mode? Preventing other ASTs from being delivered and ending up queued instead?
Cheers Richard Maher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 07:50 PM
02-26-2009 07:50 PM
Re: losing ASTs rapidly
I thought I had ventured into all code sections of this system. Here is another completely new area for me.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 08:43 PM
02-26-2009 08:43 PM
Re: losing ASTs rapidly
Do consider using ANALYZE/SYSTEM to examine the subject process and determine why the AST Limit is depleted.
In concept, there are two possibilities:
- Some operation of extended (possibly essentially forever) is tying up potential ASTs.
- AST delivery is being blocked and the IO synchronization is being resolved somewhere else (correctly or incorrectly).
Looking at the process in ANALYZE/SYSTEM will tell whether the ASTs are queued for delivery (the latter) or whether the pending count is the issue.
Admittedly, this does not resolve where in the code the problem is. However, it does tell you the (first) problem you are looking for.
I would not necessarily stop with the first problem. A review of the code is in order. In my experience, if AST management is not set up correctly in one aspect, the odds are not insignificant that there are other issues that have not been done properly.
The concern "if this is wrong, it is not likely the only aspect" is not specific to ASTs. If one finds a piece of code with odd pointer problems, it is not likely a single occurrence.
One common practice that I proscribe is the use of SYS$QIOW and other [W] (synchronous) forms of system services from within code that is called at AST level. It can be done, but it is most often an accident waiting to happen.
- Bob Gezelter, http://www.rlgsc.com
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 09:04 PM
02-26-2009 09:04 PM
SolutionJust a thought.
Was this code ported to Itanium recently?
If so, then be on the lookout for timing presumptions that may no longer be valid. The Itanium platforms are faster, as well as often multi-processor.
Thus, a gimlet eye towards the re-use of data structures is in order. Even if the code is single-threaded (one kernel thread), it may be possible for QIO to complete very fast (since the processing of the actual IO could take place on the other CPU). This can cause an unrealized latent race condition to manifest.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-26-2009 09:20 PM
02-26-2009 09:20 PM
Re: losing ASTs rapidly
So vague as to be next to useless I know, but seeing as how you're firing a shotgun anyway :-)
Cheers Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2009 12:36 AM
02-27-2009 12:36 AM
Re: losing ASTs rapidly
Jur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2009 03:34 AM
02-27-2009 03:34 AM
Re: losing ASTs rapidly
did this process ever work correctly on OpenVMS I64 ? When did it start to behave like this ? What has been done to the system prior to the first 'failure' ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-27-2009 03:48 AM
02-27-2009 03:48 AM
Re: losing ASTs rapidly
consider to use ANAL/SYS and provide the following information from this process, when it's 'lost' a couple of ASTs:
$ ANAL/SYS
SDA> READ SYSDEF
SDA> SET PROC/ID=
SDA> SHOW PROC
SDA> SHOW PROC/PHD
SDA> FORM PCB
SDA> EXIT
Collect the output into a .TXT file and attach it to your next reply.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2009 12:08 AM
03-01-2009 12:08 AM
Re: losing ASTs rapidly
The problem I face is that this issue occurs sporadically (on average one time in a month - system is up 7 days a week). Due to the nature of the application I can not interrogate the problem when it occurs.
I am going down the IOSB path. In other places we do use this facility. I can't see why we don't do it here in one of the most critical part of the whole system!
Cheer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2009 12:16 AM
03-01-2009 12:16 AM
Re: losing ASTs rapidly
does the process crash, if the problem happens ? Or does it just hang ? I assume you have mechanisms in place to re-start the process, if the problem happens.
If it crashes, run it with /DUMP or issue a SET PROC/DUMP before starting the image.
If it hangs, include a SET PROC/DUMP=NOW before stopping and re-starting the process.
If it just issues an error message and exits by itself, call a LIB$STOP(SS$_IMGDMP), this will force an image dump.
You can then do the analysis offline in the image dump with ANAL/PROC. Most of the process-related system data is also available in the image dump.
Do you disable AST delivery somewhere in the application ? And not re-enable it ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2009 12:18 PM
03-01-2009 12:18 PM
Re: losing ASTs rapidly
I had not thought about disabling ASTs. Obviously this is not intentional but possible. Any ideas how I would do this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2009 01:23 PM
03-01-2009 01:23 PM
Re: losing ASTs rapidly
>So, I am treating this as a continuous leak
I'd still recommend testing your program with a higher limit, just to make sure you're not experiencing a spike in load. It's unlikely to cause any resource problems, and you may find your program recovers itself.
Instead of assuming it's a leak, make sure!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-01-2009 01:37 PM
03-01-2009 01:37 PM
Re: losing ASTs rapidly
Concerning Disabling ASTs.
In short, read the code. Also, scan the code base for references to $SETAST or SYS$SETAST.
Obviously, also check the routines which call the routines which invoke those routines, particularly error paths.
I make several recommendations about how to do AST programming with a fair degree of safety in my DECUS presentation [mentioned earlier in this thread].
One good rule: Always use an IOSB that cannot re-cycled before the AST is processed AND never use event flags in conjunction with ASTs.
Another good rule is to include a logic check in the program to ensure that a buffer/IOSB combination is not recycled while it has a pending operation. Such a logic check often identifies an incorrect set of logic in the program long before the evidence is disturbed.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2009 01:56 PM
03-02-2009 01:56 PM
Re: losing ASTs rapidly
SDA> READ SYSDEF
SDA> SHOW SUMMARY ! to get the process index
SDA> SET PROCESS/INDEX=...
SDA> VALIDATE QUEUE PCB+PCB$L_ASTQFL_U
SDA> VALIDATE QUEUE PCB+PCB$L_ASTQFL_E
And a couple of..
SDA> FORMAT PCB+PCB$L_ASTQFL_U
SDA> FORMAT @.
Also...
SDA> SHOW CALL
SDA> SHOW CALL/NEXT
Get a linker map of the program image and find where the PCs are in the source code.
This all under the asumption that indeed the process is running out of ASTLM.
/Guenther
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2009 06:58 PM
03-02-2009 06:58 PM
Re: losing ASTs rapidly
My system has about 35 sub processes hanging off a main process. These sub processes have varying priorities from 4 to 15.
Basically what I am trying to establish is that if one of these processes calls setast to halt ASTs, this affects the whole bunch rather than just the process itself. Does that sound correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-02-2009 11:08 PM
03-02-2009 11:08 PM
Re: losing ASTs rapidly
AST quota is NOT a pooled quota. It is PER PROCESS not PER JOB.
You said that 'the program hangs'. What is the state of this or these processes as reported by SHOW SYSTEM/PROC=xxx ?
As this problem show up only very intermittently, capturing a process dump is the most important work item. Then you can check and answer all the question about where the outstanding ASTs may be pending, whether ASTs are disabled etc.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-03-2009 12:37 AM
03-03-2009 12:37 AM
Re: losing ASTs rapidly
Do you really mean priorities 4 to *15*? How many CPUs have you got?
sys$setast allows you to explicitly disable/enable ASTs. You also implicity disable ASTs while you're in an AST at the same access mode. Most of your ASTs will be in User Mode. Although RMS, Rdb, Tier3 etc operate mainly in Exec mode.
So If you got a pyramid game where one AST generates more than one additional ASTs (such as itself) per invocation then you can run out of quota pretty quick.
OTOH if a server is unable to respond to an AST 'cos it can't get a time slice at its curr priority then that might clog things up as well.
Not much help but there's not alot of hard evidence and with that architecture many, many things could go wrong - sorry.
Perhaps it's time to get someone in?
Cheers Richard Maher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-03-2009 01:03 AM
03-03-2009 01:03 AM
Re: losing ASTs rapidly
I am glad that the session notes were helpful. They are not a substitute for being there. They are an attempt to summarize good rules that I have used for many years in using AST-based mechanisms successfully, and resolving problems with existing uses of ASTs.
As Volker has already noted, AST quotas are definitely per process (a sub process has its own process control block).
There is no pooling between processes in a job or group. For completeness, I will note that it is possible to run out of non-paged pool, but with today's pool sizes that is extraordinarily improbable (but must be mentioned for completeness).
I mentioned previously that there are two paths here, and it remains so:
- get a dump at the point of failure; and
- analyze the logic.
These are not exclusive forks. The process dump tells one precisely what happened, to wit the proximal cause of the failure. If nothing else, it can rule out certain possibilities. A careful code review is necessary in any event. Unless the cause turns out to be a small case of insufficient quota, where a code base has one error in AST management, it is not unlikely that there are more lurking about.
- Bob Gezelter, http://www.rlgsc.com