1748021 Members
5143 Online
108757 Solutions
New Discussion юеВ

Re: losing ASTs rapidly

 
SOLVED
Go to solution
tim lloyd_1
Frequent Advisor

losing ASTs rapidly

Hi,

This follows on from a problem I reported earlier which I havenтАЩt been able to address. We support an IA64 system under VMS which runs a racetrack. Other than selling bets we have to send messages to TV screens every 10s. The messages leave the host computer head to the Ethernet, then to our communications devices and finally to the TVs.

The problem I am getting relates to process quotas being exceeded. I have added some diagnostics to the program and I can see that the AST count is the quota in question. The count is initially 200 (which ties in with software limits) and is drained to 0 in 80s. All other quotas remain untouched.

The process in question hibernates for 10s and checks to see which of 6 TV channels to update. It then decides which individual communications devices to broadcast a message to (there may be multiple TVs connected to 1 comms device). I would expect one AST to be used for each broadcast.

I have calculated that there would be 11 devices receiving each message. The depreciation I see in the AST count each 10s is 11, 44, 11, 33, 11, 33, 55 = 198. When I check the activity of that process over the same period it sends out 18 messages (11 * 18 = 198).

I have the Ethernet logs for the day and this ties in with what I have already found. IE. Every broadcast is sent in groups of 11 and I have matched this up to the comms devices.

So the behaviour of the system is consistent throughout the day but at some point my AST count is not getting refreshed. I can not track back and find whether the TVs were updated or not.

Is there something I am missing here where an individual process is not able to communicate with the Ethernet? I should point out that the rest of the system is ticking over just fine!

Using an analogy: after an hour or 2 in the pub the bar person has decided simply to ignore your requests while carrying on serving other customers! And you are not drunk :)

Is this a stupid question: is there a limit to how much data one process can pass to the Ethernet during its lifetime? Or a similar restriction which can only be cleared once the process is restarted? FYI when the process is restarted it carries on as normal.
21 REPLIES 21
Robert Gezelter
Honored Contributor

Re: losing ASTs rapidly

Tim,

In a word, there is no limit on Ethernet connectivity (if there was, one would be able to hear it without the benefit of electronic amplification). Large numbers of sites move gigabytes (or more) per day through their Ethernets without incident.

What is happening is that the AST limit for the process is being reached. When the process is terminated, all ASTs are ended. The new process starts over with a clean slate.

There are many mis-conceptions floating around about ASTs. Having taught many courses on their use, I can assure you that they are one of the soundest and safest mechanisms in OpenVMS. (see "Introduction to AST Programming" at http://www.rlgsc.com/cets/2000/index.html ).

Guessing is dangerous. There are two ways to locate this problem:

- a review of the sources
- using ANALYZE/SYSTEM to see where all of the AST quotas have disappeared to.

It is unlikely that they have disappeared. More likely, they are being tied up with some operation that is taking a long time to expire. There are several system requests that can specify ASTs (e.g., locks, timers, IO). I would suspect that some code path related to the write cycle is doing something incorrectly and tying up pending ASTs.

Without reviewing the sources, it is difficult to analyze.

- Bob Gezelter, http://www.rlgsc.com
John Gillings
Honored Contributor

Re: losing ASTs rapidly

Tim,

Given the quota is so low, as a first cut diagnostic, I'd increase it significantly, say 4096 (I tend to use powers of 2 for quotas, probably more voodoo than science). I'd then track the process to make sure it's not a continuous leak.

It's possible the program is running faster than whenever this worked, so it's racing with the ASTs (and winning ;-).

It's also possible that there are more things consuming ASTs. As long as the process plateau's in its maximum usage of ASTs, quotas in that order of magnitude really don't matter.
A crucible of informative mistakes
tim lloyd_1
Frequent Advisor

Re: losing ASTs rapidly

thanks for the responses. I appreciate the AST limit is low but I would expect a max of 60-70 messages every 10s and the ethernet trace tends to back this up.

The activity of this program should not change much during the day.

So, I am treating this as a continuous leak. I am guessing that (as per Bob's email) there is something else in the program which is causing this. And which is not evident on the logs I am reviewing.
Richard J Maher
Trusted Contributor

Re: losing ASTs rapidly

Hi Tim,

Is one/some of your ASTs doing synchronous i/o? Waiting for something else like a lock in synchronous mode? Preventing other ASTs from being delivered and ending up queued instead?

Cheers Richard Maher
tim lloyd_1
Frequent Advisor

Re: losing ASTs rapidly

that's a very valid point. We do issue a qiow in a different section of the same program. There should be an error message logged if the directive fails but it is a lead that is worth chasing up.

I thought I had ventured into all code sections of this system. Here is another completely new area for me.

Thanks
Robert Gezelter
Honored Contributor

Re: losing ASTs rapidly

Tim,

Do consider using ANALYZE/SYSTEM to examine the subject process and determine why the AST Limit is depleted.

In concept, there are two possibilities:

- Some operation of extended (possibly essentially forever) is tying up potential ASTs.
- AST delivery is being blocked and the IO synchronization is being resolved somewhere else (correctly or incorrectly).

Looking at the process in ANALYZE/SYSTEM will tell whether the ASTs are queued for delivery (the latter) or whether the pending count is the issue.

Admittedly, this does not resolve where in the code the problem is. However, it does tell you the (first) problem you are looking for.

I would not necessarily stop with the first problem. A review of the code is in order. In my experience, if AST management is not set up correctly in one aspect, the odds are not insignificant that there are other issues that have not been done properly.

The concern "if this is wrong, it is not likely the only aspect" is not specific to ASTs. If one finds a piece of code with odd pointer problems, it is not likely a single occurrence.

One common practice that I proscribe is the use of SYS$QIOW and other [W] (synchronous) forms of system services from within code that is called at AST level. It can be done, but it is most often an accident waiting to happen.

- Bob Gezelter, http://www.rlgsc.com
- Bob Gezelter, http://www.rlgsc.com
Robert Gezelter
Honored Contributor
Solution

Re: losing ASTs rapidly

Tim,

Just a thought.

Was this code ported to Itanium recently?

If so, then be on the lookout for timing presumptions that may no longer be valid. The Itanium platforms are faster, as well as often multi-processor.

Thus, a gimlet eye towards the re-use of data structures is in order. Even if the code is single-threaded (one kernel thread), it may be possible for QIO to complete very fast (since the processing of the actual IO could take place on the other CPU). This can cause an unrealized latent race condition to manifest.

- Bob Gezelter, http://www.rlgsc.com
Richard J Maher
Trusted Contributor

Re: losing ASTs rapidly

And the other side of the coin where you're doing *A*synchronous stuff but some condition is causing you to repeat it many times within a single AST invocation.

So vague as to be next to useless I know, but seeing as how you're firing a shotgun anyway :-)

Cheers Richard
Jur van der Burg
Respected Contributor

Re: losing ASTs rapidly

Another thing to look for is that if you issue a $qio[w] you should not only check the return status from the qio but also the status in the iosb (you DO specify an iosb don't you?). Failure to do so is a frequent case of weird problems.

Jur.