Re: Looping installed image

Mick O'Brien · ‎06-28-2010

I have an installed image running under ACMS (DCL server with dynamic username) that intermitently loops (CPU bound not IO) and then needs to be stopped (killed).

I have looked at the code and can see no obvious reason for the loop and need to know if there is any way to 'trace' (for want of a better word) where the program is looping.

Note: Image is installed and runs against and RDB as an ACMS DCL server under dynamic username.

Any help appreciated.

OpenVMS V8.3
COBOL V2.8-1286
RDB 7.2-321
ACMS 5.1B

abrsvc · ‎06-28-2010

There are a number of methods for this, perhaps the simplest is to use SDA to examine the PCs of the looping process. There is a trace utility within SDA, but I have found that it is too quick. I usually create an X.COM file with 10-20 "exam PC" commands in it and set the SDA focus to the process that is looping. Do an "@X.COM" to a file or the screen and map the PCs to the actual image.

While little tedious, it gives you a gross idea of the code flow. Once you narrow that part down, the PC trace utility can give you more details.

Dan

Bhadresh · ‎06-28-2010

>I have looked at the code and can see no obvious reason for the loop and need to know if there is any way to 'trace' (for want of a better word) where the program is looping.

Run the application and collect the pc sampling data.
To Collect pc sampling:
$anal/sys
SDA>pcs load
SDA>pcs start trace
SDA>pcs stop trace

Regards,
Bhadresh

P Muralidhar Kini · ‎06-28-2010

Hi Mark,

>> I have looked at the code and can see no obvious reason for the loop and
>> need to know if there is any way to 'trace' (for want of a better word)
>> where the program is looping.
Yes, there is.
You can make use of the PC sampling feature in order to find out what the
looping process is doing. From the PC sampling output, you should see some
set of PC's getting repeatedly logged in case of a loop in the program.
Mapping such a PC back to your source code would tell you where in the
program you have the loop.

PC Sampling Usage -
$ ANALYZE/SYSTEM
SDA> PCS LOAD
SDA> PCS START TRACE
... wait for some time ...
SDA> PCS STOP TRACE
SDA> SET OUTPUT PC.DAT
SDA> PCS SHOW TRACE
SDA> PCS UNLOAD
SDA> EXIT

Then analyze the PC.DAT file for PC's that are getting repeatedly logged.

If you know the PID of the process, you can narrow down the PC sampling by
making the PCS to sample PC's corresponding to a particular process.

i.e. In the above command,
SDA> PCS START TRACE --> Traces all PC's of all process

Instead use
SDA> PCS START TRACE/PID=XXX --> Traces all PC's of PID XXX

This would cause the PCS to sample PC's for only the process with specified PID.

Hope this helps.

Regards,
Murali

Let There Be Rock - AC/DC

Mick O'Brien · ‎06-28-2010

Murali,

Next time the process loops I will use the code you supplied - very clearly explained. Once I get the PC trace file I will no doubt be asking more questions.

Mick

P Muralidhar Kini · ‎06-28-2010

Hi Mark,

Also, for more information on the VMS SDA Extensions, Refer
* OpenVMS SDA Extensions
http://www.connect-community.de/Events/OpenVMS2009/folien/05-sda_extensions.html

For PCS sampling related information, in the above link, refer section
"PC Sampling Utility PCS commands:"

It also has a example of PCS sampling usage with corresponding output.

Regards,
Murali

Let There Be Rock - AC/DC

Hoff · ‎06-28-2010

With respect to the other respondents and their literal (and correct) replies here, the PC stuff (in conjunction with the link maps and compiler listings) will get you into the general area of the loop.

Which is interesting.

But not entirely useful.

While the tools cited are all functional, it's also easily feasible to snag a few PCs via a simple SHOW PROCESS /CONTINUOUS command.

A half-dozen PCs are often enough data. Particularly if you see a repeated PC or two somewhere in P0 space, then you're usually ready to proceed.

See where those addresses exist within the application.

How to get to the source code from a virtual address? How to translate from those PCs you have? The sequence is described here:

http://labs.hoffmanlabs.com/node/800

If that source code not pointing to an obvious trigger, then you can use more specific tools and techniques. In particular, run this application under the debugger. Yes, you can debug detached processes. Here's how:

http://labs.hoffmanlabs.com/node/803

You won't be able to debug an installed image, but you can set up the same privileged context for the detached process and run without the INSTALL.

If necessary, you can program the debugger if you can identify a trigger but not its cause. The debugger can then run in the background, waiting for the initial conditions for the loop to be met, and you'll then have some visibility into the run-up to the trigger.

http://labs.hoffmanlabs.com/node/848

If you're having difficulty with spotting the trigger with the debugger and with the debugger-level programming, then you can also build the application with the debugger activated in a signal handler, and send the debug signal over.

And if you're using the word "killed" or analogous in conjunction with this effort, then consider switching techniques and using the process dump mechanisms. Dial back the brute-force setting slightly. The debugging sequence and the preference for creating process dumps or crashdumps is analogous to using the >>> BOOT rather than the SRM crashdump command; sure, the immediate freak-out is fixed, but these tend to arise anew, and, well, you can choose to repeat the >>> BOOT or you can write a dump and go looking. In other words, capture a dump or other evidence, and +then+ restart.

As for how, you can use the SET PROCESS /DUMP command, or toss the appropriate $forcex signal over.

That signal can be allowed to cause the dump, or can captured by a signal handler. There's a simple signal handler (in C) here:

http://labs.hoffmanlabs.com/node/1438

For typical non-trivial production applications, it's usually best to instrument the code, too. To integrate debugging.

And given the use of COBOL (which avoids many of the foibles of C or Macro or Bliss), my initial suspicion would be in ACMS and system service and RTL error handling; not checking the return status or IOSB, for instance. Even so-called optional IOSB arguments aren't really optional; you should always specify the IOSB, ad always check the return status and the IOSB. Otherwise, errors can tend to accrete, and get weird.

The other and ancillary question would involve why the image is installed. If it is installed for security (privilege) reasons, then subsystem identifiers can be a good alternative.

Stephen Hoffman
HoffmanLabs LLC

Hein van den Heuvel · ‎06-29-2010

I start out with a quick MONI MODE... all USER? EXEC?

PC Samples are typically highly useful as others replied. I uses SHOW PROC/CONT, SDA> PCS, or SDA> PRF as I see fit.
But for for my (COBOL) customer the top hits are often in OTS$mumblfratz routines. So be sure to look for infrequenm but lower PC values to find out where in the code the program was when it call the OTS helpers.

Or, just cut out the (PC trace) middle man and try to find out how the process got there.
How? $ ANALYZE / SYSTEM ... SET PROC ACMSxxxSPxx ... SHOW STAC/USER/SUMM

Repeat 2 - 5 times. Pattern?

Good luck!
Hein

Hein van den Heuvel · ‎06-29-2010

>> my initial suspicion would be in ACMS

No way. In this setup ACMS only launches.

But it reminded me... maybe you can use :
$ ACMS /DEBUG /SERVER ACMSxxxSPxxx

With that you can break in to a running ACMS SP process and the DBG$INPUT and DBG$OUTPUT will be managed for you.

I happen to have used it yesterday, but for a procedure server process, on a DCL server. I needed it because ACMS V5.1 on Itanium does not support its normal ACMS/DEBUG and we had (have!) and issue with TDMS not returning filled fields for a specific request with a long input list. Using %ALL worked.

http://h71000.www7.hp.com/doc/721final/6607/6607pro_017.html

Hein

Hoff · ‎06-29-2010

>> my initial suspicion would be in ACMS

Might have misread the "and"? I was pointing to the potential for errors in the error handling from all the wheels that are spinning here. Not to ACMS itself.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Looping installed image

Looping installed image