- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Looping installed image
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 05:45 AM
тАО06-29-2010 05:45 AM
Re: Looping installed image
I had another looping process last night and did as suggested and got out some PC details. I noticed that there were a lot of calls to FDVSHR (sorry forgot to mention process uses FMS) which seems to indicate that process is hung up on a screen somewhere. These looping processes only happen in the evening but when I talked to looping process owner this morning I was advised that they definitely logged out before going home. Ignoring the end-users comments I was wondering whether a default timeout on an FDV$GETAL call is not being handled correctly within code thus causing the CPU loop - I looked for an FDV$STIME but could not find one. Does anyone know if FDV$GETAL has a default timeout?
Hein,
I like the idea of sticking the server into debug but I suspect I would not be allowed to run debug on a production server within SLA (service line agreement) just in case it brings the rest of the application down! Instead next time I'll do what you suggested to "...try to find out how the process got there."
Mick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 05:55 AM
тАО06-29-2010 05:55 AM
Re: Looping installed image
I ran into a problem where a mis-handled error condition would cause the SMG routines to loop forever in a COM state. The problem in initially tracking this down was that it occurred during a process shutdown. At the time of the loop, the majority of the process context was broken down. This means that "normal" debugging techniques were useless.
Check for return status checks that don't handle all status possibilities that might corrupt some buffers.
If you can post parts of the listings here with the PC values, we can perhaps provide some specific suggestions.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 06:06 AM
тАО06-29-2010 06:06 AM
Re: Looping installed image
>> I noticed that there were a lot of calls to FDVSHR (sorry forgot to
>> mention process uses FMS)
Now that you got the PC sample, the intresting thing would be to know the
source code to which the repeated PC values map.
Are you seeing some set of PC values getting repeatedly logged in the
PC sampling output ??
If Yes, to which source code line do those PC values map to ??
This would give a idea as to where exactly in your source code there is a loop.
Regards,
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 06:35 AM
тАО06-29-2010 06:35 AM
Re: Looping installed image
>> Does anyone know if FDV$GETAL has a default timeout?
FDV$STIME can be used to specify timeout value but you said that you did not
find that in your code.
Also, Refer the following link -
http://h71000.www7.hp.com/doc/73final/6619/6619pro_004.html
-> Section - 5.3.14.2 DECforms Timeout Values
Check if thats applicable in your case.
Regards,
Murali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 07:44 AM
тАО06-29-2010 07:44 AM
Re: Looping installed image
Either directly.
Or by instrumenting the code.
Or both.
The virtual addresses that other replies discuss will usually help you find the loop, but there's still work to be done in the source code to isolate the trigger for the loop.
I've seen all manner of network and lower-level errors triggering this sort of misbehavior, and those can lead application code into all manner of dark corners.
And yes, application errors are most definitely entirely in play here, too.
Old code often has issues with its error handling. The old stuff probably once worked fine, but just wasn't designed for modern systems and modern networks and modern problems.
Folks don't necessarily think about how much stuff has actually changed within the whole of the stack, and what that means for the applications.
As a testament to the degree of compatibility that has been achieved here, comparatively simple user applications that once used FMS to chat with VT100 or VT220 terminals are now using various wholly new intermediate layers out a telnet connection to some random terminal emulation on some other operating system, and you can be assured that there are whole zoos filled with the errors that can now arise here.
Networking in particular offers a well-populated zoo.
How much old COBOL code ever was ever implemented with the intermittent connectivity and with the requirements for reconnections and refreshes that can arise with mobile IP networking?
There's no magic bullet here. The PC is just the jumping off point. You're just going to have to isolate and debug your code.
And yes, this could well be an error in lower-level code, but (in most any non-trivial application environment) you're generally left to have to prove that is the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 03:21 PM
тАО06-29-2010 03:21 PM
Re: Looping installed image
When a process is looping:
$ SET PROCESS/SUSPEND/ID=looping-process
Now use SDA to examine the process. See SHOW CALL, SHOW CALL/NEXT etc... to walk up the call stack. The first few frames will be related to the SUSPEND, but you should get some idea of where the process is looping. Sanity check with:
$ SET PROCESS/RESUME
$ SET PROCESS/SUSPEND
and repeat the traceback. Do this a few times to find the deepest routine common to all. That will contain the loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-29-2010 11:53 PM
тАО06-29-2010 11:53 PM
Re: Looping installed image
I got another looping process last night for the same image - I have done as suggested and got out a PC file and have attached. I have also attached a link map BUT note that the map was generated recently on development and does not relate to the image on production.
Hein,
Tried to use command 'SHOW STAC/USER/SUMM' but it wants a range of memory locations?
John,
I'll give your suggested investigation route a try but I noticed from 'HELP' that command required a 'starting-address'?
Regards,
Mick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-30-2010 01:22 PM
тАО06-30-2010 01:22 PM
Re: Looping installed image
That set of PC samples doesn't tell you much. The process is obviously doing stuff, but you're really not interested in anything below your own code, so ignore any of the samples in RTLs. The exceptions may be interesting. Perhaps something is failing, but your code doesn't notice?
Try SUSPENDing and examining the call stack. That should help localise the issue into your own code, somewhere you can check. Also see SDA> SHOW IMAGE to determine base addresses, and if you don't have a current MAP file, please get one!
>but I noticed from 'HELP' that command
>required a 'starting-address'?
The starting address parameter is optional.
From SDA use SHOW CALL:
SDA> SHOW CALL
this will show you the current call frame. To step to the frame of the caller:
SDA> SHOW CALL/NEXT
repeat until you reach the top.
Minimally, take note of the return PC. There may be other things of interest. You can examine the instruction sequence leading up to the call with:
SDA> EXAMINE/INSTR
SDA> SHOW CALL/SUMMARY
gives one line per call frame, a bit like a traceback.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-30-2010 01:43 PM
тАО06-30-2010 01:43 PM
Re: Looping installed image
My bad. I mixed up SHOW CALL and SHOW STACK. Sorry.
btw... Are you also using the full ACMS tasks with their (TDMS?) terminal request and procedure calls into server procedure images?
The system you describe is Alpha right. Any Itaniums? Shoot me an Email ?
Anyway...
Thanks for the trace.
You gave the raw one, not the statistics.
Here is the 'top 20' + user image top 10...
$ perl -ne "$pc{$1}++ if / U [0-9A-F]+ (\S+)/; }{ for (sort {$pc{$b}<=>$pc{$a}} keys %pc){ print qq($pc
{$_}\t$_\n) if /COIN/ or $i++<20}" pc.dat
592 FDVSHR+3DB04
568 FDVSHR+377C4
541 FDVSHR+37680
537 FDVSHR+378F4
503 FDVSHR+377FC
503 FDVSHR+376CC
486 FDVSHR+40098
451 LIBOTS+2417C
435 MMG_STD$SWAP_PTBR_C+00838
394 FDVSHR+3DBD4
392 FDVSHR+3D9C4
371 MMG_STD$SWAP_PTBR_C+00840
342 FDVSHR+40070
293 FDVSHR+3DA84
263 MMG_STD$SWAP_PTBR_C+00830
196 FDVSHR+36C00
185 EXE$SYNCH_LOOP_C+00DF4
175 LIBOTS+24170
152 LIBOTS+20290
145 LIBRTL+76A0C
10 COIN_DCL_OSIP+7E294
8 COIN_DCL_OSIP+7C150
8 COIN_DCL_OSIP+7E448
7 COIN_DCL_OSIP+869C4
6 COIN_DCL_OSIP+7E2EC
6 COIN_DCL_OSIP+86940
5 COIN_DCL_OSIP+7E460
5 COIN_DCL_OSIP+86A54
5 COIN_DCL_OSIP+7E344
5 COIN_DCL_OSIP+7E258
5 COIN_DCL_OSIP+7BED0
4 COIN_DCL_OSIP+7C1A4
4 COIN_DCL_OSIP+7E240
4 COIN_DCL_OSIP+7E1D8
4 COIN_DCL_OSIP+86960
3 COIN_DCL_OSIP+7E2C4
3 COIN_DCL_OSIP+7E4C0
3 COIN_DCL_OSIP+7E280
3 COIN_DCL_OSIP+869A4
Hein
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-30-2010 03:02 PM
тАО06-30-2010 03:02 PM
Re: Looping installed image
Would it be fair to say this code has been running for over 25 years without the looping behaviour? What's changed? New VMS version, layered products, itanium test? When was the last change to OSIP?
DDoes it only happen after running normally for X hours? Quotas? Leaks?
Over time has an internal COBOL array grown past its limits and is overwriting memory below? Number or transactions, no. of customers etc?
WHile your waiting for the correct oscilliscope settings it might be worth just inspecting the code for CALLs to services that don't check the return status at all or, as Hoff mentioned, the iosb.
What username do the DCL servers run under? (Ooops! I IIRC don't answer that :-( )
Anyway it seemed time for clutching at straws :-)
Cheers Richard Maher
PS. It's very cold here :-(