1828218 Members
2029 Online
109975 Solutions
New Discussion

Looping installed image

 
Mick O'Brien
Advisor

Re: Looping installed image

Hein,

Thanks for the PERL script, however there are a couple of problems: -

o we do not have PERL installed anywhere
o I have never used it and it looks like Greek to me

Do you have a DCL version? If so would it be possible to just enter a PID as a parameter and it could do all the PC capture and analysis?

Richard (or should I say Mr Blobby),

Spoke to our DBA (that's Adam) who tells me that the looping process started after he extended the statement database so as to accomodate a further 5 years of data - but if this was the problem I would expect to see issues through-out the day and not just after 6pm (there have not been any looping processes for the past 2 days so I have not been able to get anymore PC details).

We have a process that runs every hour that does an ACMS/CANCEL on users that are inactive but I can't see details of users cancelled matching looping process dynamic user names.

I think the issue is related to FMS and returned statuses not being checked correctly - I have a vague recollection of seeing a looping bug many years ago when the workspace loaded into the FMS form was larger than the expected value (i.e. the COIN select field was 'set up' and returned to calling program). However I do not understand how this could happen and NOT be reported by end-users - they would see it looping on their screen!). My assumption is that it is a detached process looping around FMS calls that is looping but I don't know how to prove that.

There is ONLY CPU usage i.e. no IO of any sort.

Mick

PS We have had the driest first 6 months of the year since 1929 and I now weigh 90.1kg (i.e. I seem to have lost 10kg since you left - no more boozy lunchtimes)
Hein van den Heuvel
Honored Contributor

Re: Looping installed image

Mick,

Perl was just a handy language for me.
Pick your poison. DCL, C, ruby, java...

Perl can be made to look like gibberish at times, but it's rather effective.

I have it installed 'everywhere'.
On my VMS systems, Linux boxes, HPUX, Windows...

For your data I ran it on my windows laptop.

Just grab a download. I like:
http://www.activestate.com/activeperl/downloads

fwiw,
Hein.



Richard J Maher
Trusted Contributor

Re: Looping installed image

> Spoke to our DBA (that's Adam)

And I'm sure you used the term loosely ;-)

> who tells me that the looping process started after he extended
> the statement database so as to accomodate a further 5 years of data

This is normally what I'd refer to as the "Ta-Dah!" moment. I can't say for sure that this is the cause of your problem but I'd certanly budget a few hours investigation on the strength of STM(N?)T_DB being verballed! (Statement limits? Statement Line Limits?) Having said that I believe you only show/retrieve one month at a time so I guess it shouldn't be a new problem.

> - but if this was the problem I would expect to see issues through-out
> the day and not just after 6pm (there have not been any looping
> processes for the past 2 days so I have not been able to get anymore
> PC details).

Yes and no. That's like saying "Why doesn't it happen at every 6pm?". IIRC you audit the query every user does with the option and the key value, so the OSIP for Fred User at 6:00pm might have the account number detail to check and attempt a reproducer?

Then again being in the middle of a query and clicking |X| should be easy to test as well. Are there sometimes response problems and the user getting fed up waiting at knock-off time? (But statement retrieval was by design, and I'm sure still is, lightening fast.) Having said that, I do remember all those Rdb Freeze-Locks and Recovery Processes from other ACMS options. I'm sure Adam is "monitoring".

Possible scenario: -

1) User does a query
2) Database response poor as Rdb grabs a freeze lock and attempts to rollback txn from other impatient user.
3) User 1 gets fed up and shutsdown
4) When the process comes back FDV calls start failing due to lack of terminal
5) Loop (Why hasn't ACMS killed the process or exited the image?)

> We have a process that runs every hour that does an ACMS/CANCEL
> on users that are inactive

Yep, that's one way to do it.

> I think the issue is related to FMS and returned statuses not being
> checked correctly - I have a vague recollection of seeing a looping
> bug many years ago when the workspace loaded into the FMS form was
> larger than the expected value (i.e. the COIN select field was
> 'set up' and returned to calling program).

Certainly sounds plausible. Also 6:00pm does look suspiciously like a knock-off time and, as Hein suggested, something upsetting the UI at close.

Have to wait until it happens again I suppose.

Just a thought but, if you are rebuilding, it could be worth putting /CHECK=(bounds or all) Could also set that FDV$set_errors_to_signal call but there's probably half a dozen other errors that should be ligitimately ignored that would end up running amoc with a COBOL program not ready for errors to be signalled.

Good-hunting!

Cheers Richarde Maher

PS. Please advise Zorba that he can thank me for the ability to simply add a few storage areas and then do an ALTER STORAGE MAP (with the mixed page format, clustered parent/child set-up, if it helps anybody?)

PPS. Only 5 more years room? How many times have we seen/heard "COIN will be dead in 2 years!" before?

PPPS. Hope Jae and the kids are well and you're as happy as I am that the World Cup, Wimbledon, and Tour de Boredom have wiped cricket off the map!
Mick O'Brien
Advisor

Re: Looping installed image

Chaps,

Got a looping process a couple of nights ago (after a long gap) and I have attached zip file with source - trace details to be attached separately.
Mick O'Brien
Advisor

Re: Looping installed image

First PC zip file
Mick O'Brien
Advisor

Re: Looping installed image

Second PC zip file (also includes PC Stat and some traceback)

Any help appreciated.
Hein van den Heuvel
Honored Contributor

Re: Looping installed image


First:

Stack Frame 0007E1D8 002F9420 FDVSHR+33420
Stack Frame 00078914 0007B9D0 COIN_DCL_OSIP+7B9D0
Stack Frame 000717AC 00078310 COIN_DCL_OSIP+78310
Stack Frame 0009A370 00070000 COIN_DCL_OSIP+70000

Usng the MAP and LIS file this gives:

00078914 ---> JSR R26, OSIP_SEL_STATEMENT
OSIP 00078310 00079047 00000D38 ( 3384.) OCTA 4

$ x=%x00078914 - %x00078310
$ show sym x
X = 1540 Hex = 00000604 Octal = 00000003004

and

OSIP_SEL_STATEMENT
0007E258 --> 2884 2884 JSR R26, FDV$GETAL
2888 LDL R14, -2136(R6)
288C LDL R15, WS-CANN
0007B9D0 0007E757 00002D88 (
11656.) OCTA 4

Do you already not see what I did not see?
Nothing it checking the return status in R0!?

$ x=%x7E258 - %x0007B9D0
$ show sym x
X = 10376 Hex = 00002888 Octal = 00000024210

Looking 'upwards' in the source we see this is line 1324

The source for this reads:

PERFORM WITH TEST AFTER
UNTIL ( FMS-TERMINATOR = FDV$K_KF_NTR )
OR ( RETURN-STATUS FAILURE )

CALL "FDV$GETAL"
USING BY DESCRIPTOR FMS-FORM-BUFFER
BY REFERENCE FMS-TERMINATOR
BY DESCRIPTOR FMS-START-FIELD
BY REFERENCE FMS-START-INDEX

IF ( RETURN-STATUS = WS-CANN )
OR ( RETURN-STATUS SUCCESS )
THEN
SET RETURN-STATUS TO SUCCESS
CALL "GETAL_TERM"
USING FMS-START-FIELD
FMS-START-INDEX
END-IF
END-PERFORM.

So... if return status on entry is failure, say due to a failed (telnet) terminal IO, or running out of memory of something then it will not change, and supposedly the call will fail and never set FMS-TERMINATOR = FDV$K_KF_NTR. So that could be your infinite loop.
That GETAL call needs a GIVING RETURN-STATUS does it not?
Not sure where all the exception code comes from. Maybe that's hwo FDV works. Dunno.

Good luck!
Hein





Mick O'Brien
Advisor

Re: Looping installed image

Hein,

Thanks for the reply but its mostly over my head - I am unable to follow how you get from stack frame to source. For example, you say

"Usng the MAP and LIS file this gives:

00078914 ---> JSR R26, OSIP_SEL_STATEMENT
OSIP 00078310 00079047 00000D38 ( 3384.) OCTA 4

$ x=%x00078914 - %x00078310
$ show sym x
X = 1540 Hex = 00000604 Octal = 00000003004"

...but when I do a search of .lis and .map files for 00078914 I cannot find text from above. Nor do I understand why you are doing a hex subtraction OR what value of 1540 is used for?

My other concern is that the source you identify as problematic is from a library (include) file that is heavily used so a modification to this would have wide implications for our production code.

Can I ask you to be a bit more Janet and John with your explanation and give me a bit more detail at each step?

Thanks and regards,
Mick
Hein van den Heuvel
Honored Contributor

Re: Looping installed image

>> Thanks for the reply but its mostly over my head -
That's why they pay me the big bucks! (on good days).

>> I am unable to follow how you get from stack frame to source.

Ok, I'll go slower in a next reply, just not right now.
last night, and now, I just wanted there to be a reply for you with something to go on already.

>> My other concern is that the source you identify as problematic is from a library (include) file that is heavily used so a modification to this would have wide implications for our production code.

Understood, but that sometimes happens.
New external routines making you enter the function through a path which is slightly different.
In this case I was thinking 'what if you entered with return-status as failure from the get go, and nothing there to reset it to success.

Looking around in the code a little more I now notice:

"CALL "FDV$SSRV" USING BY REFERENCE RETURN-STATUS
BY REFERENCE RETURN-STATUS."

and

1161 *This line is VERY important - so get your sticky mits off.
1162 SET RETURN-STATUS TO SUCCESS.

So maybe that's OK after all.... UNLESS there was no TCA available. In which case the SSRV call fails, and there again no error checking is done!
I also do not like this notion of passing the same address twice. With that the code is suggesting is 'knows' which one is written first, and that is knows which one is more important. I would prefer dropping the optional IOSTAT or provide a real, seperate, memory location for that.
Also, SSRV can be a timebomb, as its pointers remain set after the module returns. if a stack variable had been passed (local variable in C) then that can lead to fun. Fortunately this is Cobol so the address stays valid.

Check out:

http://odl.sysworks.biz/disk$axpdocjun002/progtool/dyy4aaa3.p130.bkb#1131

Now that you potentially know where the program is looping you, or someone to help you, can potentially check out the RETURN-STATUS and other variable, while it is looping using SDA.

If you think you might go that route, then i strongly encourage you to practice without pressure, just looking around for recognizable data, guided by MAP and LIStings.

Good Luck!
Hein