Operating System - OpenVMS
1753321 Members
6435 Online
108792 Solutions
New Discussion юеВ

Re: Looping installed image

 
Mick O'Brien
Advisor

Re: Looping installed image

John,

Thanks for the pointer on 'SHOW CALL' - I'll give that a try next time

Hein,

How do I get the statistics out?

Richard,

OSIP is 17 years old buts it current 'ACMS' version is 12 years old (it's an image [that uses FMS] called from ACMS DCL server). The process started to loop about a year ago but as our version of ACMS was out of support we sort of hoped that the upgrade to current level would sort it out (it did not). The underlying code itself has not changed - the last time the image was released was November 2006 (a new version (recompiled/linked) was release post upgrade). The code has looped twice so far this week (Monday and Tuesday) and the alert comes out between 6pm and 6:30pm - I contacted the two users that the image was running under (remember it├в s a DCL server with dynamic username) and they tell me: -

1) They did not do anything unusual
2) Logged off properly (from what they can remember)

I'm going to rebuild/link the image with compile options of /list/machine and link options of /map and try to get that installed THEN get a few more PC stats out

Regards,
Mick

(PS Its supposed to be 27C here but I'm in my cardi [but no vest])
abrsvc
Respected Contributor

Re: Looping installed image

Mick,

Where are you located? Perhaps a visit might be in order.

Dan
Mick O'Brien
Advisor

Re: Looping installed image

Costa Del Coutts (London)
Hein van den Heuvel
Honored Contributor

Re: Looping installed image

Richard M... great question! Why now?
My WAG is emulated terminal sessions being disconnected and the resulting terminal IO errors leading the program astray.


>> Hein, How do I get the statistics out?

Read onwards for alternative thoughts, but the simple, and often sufficient, way is:

SDA> PCS SHOW TRACE /STAT


As with any SDA extension you can help a quick command overview by just typing the extension name, here PCS.


For this case, PRF may get you more and better data quicker.

Beware as to how to interpret the traces...
The raw, timestamped, full trace from either PCD or PRF may lead you to believe you are looking at a natural flow through a program. Not so! They are just samples and only indicate how often, not in which sequence.
And the samples will pick up 'slow' memory access PC's more often then register moves.

One can speculate at a sequence by 'fuzzy' sorting the data roughly by counts first and by address next.

As a coarse example I changed the perl 'one-liner' presented earlier to a program and called out the anonymous simple sort function used earlier into a proper labeled subroutine.

------------------ sda_pcs_trace.pl -------
sub my_sort {
# if the PC stats counter stats are close
# then sort ascending by address
# else sort descending by count
#
if ( abs( $pc{$a} - $pc{$b}) <= $pc{$a}/3 ) {
return ($a cmp $b);
} else {
return ( $pc{$b} <=> $pc{$a});
}
}

while ( <> ) {
$pc{$1}++ if / U [0-9A-F]+ (\S+)/;
}

for (sort my_sort keys %pc) {
next unless /COIN/ or $i++ < 20;
printf qq(%6d %s\n), $pc{$_}, $_;
}
-----------------------

Now when we run this we get 'zones' of program activity. Because of the low sample count I used a coarse 30% range to group PC's because I wanted 3 coinsurance to sort equal to 4. With more samples you want to change the divider from 3 to 10 or some such and just eyeball whether the result speaks more clearly to you.

Sample resorted output below.

Hope this helps some more.
Hein van den Heuvel
HvdH Performance Consulting

------------ sample stats run -----------
# perl sda_pcs_trace.pl pc.dat
541 FDVSHR+37680
503 FDVSHR+376CC
568 FDVSHR+377C4
503 FDVSHR+377FC
537 FDVSHR+378F4
592 FDVSHR+3DB04
392 FDVSHR+3D9C4
394 FDVSHR+3DBD4
486 FDVSHR+40098
451 LIBOTS+2417C
293 FDVSHR+3DA84
342 FDVSHR+40070
435 MMG_STD$SWAP_PTBR_C+00838
371 MMG_STD$SWAP_PTBR_C+00840
196 FDVSHR+36C00
145 EXCEPTION+08544
185 EXE$SYNCH_LOOP_C+00DF4
263 MMG_STD$SWAP_PTBR_C+00830
122 EXE$SYNCH_LOOP_C+00DB0
141 FDVSHR+36BAC
8 COIN_DCL_OSIP+7E448
8 COIN_DCL_OSIP+7C150
10 COIN_DCL_OSIP+7E294
7 COIN_DCL_OSIP+869C4
6 COIN_DCL_OSIP+7E2EC
5 COIN_DCL_OSIP+7E460
5 COIN_DCL_OSIP+86A54
4 COIN_DCL_OSIP+7E1D8
6 COIN_DCL_OSIP+86940
5 COIN_DCL_OSIP+7E344
5 COIN_DCL_OSIP+7BED0
5 COIN_DCL_OSIP+7E258
4 COIN_DCL_OSIP+7C1A4
4 COIN_DCL_OSIP+7E240
3 COIN_DCL_OSIP+7E280
3 COIN_DCL_OSIP+7E2C4
3 COIN_DCL_OSIP+7E4C0
3 COIN_DCL_OSIP+7E1C0
3 COIN_DCL_OSIP+869A4
2 COIN_DCL_OSIP+869A0
2 COIN_DCL_OSIP+86CB0
2 COIN_DCL_OSIP+870D0
4 COIN_DCL_OSIP+86960
2 COIN_DCL_OSIP+7C1C0
2 COIN_DCL_OSIP+869E0
2 COIN_DCL_OSIP+7C178
2 COIN_DCL_OSIP+869B0
1 COIN_DCL_OSIP+7C190
1 COIN_DCL_OSIP+7C1A0
1 COIN_DCL_OSIP+7C200
1 COIN_DCL_OSIP+7E190
1 COIN_DCL_OSIP+7E230
1 COIN_DCL_OSIP+7E284
1 COIN_DCL_OSIP+7E2A0
1 COIN_DCL_OSIP+7E2D8
1 COIN_DCL_OSIP+7E2F8
1 COIN_DCL_OSIP+7E304
1 COIN_DCL_OSIP+7E310
1 COIN_DCL_OSIP+7E350
Richard J Maher
Trusted Contributor

Re: Looping installed image

Hi Mick,

Looks like the PC analysis way is only one likely to yeild useful results but, in the meantime: -

Is there any i/o happening in the loop or just CPU?

I hear what Hein's saying but the users have always been clicking the X rather than exiting gracefully so I don't think there should be anything new there. But certainly very few people I know check every FDV$ status or have the signal error option set, so who knows?

Given what you've told us, I still have to opt for it being the data tha's changed. I remember having to extend ACMS workspaces about once a year because there were now more than 100 depts, account types, widgets, whatever. Is it the same account/user that it's happening on each time? Any COBOL arrays in working-storage at all?

Can't remember what OSIP does. Is it the general account screen or the statement thing? IIRC some of these options connected with RDO as well as SQLMOD and did not always share a handle. How many database connections per user? Any transactions have NOWAIT option set or locking loops?

You say it's installed shared; are all the PSECT attributes set to NOSHR for the EXTERNAL stuff like database handles?

Sorry to be as useful as usual :-)

Cheers Richard Maher

PS. Finally a few drops of rain so I can leave the push-bike at home! I ride to work and am still pushing 100kg :-(
Mick O'Brien
Advisor

Re: Looping installed image

Hein,

Thanks for the PERL script, however there are a couple of problems: -

o we do not have PERL installed anywhere
o I have never used it and it looks like Greek to me

Do you have a DCL version? If so would it be possible to just enter a PID as a parameter and it could do all the PC capture and analysis?

Richard (or should I say Mr Blobby),

Spoke to our DBA (that's Adam) who tells me that the looping process started after he extended the statement database so as to accomodate a further 5 years of data - but if this was the problem I would expect to see issues through-out the day and not just after 6pm (there have not been any looping processes for the past 2 days so I have not been able to get anymore PC details).

We have a process that runs every hour that does an ACMS/CANCEL on users that are inactive but I can't see details of users cancelled matching looping process dynamic user names.

I think the issue is related to FMS and returned statuses not being checked correctly - I have a vague recollection of seeing a looping bug many years ago when the workspace loaded into the FMS form was larger than the expected value (i.e. the COIN select field was 'set up' and returned to calling program). However I do not understand how this could happen and NOT be reported by end-users - they would see it looping on their screen!). My assumption is that it is a detached process looping around FMS calls that is looping but I don't know how to prove that.

There is ONLY CPU usage i.e. no IO of any sort.

Mick

PS We have had the driest first 6 months of the year since 1929 and I now weigh 90.1kg (i.e. I seem to have lost 10kg since you left - no more boozy lunchtimes)
Hein van den Heuvel
Honored Contributor

Re: Looping installed image

Mick,

Perl was just a handy language for me.
Pick your poison. DCL, C, ruby, java...

Perl can be made to look like gibberish at times, but it's rather effective.

I have it installed 'everywhere'.
On my VMS systems, Linux boxes, HPUX, Windows...

For your data I ran it on my windows laptop.

Just grab a download. I like:
http://www.activestate.com/activeperl/downloads

fwiw,
Hein.



Richard J Maher
Trusted Contributor

Re: Looping installed image

> Spoke to our DBA (that's Adam)

And I'm sure you used the term loosely ;-)

> who tells me that the looping process started after he extended
> the statement database so as to accomodate a further 5 years of data

This is normally what I'd refer to as the "Ta-Dah!" moment. I can't say for sure that this is the cause of your problem but I'd certanly budget a few hours investigation on the strength of STM(N?)T_DB being verballed! (Statement limits? Statement Line Limits?) Having said that I believe you only show/retrieve one month at a time so I guess it shouldn't be a new problem.

> - but if this was the problem I would expect to see issues through-out
> the day and not just after 6pm (there have not been any looping
> processes for the past 2 days so I have not been able to get anymore
> PC details).

Yes and no. That's like saying "Why doesn't it happen at every 6pm?". IIRC you audit the query every user does with the option and the key value, so the OSIP for Fred User at 6:00pm might have the account number detail to check and attempt a reproducer?

Then again being in the middle of a query and clicking |X| should be easy to test as well. Are there sometimes response problems and the user getting fed up waiting at knock-off time? (But statement retrieval was by design, and I'm sure still is, lightening fast.) Having said that, I do remember all those Rdb Freeze-Locks and Recovery Processes from other ACMS options. I'm sure Adam is "monitoring".

Possible scenario: -

1) User does a query
2) Database response poor as Rdb grabs a freeze lock and attempts to rollback txn from other impatient user.
3) User 1 gets fed up and shutsdown
4) When the process comes back FDV calls start failing due to lack of terminal
5) Loop (Why hasn't ACMS killed the process or exited the image?)

> We have a process that runs every hour that does an ACMS/CANCEL
> on users that are inactive

Yep, that's one way to do it.

> I think the issue is related to FMS and returned statuses not being
> checked correctly - I have a vague recollection of seeing a looping
> bug many years ago when the workspace loaded into the FMS form was
> larger than the expected value (i.e. the COIN select field was
> 'set up' and returned to calling program).

Certainly sounds plausible. Also 6:00pm does look suspiciously like a knock-off time and, as Hein suggested, something upsetting the UI at close.

Have to wait until it happens again I suppose.

Just a thought but, if you are rebuilding, it could be worth putting /CHECK=(bounds or all) Could also set that FDV$set_errors_to_signal call but there's probably half a dozen other errors that should be ligitimately ignored that would end up running amoc with a COBOL program not ready for errors to be signalled.

Good-hunting!

Cheers Richarde Maher

PS. Please advise Zorba that he can thank me for the ability to simply add a few storage areas and then do an ALTER STORAGE MAP (with the mixed page format, clustered parent/child set-up, if it helps anybody?)

PPS. Only 5 more years room? How many times have we seen/heard "COIN will be dead in 2 years!" before?

PPPS. Hope Jae and the kids are well and you're as happy as I am that the World Cup, Wimbledon, and Tour de Boredom have wiped cricket off the map!
Mick O'Brien
Advisor

Re: Looping installed image

Chaps,

Got a looping process a couple of nights ago (after a long gap) and I have attached zip file with source - trace details to be attached separately.
Mick O'Brien
Advisor

Re: Looping installed image

First PC zip file