Operating System - OpenVMS
1753773 Members
5261 Online
108799 Solutions
New Discussion юеВ

Re: How to optimize subroutine placement in DCL

 
Hein van den Heuvel
Honored Contributor

Re: How to optimize subroutine placement in DCL

Hmm, If you really care about understanding it then you should use the tools I outlined.

But a few reads in the procedure to 'break', use a second screen to look.

Be sure to run with SET WATCH/FILE/CLASS=MAJOR so see the final read/write IOs counted
Or pick up Volker's PROCIO.
I _think_ it will show the IOs counts as they happen, but it didn't work just now on my test system.

You can also used ANALYZE/SYSTEM... SET PROC ... SHOW PROC/CHAN... READ SYSDEF ... FORMAT ---> See read/write count. or go on to the FCB

If you go backt o my earlier reply for the SHOW RMS hints, then you may want to know the RFAs for the labels, as teh first part is the BLOCK NUMBER.

I use SEARCH/NUMB to get some relevant line numbers, then DUMP/RECOR=(COUNT=1,START=) to get the RFA.
While in SDA> the RAB will show the current RFA of course and an EXAMINE of the RBF will show the line it is on.

Finally, it is critical how the executed lines fit in buffers.
If a label happens to be placed just before an buffer (8 block?) boundary, and the rest just on the other side, then you see lot's of IOs (cached) to flip back and forward.

Good luck!
Hein.

Willem Grooters
Honored Contributor

Re: How to optimize subroutine placement in DCL

I learned - long ago (VAX?) - that looking for GOSUB and CALL targets, DCL will search the file top-to-bottom, and that, for speed, you should located them on top of the file. Search may start from the current location (of GSUB or CALL), but when not found, search for a label will still start on top of the file.

I think it still works this way - because an error in DCL code between the GOSUB or CALL, and the target location, may halt the procedure.

HTH


Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: How to optimize subroutine placement in DCL

Hein,

I tried your SDA stuff but it's rather technical : what exactly to watch for, what the fields mean).

WIllem,

I found that too.

But I'm curious if there is a description in plain English of how it works.

Wim
Wim
Hein van den Heuvel
Honored Contributor

Re: How to optimize subroutine placement in DCL

Wim, did you try the example I suggested?

You'll find it very educational and not too hard to figure out.

>>> SHOW PROC/RMS=(PIO,NOIFB:3,RAB,BDBSUM)

The PIO tells the code in RMS$SDA to report the PROCESS structures (DCL) not the IMAGE.
RAB is RAB... See RMS REFERENCE MANUAL.
BDBSUM is a Summary for the Buffer Descriptor Blocks. Internal RMS stuff.

Here is quick sample run (on OpenVMS V8.3!)
"test session input highlighted with >>

$ set proc/name=test
$ set promp="test>> "
test>> @test
>>Go Far?

SDA> SHOW PROC/CHAN
:
Channel CCB Window Status File
0020 7FF7C020 82420100 test.com

SDA> READ SYSDEF
SDA> FORMAT 82420100 ! Window
:
FFFFFFFF.82420138 WCB$L_READS 00000001
:
SDA> SHOW PROC/RMS=(PIO,NOIFB:3,RAB,BDBSUM)
RAB Address: 7FFCF014
:
RFA: 00000001,000A
RBF: 7FF9FEA6
RSZ: 0026 38.

So the first question came a record at offset 0xA in VBN 1.
To confirm:
SDA> exam 7FF9FEA6;26
%SDA-W-UNALIGNED, unaligned address 00000000.7FF9FEA6; converting to aligned address
CTER_C$ Read/promt="Go Far? " s 00000000.7FF9FEA0
ys$command x.mmand x.dog. The qu 00000000.7FF9FEC0
:
BDB/GBPB Summary
SIZE NUMB VBN BLB_PTR ADDR
00001000 00001000 00000001 00000000 000000007B0B3800

So the buffer used by RMS for DCL was 0x1000 = 4096 bytes, and all were filled.
The current VBN is 1, as expected, and the buffer address some P1 zone. We can look at that buffer, which has raw disk block data:

(Removed the HEX mumbo jumbo)
SDA> exa 000000007B0B3800;100
..$ Loop:&.$ Read/promt="Go Fa 00000000.7B0B3800
r? " sys$command x..$ if x ..$ 00000000.7B0B3820
then gosub far...$ eLse gosub n 00000000.7B0B3840

See? Every last bit can be seen and explained.
Now JUMP:

>> Go Far? y
>> Return from far?

SDA> ex FFFFFFFF.82420138 ! WCB$L_READ
FFFFFFFF.82420138: 00000000.00000003
:
RAB
RFA: 00000014,0112
:
BDB/GBPB Summary
SIZE NUMB VBN BLB_PTR ADDR
00001000 00000800 00000011 00000000 000000007B0B3800

Same buffer (address) now holds VBN 0x11 = 17 through VBN 0x14
Only 0x800 bytes where read, hitting EOF.
The RFA points to the last VBN in the buffer. To confirm:
SDA> exa 007B0B3800+(3*200)+112;40
%SDA-W-UNALIGNED, unaligned address 00000000.7B0B3F12; converting to aligned address
: ..$read/prompt="Return from fa 00000000.7B0B3F10
r? " sys$command x..$return..... 00000000.7B0B3F30

>>Return from far? y
>>Go Far?

Extra read done...
FFFFFFFF.82420138: 00000000.00000004
And VBN 1 is back in the buffer.

Clear as mud?
Very predictable!

Now you have to realize that RMS when asked to 'jump' to a remembered label (or returning, or returning from an @), will
NOT read starting at that VBN.
It will start the read at its natural VBN buffer boundaries.

It take the target VBN from the RFA (here x14, integer divide by block-in-buffer (here 8), and re-multiply by buffers size plus one (because oddly enough the first block is 1, not 0) giving 0x11 here. So if the code after the label takes more bytes than fit in that buffer, then an extra IO will be done every time again.

In fact, the label record itself may require 2 read IOs on a bad day!

Moving and changing 1 comment line from before a label to behind it could cause the label to be brought forward to that the start is just in the last bytes of one buffer, and push the return just over the end of the next buffer, changing 1 IO to 3.
... or visa versa.

Far all who had the interest to read this far...

1) Would an slow-motion rundown to show 'what happens behind the scenes' as above be a boot Camp Session you would attent?

2) Think about sticking a DCL procedure in an indexed file. It would be ordered by key right? Well, I did not provide a handy key ina (fixed byte offset) comment fields, but just used the first 4 bytes as key. Do you see how it can work at all?

Leuk? Leerzaam? Groetjes!

Hein.

Jan van den Ende
Honored Contributor

Re: How to optimize subroutine placement in DCL

@Hein:

>>>
Far all who had the interest to read this far...

1) Would an slow-motion rundown to show 'what happens behind the scenes' as above be a boot Camp Session you would attent?
<<<

Well, IF I manage to get there (and at this moment chances are slim), THEN I would love to see that!
(And if I can not make it, my guess is that at the next TUD it would also gather interest, certainly mine).

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Hein van den Heuvel
Honored Contributor

Re: How to optimize subroutine placement in DCL


Back to the underlying question.
"How to optimize subroutine placement in DCL"

The answer we derived above is that for optimal speed the goal would be to have the caller, and callee live in the same RMS buffer when it comes to execution time.

Now this is unrealistic / tedious / next to impossible to arrange.

So the next best thing is to increase the odds.
That is basic commen sense for speeding up DCL, but rarely followed.

Notably an easy win:
- restrict the 'documentation' at top to a referral pointing to the bottom.

Or you could make the intro just over 4096 bytes long (counting overhead).
Of course I mean this 99% as a joke, but it could be a neat experiment. If the procedure has a clear 'main loop' for example for a 'main menu'. then use space lines comments to push that start into a fresh RMS buffer. Verify with $DUMP/BLO=(COUN=1,START=9)

Other thoughts....
- Move all usage and maintenance comments to the very bottom, behind rarely executed ('help') routines.

- Consider a 'branch out' at the top to an 'init' section located towards the end and back there to the top.

- Consider putting extensive logical name and symbol defintion in an executable program.

- Keep callers and callees close. This could mean to NOT put all subroutines at the end, but sprinkeld amongst main logic.

- Maintain 2 copies of a performance sensitive script:
1) main version, in CMS
2) executable version, trimmed down using a tool like 'DCLDIET' or 'SQUEEZE' to strip all comments, minimize whitespace, reduce lexical functions names to abbreviations, transform all variables to a0, a1,.. a9, aa, ab,.. az, b0, b1,...

Hope this helps,
Hein.


$! x.com first, lastname. Date. Version.
$! Full comments and usage at "README"
$goto setup
$main:

:
goto part_'x
part_1:
:
goto main
:
subroutines for part 1 and part 2
:
part_2:
:
part_5:
:
other bunch of subroutines for notably for part_4 - part_7
:
part_z:
:
$setup:
:
$goto main ! Let's start for real now.
$
$help:
$type sys$input

$!README
$exit

blah blah.



John Gillings
Honored Contributor

Re: How to optimize subroutine placement in DCL

Wim,

An "optimization" I've used for a large suite of DCL is to have the main entry point check the location of the files. If they're on a real disk drive, create a RAM disk, move all the procedures to it and reexecute the new copy from the RAM disk. This gave me about 15% speed up, even counting the work to setup and tear down the RAM disk. However, it was pre XFC, so your mileage may vary today.

Another real example... I had a task of parsing some data to gather statistics from several dozen large log files (>1.5GB each). I wrote a DCL prototype and found it was taking about 90 minutes per file. While it was running, I reimplemented the DCL in MACRO32. The MACRO version produced identical output to the DCL, but took less than 60 seconds per file. In this case the end to end runtime of the compiled version was significantly faster than the DCL, including the development, compilation and debugging of the compiled version!

I think sometimes the real potential difference between interpreted code and compiled code is not appreciated. I'd also argue that although DCL is more easily accessible, it's often MUCH harder to write and get it correct than Pascal, FORTRAN, MACRO32 or even C and Basic. It's a fallacy to think that "anyone can do it" when there are so many subtle pitfalls like the myriad of potential single character typos to break programs, and there's no compiler to protect you. Sure, it's great to be able to knock out a page or so of script to run once or twice, but when you're getting into thousands of lines of code, or something that needs frequent modification, or that runs all day, every day, it's well worth putting in the investement to do it properly in the most appropriate language for the task.

That said, please don't get the wrong idea about my attitude to DCL. I write huge volumes of DCL, including procedures of several thousand lines, and even some that run all day, every day (but are not too performance sensitive). There are many things that can be done very easily in DCL that require lots of code in other languages (simple example, string subtraction!), BUT the DCL version often runs orders of magnitude slower.

On the other hand, a DCL advantage in *favour* of performance is to use multiple pipes in order to automatically exploit multiple CPUs and avoid temporary files. I've got lots of DCL that forms itself into complex trees of pipes, with up to a dozen processes, each doing their own subtask. What I lose in process creation overhead, I often gain many times over in avoiding I/O to and from temporary files, and all the directory & file system overhead.

> It also has the advantage that it doesn't
>create extra processes (I pase output of
>ncl, tcpip, ...).

On the contrary! I think you'll find that this could be a fruitful area of optimization. Get rid of all those temporary files! PIPE the output through SEARCH (compiled code, INSTALLed, and highly optimized by some of the best code cutters in the business, like Guy Peleg). Trim it down as much as possible, then back into your DCL parser. Yes it creates extra processes, but compare the overheads! These days avoiding disk I/O is your biggest bang for buck.

The important thing is to realise when you're running up against the inherent limitations of the language (true regardless of what language you're using), and wasting your time trying to achieve a particular objective, when the real solution is to restructure or rewrite in a more appropriate language.
A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor

Re: How to optimize subroutine placement in DCL

John,

$ if day1 .and. first_server
$ then
$ first_server="f"
$ gosub init_found
$ define/us sys$output 'ct_workf'
$ ucx show prot tcp/pa
$ open/read x 'ct_workf'
$r39:
$ read/end=e39 x x_rcd
$ x_rcd=f$ed(x_rcd,"compress")
$ it="Delay ACK:"
$ pos=f$loc(it,x_rcd)
$ if pos .ne. f$len(x_rcd)
$ then
$ f_1="t"
$ x_rcd2=f$extr(pos,80,x_rcd)
$ how=f$el(2," ",x_rcd2)
$ if f$extr(0,3,how) .nes. "dis"
$ then
$ call wms 'E' "TCP prot param ''it' must be disabled on Sybase nodes !"
$ endif
$ goto e39
$ endif
$ goto r39
$e39:
$ close x
$ del 'ct_workf'.*
$ if .not. f_1
$ then
$ call wms "W" "Output scan failed for ucx show prot tcp /pa"
$ endif
$ endif

This is an example of the code. As you can see there is protection against the item "delay ack" not found. I know I can nest mere lexicals but I don't do that for readability. If something goes wrong, I still have the workfile to see what went wrong.

BTW : the monitoring consumes 10 cpu sec per hour and 5000 DIO (pitty that the real IO's are not available in acc). This wile hundreds of checks are done, including a full disk scan for version >30.000.

BTW2 : 1 version running on 5.5, 6.2, 7.2, 7.3. Only the last 2 remain. Thus piping could not be used.

BTW3 : I said ANY system manager. ANY system manager should know basic DCL. What I can't say of C, FORTRAN or PASCAL (I'm a COBOL guy). And the ideal programming language changes every 10 years.

For me the performance is acceptable. But if I can improve it ...

Wim
Wim
labadie_1
Honored Contributor

Re: How to optimize subroutine placement in DCL

Wim

If you just want to check if delay ack is enabled or disabled, may be it is faster to do, after the creation of your file
$ define/us sys$output 'ct_workf'
$ ucx show prot tcp/pa

$ conv /fdl=sys$input 'ct_workf' new
record; format stream;
ctrl Z
so the file created is no longer a one-long-line file.

Then a
$ sea new "Delay ACK",enabled/match=and
and branch according to the result of the search.

The convert is necessary, because as window scale is enabled, you will always find the 2 strings (Delay ACK and enabled) without the convert.

Really a pity that with your versions of Vms, you can't have the same procedure on all nodes, as using Pipe heavily (as John Gillings said) would save a lot.

Wim Van den Wyngaert
Honored Contributor

Re: How to optimize subroutine placement in DCL

Labadie,

Compared the 2 in batch :

DCL : 95 DIO, .17 cpu, 1679 PF
Convert : 97 DIO, .19 CPU, 2371 PF

With set file/at=rfm=stm instead of convert (is what I use in my dcl)
78 DIO, .16 CPU, 1768 PF

And most of the code searches for many things and I want to know for each item if it was found or not. DCL will be difficult to beat.

Wim
Wim