RMS DME and ISI errors from batch

PaulGavin · ‎11-28-2006

I have been around VMS since the 80's, including 9 years at DEC, and have never come across this set of problems. Any comments would be welcome.

One batch job started getting DME errors a few weeks back and then another job started getting it last week. These jobs have been running for several years and now start failing.

AlphaServer ES40 1-CPU OpenVMS V7.2-2

1st job is using a DCL loop to selectively copy records from one file to another. There are many jobs doing this sort of thing on this system. Two statements just before and the error are:

...
$ open/read in sumgoct.in
$ open/write txt DISK$COMETS_xxx_xx:[PROD]GOCT_FW47_SUM.TXT;0
%RMS-F-DME, dynamic memory exhausted

2nd job is attempting to capture the output from a command so only error messages are in the output log. Like the other job, there are numerous places this scenario is used and they are not failing.

$ Define sys$output pmapf.log
$ ftp 'machine'/username="'username'"/password="'password'" -
/input=pmapf.cmd

The pmapf.log file has the FTP command and then immediately gets the DME error. Like the other job.

If I take out the define and the code looking at the log, the command runs fine.

When changing the method in the second job to put the FTP in the pmapf.cmd file, I get a different error.

$ Open/write cmd pmapf.cmd
$ write cmd "$ ftp xxxxxx.xxx.atmel.com/username=""......
$ write cmd "ascii"
$ write cmd "put ........
$ write cmd "exit"
$ write cmd "$"
$ Close cmd
$ @pmapf.cmd/out=pmapf.log
%RMS-F-ISI, invalid internal stream identifier (ISI) value

The pmapf.log file has only the ftp command like before.

Any ideas out there? Thanks in advance.
Paul Gavin
pgavin@cso.atmel.com

Steven Schweda · ‎11-28-2006

HELP /MESSAGE DME

You might simply have process memory limits
which are too small.

If DCL gets an ISI, it's having real
problems in places it wasn't expecting them.
Could be the same cause.

GuentherF · ‎11-28-2006

A DME error from RMS can have 2 reasons:

a) You run out of process dynamic memory. Use 'SHOW PROCESS/MEMORY' to check for available space. All your process logical names are stored in this area. Any increase in process logical names couls cause this. The size is controlled by SYSGEN parameter CTLPAGES.

b) Your P0 allowed virtual memory space is too small. This is controlled by process quota PGFLQUOTA. This (low PGFLQUOTA) is most likely not a problem here (with DCL-COPY) but worth a check.

c) Somehow the RMS default parameters were increased. Either the process ($ SHOW RMS) or SYSGEN parameters (SDA>SHOW/RMS) have been increased. This puts a higher demand on the virtual address space where RMS puts its buffers either in P0 space controlled by PGFLQUOTA or, in P1 space controlled by CTLPAGES.

The ISI error however smells like there is something else going on. Sounds more like a bad RMS image on disk (wild guess).

/Guenther

PaulGavin · ‎11-28-2006

Thanks for the quick replies. Going down the DCL path, I raised the PIOPAGES parameter by 33% and got the same result. Can duplicate the error running the procedures interactively for both jobs on different nodes.

Pgflquo is 500000
Dropped RMS_DFMBC from 120 to 32
set rms/block=16/netw=8
set rms/block=32/netw=8/sys
logout/in

Still same error. What am I missing?

BTW, when I changed the '@pmapf.cmd/out=pmapf.log' to a 'spawn/out=pmapf.log @pmapf.cmd', the second one runs.

Thanks in advance!

Hein van den Heuvel · ‎11-28-2006

Hey there Paul!

(I walked through the ZK1 lab earlier today, pretty much at the very spot we last met in Nashua :-).

The ISI error can be a secondary effect, where are program ignores a $connect error.
For example, I recall in some 7.2 version that a sysuaf with global buffers triggered a DME error on the $CONNECT (as well as RMS-F-CRMP) when the system was misconfigured which in turn caused an ISI error because the connect status was not checked.

Guenther indicated CTLPAGES, but you may also want to check PIOPAGES, and making sure no more than 63 PPF files are opened (DCL $OPEN, COMMAND FILES, DCL /OUTPUT) because the PPF design leaves only 6 bits for the IFI embedded in the escape sequence.

Is this ftp activation somewhere deep in a stack of opened command files? The ISI could be a problem for DCL re-opening the stream, but faling to (of course it should not fail, but maybe living on the edge of CTLPAGES?PIOPAGES some fragmentation has set in.

The spawn of course creates a fresh process with fresh P1 memory at its disposal. I'm much not surprised that works. Good workaround for now.

If there is still a problem.
Can you try this stuff with privs?
Grab some CMKRNL and do a SET WATCH FILE/CLA=MAJOR.
(To disable: SET WATCH FILE/CLASS=NONE)
Now reproduce.
This should help you identify which file triggers the problem.

For debugging you might also try a $WAIT 1:0:0 just before the FTP and then ANAL/SYST from a different session.
Check things liek
SHOW PROCE/CHAN
SHOW PROCE/RMS=(FAB,RAB,BDBSUM,PIO)

[The PIO requests Process IO versus the default Image IO. An alternative to that is to define PIO$GW_IIOIMPA to be PIO$GW_PIOIMPA]

Are there DECNET files involved?
No SET DEF NODE:: right?

Cheers,
Hein.

$ help /mess/fac=rms dme

"For process permanent files (such as DCL OPEN, DCL command procedures using "@filename," SYS$INPUT, SYS$OUTPUT, SYS$ERROR, and batch log files), the size of available memory is governed by the SYSGEN parameter PIOPAGES. The number of buffers and their sizes is governed by the DCL command SET RMS. Only 63 process permanent files can be open at once; any attempt to open more such files produces this message."

labadie_1 · ‎11-29-2006

Check how much free CTLPAGES you have with the dcl in
http://h18000.www1.hp.com/support/asktima/operating_systems/CY-1021490401-1.html

This is the same as
$ show process/memory/id=xxx
but as you have noticed, the /id= is not a valid qualifier

:-)

Raise CTLPAGES and reboot

Art Wiens · ‎11-29-2006

Just a WAG, but if it does have something to do with a maximum number of files open, perhaps there is some "extreme" fragmentation in some of the files involved? Anything change with regard to what was on what disk, hosted by what system?

Just a thought,
Art

PaulGavin · ‎11-29-2006

Thanks everyone! Still working on it and will update with what we end up doing.

John Gillings · ‎12-05-2006

Paul,

Check the output of SHOW RMS. DME and ISI errors can result from having a non-zero multi buffer count for sequential disk files. The reason is that process permanent files (ie: those opened with the DCL OPEN command) survive image rundown, therefore, any RMS structures associated with them, like local buffers, must live in the process dynamic region. When that gets too full, you get DME errors.

In extreme cases for batch jobs, the memory cost of just SYS$INPUT and SYS$OUTPUT can consume enough space that the job can't even start.

Check the RMS defaults. It's usually better to have a process level SET RMS command in LOGIN or SYLOGIN than set the system wide default.

You may get a better idea of what's going on by placing a few SHOW PROCESS/MEMORY commands to monitor consumption.

A crucible of informative mistakes

Hein van den Heuvel · ‎12-05-2006

John, DCL processes indeed listen to SET RMS/SYS/SEQ/BUF=x, but best I know not to /BLO=y.

The buffer size is fixed at a low 1 block (making bulk work in DCL slow). So even with a typical 4 or 8 buffers, not too much damage will be done no?

Anyone with more then a handful of
sequential file buffers by system default is either very smart or very silly and deserves what they get.

Hein.

PaulGavin · ‎12-06-2006

The process RMS defaults are set to BLOCK=16, NETW=8, EXTEND=8 and all others are zero. System defaults are much higher with BLOCK=120 and EXTEND=90. Setting the process defaults lower did not make a difference. In reviewing one of the jobs, I found three possible causes of recursive GOSUBs and corrected those, but the problem persisted.

I am able to duplicate the probelm on a backup node that, other than having a single CPU instead of four, is a duplicate of the the node where the failure started. After more looking and monitoring, I first raised the dynamic parm PIOPAGES by 50, with no change. Put PIOPAGES back where it was, raised CTLPAGES by 50 and rebooted, again no change. Then raised PIOPAGES as well and the DME error went away.

I plan to test using a smaller increase. Concerned what these increases might cause on the other cluster nodes. The node where the DME was a problem is mission critical, averages 300+ users and 677 processes. Moving the offending jobs to another node and raising the two parameters on that node only may be an option.

Hein, I think I remember that day as well.

Hein van den Heuvel · ‎12-06-2006

Those are all puny default/value.

blocks=16 has been the default for the first two decade. For this decade VMS switched to 32. That's still small for application writing and reading large sequential files. Your sys default seems to reflect that thought. You probably should also set buffers to 4 or so. 2 is pretty much the minimum and after 4 we often get rapidly diminishing returns for the average case.

extend=8 is ridiculous. A step back if anthying. RMS will at least give the file 2 buffers worth, the file systems rounds that up to ne nearest multiple of the clustersize.
Such low extent is only acceptable if the application only make lots of little file.
It is often not unreasonable to set this to 1024 or more. The system will truncate back to the nearest cluster on close.

>> I plan to test using a smaller increase

Don't wast too much time there.
That 50 is a small increase, unless you need to support many thousands of processes. I would say if 50 more fixed it for now, then give it 256 more (for both) and sleep more easily. It is still VIRTUAL memory. If the bulk of those hundreds of processes do not need it, they will not touch it and no real memory will be instantiated for that flavor of process memory. And normally, unlike P0 space perhpas, there is not too much competition for P1 address space to that should not be an issue. Only processes going wild will be stopped a little later having more resources to exhaust.

fwiw,
Hein.

Hein van den Heuvel · ‎12-07-2006

Side note... I was refering to an other Paul Gavin. You worked in Colorado back then. I was thinking of a bloke from Scotland who actually still works for HP, now in Cupertino doing SQLserver stuff. He did a presenation @ HP Tech Forum back last September in Houston, where I just missed him when I did my RMS presenations.

Hein.

PaulGavin · ‎12-08-2006

I met so many folks the last few years at DEC that I cannot remember many names. I spent quite a bit of time in the Spitbrook facility in relation to arious projects. The group I worked for supported global sales efforts and we were constantly lobbying engineering for enhancements to gain any little bit of comptetitve edge. I do remember demonstrating a still missing feature in RMS to a group of people in a lab at ZKO. So if someone says they met me, I am not going to dispute it.

Unfortunately I also know the fellow you are talking about and wondered if you had us confused.

Back on the topic, we plan to increase the PIOPAGES and CTLPAGES parameters during a very short shutdown period Christmas day. Thanks for all of the input.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

RMS DME and ISI errors from batch

RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch

Re: RMS DME and ISI errors from batch