1827855 Members
1554 Online
109969 Solutions
New Discussion

ISTAT 30 error

 
SOLVED
Go to solution
LM_2
Frequent Advisor

ISTAT 30 error

We have a series of applications written in FORTRAN that had been working under VMS 7.2 on a ES40

Upon migrating to VMS 7.3-2 running on an ES45, users started encountering errors that are reporting as FORTRAN status = 30.
All file protections are set to World RWED access as well as the dirtectory structures. All user account quotas have been verified as matching those in use under 7.2

These errore occurs randomly, on different files, across several users and cannot be replicated upon demand. It appears to be a function of "time" - in other words - the longer use utilizes the application - the more likely it is to occur.

Within the FORTRAN modules - we are using the following general flow

Enter the module
- Call LIB$GET_LUN
- Open a file using that LUN
- Do some work on the file
- Close the LUN
- Call LIB$FREE_LUN
Exit the module

has anyone seen this or have any suggestions. My programmers are ready to pull out their hair!
29 REPLIES 29
Robert Gezelter
Honored Contributor

Re: ISTAT 30 error

LM,

There are many possibilities. I presume that your programming staff has reviewed the complete description of a return of 30 as described in the Fortran Programmer's Guide?

Although it is not particularly common, I have seen cases where a programming practice that was in the "gray" area of conformance with the specification became an actual error in a later release.

It is difficult to diagnose this problem without additional information. A review of the sources may be in order (certainly, if the problem is occurring in some seemingly random fashion, it may also be appropriate to make some provisions to capture details of each failure so that more extensive diagnosis can be done). Disclaimer: My firm, as do the firms of several other active participants in this forum, provides services in this area.

- Bob Gezelter, http://www.rlgsc.com
Volker Halle
Honored Contributor

Re: ISTAT 30 error

LM,

no reason for your programmers to loose their hair ;-)

You need to improve your error detection and handling:

CALL OPEN(...,STATUS=ISTAT,...)
IF (ISTAT .NE. 0) THEN
CALL ERRSNS (, RMS_STS, RMS_STV,,)
CALL LIB$SIGNAL(%VAL(RMS_STS),%VAL(RMS_STV))
ENDIF

Taken from the FORTRAN User's Guide Page F-13

http://h71000.www7.hp.com/doc/82final/6443/aa-qjrwd-te.pdf

Read more in chapter 7.3 Handling Errors

Volker.
Hoff
Honored Contributor

Re: ISTAT 30 error

There's not much to go on here.

IOSTAT 30 is OPEFAI, a file open failure.
This error appears a catch-all, and could be a channel or quota leak, or could conceivably be some sort of a collision during the file open operation.

I'd probably code a FOR$*OPEFAI return to call for at least a traceback (stackdump), or to fire up the debugger dynamically, to see if I could get some details on the failure.

For the former, dig around for details of the TBK$SHOW_TRACEBACK API, and for the latter dig around for the signal of SS$_DEBUG with some debugger commands. The former did eventually get documented (after V7.3-2?) while the latter has been documented for a while.

Also look around, as there are ways to get at the FAB from within Fortran, and there are fields down in there (fab$l_sts and fab$l_stv) that might shed more light on the particular failure. You might have to use USEROPEN (depending on what caused this OPEN to tip over), but you might also be able to get at the FAB and its fields via FOR$RAB. (Problem here is that the open failed, so the FAB and RAB might be gone by the time you go look; you might have to manage these structures yourself with USEROPEN to see what's really in there.)

V7.3-2 on an AlphaServer ES45 is going to be faster than V7.2 on an AlphaServer ES40, so it's quite conceivable that a latent timing bug has been exposed.

Do check for and load the Fortran and other mandatory ECOs for OpenVMS Alpha V7.3-2 (as per the normal HP support requests), should this become an escalation.

You might want to look at raising the quotas, too. Just because old and existing quotas tend to get stale.

Alternatively, get some outside eyes to take a look at the code.

Stephen Hoffman
HoffmanLabs LLC

DECxchange
Regular Advisor

Re: ISTAT 30 error

Hello,

Did you change any logicals pointing to disc drives where the files are located after the upgrade to the new ES45? That is, did you change any sCSI disc assignments?
Steven Schweda
Honored Contributor

Re: ISTAT 30 error

> My programmers are ready to pull out their
> hair!

One simple (-minded?) approach is to leave
the hair and pull out the simple Fortran
error handling. It's been a long time since
I did anything serious with Fortran (back
when it was still FORTRAN), but, as I recall,
the error messages from an unhandled error
were much more informative than the OPEFAI
you got when you did the right thing.
John Gillings
Honored Contributor

Re: ISTAT 30 error

LM

This may be what Steven was suggesting, but to be sure...

Sometimes the easiest way to find out what a FORTRAN status means is to NOT try and handle it yourself. In simple terms, remove the IOSTAT clause from your OPEN statement and let the program fail.

Obviously not great for production code, but if you're debugging you'll get the complete signal array in the resulting stack dump. Same kind of info as the code Volker is suggesting, but much easier to write the code and get it right.

If you know there are certain status codes you want to handle, code it like this:

OPEN( whatever IOSTAT=stat etc...)
IF(stat.NE.0)THEN
! some error
IF(stat.EQ.error1)THEN
! handle error 1
ELSE IF(stat.EQ.error2)THEN
! handle error 2
! ...etc
ELSE
! unexpected error - repeat OPEN without IOSTAT
OPEN( whatever etc...)
ENDIF
ENDIF

If it is some kind of access problem, turning on file access failure auditing might help too.

$ SET AUDIT/ALARM/ENABLE=FILE=FAIL=ALL

A crucible of informative mistakes
DECxchange
Regular Advisor

Re: ISTAT 30 error

Hello,
You could be running into a record locking problem. Are the files opened with the SHARED key word in the OPEN statement? One way to get around this error is to put the OPEN statement and its error checking in a loop of so many tries, with a SYS$SETIMR or LIB$WAIT in between. Then try the OPEN again. The same goes for READ. Even though the error might indicate a file OPEN error, it could also occur on a READ. But I think a READ is Error code 36 instead of 30. All of these codes are spelled out in an appendix in the FORTRAN Language Reference Manual.

One of my Alphas has a FORTRAN compiler and is running 7.3-2. I can fire it up and test out some of your code if you'd like.

Some other thoughts. Have any of these FORTRAN programs been recompiled since the OS upgrade? Have you made sure your version of the FORTRAN compiler is compatible with the new OS?

I would think that if this were just an OS upgrade and no code changes, your code should function pretty much the same. It might be these problems existed before and users just lived with them. But now users may be blaming them on your upgrade.

However, I see you are also going from an ES40 to an ES45. Have you made sure that all of your disc drives are pointed to according to where the application is expecting to see them? That is, have you checked your logical name assignments?

Greg Miller
DECxchange
DECxchange
Regular Advisor

Re: ISTAT 30 error

Hello,
I compiled and linked a small piece of FORTRAN code that opened a nonexistent file.

PROGRAM open_file

IMPLICIT NONE

INTEGER iostatus,filun

filun = 1

OPEN(filun,FILE='decxchange.dat',STATUS='OLD',IOSTAT=iostatus,ERR=10)

10 CONTINUE
WRITE(6,*)'iostatus = ',iostatus

END

$ fort open
$ link open
$ run open
iostatus = 29

I could not reproduce an error 30. I'm not exaclty sure what circumstances you are running into without knowing more about your OPEN statement (what parameters you are using to open the file).

In fact, I tried using a LUN of 0, just in case your LIB$GET_LUN is not working for some reason. But I still got an error opf 29.

I also experimented with creating a file with the VMS CREATE command at the $ and then opening the file FORM='FORMATTED' and then FORM='UNFORMATTED' added to the OPEN statement. However, the file would open with an iostatus of 0, meaning it opened the file OK. I know that normally you can't mix formatted and unformatted under some conditions. But I haven't fully played around with this yet. I also messed around with RECORDSIZE And RECL. But I could not get an error 30. Just 29.

Greg Miller
DECxchange
Hein van den Heuvel
Honored Contributor

Re: ISTAT 30 error

As others indicated, you really want to get the program to display the underlying RMS STS (and optional STV) fields. That will be a good, permanent, investment to the file.

And you may want to monitor process quotas for a few sample users.

>> randomly, on different files, across several users
>> the longer use utilizes the application - the more likely it is to occur.

So clearly nothing to do with logical names or disk confusion during the migration, because if those are wrong it will never work.

Also not likely anything to do with file locking allthough admittedl timing may have changed.

And the error is on Open, so it can have nothing to do with a timing window on a record lock.

Those 'over time' error tend to be QUOTA related with some form of leaking added in.

So just monitor a 'prone' user for a while.
Ideally the SHOW PROC/CONT 'q' screen, but that's not available untill OpenVMS 8.3

But you probably have, or can google for, a DCL script with GETJPIs to display the various COUNTS vs LIMITS.
Or just use ANALYZE/SYSTEM ... SHOW PROC


Running out of channels? (SYSGEN/SYSUAF)
Running out of PAGFILQUO?

And monitor Virtual memory.
Are those files using RMS GLOBAL BUFFERS?
Running out of P0 (GETJPI FREP0VA)?
GBLSECTIONS, GBLPAGES?

What else changed?
Did you switch to XFC? SHOW RMS?
Were all 7.3-2 patches applied?

Hope this helps some,
Hein van den Heuvel
Willem Grooters
Honored Contributor

Re: ISTAT 30 error

For what it's worth: If you need to have a contunious view on evolving process quota, PQUOTA is a program that might be of help:
http://vms.process.com/scripts/fileserv/fileserv.com?PQUOTA

WG
Willem Grooters
OpenVMS Developer & System Manager
John Gillings
Honored Contributor

Re: ISTAT 30 error

re: DECxchange:

>Have any of these FORTRAN programs
>been recompiled since the OS upgrade?
>Have you made sure your version of the
>FORTRAN compiler is compatible with the
>new OS?"

Just to clear up this misconception.

On OpenVMS it is NEVER necessary to recompile or relink a correct, user mode program after upgrading OpenVMS. User mode code is guaranteed to be upwards compatible (OpenVMS engineering go to great lengths to fulfill that promise). It may be necessary for some other operating systems, but not OpenVMS (indeed, I believe there are some programs in the VAX Fortran regression test suite that were compiled and linked on VMS V1.0 in 1978)

There's unlikely to be any significant benefit to be gained from recompiling (with the possible exception of forcing you to make sure you know where your sources are so you don't lose them!).

You don't even need to recompile or relink to benefit from improvements or bug fixes in run time libraries, as you will automatically use the newer version.

Since compilers are essentially usermode text processers, there is no such thing as a compiler which becomes "incompatible" with a new version of OpenVMS. Many compilers support having multiple versions installed on the same system.

The biggest benefit of most new compilers is new features, which, by definition aren't required for existing code. There may be some bugs fixed (but again, working code doesn't need them), and there may be some improvements in generated code quality (but rarely significant enough to warrant dredging out all your old code for recompiling).
A crucible of informative mistakes
LM_2
Frequent Advisor

Re: ISTAT 30 error

After implementing the ERRSNS calls as recommended - we now see this - which is what I was expecting and does confirm my suspicions that the cause of the ISTAT = 30 is the fact that the process has exhausted all available channel allocation space. I cannot find any reason for this allocation failure within the code.

IN MODULE: BLINK_WAVE_WORK_ORDER
ERROR DURING OPEN
STATUS: 30
%RMS-F-CHN, assign channel system service request failed
%SYSTEM-F-NOIOCHAN, no I/O channel available

HOWEVER - we are also seeing this within the user log files - I have not seen this before, but is most likely related to the problem we're seeing with the istat 30 issues. ( Read the HP Wizard response below) I am not saying that we have the EXACT situation - but the "%DEBUGBOOT" prefix of the message worries me. This should not happen - it is my understanding that the DEBUGBOOT handler only kicks in after all other means have failed.

%DEBUGBOOT-W-CHN, assign channel system service request failed
%DEBUGBOOT-W-CHN, assign channel system service request failed
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000003000000
0000, PC=000000000019DCFC, PS=0000001B
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000003000000
0000, PC=000000000019DCFC, PS=0000001B
Steven Schweda
Honored Contributor

Re: ISTAT 30 error

mcr sysgen show CHANNELCNT

Done an AUTOGEN lately?
Steven Schweda
Honored Contributor

Re: ISTAT 30 error

Motivation:

alp $ help /mess NOIOCHAN

NOIOCHAN, no I/O channel available

Facility: SYSTEM, System Services

Explanation: The process exceeds the number of I/O channels that can be
assigned at one time.

User Action: Deassign another channel, or close a file and retry the
operation. Check for a program error that fails to deassign
channels or close files. Also check the SYSGEN parameter
CHANNELCNT to see if it is high enough.
LM_2
Frequent Advisor

Re: ISTAT 30 error

I have a two node cluster - here is what the channel count is on both nodes:

SYSGEN> SHOW CHAN
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
CHANNELCNT 2446 256 31 65535 Channels


SYSGEN> SHOW CHAN
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
CHANNELCNT 256 256 31 65535 Channels
Volker Halle
Honored Contributor
Solution

Re: ISTAT 30 error

LM,

the SYSGEN parameter CHANNELCNT specifies the maximum number of channels, a process can have assigned at one time. This parameter does get monitored by AUTOGEN. Consider to check and increase this parameter. The parameter is not dynamic, so you have to reboot to change it.

You can check running processes with:

$ ANAL/SYS
SDA> SET PROC/id=
SDA> SHOW PROC/CHAN
SDA> EXIT

to see, whether the no. of open channels gets near the value of CHANNELCNT. There may be a channel leak, i.e. a process may forget to deassign channels.

Volker.
Volker Halle
Honored Contributor

Re: ISTAT 30 error

LM,

typo warning - I meant to say: CHANNELCNT dooes NOT get monitored by AUTOGEN feedback.

CHANNELCNT=256 seems a little bit low on the 2nd node. Consider to set it to the same value as on the first node. Could 'randomly' mean, that processes on the 2nd node see the error, while those on the first node seem to work ?

Volker.
LM_2
Frequent Advisor

Re: ISTAT 30 error

I have been monitoring a few users - and what it seems like to me is.......these users's continually use the same "custom code" over and over again without exiting out of the command......the more times they use this piece of code continually - the more likely the chance they will get this error message. So, to clarify -they are in our picking software, pick a part, continue on to pick a different part......so they could be in the exact same command for over an hour.....and it seems like they reach a limit and are kicked out. That's why I am unsure about bumping up some parameters - cause at this point - I am not sure if it's a system (sysgen) issue - or a program issue.

I am not sure if it is happening on just one particular node or not. I can watch to clarify this.
Steven Schweda
Honored Contributor

Re: ISTAT 30 error

Around here:

ALP $ mcr sysgen show CHANNELCNT
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
CHANNELCNT 4096 256 31 65535 Channels


It's been a while, but there was probably
some reason for the "MIN_CHANNELCNT = 4096"
in my MODPARAMS.DAT. For some possibilities:

seach sys$help:*.RELEASE_NOTES channelcnt
Volker Halle
Honored Contributor

Re: ISTAT 30 error

LM,

the %DEBUGBOOT message arises, when the image aborts due to an error and the condition handles tries to lactivate TRACE.EXE to print the stack trace information. If there is a resource error (insufficient IO channels available) at that time, you will get this error message.

There should be no problems, if users execute the same code over and over again, EXCEPT if there is a coding error and some channel or file does not get closed.

Watch a user with SDA> SHOW PROC/CHAN/ID=

If there is a channel leak, you will see the no. of open channels increasing.

Volker.
Steven Schweda
Honored Contributor

Re: ISTAT 30 error

> [...] I am not sure if it's a system
> (sysgen) issue - or a program issue.

Some of each, I'd guess. In any case, you
know what to look for.
Volker Halle
Honored Contributor

Re: ISTAT 30 error

LM,

can you check the value of CHANNELCNT on your old V7.2 system ?

I would also suggest to use SET PROC/DUMP so that you'll get an image dump, if the image incurs an improperly handled condition, but as the %DEBUGBOOT-W-CHN error already indicates, there may be not enough resources (i.e. channels) available for an image dump to be written...

Volker.
LM_2
Frequent Advisor

Re: ISTAT 30 error

I did check my old system and I had it set to

MIN_CHANNELCNT = 2446


my one cluster does have it in the modparams.dat already with 2446 - so I added it to my other node and as soon as I can reboot - I will see if that will fix the problem.
Hoff
Honored Contributor

Re: ISTAT 30 error

FWIW, having a requirement for 2,000 channels implies a fairly unusual application. Your description here could indicate that there are channels being allocated, but not released. In other words, you may have a channel leak somewhere.

Use ANALYZE/SYSTEM and display one of these running Fortran processes at intervals, with SHOW PROCESS/CHANNELS or other such. If you see the numbers of channels increasing over time, then increasing CHANNELCNT is a stopgap on the way to an eventual failure.

And as for rebuilding your source code, there are specific cases were rebuilding is required due to bugs in the generated code. (See the srm_check tool for details of one of the more notable cases where this occurred. http://mvb.saic.com/freeware/freewarev40/21264/ and there's a long-standing case with the VAX C code generation (VCG) and its known problems with MD5 that's not likely going to get fixed.) There are also cases where the newer compilers will better spot latent bugs, or generate better codes. So there can be some advantages to rebuilding from source, even for the many cases where it's not required.