Operating System - OpenVMS
1753875 Members
7433 Online
108809 Solutions
New Discussion юеВ

Re: $GETRMI returning SS$_SUSPENDED

 
SOLVED
Go to solution
Mark Finn
New Member

$GETRMI returning SS$_SUSPENDED

This error code is not documented as a possible return value of $GETRMI. What does it mean, and what might I be doing wrong to cause it? Thanks for any help anyone can give.
13 REPLIES 13
Jon Pinkley
Honored Contributor

Re: $GETRMI returning SS$_SUSPENDED

Can you show us the code?

Just the minimum to reproduce what you see?

Jon
it depends
John Gillings
Honored Contributor

Re: $GETRMI returning SS$_SUSPENDED

Mark,

You're probably not doing anything wrong. The exact reason it most likely dependent on what your item list is asking for and the state of the system at the time of your call.

Generally the system uses SS$_SUSPENDED when "something" is preventing the collection of data from a process. For example, SHOW PROCESS/CONTINUOUS will say "Process is suspended" when it's really in a MWAIT state, and thus can't respond.

Although it's not really a correct usage of SS$_SUSPENDED, it's expedient.

I'd guess that you're asing for a statistic that needs to be gathered from another process, and at the time the process is not responding. This may be a symptom of a real problem, or could be just timing (for example, the process is in RWSCS, waiting for a response from another node).

Post a summary of your item list, and maybe we can have a guess as to which item is the cause.

Of course, I'm assuming that this is an occasional, transient error. If it's repeatable, can you trim down your item list to the minimum required to get the error?

If it's transient, you need to decide what data to use for your missing sample point. Zero? Infinity? Missing? What data does $GETRMI return (if any)?
A crucible of informative mistakes
Mark Finn
New Member

Re: $GETRMI returning SS$_SUSPENDED

$GETRMI returns good status; the SS$_SUSPENDED is coming in iosb.L0.

The error happens every few minutes, so repeating it is not a problem. Unfortunately, while I could iteratively remove itemcodes until I find the culprit, it would be very impractical because I'd have to release the software each time (it's running in a production environment and is not having this problem in the development environment - of course), and each release takes months.

I can list the itemcodes. Here they are:
RMI$_CPUIDLE, RMI$_CPUINTSTK, RMI$_CPUMPSYNCH, RMI$_CPUKERNEL, RMI$_CPUEXEC, RMI$_CPUSUPER, RMI$_CPUUSER, RMI$_DIRIO, RMI$_BUFIO
Jon Pinkley
Honored Contributor
Solution

Re: $GETRMI returning SS$_SUSPENDED

Mark,

What is different between the development environment and production environment? Lack of load? Single processor vs. SMP? Different versions of VMS? Different architectures?

What version of VMS is running on your production server, and what type of processor is it?

Here's the description from the SSREF manual (July 2006, OpenVMS I64 Version 8.3 OpenVMS Alpha Version 8.3)

--------------------------------------------------------------------------------

$GETRMI

Returns system performance information about the local system. $GETRMI is an asynchronous system service and requires the $SYNCH service or another wait-state synchronous mechanism to guarantee that the required information is available. There is no synchronous wait form for this system service.
For additional information about system service completion, see the Synchronize ($SYNCH) service.


--------------------------------------------------------------------------------

Format
SYS$GETRMI [efn] [,nullarg] [,nullarg] ,itmlst [,iosb] [,astadr] [,astprm]


--------------------------------------------------------------------------------

So, if you are using only the documented functionality, it isn't clear to me what process it could be waiting on (re John Gillings' comment). It isn't like $GETJPI where process specific information is being retrieved, and the documentation suggest it can't return information from another node in a cluster.

The data being requested is coming from cells in S0, for the itemcodes you list.

Are param 2 and 3 specifying 0 by value, or are you treating them like a $GETSYI call?

Does your $getrmi call look similar to this (lifted from http://www.eight-cubed.com/examples/framework.php?file=sys_getrmi.c )

r0_status = sys$getrmi (efn,
0,
0,
itemlist,
&iosb,
0,
0);


Are you using an event flag or ast completion for notification that the data is ready? If you aren't synchronizing, that could explain why it appears to work on a lightly loaded system, but sometimes fails on your production system.

$GETRMI is much more likely to have bugs than $GETSYI, since it is relatively new (7.3-2?) and is probably used much less frequently than $GETSYI. So it is possible that there is a bug or undocumented feature. But there is also the possibility that your code has a bug, and since we can't see how you are calling the service, and what synchronization you are using, a bug there can't be ruled out.

Can you try running the program on your test system with a low priority, and generate some load with something like
sys$test:uetp.com and possibly some compute intensive processes?

Good luck,

Jon
it depends
Hein van den Heuvel
Honored Contributor

Re: $GETRMI returning SS$_SUSPENDED

I concur with Jon's initial question, and detailed follow up... show me the code!

>> This error code is not documented as a possible return value of $GETRMI.

Correct, and t looks like it is not an system service return code.

>> $GETRMI returns good status; the SS$_SUSPENDED is coming in iosb.L0.

Is the code waiting to look into the iosb untill it is done, typically after a $synch call?

What is the scope of the iosb variable?

The only system service documented to return SS$_SUSPENDED is SYS$GETJPI. Is the programm also using that service?

Is the program using the same iosb for both calls?

Good luck,
Hein.



John Gillings
Honored Contributor

Re: $GETRMI returning SS$_SUSPENDED

Mark,

> $GETRMI returns good status; the SS$_SUSPENDED is coming in iosb.L0.

This is normal for an asynch service. The good return status means your call was well formed. The iosb sattus is the result of the request. I'm assuming you're properly synchronizing with $SYNCH, or equivalent?

Of your item codes, my suspect is DIRIO and BUFIO. All the CPU stuff is readily available from system data cells, but, depending on the definition, I/O counts might require gathering data from all(?) processes on the system. So even one process in an MWAIT state might give you SS$_SUSPENDED.

Looking at the time series of the data you're gathering, do you see any pattern in the returned data depending on the status? Try outputting your samples in T4 format, add a column with 0/1 depending on the state of SS$_SUSPENDED. Look at it under TLVIZ. First cut just do a "CORRELATE" against the status column.

>it would be very impractical because I'd
>have to release the software each time
>(it's running in a production environment
>and is not having this problem in the
>development environment - of course), and
>each release takes months.

So you need to write a baby program that just exercises this issue. I'd break the item list into two. One with all the CPU stuff, and one with the IO. Run it on your production system, in parallel with your production code. Yes, you'll get arguments, but do they want to answer this question or not? I'd also add items checking counters of process states. See if there are any MWAIT states reported, and if they correlate with the SS$_SUSPENDED.
A crucible of informative mistakes
Jon Pinkley
Honored Contributor

Re: $GETRMI returning SS$_SUSPENDED

With high probability, RMI$_DIRIO is coming from PMS$GL_DIRIO, and RMI$_BUFIO is coming from PMS$GL_BUFIO. If that were not the case, and instead the PHD$L_DIOCNT and PHD$L_BIOCNT fields from every PHD were being summed on each call, the values returned could decrease between calls, as some processes may have terminated since the previous call.

I think Hein's conjecture about the IOSB being shared with a $GETJPI call is much more likely.

The following is a great checklist when programs fail intermittently. It specifically addresses synchronization bugs that SMP systems tend to bring out of hiding.

http://h71000.www7.hp.com/wizard/wiz_1661.html

Since you have specified you are using an IOSB, and assuming you are using $synch, make sure that the IOSB is in memory that remains valid for the duration between the initial $getrmi and the $synch (using static storage is by far the easiest method to ensure that), that the $getrmi and the $synch are using the same iosb, and that nothing else is using the memory used by the IOSB (don't share this static storage with other concurrent asynch operations, For example an asynch $getjpi using the same IOSB as a concurrent $getrmi could produce results like you see.)

The following are some threads that describe problems that can arise with incorrect IOSB usage.

sys$qiow(efn$c_enf,...,iosb,...) - must iosb be specified?

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1163915

ASTs corrupting stack frames in DECC 6.5 /optimize

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=942947

Good luck,

Jon
it depends
Jon Pinkley
Honored Contributor

Re: $GETRMI returning SS$_SUSPENDED

Here is good circumstantial evidence that the PMS cells are the source of RMI$_DIRIO and RMI$_BUFIO items.

The sys_getrmi_dir_buf is a slightly modified version of sys_getrmi.c from James Duff's examples. See attached .zip file that contains everything you need to recreate the source code.

Here's an example run on a 4 processor ES40 running VMS 7.3-2

OT$ analyze/system

OpenVMS (TM) system analyzer

SDA> read sys$loadable_images:sysdef
%SDA-I-READSYM, 10724 symbols read from SYS$COMMON:[SYS$LDR]SYSDEF.STB;1
SDA> eval @pms$gl_dirio
Hex = 00000000.13C1D0BB Decimal = 331468987
SDA> eval @pms$gl_bufio
Hex = 00000000.07F87943 Decimal = 133724483
SDA> spawn run sys_getrmi_dir_buf
DIRIO: 331470726
BUFIO: 133725603
SDA> eval @pms$gl_dirio
Hex = 00000000.13C1D979 Decimal = 331471225
SDA> eval @pms$gl_bufio
Hex = 00000000.07F87EBC Decimal = 133725884
SDA>

Jon
it depends
Richard J Maher
Trusted Contributor

Re: $GETRMI returning SS$_SUSPENDED

Hi Mark,

I agree with others here, in so much as something else could be stomping on your IOSB. (What is ss$_suspended in ascii perhaps?)

One other option may be a TCP/IP $qio which can perfectly-well return ss$_suspended (A quick glance says I have some Multinet-specific code that I think is more to do with spurious ss$_shut rather than suspended)

Can't see why a $getrmi for the local system would ever use a TCP/IP call, but who knows?

Cheers Richard Maher