Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Non-paged dynamic memory

Wim Van den Wyngaert
Honored Contributor

Non-paged dynamic memory

I have about 60 AlphaStations with the same hardware and software. Only the name and address are different (VMS 7.2).

58 stations consume 2.5 MB np dyn mem. 2 consume 6.5 MB. This while using the same programs.

I checked with sda show pool/sum.

I found following big differences.
1) ORB normally takes 25 K but takes about 1 MB on the 2 stations
2) UCB normally takes 85 K but takes almost 2 MB on the 2 stations
3) MISC normally takes 100 K but takes 1.5 MB on the 2 stations

What does it all means and how can I find the process causing it ?

A virtual Duvel for he who solves it.

Wim
Wim
26 REPLIES
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

The Duvel is for me !

I found that BYTLM had been eaten too for certain processes. Then I found that 85% of available channels of that process were taken. Network problems caused the application to re-connect but without releasing the network devices.

Wim
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

let's start looking at the UCBs (these should be Unit Control Blocks for devices).

SDA> SHOW POOL/HEAD/TYPE=UCB

type return until the list ends. It should tell you the UCB packet count and the bytes used for all those UCBs.

Then look at some of the UCB addresses found with SDA> SHOW DEV/ADDR= - take some addresses from the end of the UCB packet list.

Which kind of devices are these ?

Then do the same with /TYPE=ORB. ORBs are Object's Rights Blocks. What's the no. of packets ? You can format some of the ORBs with

SDA> READ SYSDEF
SDA> FORMAT

The associated device name is pointed to by
ORB$L_NAME_POINTER, so after you formatted an ORB, try:

SDA> EXAM @(.+orb$l_name_pointer);8

If the count of UCBs, ORBs and MISCs is about the same, you'll known that these packets somehow all are related to each other.

Volker.

PS: My guess would be: something from TCPIP, but you'll find out soon...
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Volker,

Show dev net/fu shows a lot of devices allocated by a few processes. But in TCPIP there is no device and in decnet no connection. What could be the problem ?

The application people don't know why.

Wim
(half the Duvel is for you)
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

so the UCBs/ORBs/MISCs are for which devices - TCPIP or DECnet ?

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

The evidence is already gone (had to reset the stations). But it was decnet+. The proceses connect to processes on server node. If that fails, they retry in a loop.

But my main question is : why is the net device not deallocated ? Bug in decnet (7.3 eco 3) ?

Wim
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

how are you going to ever document, analyze and find errors, if you 'delete the evidence' ? A forced crash would have provided the same results (after the reboot), but would have captured all the info needed...

If we assume those were NETn: devices, may there is a $DASSGN missing ?

Volker.
Ian Miller.
Honored Contributor

Re: Non-paged dynamic memory

The MISC could be buffers for the buffered I/O requests associated with the decnet+ devices.
But without a crash dump to enjoy its hard to say.
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Sorry but too late for the crash dump. I'll take one next time.

I took the source of the application and it seems Volker is right. There is a dassgn missing (or my C knowledge is too bad and I misunderstood the coding).

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Volker and others,

Same problem again on a different node. This time with a crash dump.

This time all non paged dynamic memory is gone (and node went into hang). Show pool/sum shows 52% is going to type unknown. Any suggestions ?

Wim
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

it may not be trivial to find the underlying problem, but let's start like this:

SDA> SET OUT x.x
SDA> SHOW POOL/NONP/HEADER/TYPE=UNKNOWN
SDA> EXIT

$ EDIT/READ x.x

Go to the bottom of the file and look at the headers (first 4 longwords) of those unknown packets. Looking at the bottom of the file is based on the assumption, that 'something happened' and then some code started allocating those packets in some kind of loop. Can you spot any particular pattern (size, header contents) ?

Another possibilty would be to show all packets from non-paged pool (without /TYPE=UNKNOWN) to a file and apply the same kind of 'looking around'. You may see a repeating pattern of packet types filling up pool.

Consider to compare pool usage (packet types) with the running system to determine, if the high percentage of Unknown packets is the real problem.

If you've spotted some patterns, consider to include part of the SDA output in an attachment, but not a full pool listing ;-)

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

The output of the first command seems to contain some Sybase server related info.
But why/what ? I'll post the output.

Wim
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

the summary of your SDA> SHOW POOL/NONP/TYPE=UNKOWN output seems to indicate Unknown packets only consuming 7.4 from 55 million bytes of nonpaged pool.

Does CLUE MEM/STAT show expansion/allocation failures ?

Could you post SHOW POOL/NONP/SUMM ?

Don't expect a fast resolution (at least not with no access to the dump).

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Volker,

I was confused. I saw 52% and I thought it was 52% of the whole NP dyn mem. I checked on the running system and it also takes about 50% unknown.

I also posted the output of show mem, the prove that the memory was gone.

VPA also proved that it was gone since the moment VPA started during boot.

Wim
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

your attachment looks like an SEA errlog analysis. Could you please post the SDA> SHOW MEM/POOL/SUMM output ?

If you're running VPA, you could tell, when the problem started. Did pool decrease slowly over time or did it just decrease suddenly ?

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Sorry Volker.

VPA didn't see a decrease. When it started less than 1 MB was free.

Wim
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

correction:

SDA> SHOW POOL/NONP/SUMM

About some of the Unknown packets:

TCPIP does not use a standard OpenVMS packet header, but you can identify the TCPIP packets, which show up as Unknown:

If the pool packet size and the 4th longword of the packet header (count from right to left) have the SAME value and the VMS packet size word (low word in 3rd longword) is ZERO, it's most probably a TCPIP packet.

Volker.
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

let's try to re-assess the problem description:

- are we talking about unexplained static nonpaged pool consumption on 2 of your 60 Alphastations ?
- if so, is it still ORBs, UCBs, and MISCs ?

- by the time VPA starts during startup, most of nonpaged pool is already consumed.

- and today one of those 2 stations hung and you forced a crash.

Anything else ?

Could you please attach the SDA> SHOW POOL/NONP/SUMM output ?

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Volker,

This is not the same problem. This time it concerns a GS160 node of an interbuilding FDDI cluster. I found it in hang and forced a crash. And after 2 minutes of boot, the memory seems to be gone.

Sorry for the mistakes but I have to use different tools to get it on my pc.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Now HP nailed me ...
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

looks like we need to take a 'big step back' and re-assess the whole problem:

The summary line of your SHOW POOL/NONP/SUMM command says:

Total space used: 00D7AC80 (14134400.) bytes out of 0354E000 (55894016.) bytes
in 00002ED8 (11992.) packets

Total space utilization: 25.3%

I don't think that the system hung due to a pool consumption problem. Please check SDA> CLUE MEM/STAT - any nonpaged pool failures ?

Now if nonpaged pool on your running GS160 is all consumed, you would want to post a SDA> SHOW POOL/NONP/SUMM from the running system. But lets make sure you correctly relate the postings etc. for the 2 different problems.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Clue mem/stat of crash
(don't refer to original problem any more)
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

Show mem of crash
Wim
Wim Van den Wyngaert
Honored Contributor

Re: Non-paged dynamic memory

show pool/nonp/sum of running system.

(has less load now : no Sybase servers active)
Wim
Volker Halle
Honored Contributor

Re: Non-paged dynamic memory

Wim,

something is not right here:

- the CLUE MEM/STAT from the crash shows 180 successful pool expansions, but no Expansion or Allocation failures at all.

- the SDA> SHOW POOL/NONP/SUMM from the crash shows 14 mio from 55 mio used, usage=25%

- the SDA> SHOW POOL/NONP/SUMM from the currently running system shows: 14 mio from 50 mio used, usage=28%

(the above 2 observations seem to be somehow consistent, but do not indicate a nonpaged pool shortage).

- yet SDA> SHOW MEM/POOL from the dump shows 0.7 MB free of 53 MB - I tend to NOT believe that.

Volker.