Operating System - OpenVMS
1748288 Members
3320 Online
108761 Solutions
New Discussion юеВ

Re: DS15 - LOCKMGRERR crash

 
SOLVED
Go to solution
Jan van den Ende
Honored Contributor

Re: DS15 - LOCKMGRERR crash

Volker,

This node was rebooted after 159 days because of the tape MDR: the driver for $2$MGA had received a wrong SCSI bitmask. Obviously a know problem, and only to be cleared by reboot. (and NO patches coming anymore, because MDR is EOL! How did that stuff EVER qualify for use under VMS?)
24 hours after the reboot this crash happened.


Clue mem/stat:
Successful pool expansions : 0
Unsuccessful pool exp : 0
Various "Failed" stats: all are 0

SHOW PAGE/S2/FREE:
not sure how to interpret what I see.
Mapped addr:
counting down in steps of %X4000, 8000, C000, 10000, 20000 for the first couple of pages
PTE addr:
conting down in (irregular?) multiples of 4, like 18, 30, 1C , C0
PTE:
counting down in rather big steps (all ending 0000)
Count:
small numbers, single digit except the last one: 3F7
But what does that mean?


exa @LCK$AR_POOLZONE_REGION+80;20

4F9A6A - 25C1 - 445A 1A
Again, what does that mean?


system seems to be badly tuned in general

Care to elaborate?
Any suggestions for improvement?


Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: DS15 - LOCKMGRERR crash

Ian,

Indeed, that is what HELP/MESS INSVIRMEM offers as possibility, and I already installed an extra Gb of pagefile. But it makes me wonder WHY all of a sudden (after a reboot!!) so much pagefile was needed, because we monitor pagefile use, and try to never need it whatsoever.
(Then again, this IS the one small machine in the cluster).

Proost.

Have on on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: DS15 - LOCKMGRERR crash

Jan,

EXE$GL_FLAGS: ...,pgflfrag,pgflcrit,...

This says, that the page file has been severely fragmented and critically full during the uptime of the system (which is just 1 day). Look at the current situation at the time of the crash with:

SDA> CLUE MEM/FILES

SDA> SHOW PAGE/S2/FREE shows the amount of free PTEs in the S2 free page list. If the lock manager needs to allocate more RSBs and LKBs, it may need to expand it's pool zone in S2 space and would need some free S2 PTEs. Only the count fields would be interesting.

Were there any free physical pages SDA> SHOW PFN/FREE ?

If you've copied the LCKMGR POOLZONE counters from right to left, it would be:

hits: 4F9A6A
misses: 25C1
expansions: 445A
failures: 1A <<< normally this counter is 0

NOTE: you've seen an INSFMEM error, not an INSVIRMEM ! Lock manager resources are in S2 space, which is NOT paged, so pagefile space problems cannot cause this crash.

If this is 'the small machine' in the cluster, it might just not have had enough resources to receive the lock/resource tree being moved to it.

Volker.
Jan van den Ende
Honored Contributor

Re: DS15 - LOCKMGRERR crash

Volker,

NOTE: you've seen an INSFMEM error, not an INSVIRMEM
Sorry, typo in the posting. I used the actual message in HELP.

SHOW PFN/FREE
*** List is empty ***

Looks we pinned it down!
Maybe a budget request for more memory is in order.
A bigger pagefile has already be installed.

Thanks!

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.
Jan van den Ende
Honored Contributor

Re: DS15 - LOCKMGRERR crash

Sufficiently explained.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Ian Miller.
Honored Contributor

Re: DS15 - LOCKMGRERR crash

parhaps you need to set LOCKDIRWT system parameter to keep lock directory load off this node.

More memory is always a good thing.
____________________
Purely Personal Opinion
Volker Halle
Honored Contributor

Re: DS15 - LOCKMGRERR crash

Jan,

maybe - just maybe - you've run BACKUP to test access to the tape after the reboot ? And backup has used lots of memory and pulled over the resource tree of the disk (due to it's lock activity) ?

Volker.