Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Lock Flags lck$m_recover and lck$m_protect

SOLVED
Go to solution
Richard J Maher
Trusted Contributor

Lock Flags lck$m_recover and lck$m_protect

Hi,

A recent post here regarding how Rdb uses its "FREEZE" lock to recover from process failure (where the process had been holding critical protective locks on a given resource) got me wondering why it works like the Guide to Performance Monitoring and Tuning says it does. (To be fair there is a big disclaimer that all the nasty bits have been left out but still. . .)

But let me start the question from scratch: -

Q:) If I wanted to take out a VMS lock with $ENQ that would survive process rundown (hard or soft) how would I do it?

a) I am in inner mode (say EXEC)
b) I have all the privileges in the world.
c) Yes I know that A.N.Other process will have to take ownership of the lock and cleanup for me.

A:) Lock with lck$m_protect and recover with lck$m_recover???

i) In the mean time VMS will hold the lock on the resource ad infinitum
ii) If a new process does not resume control of the lock it's stuck forever
iii) I don't know what would happen if a Recover of a lock was attempted while the original owner was still alive?

Is being in EXEC mode enough and the locks live on regardless (I think not) So what do these flags do, or more to the point "How do I create a lock thet will survive process rundown?"

Cheers Richard.

PS. Any $getlki considerations? Who owns the lock when the creator is dead?

PPS. The way Rdb describes it in the manuals is so full of holes and race conditions that it simply cannot work
8 REPLIES
Volker Halle
Honored Contributor
Solution

Re: Lock Flags lck$m_recover and lck$m_protect

Richard,

you need to create a System-Owned Lock if you want the lock to survive process rundown. Use the LCK$V_CVTSYS flag (from EXEC or KERNEL mode). Reserved to Digital (see IDSM Chapter 11.3.5).

The XQP is a good example of an OpenVMS component, which uses system-owned locks. You can see them with SDA> SHOW LOCK - look for locks with PID: 00000000

Volker.

Richard J Maher
Trusted Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Hi Volker,

Sounds good. If you tell me what IDSM stands for and possibly a web pointer then I'll pass on the extra point :-)

I = Internal
D = Data
S = Structures
M = Manual

I = Infrastructure
D = ?

A code snippet would be nice. Yes, your XQP reference should be enough for anyone and I do have a copy of the VMS source somewhere that hopefully doesn't predate XQP, but I reakon a reasonable working example of a process dying and another taking over would be about 200 lines *of COBOL* long. Yell out the psuedo-code and I'll do it.

So a System lock in inner-mode is enough? How does the recover job take it over?

Cheers Richard
Volker Halle
Honored Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Richard,

IDSM = OpenVMS Internals and Data Structures Manual, the 'OpenVMS Bible' written by Ruth Goldenberg/Saro Saravanan - sorry no WEB pointer.

If this is all about the RDB FREEZE protocol, I've googled and found:

http://www.kuzbass.ru:8083/docs/rdb702/oraclerdb/gdpt7/gdp_profile_015.html

Chapter 3.8.3.4 seems to explain the underlying algorithm. Each process requests a CW lock on the FREEZE resource and RDB seems to check that the process has this lock granted in CW mode, whenever the process is granted a lock request for a database resource.

I would also assume, that the RDB monitor uses some kind of lock algorithm (e.g. EX lock for each process) to detect abnormal process termination. Aren't there all these 'received user image termination' messages in the monitor logfile ?

Volker.
John Gillings
Honored Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Richard,

I think you're going way over the top with this! You don't need to create a system lock, or specify any special flags to implement process failure recovery, you just need a "deadman" mechanism and appropriate protocol within your application.

Perhaps you're also confusing "lock" and "resource". The RESOURCE will survive, cluster wide as long as there are any locks queued or granted against it.

In its simplest form, write a program (unprivileged) that starts by queueing an $ENQWs an EX mode lock against a resource name of your choosing. The process will block a the $ENQW if the lock is not granted.

Now suppose we start multiple processes, possibly on multiple cluster nodes, each running this program. The first to reach the $ENQW will have its lock granted, all the others will queue up their locks behind it (note, each process has its own lock, but against the same resource). If the first process exits the image for any reason, the granted lock will vanish, resulting in the next lock queued to the resource being granted. The next process will then complete the $ENQW and continue execution. It can then check for evidence that it needs to cleanup anything before taking over the function of the first process.

This is symmetric as all processes execute the same code. It's more or less infinitely extendable, just fire up another process to increase the level of protection. Indeed, the process could create a new copy of itself immediately after being granted the "you're it" lock. Just make sure you've got a mechanism to kill the chain when you want/need to!

In a data base or client/server model, a client could create a resource for itself, $ENQW an EX mode lock to it, then send the resource name to the DBMS as part of the protocol of opening the data base. Some data base "master" process then $ENQs (note *asynch*) an EX mode lock against the resource, with an AST to be executed when the lock is granted. If the AST fires, the DBMS process can check to see if that process cleaned up itself, or commence recovery. Again, the resource will become free if the process holding the lock (or system it's running on) terminates for any reason.

Sometimes client/server models have a full handshake, that is, the client has a deadman lock on the server. For that model, you need to set the "no deadlock search" flag, because you're deliberately creating a deadlock which can only be broken by the failure of one or other process.

Properly written, you can avoid holes and race conditions, it's just a matter of designing your state machine properly.
A crucible of informative mistakes
Richard J Maher
Trusted Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Hi Volker,

You can get an up to date copy of most Rdb manuals at oracle.com/rdb. If you re-read my initial paragraph you will see that I referred to same there. I obviously acknowledge that Rdb recovery just works, and that they can't describe everything that happens during recovery, in a couple of paragraphs.

Having said that, let me say this :-)

1) Both of us have recently read the FREEZE lock description and, prima face, a timing window issue or race condition exists. EG:

a) Process A has a lock on Row 1 and has updated Employee Salary incorrectly to 100M, holds the lock on the row but has yet to commit the changes.

b) Process B is the payroll that wants to transfer 1/26 of 100M to his bank account but can't aquire the lock that process A owns.

c) Process A drops dead for what ever reason. Now the Rdb Monitor may execute at a priority of 15 but with SMP systems and clusters (with busy busy busy interconnects) Exactly how long is it before the monitor realizes that process A is dead then triggers a BLAST to everyone else (including process B) that they should frezze???

d) Process B's got the lock before the monitor can create a DBR and that DBR can recover. End of story?

OK Rdb must do something to prevent this from occuring, but let's get outside the box and go El Fresco. Process A had a system owned lock, had its CompAST and BlkAST routines in System Space somewhere and would wait for the DBR to come up and take ownership of Proc A's locks. *NO* race condition. The locks were never given up. Sonny clung to the cliff until Skippy got back!

All comments welcome.

Cheers Richard
Richard J Maher
Trusted Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Big John,

How could we have come this far together, yet understand each other so poorly?

Yes, it's me - "TIER3, Simply a better RPC!" from VMSNOTES. Remember?

Anyway, doesn't matter. But if you had Read The Fine Manual that I sent you, you'd know that I have more than just a fleeting association with the VMS Lock Manager and how it works with everyday applications. The T3$_NOCOMSRV status being just such an example that you spoke of.

What's that you say? You have "time management" and "motivational" issues with the unsolicited (ITRC decorum button) that I sent you? A big "Must try harder!" stamp for you :-)

Anyway, please see the reply to Volker and thanks for responding in the first place. But if we can move forward on the premise that "I have not hit the HOLD button" then that would be just peachy :-)

Cheers Richard

PS. The VMS Tech journal is looking for you to describe *in detail* *exactly* what IPL has to be achieved and how to move known code to System Space for AST execution. (Yes I know there are several examples of Sched$ast (sp?) in COV that one could copy but your passion for detail will make yours all the more useful!

PPS. As discussed I've changed "Dick" Maher to "Richard J". Did it look odd? No one in Europe called me Dick but since most people in WA know me from high school, Dick it is. Hey I been (and will be :-) called worse.

PPPS. If they ever get rid of you I'm going as well. (Not that there's alot of choice in Perth :-) Ooops 4 Beer alert!
Volker Halle
Honored Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Richard,

I'm not a database person, but...

If process A has not yet committed the transaction, process B does not yet see the changed value (100M).

Volker.
Richard J Maher
Trusted Contributor

Re: Lock Flags lck$m_recover and lck$m_protect

Hi Volker,

I'm tired and the Winter Olympics have been snowed off and I'm waiting 15mins for the news 'cos everything else on Oz TV is rubbish!

But please try to work with me on this. I know you're a brainy guy, and DEC letting you go was a crime (as they say in Jamaica "Maximum!")

[I can't see your reply to cut and paste bits out of :-(]

The gist being, conjure up whatever scenario you like. Why can't they see the change? 'Cos their *was* a lock on the record. The UNDO logic is in the RUJ.

I don't particularly wish to discuss the bowels of Rdb recovery. I want "legacy" locks! Does anyone care to discuss how to achieve this?

Cheers Richard Maher