File Lock / Record Lock Interpretation

JJABEZ10 · ‎05-16-2013

Good Morning All,

We have a requirement where we need to run the Batch cycle along with the online transactions. During this time we need to monitor File locks & record level locks. Due to locks, is there any delay in response times of Online transactions.

We have collected the stats -using the below command,

$ MONITOR RMS/FILE=FILE_NAME/ITEM=ALL -/RECORD=FILE1.DAT/NODISP

Played back the recorded file and lock stats as below,

RMS FILE LOCKING STATISTICS

Active Streams: 1 CUR AVE MIN MAX

New ENQ Rate 1839.07 251.68 0.00 2899.46

DEQ Rate 1839.07 251.67 0.00 2899.46

Converted ENQ Rate 3985.05 526.56 0.00 6921.88

We are not sure how to interpret the above stats. From the above stats, we could understand that some lock happens in the file, but not more than that.

Googled and found the below,

New ENQ Rate - Rate of new lock (ENQ) requests (as opposed to conversions)
Converted ENQ Rate - Rate of lock (ENQ) conversion requests
DEQ Rate - Rate of unlock (DEQ) requests

Can someone please explain us in detail - what exactly happens here? When it says CUR NEW ENQ Rate is 1839.07 - what count is 1839.07? Hope it is not the number of locks in file during that time as it has decimal value ?

Please explain.

Thanks,

Regards,

John Jabez

Steven Schweda · ‎05-16-2013

John Gillings · ‎05-16-2013

John,

I don't think you're measuring what you want...

The $ENQ system service deals with two things. "Resources" and "Locks". Resources are things for which access needs to be synchronised, like files and records. Locks are the mechanism used by processes to coordinate their actions with other processes. A particular resource will have one or more associated locks, representing different threads of execution with an interest in the resource.

A process which wants to access a shared resource takes out a lock against that resource. The OS resource object will be created when the first process enqueued a lock against it. It will be destroyed when the last process dequeues its lock.

When a lock is requested, the process will specify a "mode" which states what level of sharing the process is requesting against the resource. These are:

NUL - unconditional sharing with all

CR - concurrent read - sharing with other readers and writers

CW - concurrent write - sharing with other readers and writers

PR - protected read - sharing only with other readers

PW - protected write sharing only with other writers

EX - exclusive - no sharing

A particular lock request will be granted if the mode of the request is compatible with existing granted locks on the same resource. If it's incompatible, the lock will be in "waiting" state, and the process should not proceed with the intended access (though realise there's nothing physically preventing it). There may be a queue of locks waiting on a particular resource, and there may be multiple granted locks for the same resource, provided they're all compatible.

Creating a new lock is a relatively expensive operation, especially if it's the first lock on a particular resource, as the resource needs to be created. Similarly, destroying a lock may involve destroying the resource object. If a process expects to interact with the resource more than once, rather than multiple create/delete cycles, it's better to create the lock once and "convert it" between modes as required.

Typically the initial $ENQ operating is NUL mode, which is always granted immediately. As the process wishes to lock and unlock the resource it can CONVERT up or down to the appropriate mode. When the process has finished with the resource the lock is $DEQed, destroying the lock object, and possibly the resource object as well, if there are no other locks associated with it.

So what does all that mean to you?

What you're measuring is the rate (per second) of the above operations. "New ENQ rate" means the number of $ENQ calls in which a new lock was created. The "DEQ rate" is the number of calls to $DEQ, meaning the locks were destroyed. "Converted ENQ rate" is the number of calls to $ENQ in which the mode of an existing lock was changed.

So, this is telling you about the RMS locking activity, which I don't think is of much interest to you. I suspect you're more interested in contention. That is, how often was a lock request was delayed because of existing incompatible granted locks.

I don't think there's a way to drill down to that information using monitor RMS, but notice that your display says "Active Streams: 1", which means there can be no contention!

MONITOR LOCK will show you the system wide "New ENQ Rate", "Converted ENQ Rate" and "DEQ Rate" as described above. It also shows "ENQs Forced to Wait Rate" and "ENQs Not Queued Rate" which gives you an indication of the system wide contention. If that changes consistently during your batch cycle you might infer that it was caused by something in the batch jobs.

MONITOR LOCK also gives the "Blocking AST Rate" - when requesting a lock, you can specify an AST routine to be executed if some other process requests a lock which blocks against your granted lock. It also shows the "Deadlock Search Rate" and the "Deadlock Find Rate". A deadlock is when I'm holding a lock and waiting on another lock while you're holding the lock I'm waiting for, and waiting for the lock I'm holding. This is usually an indication of poor design, and hopefully doesn't happen.

For more detail see HP OpenVMS Programming Concepts Manual, Chapter 7

Synchronizing Data Access and Program Operations

A crucible of informative mistakes

Hein van den Heuvel · ‎05-17-2013

As per Steve, you may want to indicate roughtly what is going on... OpenVMS (sure... version?), RMS (sure, which language), RMS-RU-Journalling? RMS Buffer settings (Global/local)? Alpha/Itanium? Cluster/Single system?

?DUPLICATE ALTERNATE KEYS IN PLAY?

(I've analyzed and fixed situations where a single record update would cause thousands of read IOs due to millions of duplicates for a target alternate key value)

>> We have a requirement where we need to run the Batch cycle along with the online transactions.

Of course. And the batch jobs are updating right? Are they written to WAIT for (record) locks under contention or do they retry or what? Do they deal with a record at a time or do they collect locks on multiple records and $free them at the end of a (logical) transaction? Ditto for the online transactions: what is the contention handling design?

Batch jobs performing updates best have HIGH or equal processing priority such that they get CPU cycles to do the work and release lock once they get in.

>> During this time we need to monitor File locks & record level locks.

Ok, monitor how? why?Did something happen? What? Error messages?

What do you think you'll learn? What is good, what is bad?

>> Due to locks, is there any delay in response times of Online transactions.

How do they know is it locks? How do they know which file to look at?

Could delays not be caused by general high CPU or IO activity during batch activity?

Are they using RMS Record lock wait (RAB$V_WAT) as they should or doing some stupid re-try algoritme with some delay factor? If there is a 1 or 2 second retry delay coded in, well then there will obviously be a slow down mostly for no good reason (batch jobs typically hold locks for milliseconds, not whole seconds).

Is the batch job running on the same node as online?

If it writes to the file then it should! otherwise it will thrash the XFC

ANALYZE/SYS ... SET PROC <slow>... SHOW PROC/LOCK can be _wonderful_ to spot a lock being waited for.

However... more often than not application just 'poll'. They try to get a lock and try after a delay on failure.

In those situations the system does not know a process really was waiting for a lock, and dead-lock detection will never be activated.

>> We have collected the stats -using the below command,

$ MONITOR RMS/FILE=FILE_NAME/ITEM=ALL -/RECORD=FILE1.DAT/NODISP

Active Streams: 1 CUR AVE MIN MAX

New ENQ Rate 1839.07 251.68 0.00 2899.46

DEQ Rate 1839.07 251.67 0.00 2899.46

Converted ENQ Rate 3985.05 526.56 0.00 6921.88

>> We are not sure how to interpret the above stats.

As presented, it is rather useless.

In fact it suggests NO issue because with 1 active stream there is no lock contention right?

May we assume that the file is opened shared? With GLOBAL buffers of course?

>> From the above stats, we could understand that some lock happens in the file, but not more than that.

Correct. So now you need to know what would trigger a lock and what is normal, or not.

So let me give you a hint...

When a batch job processes a file, updating record by record, primary SEQUENTIAL access, here is what happens.

$GET: grab BUCKET LOCK (enq for first time, cvt for next time to same bucket),

look for record (1), grab RECORD LOCK (enq), release bucket lock (CVT),

process...

$UPDATE: (re)grab BUCKET LOCK (cvt), physically update record with IO, release RECORD LOCK (deq), release bucket lock (CVT),

(1) ... if no more record found in bucket, then DEQ current bucket lock, and ENQ next bucket lock.

Therefor... for every record processed one expects 2 lock converts (bucket), 1 ENQ and 1 DEQ + 1/N ENQ + 1/N ENQ where N is the number of records per bucket. This multiplies for every alternate key which is actually changed.

With (ALTERNATE) keyed access you'll get many more converts in all likelyhood.

So it seems to me that this is in the ballpark and therefor you are processing roughly 1700 records/sec 'current' and 250 record/second average.

You want to display /ITEM=(OPER,LOCK) to really see if the lock rates are reasonable.

250/second longer term average is on the low end, unless each update requiers a write-IO.

Personally I don't think MONI RMS is workable and I created my own alternative.

Here is sample output from that tool for 220 get+updates on a simple 3/records per bucket indexed file:

$ rms_stats -i -o:rl -r:1000 -n:1000 10k.idx
:
-- 17-MAY-2013 09:53:21.39 --
Locks:             Enqueue     Dequeue     Convert         Wait   Block-ast
-----------------+----------+-----------+-----------+------------+----------
 Shared file:             1           1           0            0           0
 Local buffer:           76          77         516            0           0
:
 Data record:           220         220           0            0
:
Get  seq:     220  key:          0  rfa:          0  bytes:      66000
Deletes:        0                   Updates     220  bytes:      66000

In the above, output lines with counters at 0 (global buffer locks for example) were deleted.

The C source for RMS_STATS attached.

Hope this helps some,

Hein van den Heuvel ( at gmail )

Attunity.

JJABEZ10 · ‎05-17-2013

Hi Hein van den Heuvel ,

I don't see the attachment Please attach again...

JJABEZ10 · ‎05-17-2013

Too many questions. I'm trying to answer all the questions raised in the above replies... Please give me some time.

JJABEZ10 · ‎05-17-2013

We are using - VMS V8.3-1H1, the programs are written in Basic or COBOL, files used are Indexed / Sequential files. There are 2 web based applications where the Customer Service representatives will use these web based application for Inquiry / update transactions which inturn calls VMS backend for data.

Project Req as below,

In current production scenario - the onlines or web based applications used by the Customer Service representative will be brought down whenever the Nightly batch cycle jobs were run.

Now we have a requirement to bring the Web based applications up & running for 23 hours a day and 6 days a week. It has been agreed that a SLA of 30 Seconds is OK as we told we will be implementing a retry logic in our code (.BAS / .COB) to retry for 30 seconds in case of any record level lock during the updates.

I need to take a look at the option - RMS Record lock wait (RAB$V_WAT).

We have identified 38 Master files that will be used by these Web based applications and wrote the retry logic in all the programs (jobs) which uses these files & runs during Nightly batch cycle.

These changes are not yet implemented in Production and we are testing the potential impact due to these changes.

We noticed, whenever we executed the Online transactions (web appl) in parallel with Nightly batch cycle there are delay in response times of the Online transactions. We used Lodrunner to simulate the Production like load.

Now we need to conclude, Whether the delay in online screens is because of the locks that happen in file or at record level or is it because of any other environmental factor.

We are collecting the CPU / IO / Memory utilization during the NB cycle and could notice that the CPU is full during the run. Not sure whether this could be the reason for delay.

On the lock analysis, When googled, I could find the following Monitor command,

$ MONITOR RMS/FILE=FILE_NAME/ITEM=ALL -/RECORD=FILE1.DAT/NODISP

While executing we could see that there are locks but really could not interpret with the stats available. Whether the lock is a file level lock or record level lock.

We could see more Active Streams during the run, an example as below,

Active Streams: 30                    CUR        AVE        MIN        MAX

    New ENQ Rate                      0.00     330.44       0.00   14194.52

    DEQ Rate                          0.00     277.87       0.00   12993.56

    Converted ENQ Rate                0.00       0.54       0.00      21.32

In "Monitor Lock" command, it gives the lock information at the system level I hope. But what we need is we need to collect the lock at File / Record level.

Please let us know is there anyway we can achieve the above requirement.

Hope I've answered all your questions, if not please let me know.

Thanks,

Regards,
John Jabez

Hein van den Heuvel · ‎05-17-2013

>> We are using - VMS V8.3-1H1, the programs are written in Basic or COBOL, files used are Indexed / Sequential files.

Good.

>> In current production scenario - the onlines or web based applications used by the Customer Service representative will be brought down whenever the Nightly batch cycle jobs were run.

Ugly, but sometimes needed to garantuee consistent data as RMS does not really give you read-consistent transactions.

>> Now we have a requirement to bring the Web based applications up & running for 23 hours a day and 6 days a week.

You are getting it easy! Lots of quiet time.

>> It has been agreed that a SLA of 30 Seconds is OK as we told we will be implementing a retry logic in our code (.BAS / .COB) to retry for 30 seconds in case of any record level lock during the updates.

Retry logic sucks. It increases delays needlesly, and it is inherently unfair.

Also, you need to take actions on both sides... batch and online.

>> I need to take a look at the option - RMS Record lock wait (RAB$V_WAT).

You you should.

>> We have identified 38 Master files that will be used by these Web based applications and wrote the retry logic in all the programs (jobs) which uses these files & runs during Nightly batch cycle.

Ready... Fire... Aim.

>> These changes are not yet implemented in Production and we are testing the potential impact due to these changes.

Great.

>> We noticed, whenever we executed the Online transactions (web appl) in parallel with Nightly batch cycle there are delay in response times of the Online transactions.

Some delays are to be expected. Are they reasonable and predictable with the SLA?

>> We used Lodrunner to simulate the Production like load.

Excellent!

>> Now we need to conclude, Whether the delay in online screens is because of the locks that happen in file or at record level or is it because of any other environmental factor.

Yes you should.

>> We are collecting the CPU / IO / Memory utilization during the NB cycle and could

>> notice that the CPU is full during the run. Not sure whether this could be the reason for delay.

well yeah duh!

>> While executing we could see that there are locks but really could not interpret with the stats available. Whether the lock is a file level lock or record level lock.

A file level lock would prevent file access. Unlikely.

The biggest lock activity is often BUCKET LOCK, which the application can NOT directly control. They just happen.

Best monitor for those are ANALYZE/SYSTEM and the RMS and/or LCK extentions in that.

>> We could see more Active Streams during the run, an example as below,

>> Active Streams: 30 CUR AVE MIN MAX

That makes more sense..

>> New ENQ Rate 0.00 330.44 0.00 14194.52
>> Converted ENQ Rate 0.00 0.54 0.00 21.32

That's and odd picture which could possibly be explained by very random file access were

Regardless... you indicate 38 files.. which ones matter most?

>> But what we need is we need to collect the lock at File / Record level.

I doubt it because you would have no clue as to what the numbers mean.

Is more better or worse? I can argue either way but tipically MORE is better, unless it reflect avoidable, overhead, actions

>> Please let us know is there anyway we can achieve the above requirement.

Are the 'hot files' know and identified'?

Was the basic file tuning checked?

Are global buffer in use as they should be?

Get a Consultant or 3 months learning, experiments and experience, whichever is cheaper/faster for you.

>> I don't see the attachment Please attach again...

Hmmm, dunno. will try again 'hidden' as .txt file.

Good luck!

Hein

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

File Lock / Record Lock Interpretation

File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation

Re: File Lock / Record Lock Interpretation