Re: Using LIB$BBSSI and BBCCI for locking

John McL · ‎05-05-2009

Hoff, John

I have to say that I like these forums for the input from multiple sources and the speed of response but they're not great from a "business security" aspect and I don't want to go into too many specifics about the applications.

This consideration of locking is just one of several avenues that I'm looking at regards performance. We'll go from Alpha to IA64 one day but going there with highly efficient code makes good sense.

Hoff, the application and its data files are just too big to load into memory. We did consider installing images /RESIDENT but that comes with its own set of problems (e.g. replacing them, memory fragmentation).

You both talk of ensuring that a resource name is recognised as unique to a node when we want the locking to really only apply locally. Do I understand you to mean that all the activity for the lock for the resource in question is seen by Lock Manager to be only on one node and therefore that the sending of data between nodes for this lock is very minor?

This could be something worth checking and if code modifications are required they should be minor.

Hein van den Heuvel · ‎05-05-2009

John,

It sounds like the lock activity you intend to target is just a fraction of the real lock volume.

50,000 locks per seconds is piddly... for a GS1280 with Ghz CPU's. A 300 Mhz 4-CPU ES40 on the otherhand might be stressed by it.

What is the box? How many CPUs/How fast?
Are you using a dedicated lock manager already?

What OpenVMS version?

Did you check with ANALYZE/SYSTEM and SYS$EXAMPLE:SPL.COM where MPSYNC time might be burned for real rather than speculating?

What are those batchjobs doing? RMS? Indexed files or scanning though sequential files? Oracle? Do they compete for shared resources?

IMHO You need to dive a lot deeper into figuring out what the problem is befor contemplating on a solution.
Unless you know a lot more than you are letting on so far, this LIB$BBSSI stuff sounds like a fun but random approach. So far it sounds much like a suggestion to 2 Platinum tipped sparkplugs in a car to make it run faster... without knowing how many cylinders there are. But hey, Platinum is some mighty fine raw material. That oughta help!

Grins,
Hein van den Heuvel ( at gmail dot com )
HvdH Performance Consulting

John McL · ‎05-05-2009

Thanks Hein, I'll come back to your comments later. For now though, a question or two to John and Hoff.

What determines whether a lock is seen as node-specific or cluster-wide? The flag LCK$_SYSTEM looks more like a SOGW type of thing.

If I have the same lock name being used in code on all nodes will the lock name be regarded as cluster-wide even though my application in each case is really node-specific?

John Gillings · ‎05-05-2009

John,

>What determines whether a lock is seen as
>node-specific or cluster-wide?

Make sure you understand the difference between the LOCK and the RESOURCE. It all comes down to naming and usage.

Resources are inherently, and always cluster wide. When you request a lock against a resource, the first step is to find which node is mastering the resource. That's a directory lookup. If it's not local, or new, a request always goes out to the cluster. If it's a new resource a master is decided, usually the local node. A locally mastered resource gets a local lock. A remote resource gets a local lock, and a "proxy" lock on the remote node.

If all the interest in a particular resource is on a single node, then all the locking activity will be local.

>If I have the same lock name being used in
>code on all nodes will the lock name be
>regarded as cluster-wide even though my
>application in each case is really node-
>specific?

Yes, but it's a RESOURCE name. Locks don't have names.

I think Hoff was suggesting you add the local node name to the resource name to make sure you don't have multiple nodes declaring the same name for what are actually different resources (ie: local global sections). Having an apparently single resource would "work", but you'd be unnecessarily single threading access to all global sections across all nodes in the cluster.

If you're concerned about inter cluster lock traffic, simply starting all interested applications on one node is a good start to elimination.

Worst case is establishing the resource on one node, then doing all the accesses from another. Usually lock tree migration will move locks to the node with the most activity, but that can cause trouble if the activity moves around (less of an issue in the latest versions of OpenVMS, there's been a lot of work to make sure migrations behave well).

A crucible of informative mistakes

Jonathan Cronin · ‎05-06-2009

With regard to Hoff's suggestion to include the host in the resource name, you should keep in mind that you can also use parent locks to create "namespaces" to avoid collisions on resource names.

Volker Halle · ‎05-06-2009

John,

regarding lock activity and resource trees, the LCK SDA extension may provide some useful insight:

$ ANAL/SYS
SDA> LCK SHOW ACTIVE

Volker.

Hoff · ‎05-06-2009

Here's some reading on locking and lock trees and lock resource names and lock states:

http://labs.hoffmanlabs.com/node/492

And as for contact with a support organization as a sounding board, there are still folks around that know this sort of stuff and that are in the business of fielding these sorts of calls. HP very likely still offers this service, as do other entities.

Jon Pinkley · ‎05-06-2009

John McL,

This question is somewhat like asking if it is better to use RMS indexed files or SYS$IO_PERFORM. It depends.

Now to your specific questions:

(a) What's the overheads?

The overhead of a call to SYS$ENQ will under almost all circumstances be higher than a call to either LIB$BBSSI or LIB$BBCCI (assuming the memory being modified is isolated from other use). But comparing the two is like comparing a routine to update an account balance with an add instruction. The point being that SYS$ENQ does much more than LIB$BBSSI. LIB$BBxxI is a very low level operation compared to what SYS$ENQ does.

(b) What's the performance gain over normal Locks (quantified, not subjective e.g. about 100 x faster)?

In my opinion, this an unanswerable question, since LIB$BBxxI isn't a locking solution by itself, and we have no idea how you intend to implement your locking protocol. In fact you have told us very little about the real requirements. Specifically, what do you plan to do if the "acquisition of the lock bit" was not successful? Yes, you can just keep trying, but that doesn't scale when there is high contention. And if you want any semblance of FIFO, you will need to implement some sort of queuing. Now your "simple" locking starts becoming not so simple.

(c) Any gotcha's I should be aware of?

Others have covered this. The biggest gotcha is that implementing a locking protocol that will work under varying conditions appears to be easy until you do it. Then you will start to rediscover all the potential problems.

>>>-------------------------------------------------------------------
There are instances where the locking is trivial - e.g. to assign space in a table in a global section (obviously on one node) - so I'm investigating whether situations like this would be better as bitlocks rather than bouncing lock information around the cluster.
...
The trivial instance that I mentioned involves looking through a table of 1000 entries. Since space is assigned once to each process the potential for contention is minor until the system is heavily loaded, but Lock Manager always sends its information around the cluster.
<<<-------------------------------------------------------------------

Are you really worried about optimizing something that is happening once per process? The cost of the initial "expensive" $ENQ that specifies the resource name is trivial compared to the cost of creating a process.

If space is assigned once to each process, and it is a fixed size, then why not just forget locking and use the process index as an index into the global section. You are guaranteed that two processes won't have the same process index, and you could write the IPID into the process specific portion so you could detect if the process that wrote the stuff in the entry was still around.

I have to agree with John Gillings, Hoff, Hein and others. Make sure that locking is really the problem before deciding that getting rid of $ENQ will solve your problems.

If you have a copy of the VAX/VMS 5.2 IDSM, perhaps you have a copy of "VAXCluster Principles" by Roy G. Davis. If so, read chapter 6 on the Distributed Lock Manager. That is one of the best descriptions of the Lock Manager I am aware of. It's a bit dated, but the principles of the lock manager are still basically the same as described there or in the 5.2 IDSM. There have been enhancements to functionality and optimizations in cluster messaging, lock remastering, etc. with newer versions, but the basic steps are the same.

Jon

it depends

John McL · ‎05-06-2009

(If this appears multiple times please don't blame me. Each time I hit "submit" the webpage went blank. I've waited a while between re-posts but perhaps they are queueing up somewhere.)

What started out as an idea to move some trivial (short-term) locking to bit-locks has grown into something else.

I've checked the lock in question and for historical reasons it was clusterwide, but never got changed when there was no longer a need to be. I was correct in believing that the lock was causing inter-node traffic but I hadn't considered that processes on other machines might be getting blocked because their own local copies of the data structure in question were getting locked when they didn't need to be.

We'll now modify it to refer to a node-specific resource name and should see less inter-node traffic, lower MPSYNCH and processes on other nodes continuing when they would otherwise be blocked.

Just in case anyone is still considering bit-locking, what happens if the locking process aborts? How do you clear the lock? The Lock Manager will sort that out for you.

Hein and Jon, I don't think there's much point in responding to your questions (but don't worry, I'll still award points).

Bearing in mind that these forums should be a resource for anyone seeking solutions I'll keep this open for about 24 hours in case anyone wants to add anything to it.

Volker Halle · ‎05-06-2009

John,

Just in case anyone is still considering bit-locking, what happens if the locking process aborts? How do you clear the lock?

Just an analogy: if you consider OpenVMS SPINLOCKS, the system will crash with CPUSPINWAIT, if some component locks a spinlock and goes away without unlocking it.

So you also need to design timeouts and error handling into this mechanism.

Volker.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Using LIB$BBSSI and BBCCI for locking