- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Using LIB$BBSSI and BBCCI for locking
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-04-2009 07:25 PM
тАО05-04-2009 07:25 PM
(a) What's the overheads?
(b) What's the performance gain over normal Locks (quantified, not subjective e.g. about 100 x faster)?
(c) Any gotcha's I should be aware of?
We run several hundred images and have a large number of users, so contention can be a real issue.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-04-2009 08:41 PM
тАО05-04-2009 08:41 PM
SolutionI assume you're talking about implementing your own spin locks?
LIB$BBSSI works the same on all architectures, but the underlying instructions are different.
Performance gains or losses depend on contention. If you have very little contention, and short critical regions, that is most requests for the lock are granted immediately, and locks are only held for a short time, then the performance of spinlocks can be exceptionally fast. But, if you have high levels of contention, and/or long critical regions, then performance can be terrible, with waiting processes burning CPU. As more processes join the mix, the worse things get.
It's impossible to give you a simple number as it's entirely dependent on load and contention.
Gotchas? There are lots and lots of them. Just looking at one - priority equalisation. Spinlocks don't work well with processes at different priorities. On uniprocessors this is fatal, as a higher priority requesting process will starve out a lower priority process holding the lock. Deadlock and dead system. On multiprocessors this isn't necessarily fatal, unless you get a low priority process holding the lock and N higher priority processes requesting it (where N is the number of available processors), but you can still end up with strange behaviour.
To work around this you should equalise priorities while requesting the lock, so there's your first overhead. Two calls to $SETPRI for each lock request. If you decide to skip this, assuming all your processes will always be at the same priority, you'd better make sure it's clearly and LOUDLY documented, or someone somewhere down the track will start a batch job at priority 3 and everything will break!
NUMA can also do odd things, causing asymmetries.
Typically synchronisation mechanisms are layered, so that the lowest level, "busy wait" mechanisms are used only for very short duration locking of data structures that implement higher level mechanisms like semaphores or VMS style locks. You can use this principle to build your own mechanisms, but you'll soon discover you're just reimplementing the lock manager.
This is non-trivial stuff... The worst issue is you won't necessarily know if there's some tiny timing window waiting to catch you at the least opportune time.
I'd want to see some very strong evidence that there is a real problem with the lock manager before delving into lower level synchronisation mechanisms. What do you think can be improved?
If you really want to do this, I'd recommend building a layer of code that implements an "ideal" locking API for your application, without revealing the underlying mechanism.
Implement it first using the lock manager, for simplicity and robustness. Once that's working, implement a version using spin locks and see if you get any measurable improvement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 04:35 AM
тАО05-05-2009 04:35 AM
Re: Using LIB$BBSSI and BBCCI for locking
I'm with John G. here.
Though there are no details on the complexity of the application, this switch-over is usually and likely a large project in an application with several hundred images and with masses of active users, and you'll need to ensure you're removing the right roadblocks.
Some idea that locks are the limiting issue here would be a requirement, and a look at moving toward sharding or toward finer granularity of the locks would (also) be in order; at re-architecting the locks required and the lock sequences. In particular, make sure you don't actually have a critical-path problem here; a case where you have critical code sequences. (qv: Amdahls' Law, et al. http://labs.hoffmanlabs.com/node/900 and http://labs.hoffmanlabs.com/node/638 among others.)
In addition to the priority inversion deadlocks John G mentioned, there are more direct deadlocks that can (also) arise here and you'll want or need to code deadlock scans for those. (The lock manager does these scans for you.)
Various applications use sequences of lock acquisition and conversions and releases, and use locks as notification "doorbells" in various designs; features that aren't available with bitlocks. You'll need to find any of those, and figure out how to implement the notifications.
BBCCI and BBSSI also involve the memory controller on some of the boxes; you'll need to be careful around the controller granularity and the bitlock memory locations here as you can end up with subtle lock contention. (With OpenVMS on Alpha or Itanium, you are hitting far fewer instructions than with the lock management calls, though you're getting a memory barrier or a memory fence; these calls are lighter-weight.)
This migration to bitlocks also means you're single-host from here on out, or that you are now rolling your own distributed synchronization. Or both.
For an application structural change this fundamental, I'd likely want to look at the whole design of the application, and instrument the current environment. (This work is a sizable chunk of a full platform port, in practical terms.) If a locking rewrite is on the table, then the whole design of the application is (also) on the table. And I'd also look at where I wanted to end up longer-term, whether that's an application locking layer, or a redesign of how the data rolls and roils through the application environment.
The abstraction layering John G. mentions is a classic OpenVMS application design. That's entirely reasonable here and I might well look to go further here, given the scale of the changes that are involved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 04:56 AM
тАО05-05-2009 04:56 AM
Re: Using LIB$BBSSI and BBCCI for locking
I have to agree with John and Hoff: Be careful. There is much potential for a large cost with debatable gain.
The first questions that I would ask are:
- Is a lot of time being spent processing locks?
- Is there a lot of contention?
If the delays are being caused by contention, then the gains by changing mechanisms are limited. The solution to contention is not to change the locking mechanism, but to take a careful look at what is protected by what lock and break that into different locks. This was seen in the changes in recent TCP/IP Services releases relating to IOLOCK8. At the user level, the concept is the same.
Performance monitoring using T4 or similar tools to gather statistics is paramount as a first step. If the performance monitoring shows that locking is an issue, the sequence of steps is:
- Tune Lock Manager performance
- Consider the use of Dedicated Lock Manager (a CPU in a multiprocessor dedicated to running the Lock Manager).
- Review the relationships between Lock Manager resources and whether this is creating contention needlessly
- Only then consider whether one should use low level spin-lock mechanisms
The above sequence also corresponds to an approximation of the cost and risks associated with each set of measures. Tuning is lowest risk and least expensive, a full re-structuring of the code and debugging of spin lock mechanisms can be expensive and very demanding.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 03:07 PM
тАО05-05-2009 03:07 PM
Re: Using LIB$BBSSI and BBCCI for locking
Myabe if I tell you a little more about the situation you'll better understand where I'm coming from and why I'm considering bitlocks to replace some, but certainly not all, of our locking.
We have a 4-node Alpha cluster that during evening processing of batch jobs has its CPU's all running at about 100% for about 6 hours while processing batch jobs. The ENQ/DEQ rate for much of this time seems to average about 50,000 per second and MPSYNCH on one processor (or maybe one node, can't recall right now) is around 90%.
There are instances where the locking is trivial - e.g. to assign space in a table in a global section (obviously on one node) - so I'm investigating whether situations like this would be better as bitlocks rather than bouncing lock information around the cluster.
One issue not mentioned yet is the release of locks should processes die but that's something we can handle through our process monitoring tools that identifies dead processes and releases resources.
Yes, John G, I was plannning on having this in functions that other code calls rather than scattered across and/or duplicated in several images. That's the only sensible way to do it both for maintenance and tweaking internal monitoring code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 03:12 PM
тАО05-05-2009 03:12 PM
Re: Using LIB$BBSSI and BBCCI for locking
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 03:28 PM
тАО05-05-2009 03:28 PM
Re: Using LIB$BBSSI and BBCCI for locking
If you're already getting high levels of MPSYNCH using locks, chances are it would only get WORSE with spin locks covering wider critical regions. Why? At the moment, a process that is waiting for a lock request is not consuming CPU. The MPSYNCH you see is the time spend spinning on OpenVMS spinlocks, waiting to access the lock structures. Spinning for the whole duration of the lock request will be much worse.
Look at the granularity of the locks, and try to subdivide the objects of contention. Reduce MPSYNCH and increase parallelism.
Another thing to consider... if these are all batch jobs, what would happen if you ran them sequentially? That might eliminate the contention altogether. It's entirely possible you will complete the sequence faster than running them all in parallel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 03:55 PM
тАО05-05-2009 03:55 PM
Re: Using LIB$BBSSI and BBCCI for locking
Only knowingly replicate "dumb"; don't blindly do so.
And don't blindly replicate an older application design.
There have been cases I've worked were it was far faster to load the whole data store into memory and run with it; disks and files are a convenience for restricted virtual and physical memory, after all. Stuff was designed prior to 64-bit addressing, and when a couple of gigabytes was Big Physical Memory.
Ensure you've properly segmented your cluster and your host-local locks, too. If your global sections are host-local, then embed the host name or such into the lock resource name. Keep the locks and lock trees local.
I'd look to spend time increasing the scope of what is locked or reducing the critical path code (once that is known); looking to tweak the current model. Before I started a locking rewrite.
Then look to get rid of the allocation of space if you can. Or reduce the number of times the application needs to go after it. This could be using sharded or cached allocation of storage, or going to interlocked queues and lookaside lists of allocated or deallocated, or going after bigger hunks.
There are tools around beyond PCA, such as the LCK extension in SDA, and DECamds/AvailMan that can be useful, too.
And do the due diligence involved with tuning; look for overloaded disk spindles and such.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 03:56 PM
тАО05-05-2009 03:56 PM
Re: Using LIB$BBSSI and BBCCI for locking
The trivial instance that I mentioned involves looking through a table of 1000 entries. Since space is assigned once to each process the potential for contention is minor until the system is heavily loaded, but Lock Manager always sends its information around the cluster.
Modifying the granularity sounds useful but there could be significant work in modifying the code and in testing. If we merely split something into smaller portions it might be necessary to lock and access multiple portions until the desired item is found.
I'm already considering our options for reducing that batch processing load and trying to identify the costs and benefits of each.
One point you've not commented on is whether 50,000 ENQ/DNQ operations per second is high, normal or low when running a whole heap of batch jobs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО05-05-2009 04:09 PM
тАО05-05-2009 04:09 PM
Re: Using LIB$BBSSI and BBCCI for locking
>high, normal or low when running a whole
>heap of batch jobs.
On a VAX it might be an issue, but on an Itanium it's no big deal, especially if the locking activity is local. See MONITOR DLOCK.
On the other hand, are they really ENQ/DEQ? If a single process deals with the same resource multiple times, you might consider an ENQ NL at the start, then use lock conversions to synchronize. DEQ when you've completely finished with the resource.
>but Lock Manager always sends its
>information around the cluster.
Not true. You need to check the resource name cluster wide, but once you have a lock on a locally mastered resource, there is no further external activity.
Keeping all the interested processes on a single node should keep the resource local. If the resource is a global section, then that should already be true.