1833378 Members
3681 Online
110052 Solutions
New Discussion

Semaphore question

 
Craig A. Sharp
Super Advisor

Semaphore question

Hi all,

We are having performance issues on our HPUX 11iV1 system. It is a 32-way dome with plenty of memory.
We are seeing unexplainable load spikes and we are working with the software vendor to try to find the cause.
We have been told that we should be monitoring the semaphore utilization. Using MWA, we have collected the following data on Semaphore Wait% and Time. We have determined that the wait % does not correlate to the wait time. The high values that we are seeing are happening during extended periods of very high load.
I am looking for an explanation of what I am seeing with these high semaphore time spikes.
Our semmni and semmns are both set at 3072.

The data is consolidated from a long time period.

Thanks,

Craig

Here is the data:

Sem ; Sem ;
Wait %; WaitTm ;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
10.31; 33856.691;
1.16; 3771.003;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.13; 35.777;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.56; 142.137;
0.00; 0.000;
0.00; 0.000;
7.47; 2030.291;
3.03; 0.120;
1.48; 310.002;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
6.98; 1752.705;
1.56; 346.436;
0.00; 0.000;
0.00; 0.000;
0.00; 0.000;
0.36; 91.137;
0.00; 0.000;
0.00; 0.000;
44.11; 10705.630;
14.59; 3770.927;
0.00; 0.000;
0.00; 0.000;
4 REPLIES 4
TwoProc
Honored Contributor

Re: Semaphore question

Craig, a semaphore can be thought of a handle to a lock. So, a lock is being thrown while you wait for some update operation to take place, and you've got process(es) waiting for their turn at the locked resource (in a file).

If you look in the database and do a latch analysis, you'll see a similar problem (latch waits).

It's pretty simple, you've got busy areas in your database with a lot of contention for that resource. You'll need to find it and identify it.

I'd start with a statspack analysis from Oracle, and begin looking for the top code being called, on two fronts - total time running, and total number of executions. Your problem code is probably somewhere in there, and it probably needs tuning.

If you find it's tuned (it's probably not... just programs in general), then you've got to start looking at how the data is being striped across the data areas. You should look into making sure the data involved in the above criteria (run time and execution time) is striped across a good many database files. Also all tables and (hopefully) indexes involved should have their "inittrans" levels set high enough to keep the latch contention levels low. If it's a busy table, a rough guess would be to start at 12 (vs default of 2) and work your way up from there.

Anyway, you've got database tuning to do.
We are the people our parents warned us about --Jimmy Buffett
TwoProc
Honored Contributor

Re: Semaphore question

Oh, I (for some reason) just assumed that you're using Oracle for some of what I've told you. But, if you're using another database (Sybase, Informix, MySql) - your problem is still basically the same.
We are the people our parents warned us about --Jimmy Buffett
A. Clay Stephenson
Acclaimed Contributor

Re: Semaphore question

It's difficult to make much of these statistics without putting them into context (ie, the design of a given application). If one process (or thread) is blocked on a semaphore it generally means that it is waiting for some action to be completed or some result to be computed by another process (or thread) before it can proceed.
In short, blocking on a sema4 can be a perfectly normal thing and no amount of changing kernel tunables will help in that case. Depending upon the application design, you might see Process A blocking on a sema4 but what it is really doing is waiting for Process B to complete some task (possibly i/o bound) and set the semaphore to tell Process A to proceed.





If it ain't broke, I can fix that.
Howard L. Curtis
New Member

Re: Semaphore question

Any thoughts as to why the 'wait %' and the 'wait time' each do not correlate with the interval time as seen by MWA ? We are seeing the same manifestation for a number of processes running under 11.11. As an example, MWA tells me that INTERVAL ( measured in seconds) for PROCESS A is 60. The PROC_THREAD_COUNT is 1, PROC_SEM_WAIT_PCT is 0.09 ( is this actually the ratio, or .09 of 1% ? ), and PROC_SEM_WAIT_TIME is 302.897 seconds