1846643 Members
2417 Online
110256 Solutions
New Discussion

Latency problem

 
SOLVED
Go to solution
Peter Hug
Advisor

Latency problem

The client using of my multi threaded app on an HP-UX 11.11 B with two CPU's is experiencing latency issues.

I can't quite understand why this would be as the app runs with a lower priority than most other apps (using renice -n 10).

I would have thought that this would ensure that if any other process with a higher nice value was pending the OS would immediately switch from my app to the other app.

Memory utilisation is below 80% and I would have thought that therefore, potential delays caused by swapping memory to disk and back could be ruled out.

The only unknown factor to me is Oracle. My app heavily interacts with a particular Oracle instance which runs at a normal priority.

Still, the bottom line is that when my app is under heavy load, users experience noticeable performance degradation. The client suggested using tools like PRM or WLM, but I can't understand why these would be needed if the OS worked as expected.

One last thing, my app is always started using a shell script and piping commands. Is it possible that this could have an impact?

Many thanks for any help
Pete
11 REPLIES 11
Steven E. Protter
Exalted Contributor

Re: Latency problem

Hello Peter,

You may have a poorly written application that requires more processors or memory.

That is however speculation. You may wish to collect some performance data to determine the actual issue.

Here is a script:

http://www.hpux.ws/system.perf.sh

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
A. Clay Stephenson
Acclaimed Contributor

Re: Latency problem

I would have thought that this would ensure that if any other process with a higher nice value was pending the OS would immediately switch from my app to the other app.

Think again, the nice value is only one component that the scheduler uses when comparing processes in the same scheduler class (e.g. Time-Sharing as opposed to Real-Time). Nice should really be thought of as a hint.

In general, when processes are not running, the scheduler raises their priorities and while they are running, their priorities are decreased. During execution the priority decreases linearly; during waiting the priority increases exponentially - most rapidly when the CPU load is low and least rapidly when the CPU load is high.

When a process other than the current process reaches a higher priority, the scheduler suspends the current process and starts running the process with higher process.

It's time to profile your code and determine what it is doing. The idea that you needed to (not) nice your code was a big hint that you probably should have thought of a better way.
If it ain't broke, I can fix that.
Peter Hug
Advisor

Re: Latency problem

Thanks for your hints so far guys. Your suggestions made me think that you might need a bit of an idea about the nature of my application to actually understand the problem.

I'm using the boost thread library for threads. My app runs as a daemon and has no UI. Think of my application as consisting of multiple concurrent threads, each of which would calculate the next move in a chess game and then wait for the opponents move.

The latency issue I was talking about was not between threads of my app, it was that when my app was running under heavy load, users of other applications on the same system would notice significant performance degradation.

So what I really need to know is and answer to this question: "How can I ensure that the threads of my process run with a priority lower than normal and behave in a way like a 'system idle' process does under Windows?".
A. Clay Stephenson
Acclaimed Contributor

Re: Latency problem

Okay. Note that you said that you renice'ed this process with -10; that had exactly the opposite effect of what you were looking for. I would modify your code so that very early in main you do a nice(myniceval) and make is positive (10 or so). You could add a command line option or an environemnt variable that would allow you to set it. One of the typical things that I do with daemon is a nice() very early on. The good news is that nice() affects all the threads in this process.
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: Latency problem

Oh, and I don't know or care how the system idle process works under Windows. Silly me, I would have thought a process named "idle" would have simply consumed a few more of the otherwise unused system resources.
If it ain't broke, I can fix that.
Peter Hug
Advisor

Re: Latency problem

I said I changed the nice value with the command:

renice -n 10 -p [pid]

Where [pid] was the process ID of my daemon. When I look at the process using the command:

ps -elf

The nice value is shown as 30, i.e. exactly as intended.

So obviously, setting the nice value is not the correct means to ensure the process only runs when the system is otherwise idle.

What is the correct way?
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Latency problem

Sorry, I should have read you initial posting more carefully as I obviously mistranslated your "renice -n 10" to "renice -n -10". In any event, I would use the nice() system call or better yet the setpriority() system call. You can use the getpriority() system call to check the before and after values. Don't be afraid to use values > 10.

One of the very first things to check because it can cause all sorts of scheduling anomalies is the timeslice tunable. If it has been set to a 1 then that could cause all sorts of strange behavior. It should be left very near 10.
It would also help to know if your threads are doing intense computation and how loaded the machine is in terms of CPU use.
One trick that you might consider is sprinkling some nanosleep()'s in your code and I'm thinking of a separate daemon that might look for heavy system usage and signal your process to make nanosleep calls and then ramp them back down as the load drops.

If it ain't broke, I can fix that.
Peter Hug
Advisor

Re: Latency problem

I checked the timeslice tunable parameter and found that it is set at 10.

Yes, each thread does intensive computing (which is why I compared it to a chess program). CPU utilisation varies but if my app is under heavy load on a test system where nothing else runs CPU utilisation (using glance) goes to close to 100%.

I like your nanosleep suggestion.

I don't understand though why I need another daemon. I really have no idea how I can work out if the system is busy or idle. But if I can determine that I suppose I could write another thread which periodically checks system load and if this is to high sets some shared variable which - if turned on - would seed nanosleep() calls into the worker threads.

What do you think?
A. Clay Stephenson
Acclaimed Contributor

Re: Latency problem

You may be forced to use more granular controls. Do a man pthread_attr_setschedpolicy (which will actually describe many functions).
If it ain't broke, I can fix that.
David Gourley
Occasional Contributor

Re: Latency problem

Oracle recommend that Oracle processes should run with SCHED_NOAGE priority see:

http://www.dbis.informatik.uni-goettingen.de/Teaching/oracle-doc/admin-guide/appb_hp.htm#i636964

Actually what we found was that to avoid obscure scheduling problems, we ended up having to run *all* processes on the server at this priority (as otherwise Oracle processes ran at a lower priority than application processes which can cause starvation of the Oracle processes).
Peter Hug
Advisor

Re: Latency problem

I can easily work out how to suspend/resume my worker threads from another thread that monitors system load.

What I don't know how to do is how to check system load in a way that is extremely responsive and non-intrusive. Any hints?