HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- can't explain a sudden high load average
Operating System - HP-UX
1827584
Members
2815
Online
109965
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-30-2002 01:43 PM
07-30-2002 01:43 PM
can't explain a sudden high load average
typically my server runs at a load of 1. but at some point the load goes up to 30 and 50 at times ...yes 50 while no users seem to notice/complain.
swap, disk, and cpu usage are low in glance while memory (4GB) is at about 90% usage. we are running Java apps. that talk over the TIBCO Rendevous bus (rvd).
i think the sudden jump in the load averages is when our "rvd" daemon has many open files (~1500 sockets) but i am only guessing.
what causes the load average to go up? i always thought is was directly related to how hard the CPUs were working but not sure if that really is correct. i cannot figure out how to correspond the high load average to anything. it may be due to the "rvd" process but do not know what to look for ....if the problem is a kernel parameter, lack of RAM (even though swapping is low at best), etc...!?!
FYI: TIBCO told us to increase maxfiles_lim from 2048 to 8096 (glance tells me that is not the problem), increase maxdsiz from 256MB to 3GB (glance too tells me that this is not the issue), and add pateches (have the ones they recommend)
swap, disk, and cpu usage are low in glance while memory (4GB) is at about 90% usage. we are running Java apps. that talk over the TIBCO Rendevous bus (rvd).
i think the sudden jump in the load averages is when our "rvd" daemon has many open files (~1500 sockets) but i am only guessing.
what causes the load average to go up? i always thought is was directly related to how hard the CPUs were working but not sure if that really is correct. i cannot figure out how to correspond the high load average to anything. it may be due to the "rvd" process but do not know what to look for ....if the problem is a kernel parameter, lack of RAM (even though swapping is low at best), etc...!?!
FYI: TIBCO told us to increase maxfiles_lim from 2048 to 8096 (glance tells me that is not the problem), increase maxdsiz from 256MB to 3GB (glance too tells me that this is not the issue), and add pateches (have the ones they recommend)
hola
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-30-2002 03:40 PM
07-30-2002 03:40 PM
Re: can't explain a sudden high load average
Perfectly normal--if that is the way your rvd process works. Let's start with load average: it does not mean what you think...it is simply the average runqueue depth. Now that needs an explanation...
runqueue is the table where all processes that are running or ready to run are placed. If 4 processes are ready to run and you have 4 processors, then the runqueue depth is 4 and each processor is working as hard as it can. But if 8 processes are ready to run, 4 of them will have to wait. Is that bad? Not at all, especially if the processes last for just a few milliseconds.
Network socket processes may be spawned by the hundreds within a scond or two but each process might last for just a fraction of a second. And herein is the problem with (very slow) human perception and statistics. These processes may be rescheduled several times per second based on socket activity and the computer is happily humming along because there is plenty of time for other tasks. The Load Average looks bad but as you've seen, it doesn't have much effect on the users. One other characteristic is that the system overhead will go up dramatically (still not bad though)
Now the above is one case (fairly common it turns out) where high runqueues (and high system overhead) are just a measure of very high throughput for short-lived daemons. On the other hand, if 100 users all decide to copy files at the same time, the runqueue will be high, system overhead will be fairly normal and everyone will complain about slow response times. In this case, the high Load Average indicates that indeed, way too many programs are consuming way too much disk and CPU power and this will affect all users.
Bill Hassell, sysadmin
runqueue is the table where all processes that are running or ready to run are placed. If 4 processes are ready to run and you have 4 processors, then the runqueue depth is 4 and each processor is working as hard as it can. But if 8 processes are ready to run, 4 of them will have to wait. Is that bad? Not at all, especially if the processes last for just a few milliseconds.
Network socket processes may be spawned by the hundreds within a scond or two but each process might last for just a fraction of a second. And herein is the problem with (very slow) human perception and statistics. These processes may be rescheduled several times per second based on socket activity and the computer is happily humming along because there is plenty of time for other tasks. The Load Average looks bad but as you've seen, it doesn't have much effect on the users. One other characteristic is that the system overhead will go up dramatically (still not bad though)
Now the above is one case (fairly common it turns out) where high runqueues (and high system overhead) are just a measure of very high throughput for short-lived daemons. On the other hand, if 100 users all decide to copy files at the same time, the runqueue will be high, system overhead will be fairly normal and everyone will complain about slow response times. In this case, the high Load Average indicates that indeed, way too many programs are consuming way too much disk and CPU power and this will affect all users.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-31-2002 09:28 AM
07-31-2002 09:28 AM
Re: can't explain a sudden high load average
Hi,
It is very likely that you have a situation with intensive context switching and many system calls. Check the output of vmstat and look into the columns 'sy' and 'cs'. Probably you have sy>10k and cs>1k. Typical level of these numbers for idle system is <400/sec.
In general the situation you have means your server is running with degraded performance (although it is difficult to say how much, since it depends on capabilities of your server). The reason is simple: every running process in the system is frequently interrupted creating additional overhead the system spends in kernel mode. The good sign is that CPU utilization is low meaning the system is easily handling the situation. However, the most impact should be on the responsiveness (I guess, HP kernel isn't preemptive).
Your assumption about Java application is right, I think. I have seen very similar behavior of buggy java stuff which created excessive 'sy' and 'cs' while CPU was below 10% but the system load factor was close to 10.
Have a nice day,
Vytas.
It is very likely that you have a situation with intensive context switching and many system calls. Check the output of vmstat and look into the columns 'sy' and 'cs'. Probably you have sy>10k and cs>1k. Typical level of these numbers for idle system is <400/sec.
In general the situation you have means your server is running with degraded performance (although it is difficult to say how much, since it depends on capabilities of your server). The reason is simple: every running process in the system is frequently interrupted creating additional overhead the system spends in kernel mode. The good sign is that CPU utilization is low meaning the system is easily handling the situation. However, the most impact should be on the responsiveness (I guess, HP kernel isn't preemptive).
Your assumption about Java application is right, I think. I have seen very similar behavior of buggy java stuff which created excessive 'sy' and 'cs' while CPU was below 10% but the system load factor was close to 10.
Have a nice day,
Vytas.
Vytautas Vysniauskas

Not applicable
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-31-2002 11:05 PM
07-31-2002 11:05 PM
Re: can't explain a sudden high load average
not neccessarily related- however probably worth considering:
a) Did you run HPjtune/HPjmeter/HPjconfig? (http://www.hp.com/go/java)
http://www.hp.com/products1/unix/java/java2/hpjtune/index.html
HPjtune is a tool that helps you analyze garbage collection performance by graphically displaying instrumentation data from the garbage collector.
HPjtune lets you view this data in the following ways:
Several predefined graphs which show the utilization of garbage collector resources and the impact of the garbage collector on application performance.
User-configurable graphs for access to selected garbage collection metrics.
Separate predefined graphs for garbage collection behavior pertaining to threads.
HPjtune also includes a unique feature which allows you to use collected data to predict the effect of new garbage collector parameters on future application runs.
HPjtune displays data for any SDK and RTE for Java release 1.2.2 and 1.3 for HP-UX 11.x (PA-RISC and Itanium), HotSpot VM.
(in case HPjtune would help you to nail that down pls post it here- didn't have the time to check this out so far)
b) Have a look at your installed patches...
http://www.hp.com/products1/unix/java/infolibrary/patches.html
(and this is a shot in the dark;-) if they include
11.0 PHKL_24064 eventport (/dev/poll) pseudo driver (and Deps.)
11.11 PHKL_25468 eventport (/dev/poll) pseudo driver (and Deps.)
in case that you're using select() a lot (man page is delivered along with these patches)
a) Did you run HPjtune/HPjmeter/HPjconfig? (http://www.hp.com/go/java)
http://www.hp.com/products1/unix/java/java2/hpjtune/index.html
HPjtune is a tool that helps you analyze garbage collection performance by graphically displaying instrumentation data from the garbage collector.
HPjtune lets you view this data in the following ways:
Several predefined graphs which show the utilization of garbage collector resources and the impact of the garbage collector on application performance.
User-configurable graphs for access to selected garbage collection metrics.
Separate predefined graphs for garbage collection behavior pertaining to threads.
HPjtune also includes a unique feature which allows you to use collected data to predict the effect of new garbage collector parameters on future application runs.
HPjtune displays data for any SDK and RTE for Java release 1.2.2 and 1.3 for HP-UX 11.x (PA-RISC and Itanium), HotSpot VM.
(in case HPjtune would help you to nail that down pls post it here- didn't have the time to check this out so far)
b) Have a look at your installed patches...
http://www.hp.com/products1/unix/java/infolibrary/patches.html
(and this is a shot in the dark;-) if they include
11.0 PHKL_24064 eventport (/dev/poll) pseudo driver (and Deps.)
11.11 PHKL_25468 eventport (/dev/poll) pseudo driver (and Deps.)
in case that you're using select() a lot (man page is delivered along with these patches)
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Support
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP