- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- J7000 develops inceasing load average over time
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2000 01:42 PM
09-08-2000 01:42 PM
J7000 develops inceasing load average over time
We have a J7000 (4GB 18GB - system and development - vxfs filesystems/36GB raid 5 storage - mail store - HFS- HPUX 11 June IPR) which we are using as our student mail store. Job mix is mostly imapd and sendmail. Number of processes is fairly steady averaging about 180-260 processes. When the system is first booted load averages and performance is what you would expect from this class of machine
(LA of 0.03-0.20 with occasional rise to 0.60). The problem is that over time
(typically 5 to 10 days) the system seems to be less and less able to sustain
this load (typically surges in the job mix caused by multi-recipient listserv
type mail cause the system to sustain high load averages over longer and
longer periods - at times making the system unuseable) - reminds me of the symptoms you would see if there was a resource shortage. Load averages at this time, with the same job mix that produced sub 1 LA's produce LA's anywhere from 6-18. After a reboot the symptoms disappear.
I have checked with sar and the other typical unix tools. The only thing that pops out
is %system vs %user (system will be at 30-49% and user will be at 1-4% ) which
would make you think that it is a io bottleneck .... but no other info points to this as
the problem. Some answers I have had suggested that the vxfs filesystems needed
defragmentation ... unfortunately since the condition is fixed by rebooting and no
defragmentation is done at that time ... im not sure this is a root cause of the problem.
I have added most of the latest patches, June 2000 IPR, NFS megapatches etc and have reached the end of things I can tweak. BTW we have a J5000 running 10.20 configured just the same and it handles 3x the load with no problems. Has anyone else incountered this type of problem? Suggestions?
Hugh Smith
Computing and Communication Services
University of Guelph
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2000 01:45 PM
09-08-2000 01:45 PM
Re: J7000 develops inceasing load average over time
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2000 02:36 PM
09-08-2000 02:36 PM
Re: J7000 develops inceasing load average over time
The RUNQUEUE is the number of processes that are ready to run at the same time. Now this is a rough indication of the loading on the machine but not really. As the RUNQUEUE changes dozens of times per second, there are scenarios where short I/O processes can paralyze a system if there are enough, yet no single process consumes much time.
An example is an Xwindow program that keeps track of cut buffers in a window manager. Since there is no wake-up call to a remote program indicating that new data exists, such a program must poll the window manager to see if something happened, typically once a second. That's two Xmessages, one for a request and the second (perhaps more) as a result.
Now multiply that program times 100 copies and now 200 messages per second go over the LAN with 100 context switches per second. The load average could go to 10 or 20 yet the system would seem to respond rapidly.
Alternately, you might have dozens of fairly large programs that are all run at the same time that are waiting to be run because there are no more processors available. If they don't all fit into RAM at the same time, then paging (swap) begins. vmstat, specifically, po (page out) will tell you if RAM is too small.
There is no easy answer as your system is probably doing a lot of things at the same time. Use Glance or top to sort the big items first and see why they are consuming so much time.
It's certainly possible that the programs have a memory leak and this will show up as constantly growing program sizes. That's a programming error, but will impact the system.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2000 02:36 PM
09-08-2000 02:36 PM
Re: J7000 develops inceasing load average over time
The RUNQUEUE is the number of processes that are ready to run at the same time. Now this is a rough indication of the loading on the machine but not really. As the RUNQUEUE changes dozens of times per second, there are scenarios where short I/O processes can paralyze a system if there are enough, yet no single process consumes much time.
An example is an Xwindow program that keeps track of cut buffers in a window manager. Since there is no wake-up call to a remote program indicating that new data exists, such a program must poll the window manager to see if something happened, typically once a second. That's two Xmessages, one for a request and the second (perhaps more) as a result.
Now multiply that program times 100 copies and now 200 messages per second go over the LAN with 100 context switches per second. The load average could go to 10 or 20 yet the system would seem to respond rapidly.
Alternately, you might have dozens of fairly large programs that are all run at the same time that are waiting to be run because there are no more processors available. If they don't all fit into RAM at the same time, then paging (swap) begins. vmstat, specifically, po (page out) will tell you if RAM is too small.
There is no easy answer as your system is probably doing a lot of things at the same time. Use Glance or top to sort the big items first and see why they are consuming so much time.
It's certainly possible that the programs have a memory leak and this will show up as constantly growing program sizes. That's a programming error, but will impact the system.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2000 02:37 PM
09-08-2000 02:37 PM
Re: J7000 develops inceasing load average over time
The RUNQUEUE is the number of processes that are ready to run at the same time. Now this is a rough indication of the loading on the machine but not really. As the RUNQUEUE changes dozens of times per second, there are scenarios where short I/O processes can paralyze a system if there are enough, yet no single process consumes much time.
An example is an Xwindow program that keeps track of cut buffers in a window manager. Since there is no wake-up call to a remote program indicating that new data exists, such a program must poll the window manager to see if something happened, typically once a second. That's two Xmessages, one for a request and the second (perhaps more) as a result.
Now multiply that program times 100 copies and now 200 messages per second go over the LAN with 100 context switches per second. The load average could go to 10 or 20 yet the system would seem to respond rapidly.
Alternately, you might have dozens of fairly large programs that are all run at the same time that are waiting to be run because there are no more processors available. If they don't all fit into RAM at the same time, then paging (swap) begins. vmstat, specifically, po (page out) will tell you if RAM is too small.
There is no easy answer as your system is probably doing a lot of things at the same time. Use Glance or top to sort the big items first and see why they are consuming so much time.
It's certainly possible that the programs have a memory leak and this will show up as constantly growing program sizes. That's a programming error, but will impact the system.
Bill Hassell, sysadmin