- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- High %CPU sys and context switches
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2006 04:12 AM
тАО03-05-2006 04:12 AM
I have been asked to analyse a problem which reflected in CPU stats showing a high length of CPU run queue on a system daily around 23:30.
Although I do not have root access,I collected the output of sar, top, ps -eafl, vmstat for the system at around the 23:30 , the outputs have been attached.
My observations and questions as follows:
All the observations from the logs collected and are relevant for the time period around 23:30 daily.
The vmstat output shows a lot of running processes in the "r" field - starting from 80 and going to above 1000 and then again reducing, as well as shows an increased no of context switches.
The sar output also shows the same - high figures in the "runq-sz" along with around 99% %CPU sys.
The top output also reflects the increased no of running processes, high %CPU sys and shows a lot of rm commands in the top processes list.
The ps -eafl even though run during that period takes a looooong time to respond and was able to get the output only when the %CPU usage went a bit low. But to mention in short , the ps also shows a lot of processes in "R" state and many of them are the rm commands.
My observation is that around the night time there is a batch job schedule used for archiving older files and then removing them from the original location using command like find .... -exec rm -f {}\;
Is it that since a lot of rm commands are fired there are a lot of running processes and hence the CPU has to do a lot of context switching for the running processes and thus causing the high %CPU sys ?
My queries
1. What is the problem ?
2. Is there virtually no processing happening due to context switches ?
3. What are the acceptable values for "runq-sz" , runnable/running processes, context switches, %CPU sys.
4. When a find command with exec to rm is run, do the rm commands run as the previous rm command completes or they are run parallely ? (From ps etc it seems that they run parallely - but would like your comments). In such a case is using find and exec a bad idea as it is hampering the system.
Please advice.
Thanks a lot
Ninad
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2006 07:39 AM
тАО03-05-2006 07:39 AM
Re: High %CPU sys and context switches
Now if all of this is slowing the system down at a time when processing (and disk I/O) horsepower is needed for other tasks, then you'll need to reschedule the cleanup. Or bettery yet, redesign the cleanup task. Perhaps a better solution is to look at the archive process since it apparently involves a massive number of files, and look at a less invasive way to accomplish the tasks.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2006 07:52 AM
тАО03-05-2006 07:52 AM
Re: High %CPU sys and context switches
If your cleanup process does something like:
# find
...then for *each* file meeting the specified criteria, a new process ('rm') is spawned. This can be terribly expensive.
Change the above mechanism to:
# find
This will cause an assembly of file names to be built in an internal buffer of 'xargs' and a 'rm' command to be spawned for the block (list) of files thus buffered. The number of processes spawned will be greatly reduced along with your CPU queue depth and utilization.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2006 08:18 PM
тАО03-05-2006 08:18 PM
Re: High %CPU sys and context switches
James,
You have mentioned the exact syntax used for the cleanup process and I agree that the solution you have suggested should be a better one.
Currently the system is becoming very slow as regards response if I login around that time, I am not sure if any processing is getting affected, but still this is not a good state of the system to be non-responsive, so I guess I can suggest the modification.
If you could also guide me on the other part of the queries - points 3 and 4 please.
Thanks,
Ninad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-06-2006 12:18 AM
тАО03-06-2006 12:18 AM
SolutionBut this is too simplistic. I've personally seen a system running with 2 processors and the average run queue was 45-50 for hours. Yet no one complained. How can this be? The processes creating the load were extremely shortlived polling processes. They sent one LAN packet, waited for a response, and then went to sleep for 1/2 second. Multiply this by 400 copies of the same program and the runq (and context switches) were extremely high, but because the OS kept things moving and the processes consumed so little total cycles, no one noticed any slowdown.
By changing the polling program to run every 2 seconds (rather than 1/2 sec), the runq dropped to 5-6 and context switches dropped to about 10% of the large value. Other than the large kernel load numbers, the only real impact was for the LAN, about 2000 packets/second continuous, so the load was adjusted to lower the network load.
In your case, the rm command (for one file at a time) exhibits a similar load but instead of loading the LAN with requests, it loads the filesystem with massive directory requests, and that will indeed impact overall system performance. Change the cleanup process (as James mentioned) to use xargs which will more efficiently remove the files with less rm processes. Note that the apparent parallel rm's seen in ps are an artifact of an extremely fast system and a slow measurement tool (ps). Most kernel measurement tools are misleading because things change very rapidly and the tools can't keep up. Look at the number of context (program) switches. Thousands per second is normal on a busy system.
I would use the metrics as a starting point. If something seems high, does these numbers seem to affect performance? Are the numbers new or have they always been this way?
Bill Hassell, sysadmin