1837228 Members
2438 Online
110115 Solutions
New Discussion

High Load Average

 
AUJ
Advisor

High Load Average

Hi Guys,

I have a problem on one of my server wherein in normal situation, the system load average is less than 1 but when you run a command like find or du, the load average goes up to 35.

This problem only happened after the upgrade to HP-UX 11 32 bit.

Please advise if there are patches need to be installed to this server.

Thanks in Advance.
AUJ
7 REPLIES 7
Stefan Farrelly
Honored Contributor

Re: High Load Average


To complete your upgrade you should have installed the latest patch bundles - QPK (Quality pack) and HWE (Hardware extensions). Latest are from March2002 and you can download from software.hp.com

This may fix your problem. Generally if you are doing an I/O intensive command like find or du then if the load average shoots up its because of 2 possibilities;

1. You have a hardware I/O problem. Could be disk or cable or controller. Any errors in dmesg ? run xstm (Online Diagnostics) then go to Utility -> Tools -> Logtool and view the formatted log and see if you are getting any hardware errors.

2. Your disks are being hit very hard normally by an application or database and you trying to do a du or find really hits your I/O hard and this pushes up the load average. This doesnt seem your situation becuase you say your load is low before you run the du/find ?

There is also a patch for find 11 but im not sure its relevant, read on;

Bill Hassel wrote on May 20 2002.....

A higher buffer cache size (perhaps 500 megs) will help, but the interactive delays you are seeing (ie, login) are largely due to sequential I/O such as copying files to another server. HP-UX gives preference to sequential I/O, so much so that it can severely delay random I/O such as a login request.

A new parameter has been introduced with recent patches: disksort_seconds. Note that the SAM Help on Context file and a section 5 man page are still missing but the patch documentation has the details. Here is an excerpt from the 11.00 patch:

PHKL_21768:

The system sometimes takes a very long time to respond to a disk read/write request (could be up to several hundred seconds) while it is busy processing other I/O requests on the same disk, especially when there are sequential file accesses going on.

This is a fairness problem with the disk sort algorithm. The disk sort algorithm is used to reduce the disk head retractions. With this algorithm, all I/O requests with the same priority are queued in non-descending order of disk block number before being processed if the queue is not empty. When requests come in faster than they can be processed, the queue becomes longer, the time needed to perform one scan (from smallest block number to largest block number of the disk) could be very long in the worst case scenarios.

It is unfair for the request which came in early but has been continuously pushed back to the end of the queue because it has a large block number or it just missed the current scan. These kind of unlucky requests could line up in the queue for as long as the time needed for processing a whole scan (which could take a few minutes). This situation usually happens when a process tries to access a disk while another process is performing sequential accesses to the same disk.

Resolution:

To prevent this problem from happening, we have to take the time aspect into consideration in the sorting algorithm. We add a time stamp for each request when it is enqueued, which is used as the second sorting key for the queue (1st key: process priority; 2nd key: enqueued time; 3rd key: block number). The granularity of the time stamp value is controlled by a new tunable "disksort_seconds".

If we set "disksort_seconds" to N (N>0), for all the requests with the same priority, we can guarantee that any given request will be processed earlier than those which come in N seconds later than this request. Within each N second period (requests have the same time stamp), all requests are sorted by non-descending block number order. By choosing the right "disksort_seconds" value, we can balance the maximum waiting time of requests and the efficiency of disk accesses. The tunable parameter can be set to 0, 1, 2, 4, 8, 16, 32, 64, 128 or 256 second(s). If "disksort_seconds" is 0 (default value), the time stamp is disabled, which means that time aspect is not taking effect.


Im from Palmerston North, New Zealand, but somehow ended up in London...
Leif Halvarsson_2
Honored Contributor

Re: High Load Average

Hi
I tested this (find) on my fileserver (A400 HP-UX 11.11 with a RAID disk system) and get a load of about 8-9%. With a lower performance server and a very high performance disk system I think it is possible to get loads of 35% (with find). Have you done any changes to your filesystems when upgrading ?
AUJ
Advisor

Re: High Load Average

Hi!
Thanks for your immediate reply, just assign the points later...

By the way, HP-UX General & Critical Release patch has been applied to this server.

Here's the output of sar -u and see how bad the system performance when running find/du.

12:49:55 %usr %sys %wio %idle
12:49:57 4 96 0 0
12:49:59 1 99 0 0
12:50:01 1 99 0 0
12:50:17 2 98 0 0
17 83 0 0
Average 2 98 0 0

The bottle neck is on the system and when you run top, the %WCPU and %CPU for that particular command is very very high and when you cancel the system is back to normal.

One more thing, the type of filesystem I'am using here is HFS.

Thanks for the Help!

AU
Stefan Farrelly
Honored Contributor

Re: High Load Average

It looks to me like youre running find/du on a large filesystem which has tens of thousands of files on it (or more) hence the high overhead of cpu in order to process all the filenames. Is this so ?
Im from Palmerston North, New Zealand, but somehow ended up in London...
AUJ
Advisor

Re: High Load Average

Hi Stefan,

No! Actually I just run "find" command in /var directory.

#cd /var
# find . -name "logs" -print

Or even "du" in /var, I got the same performance.

Thanks.
AUJ
Stefan Farrelly
Honored Contributor

Re: High Load Average


Is you /var filesystem almost full ?

What kind of disks are you using' vgdisplay -v vg00

And whats the output from; fstyp -v /dev/vg00/lvol8
(presuming /var is lvol8)
Im from Palmerston North, New Zealand, but somehow ended up in London...
AUJ
Advisor

Re: High Load Average

Hi Stefan,

bdf
/dev/vg00/lvol8 1154320 722991 315897 70% /var

I only used /var but even I used different filesystem, it give me the same result.

In this server, I have 2 internal disk and one controller, vg00 and vg01. Actually, there is no application running here at the moment.

# fstyp -v /dev/vg00/lvol8
hfs
f_bsize: 8192
f_frsize: 1024
f_blocks: 1154320
f_bfree: 433545
f_bavail: 318113
f_files: 559488
f_ffree: 515129
f_favail: 515129
f_fsid: 1073741832
f_basetype: hfs
f_namemax: 255
f_magic: 95014
f_featurebits: 1
f_flag: 0
f_fsindex: 0
f_size: 1228800

Thanks.

AUJ