Operating System - HP-UX
1837095 Members
2455 Online
110112 Solutions
New Discussion

compress causing bdf/quota checks to hang

 
SOLVED
Go to solution
Adrian Turner
New Member

compress causing bdf/quota checks to hang

Hi all,

Whenever a compress or uncompress is initiated on our L2000 (4CPU 3.5GB RAM) with attached AutoRAID disk array running HPUX 11.0 user logins are halted at quota check and bdf commands hang.

top, sar -u, vmstat, sar -v show no issues. ioscan -kfn shows S/W states of CLAIMED for all hardware. We are not running NFS or NIS

We have the following related patches installed:
PHKL_20333 superseded
PHKL_22267 superseded
PHKL_23127 applied

Anyone got any ideas?

Thanks in advance,
Adrian Turner
7 REPLIES 7
Patrick Wallek
Honored Contributor

Re: compress causing bdf/quota checks to hang

A compress / uncompress is a fairly I/O intensive operation. If it is a large file that you are working on, it may be that your AutoRaid just can't service all requests in a timely fashion.

There may be some tweaking you can do to your set up, especially with the AutoRaid. How is the AutoRaid set up? Do you have dual controllers? Is the LV set up to stripe across both controllers? How much space in the autoraid and how much allocated?
Roger Baptiste
Honored Contributor

Re: compress causing bdf/quota checks to hang

Hi,

On what size file are you running compress?
After starting compress, monitor it in the glance process details. See what it is waiting on. Also do sar -d and collect stats while the compress is running.

Try the compress operation using GZIP. (/opt/contrib/bin/gzip).

HTH
raj
Take it easy.
Bill Hassell
Honored Contributor
Solution

Re: compress causing bdf/quota checks to hang

The first thing to do is to run all compress tasks using nice (to set the process to a lower priority). Compress is not only CPU intensive, but is very disk intensive. By reducing the priority of what is essentially a batch job (no interaction), other more random tasks (like login and bdf) will get their place in the disk queue.

The reason this occurs is due to a fairness problem with the disk sort algorithm. The disk sort algorithm is used to reduce the disk head movements. With this algorithm, all I/O requests with the same priority are queued in non-descending order of disk block number before being processed if the queue is not empty. When requests come in faster than they can be processed, the queue becomes longer, the time needed to perform one scan (from smallest block number to largest block number of the disk) could be very long in the worst case scenarios.

It is unfair for the request which came in early but has been continuously pushed back to the end of the queue because it has a block number that differs greatly from previous requests, or it just missed the current scan. These unlucky requests could line up in the queue for as long as the time needed for processing a whole scan (which could take minutes). This situation usually happens when a process tries to access a disk while another process is performing sequential accesses to the same disk.

Resolution:

To prevent this problem from happening, the disk queue manager has to take the time aspect into consideration in the sorting algorithm. It now adds a time stamp for each request when it is enqueued, which is used as the second sorting key for the queue (1st key: process priority; 2nd key: enqueued time; 3rd key: block number). The granularity of the time stamp value is controlled by a new kernel tunable "disksort_seconds".

If "disksort_seconds" is set to N (N>0), for all the requests with the same priority, HP-UX can guarantee that any given request will be processed earlier than those which come in N seconds later than this request. Within each N second period (requests have the same time stamp), all requests are sorted by non-descending block number order.

By choosing the right "disksort_seconds" value, HP-UX can balance the maximum waiting time of requests and the efficiency of disk accesses. The kernel parameter can be set to 0, 1, 2, 4, 8, 16, 32, 64, 128 or 256 second(s). If "disksort_seconds" is 0 (default value), the time stamp is disabled, which means that time aspect is not taking effect.

This feature was added to 10.20 via patch PHKL_23836. As with all patches, verify that this is the latest version and that all prerequisites are installed. It is a standard kernel parameter for 11.0 (although both 10.20 and 11.0 are missing descriptions of this parameter)...consider this post as an update to SAM's help menu.


Bill Hassell, sysadmin
Adrian Turner
New Member

Re: compress causing bdf/quota checks to hang

Thanks all, especially Bill.

disksort_seconds is currently set to 0 so we'll give it a go at either 2 or 4 and see how it goes.

generic_1
Respected Contributor

Re: compress causing bdf/quota checks to hang

Hello Bill I am having a similar problem in a very large NIS environment. I have another post and restarting RPC seems to fix it. I am wondering if this is a similar type situation. I have no particular known trigger unlike this case though. Its hard to tell with 20k users :).
Also what value would you recommend for a busy environemnt and have you seen anything else cause this type of hang?
Bill Hassell
Honored Contributor

Re: compress causing bdf/quota checks to hang

The long post I made was specific to a disk-intensive operation. NIS is normally a short-lived task during authentication so it should have no particular effect on the system. But if NIS services are overloaded by poorly designed programs and made worse by a badly overloaded network, then you can certainly except slowdowns.

Hangs are also possible, but 99% of hang conditions are fixed with a current set of patches, not changes to the disk scheduler.


Bill Hassell, sysadmin
generic_1
Respected Contributor

Re: compress causing bdf/quota checks to hang

Do you know any utility hp has that can capture whats holding quota or rpc, so a patch can be created in a timely manner?