hpux 11.11 IO perfomance

 
sfgroups
Advisor

hpux 11.11 IO perfomance

We have HPUX 11.11 running progress database, this server attached to ECM SAN with 4 path, we are using power path software for mutlipathing. We running the SRDF Synchronized mode

Sometime my users are experiencing performance problem in their application. CPU & memory utilization looks normal, I want to make sure it’s not IO related. I am collecting sar data for every 5 minutes


I am posting one sar sample data, Is avwait & avserv time look normal or do you see any problem?


device %busy avque r+w/s blks/s avwait avserv
c10t0d1 0.10 0.50 0 3 4.29 3.75
c10t0d4 0.22 0.50 0 4 5.09 6.37
c10t0d6 0.88 0.50 1 8 4.33 15.56
c10t0d7 1.17 0.50 1 8 3.61 37.50
c10t1d0 1.13 0.50 1 5 3.26 46.25
c10t1d1 0.85 0.50 1 5 3.50 29.71
c10t1d3 0.82 0.50 1 10 4.90 12.09
c10t1d4 0.23 0.50 0 4 4.99 5.70
c10t1d5 0.08 0.50 0 2 3.72 4.97
c10t1d6 0.03 0.50 0 1 9.33 3.59
c10t4d3 0.48 0.50 1 17 5.12 4.58
c10t4d4 0.07 0.50 1 17 1.15 2.61
c12t0d1 0.05 0.50 0 2 4.54 2.98
c12t0d4 0.23 0.50 0 4 4.24 8.11
c12t0d6 1.13 0.61 1 8 9.87 46.16
c12t0d7 1.12 0.50 1 7 3.65 33.24
c12t1d0 1.68 0.59 1 9 9.20 75.40
c12t1d1 0.57 0.50 0 4 3.97 26.45
c12t1d3 0.95 0.50 1 12 4.77 13.11
c12t1d4 0.25 0.50 0 3 6.00 7.78
c12t1d5 0.05 0.50 0 3 4.15 4.33
c12t1d6 0.02 0.50 0 1 1.19 12.44
c12t4d3 0.52 0.50 1 18 5.36 4.75
c12t4d4 0.15 0.50 1 19 1.19 3.31
c14t0d1 0.05 0.50 0 3 4.82 4.56
c14t0d4 0.18 0.50 0 4 4.40 6.00
c14t0d6 1.12 0.50 1 7 3.85 27.88
c14t0d7 1.00 0.50 1 8 3.94 21.72
c14t1d0 1.58 0.50 1 7 4.35 57.53
c14t1d1 0.82 0.50 1 5 4.52 34.11
c14t1d3 0.82 0.50 1 10 4.49 12.22
c14t1d4 0.28 0.50 0 4 4.80 9.73
c14t1d5 0.13 0.50 0 5 4.88 5.90
c14t1d6 0.02 0.50 0 1 2.35 8.56
c14t4d3 0.43 0.50 1 18 5.14 4.03
c14t4d4 0.22 0.50 1 21 1.57 4.09
c8t0d1 0.05 0.50 0 2 5.36 3.35
c8t0d4 0.27 0.50 0 4 4.15 8.44
c8t0d6 0.88 0.52 1 8 4.56 27.03
c8t0d7 1.05 0.50 1 6 3.05 31.60
c8t1d0 1.30 0.50 1 7 3.76 63.80
c8t1d1 0.50 0.50 0 3 4.40 21.99
c8t1d3 0.95 0.50 1 12 4.89 13.78
c8t1d4 0.35 0.50 0 4 4.79 10.31
c8t1d5 0.15 0.50 0 5 3.55 6.64
c8t1d6 0.03 0.50 0 2 1.55 10.86
c8t4d3 0.63 0.50 1 20 5.29 4.92
c8t4d4 0.20 0.50 1 17 1.41 3.97
c8t4d5 2.32 1.64 5 53 30.41 28.52
c8t4d6 2.33 2.41 6 61 38.88 26.50
c8t4d7 2.20 1.62 4 55 33.63 35.77
c10t4d5 2.27 1.32 4 56 24.40 31.56
c10t4d6 2.27 1.52 5 54 23.97 25.87
c10t4d7 1.95 2.12 3 37 39.88 34.60
c12t4d5 2.18 1.34 5 56 22.91 27.25
c12t4d6 2.38 1.65 5 56 26.50 27.83
c12t4d7 2.13 2.08 4 41 35.76 35.56
c14t4d5 2.07 1.10 4 51 18.59 27.52
c14t4d6 2.23 0.98 6 64 14.86 20.20
c14t4d7 2.13 2.28 4 50 38.47 32.20
c8t5d0 5.35 6.06 9 147 43.50 19.05
c8t5d1 5.45 6.81 9 157 49.93 20.08
c8t5d2 5.28 5.75 9 142 46.47 21.50
c8t5d3 5.06 8.35 9 153 62.01 20.40
c8t5d4 5.21 10.18 9 150 68.89 21.35
c10t5d0 5.25 7.00 9 152 51.45 20.06
c10t5d1 5.16 8.16 9 143 59.53 20.73
c10t5d2 5.70 6.22 9 152 46.96 20.43
c10t5d3 5.41 8.65 9 155 62.50 20.48
c10t5d4 5.48 8.15 10 161 58.30 20.12
c12t5d0 5.11 6.96 9 158 49.44 19.97
c12t5d1 5.68 7.16 9 147 48.15 20.49
c12t5d2 5.71 6.77 9 154 51.02 20.96
c12t5d3 5.48 7.77 9 149 57.16 21.17
c12t5d4 5.28 8.65 9 154 67.09 20.85
c14t5d0 5.51 8.40 9 157 58.92 20.14
c14t5d1 5.46 5.76 8 139 44.17 20.67
c14t5d2 5.98 7.33 9 160 52.82 20.93
c14t5d3 5.68 7.20 9 156 58.89 21.30
c14t5d4 5.45 8.43 10 155 62.46 20.24



Is some other command I can use to analyze this issue?
13 REPLIES 13
Ninad_1
Honored Contributor

Re: hpux 11.11 IO perfomance

Though it is difficult to comment the exact cause from just an instance of sar output - what I feel is that none of your disks are busy - just upto 5-6 % busy, but the avserv time for many disks is around 20 ms which if you are using SAN disks seems a bit high. Also there is avwait of 50-60 ms on many disks the especially last entries as well as avqueue. Can you check the statistics at SAN switch or Storage array end as well. I guess that you may need to increase Storage cache - if its not enough.
Also if you have glance check
glance -B for global waits to understand where your system is waiting more.

Regards,
Ninad
Alzhy
Honored Contributor

Re: hpux 11.11 IO perfomance

You have plain and simple severe disk queuing on all of you c14, c10, c12 and c8 EMC disks. And "sar -d" is the most authoritative monitoring tool you can use to conculde you indeed have an I/O bound system.

Possible Remedies:

1. Check kernel parameter scsi_max_qdepth - is it within the recommended setting by EMC? Normally it should be at 16 for most arrays.

2. Do you stripe your db storage? If not, try striping so I/O load can be better distributed.

3. Check your PowerPath configuration to make sure your load balancing is optimal.

4. When this occurs, are there backups are running on the system at the same time intense processing is occuring?


Hope this helps.



Hakuna Matata.
Ludovic Derlyn
Esteemed Contributor

Re: hpux 11.11 IO perfomance

hi,

I thinks as Ninad, difficult to respond with only a result of sar.
Approximatively, average wait and average service must be equal to 20 ms...
It's also important to see avque.

Minimum %busy must be 0.5

What applications are running ?
What is the mount options ?

If you have database , you can specify mincache=direct for example

Regards
L-DERLYN
Ludovic Derlyn
Esteemed Contributor

Re: hpux 11.11 IO perfomance

Hi,

I thinks as ninad, only a result of sar command is not sufficient

Normally average wait and average service is equal to 20 ms
It's important to see avque and %busy,
A disk is occupied at 0.5 at minimum.

What applications are running ?
What mount options are set ?
If you have database for example try mount options mincache=direct

regards
L-DERLYN
Alzhy
Honored Contributor

Re: hpux 11.11 IO perfomance

I disagree, the SAR output provided is conclusive enough of the fact that you have an I.O bound system since queueing (severe) is present.

I smell either contention on the server (too many processes wanting to be served by the storage) OR the array itself - perhaps your BCV operations are getting in the way that service times and queues are severely affected.

Establish what causes this - use glance/MWA to track LVOL acticvity and see if you have other proceses doing IO on those lvols asisde from noma DB IO at the periods of time when youhave this situation.

Then Check the possible remedies I mentioned.

Hakuna Matata.
sfgroups
Advisor

Re: hpux 11.11 IO perfomance

Thanks for the reply,

1.
here is my glance -B output:

GLOBAL WAIT STATES Users= 1611
Procs/ Procs/
Event % Time Threads Blocked On % Time Threads
--------------------------------------------------------------------------------
IPC 0.0 0.00 0.0 Cache 0.0 9.86 1.8
Job Control 0.0 0.00 0.0 CDROM IO 0.0 0.00 0.0
Message 0.0 0.00 0.0 Disk IO 0.0 0.00 0.0
Pipe 0.4 90.19 16.1 Graphics 0.0 0.00 0.0
RPC 0.0 0.00 0.0 Inode 0.0 0.00 0.0
Semaphore 0.0 5.61 1.0 IO 0.0 5.72 1.0
Sleep 47.8 10165.40 1818.5 LAN 0.0 0.00 0.0
Socket 0.1 22.36 4.0 NFS 0.0 0.00 0.0
Stream 43.7 9301.42 1663.9 Priority 0.0 1.50 0.3
Terminal 0.0 0.00 0.0 System 4.0 857.85 153.5
Other 3.7 783.92 140.2 Virtual Mem 0.0 0.00 0.0

terminal wait time looks more, but 1600 user logged into the system using telnet.

2. We use vxfs file system (online JFS), this is the mount option we use, vxfs delaylog 0 2

3. we already striped the volume.

4. they are running some tty application.

5. current scsi_max_qdepth is 8.
$kmtune|grep scsi_max_qdepth
scsi_max_qdepth 8 Y 8

I think we need to increase this value to 16.
Alzhy
Honored Contributor

Re: hpux 11.11 IO perfomance

In that case, aside from increasing your scsi_max_qdepth - you may very well check if Progress DB likes its IOs to be direct (no to minimal caching on the filesystem)

In which case, try changing all your DB storage mount points ie:

/dev/vgdb/dbdata /dbdata vxfs mincache=direct,delaylog,convosync=direct 0 0

Is this a new environment? What changes have there been since this "slowness" started? Or has this been the case ever since?
Hakuna Matata.
Michael Steele_2
Honored Contributor

Re: hpux 11.11 IO perfomance

Hi all. Definition of a disk bottlenect is avwait greater then avserv. So I've sorted and pasted in most of those events. They do appear on c8, c10, c12 and c14.

So you want to follow the guidelines below to correct the problem.

device avwait avserv
c10t4d7 39.88 34.60
c10t5d0 51.45 20.06
c10t5d1 59.53 20.73
c10t5d2 46.96 20.43
c10t5d3 62.50 20.48
c10t5d4 58.30 20.12
c12t4d7 35.76 35.56
c12t5d0 49.44 19.97
c12t5d1 48.15 20.49
c12t5d2 51.02 20.96
c12t5d3 57.16 21.17
c12t5d4 67.09 20.85
c14t4d7 38.47 32.20
c14t5d0 58.92 20.14
c14t5d1 44.17 20.67
c14t5d2 52.82 20.93
c14t5d3 58.89 21.30
c14t5d4 62.46 20.24
c8t4d5 30.41 28.52
c8t4d6 38.88 26.50
c8t4d7 33.63 35.77
c8t5d0 43.50 19.05
c8t5d1 49.93 20.08
c8t5d2 46.47 21.50
c8t5d3 62.01 20.40
c8t5d4 68.89 21.35

strings /etc/lvmtab
to identify the volume group associated with the disks.

lvdisplay -v /dev/vgXX/lvolX
to tell you what disks are associated with the disks

bdf
to see if this volume groups files sytems are full ( > 85%)

cat /etc/fstab
to determine the file system type assiciated with the lvol/mountpoint

How can improve disk I/O ?

1. Reduce the volume of data on the disk to less than 90%
2. Stripe the data across disks to improve I/O speed
3. If you are using Online JFS , run fsadm â e to defragment the extents.
4. If you are using HFS filesystems , implement asynchronous writes by setting
the kernel parameter fs_async to 1 or consider converting to VxFS.
5. Reduce the size of the buffer cache ( if %wcache is less than 90)
6. If you are using raw logical volumes , consider implementing asynchronous IO.

The difference between the async i/o and the synchronous i/o is that async does
not wait for confirmation of the write before moving on to the next task. This
does increase the speed of the disk performance at the expense of robustness.
Synchronous I/O waits for acknowledgement of the write (or fail) before
continuing on. The write can have physically taken place or could be in the
buffer cache but in either case, acknowledgement has been sent. In the case of
async, no waiting.
Support Fatherhood - Stop Family Law
sfgroups
Advisor

Re: hpux 11.11 IO perfomance

Hi all,

Thanks for your inputs,

1. We are not running any backup during production time.
2. This is new environment
3. Good comparison of sar avwait & avserv information



Questions:

1. What powermt command option I can use to check data is distributing to all four paths.
2. Where can I download HPUX version of IOZone file system benchmark tool.

Thanks