Re: What can be done to lower maximum time for read and writes in the following situation?

blackwater · ‎08-22-2013

Our customer has reported that one of our programs after updating started working slowly. CPU utulization is not big so I decided to check where the program waits. I got four HP Caliper cstack reports. In all of the reports the "Hot Call Paths (All Threads)" section shows that the program waits most often in libc.so.1::pread. This is an example:

Hot Call Paths (All Threads)
-------------------------------------------------------------
       --% Total Hits In Only--    
       Run +     Run      Block    
Index  Block     Hits     Hits         Name    
       Hits      Only     Only     
-------------------------------------------------------------
  0      7.7      7.7       0.0      libc.so.1::__pread_sys [51]
                                       libc.so.1::pread [54]
                                       cache.so::Cache_Storage::CSFileStorage::get(CDataBuffer&,long,long) [50]

So it is pread where the algorithm waits.
Then I checked the "System Usage - Syscalls (All Threads)" section:

System Usage - Syscalls  (All Threads)
--------------------------------------------------------------------------------
Relative Time             Total    Calls --------- Time (thread secs) ----------
 System Call              Calls    / Sec   Minimum   Average   Maximum     Total
--------------------------------------------------------------------------------
Overall
 pread                   277276    13.67   0.00000   0.01649  26.53552 4570.98466
 write                   208490    10.26   0.00000   0.00058  34.93571 120.68408

To me it seems that "Maximum time spent in pread" is too big: 26.5 secs.

Then I checked the "System Usage - I/O (All Threads)":

--------------------------------------------------------------------------------
Relative Time     Count   Size/Sec --------- Size / Time (thread secs) ---------
 I/O Type                Count/Sec    Minimum    Average    Maximum        Total
--------------------------------------------------------------------------------
Overall
 Logical I/O - Disk
  read  (size)   277925     507.38          2         40     262144     11198802
        (time)               12.59    0.00000    0.01645   26.53552   4572.46887
  write (size)   202288     390.07          1         42      81668      8609647
        (time)                9.16    0.00000    0.00070   34.93571    140.97860
  total (size)   480213     897.45          1         41     262144     19808449
        (time)               21.76    0.00000    0.00982   34.93571   4713.44746
        
 Physical I/O - Disk
  read  (size)     5232    3339.01       4096      11252     131072     58871808
        (time)                0.30    0.00022    0.10258    0.54848    536.68981
  write (size)       42     135.21       4096      56758      65536      2383872
        (time)                0.00    0.00453    0.09820    0.43120      4.12429
  total (size)     5274    3474.22       4096      11614     131072     61255680
        (time)                0.30    0.00022    0.10254    0.54848    540.81409

This section puzzled me. It shows that Logical I/O read max time value is 26.53 secs
while Physical I/O read is only max time value 0.54 secs. The same problem is with writes.
So obviously such long logical preads and writes are not suitable.
Yet physical reads and writes seem to have no performance problems.
What can be done to lower maximum time for Logical read and writes?

Dennis Handly · ‎08-22-2013

How big is your filesystem cache?

While you have a large max time, the average is small.

Is there another process that is swamping your box?

Are these I/Os to local disks, NFS, etc?

blackwater · ‎08-23-2013

Dennis, thank you for your respone

> How big is your filesystem cache?
There is 191 Gb memory on the production server.
The filecache has these settings:
-bash-4.1$ /usr/sbin/kctune -q filecache_max -q filecache_min
Tunable Value Expression Changes
filecache_max 9795904307 5% Imm (auto disabled)
filecache_min 9795904307 5% Imm (auto disabled)

Is 5% optimal in this situation?

> Are these I/Os to local disks, NFS, etc?
local disks.

> While you have a large max time, the average is small.
However the program waits a lot on calls to pread().
So even though the average time is small pread() is the number one function where it waits.
That might be a reason for slowdown in the program.

> Is there another process that is swamping your box?
Not actually. There is only one process with PID 26283 that eats 18Gb physical memory. Other processes are not so big.
This is from the production server:
$ UNIX95= ps -ef -o pid,vsz,sz,args | sort -n -k 3 | tail -n 10
2180 50080 12418 /opt/wbem/lbin/cimprovagt 0 4 9 root SFMProviderModule
2983 127168 31792 /opt/perf/bin/midaemon
13627 407044 54457 ./has_serverd_mm -d
13805 442884 59991 ./has_serverd_mz -d
14183 467076 66100 ./has_serverd_cf_mm -d
13093 471172 66724 ./has_serverd -d
19033 522916 81141 ./has_serverd_uap -d
12287 2391380 548655 ./has_serverd_2 -d
17726 4665592 1165608 /export/usnmp/snmp_int/bin/snmp_int -b -p/export/usnmp/snmp_int/tmp/.snmp_int.pid
26283 75852356 18884623 ./has_serverd_1 -d

$ uptime
1:14pm up 68 days, 16:59, 11 users, load average: 0.32, 0.35, 0.38

Actually this is another difficult question.
The number of pages that are paged out is rather high (>100 or even > 200) on the production server but it is not clear to me why it happens.
For example:

-bash-4.1$ vmstat 10 10
         procs memory page faults cpu
    r b w avm free re at pi po fr de sr in sy cs us sy id
    7 325 0 23017586 21414749 0 0 0 181 0 0 0 17022 586876 9035 26 4 70
    4 327 0 23009503 21414768 0 0 0 104 0 0 0 12705 471482 3848 13 2 85
    5 326 0 23018414 21414780 0 0 0 52 0 0 0 12612 124498 4568 9 2 89
    2 332 0 23014890 21414266 0 0 0 161 0 0 0 13067 129257 4756 5 3 92
    5 326 0 23004962 21414532 0 0 0 238 0 0 0 14882 370527 6110 16 4 80
   10 279 0 23009556 21414532 0 0 0 275 0 0 0 14729 508642 7434 13 5 82
    9 169 0 23012747 21414530 0 0 0 221 0 0 0 15549 266111 7318 23 4 72
    5 162 0 23006140 21414529 0 0 0 218 0 0 0 17087 246297 8063 23 4 73
    9 120 0 23010860 21414526 0 0 0 114 0 0 0 17117 223376 8083 26 3 70
   10 226 0 23012303 21413179 0 0 0 90 0 0 0 17621 227688 7077 27 3 69

Yet "/usr/sbin/swapinfo -tam" shows that that only 34% memory is used (the total line) and plenty of free memory is available ( >167GB ):
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 131072 0 131072 0% 0 - 1 /dev/vg00/lvol2
reserve - 89934 -89934
memory 186842 19371 167471 10%
total 317914 109305 208609 34% - 0 -

How is possible that there are lots of free memory and the number of pages that are paged out is rather high?
And vhand seems to be more active than on other servers:
-bash-4.1$ ps -o pid,cpu,pcpu,etime,time,args -p 2
PID C %CPU ELAPSED TIME COMMAND
2 53 1.27 67-14:55:59 19:10:58 vhand

Dennis Handly · ‎08-23-2013

>Other processes are not so big.

It really depends if they are doing lots of I/O.

>How is possible that there are lots of free memory and the number of pages that are paged out is rather high?

Writing to a mapped file and then flushing it. This would show up on PO but not in swapinfo.

blackwater · ‎08-23-2013

> It really depends if they are doing lots of I/O.

How do I get this information? Should I use "caliper cstack --system-usage=all -w" or

"caliper cstack --system-usage=all `UNIX95= ps -o pid -ef`" or somethin else?

blackwater · ‎08-23-2013

> It really depends if they are doing lots of I/O.

I compared scgprof report that was taken one year ago and today. Obviosly for read the average time and the maximum time became much worse today.

One year ago:

---------------

System Usage - I/O : has_serverd_1

--------------------------------------------------------------------------------

Relative Time Count Size/Sec --------- Size / Time (thread secs) ---------

I/O Type Count/Sec Minimum Average Maximum Total

--------------------------------------------------------------------------------

Overall

Logical I/O - Disk

read (size) 2119264 1586575.68 2 45 211492 96634912

(time) 34794.60 0.00000 0.00002 0.11838 39.11004

write (size) 2163982 1307654.55 2 36 65536 79646426

(time) 35528.79 0.00000 0.00000 0.03834 10.49590

total (size) 4283246 1447115.12 2 41 211492 176281338

(time) 35161.69 0.00000 0.00001 0.11838 49.60594

Today

-------

System Usage - I/O : has_serverd_1

--------------------------------------------------------------------------------

Relative Time Count Size/Sec --------- Size / Time (thread secs) ---------

I/O Type Count/Sec Minimum Average Maximum Total

--------------------------------------------------------------------------------

Overall

Logical I/O - Disk

read (size) 7201988 1424701.27 0 35 262144 258010550

(time) 39768.46 0.00000 0.00152 14.73399 10916.57038

write (size) 24377267 2819550.76 1 20 65536 510615000

(time) 134608.15 0.00000 0.00001 22.41641 229.75982

total (size) 31579255 2122126.02 0 24 262144 768625550

(time) 87188.30 0.00000 0.00035 22.41641 11146.33020

Dennis Handly · ‎08-24-2013

>> It really depends if they are doing lots of I/O.

>How do I get this information?

You use glance and probably not caliper. glance has a better system wide, non CPU view.

Santosh Abraham · ‎08-24-2013

It is pretty clear that the pread() /pwrite() time is unacceptably slow.

From the 2 outputs what stands out is that the Maximum I/O size for reads is now 256K, slightly more than the earlier ~206K. What is unique about 256K is that it is the limit at which VxFS does a discovered direct I/O. However, even if that case the 20 seconds is unacceptably slow.

Can you run kiinfo / kitrace to capture some more logs ? kiinfo with the "-pid" option can be used to trace the wait times of a system call in the kernel. Likewise kitrace may also help.

-santosh

blackwater · ‎08-27-2013

>>Other processes are not so big.

>It really depends if they are doing lots of I/O.

Finally it was found that one of shared libraries that was developed by another group and was loaded in our process started doing lots of updates what resulted in lots of I/O operations.

>> How do I get this information?

>You use glance and probably not caliper. glance has a better system wide, non CPU view.

Unfortunatelly this customer has not bought glance. Dennis, could you tell me, please, what other HP utilities can get this information?

Dennis Handly · ‎08-28-2013

>that one of shared libraries ... started doing lots of updates what resulted in lots of I/O operations.

If you have some extra I/O calls, caliper can measure these extra calls with scgprof.

>what other HP utilities can get this information?

There is vmstat(1) and iostat(1) but these don't tell you where.

But caliper can tell you that you are now calling pread more often. It's just that we were looking at how long it took, not how many calls.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: What can be done to lower maximum time for read and writes in the following situation?

What can be done to lower maximum time for read and writes in the following situation?