- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- What can be done to lower maximum time for read an...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2013 08:27 AM
08-22-2013 08:27 AM
What can be done to lower maximum time for read and writes in the following situation?
Our customer has reported that one of our programs after updating started working slowly. CPU utulization is not big so I decided to check where the program waits. I got four HP Caliper cstack reports. In all of the reports the "Hot Call Paths (All Threads)" section shows that the program waits most often in libc.so.1::pread. This is an example:
Hot Call Paths (All Threads) ------------------------------------------------------------- --% Total Hits In Only-- Run + Run Block Index Block Hits Hits Name Hits Only Only ------------------------------------------------------------- 0 7.7 7.7 0.0 libc.so.1::__pread_sys [51] libc.so.1::pread [54] cache.so::Cache_Storage::CSFileStorage::get(CDataBuffer&,long,long) [50]
So it is pread where the algorithm waits.
Then I checked the "System Usage - Syscalls (All Threads)" section:
System Usage - Syscalls (All Threads) -------------------------------------------------------------------------------- Relative Time Total Calls --------- Time (thread secs) ---------- System Call Calls / Sec Minimum Average Maximum Total -------------------------------------------------------------------------------- Overall pread 277276 13.67 0.00000 0.01649 26.53552 4570.98466 write 208490 10.26 0.00000 0.00058 34.93571 120.68408
To me it seems that "Maximum time spent in pread" is too big: 26.5 secs.
Then I checked the "System Usage - I/O (All Threads)":
-------------------------------------------------------------------------------- Relative Time Count Size/Sec --------- Size / Time (thread secs) --------- I/O Type Count/Sec Minimum Average Maximum Total -------------------------------------------------------------------------------- Overall Logical I/O - Disk read (size) 277925 507.38 2 40 262144 11198802 (time) 12.59 0.00000 0.01645 26.53552 4572.46887 write (size) 202288 390.07 1 42 81668 8609647 (time) 9.16 0.00000 0.00070 34.93571 140.97860 total (size) 480213 897.45 1 41 262144 19808449 (time) 21.76 0.00000 0.00982 34.93571 4713.44746 Physical I/O - Disk read (size) 5232 3339.01 4096 11252 131072 58871808 (time) 0.30 0.00022 0.10258 0.54848 536.68981 write (size) 42 135.21 4096 56758 65536 2383872 (time) 0.00 0.00453 0.09820 0.43120 4.12429 total (size) 5274 3474.22 4096 11614 131072 61255680 (time) 0.30 0.00022 0.10254 0.54848 540.81409
This section puzzled me. It shows that Logical I/O read max time value is 26.53 secs
while Physical I/O read is only max time value 0.54 secs. The same problem is with writes.
So obviously such long logical preads and writes are not suitable.
Yet physical reads and writes seem to have no performance problems.
What can be done to lower maximum time for Logical read and writes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2013 12:52 PM
08-22-2013 12:52 PM
Re: What can be done to lower maximum time for read and writes in the following situation?
How big is your filesystem cache?
While you have a large max time, the average is small.
Is there another process that is swamping your box?
Are these I/Os to local disks, NFS, etc?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2013 02:31 AM - edited 08-23-2013 02:42 AM
08-23-2013 02:31 AM - edited 08-23-2013 02:42 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
Dennis, thank you for your respone
> How big is your filesystem cache?
There is 191 Gb memory on the production server.
The filecache has these settings:
-bash-4.1$ /usr/sbin/kctune -q filecache_max -q filecache_min
Tunable Value Expression Changes
filecache_max 9795904307 5% Imm (auto disabled)
filecache_min 9795904307 5% Imm (auto disabled)
Is 5% optimal in this situation?
> Are these I/Os to local disks, NFS, etc?
local disks.
> While you have a large max time, the average is small.
However the program waits a lot on calls to pread().
So even though the average time is small pread() is the number one function where it waits.
That might be a reason for slowdown in the program.
> Is there another process that is swamping your box?
Not actually. There is only one process with PID 26283 that eats 18Gb physical memory. Other processes are not so big.
This is from the production server:
$ UNIX95= ps -ef -o pid,vsz,sz,args | sort -n -k 3 | tail -n 10
2180 50080 12418 /opt/wbem/lbin/cimprovagt 0 4 9 root SFMProviderModule
2983 127168 31792 /opt/perf/bin/midaemon
13627 407044 54457 ./has_serverd_mm -d
13805 442884 59991 ./has_serverd_mz -d
14183 467076 66100 ./has_serverd_cf_mm -d
13093 471172 66724 ./has_serverd -d
19033 522916 81141 ./has_serverd_uap -d
12287 2391380 548655 ./has_serverd_2 -d
17726 4665592 1165608 /export/usnmp/snmp_int/bin/snmp_int -b -p/export/usnmp/snmp_int/tmp/.snmp_int.pid
26283 75852356 18884623 ./has_serverd_1 -d
$ uptime
1:14pm up 68 days, 16:59, 11 users, load average: 0.32, 0.35, 0.38
Actually this is another difficult question.
The number of pages that are paged out is rather high (>100 or even > 200) on the production server but it is not clear to me why it happens.
For example:
-bash-4.1$ vmstat 10 10 procs memory page faults cpu r b w avm free re at pi po fr de sr in sy cs us sy id 7 325 0 23017586 21414749 0 0 0 181 0 0 0 17022 586876 9035 26 4 70 4 327 0 23009503 21414768 0 0 0 104 0 0 0 12705 471482 3848 13 2 85 5 326 0 23018414 21414780 0 0 0 52 0 0 0 12612 124498 4568 9 2 89 2 332 0 23014890 21414266 0 0 0 161 0 0 0 13067 129257 4756 5 3 92 5 326 0 23004962 21414532 0 0 0 238 0 0 0 14882 370527 6110 16 4 80 10 279 0 23009556 21414532 0 0 0 275 0 0 0 14729 508642 7434 13 5 82 9 169 0 23012747 21414530 0 0 0 221 0 0 0 15549 266111 7318 23 4 72 5 162 0 23006140 21414529 0 0 0 218 0 0 0 17087 246297 8063 23 4 73 9 120 0 23010860 21414526 0 0 0 114 0 0 0 17117 223376 8083 26 3 70 10 226 0 23012303 21413179 0 0 0 90 0 0 0 17621 227688 7077 27 3 69
Yet "/usr/sbin/swapinfo -tam" shows that that only 34% memory is used (the total line) and plenty of free memory is available ( >167GB ):
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 131072 0 131072 0% 0 - 1 /dev/vg00/lvol2
reserve - 89934 -89934
memory 186842 19371 167471 10%
total 317914 109305 208609 34% - 0 -
How is possible that there are lots of free memory and the number of pages that are paged out is rather high?
And vhand seems to be more active than on other servers:
-bash-4.1$ ps -o pid,cpu,pcpu,etime,time,args -p 2
PID C %CPU ELAPSED TIME COMMAND
2 53 1.27 67-14:55:59 19:10:58 vhand
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2013 02:42 AM
08-23-2013 02:42 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
>Other processes are not so big.
It really depends if they are doing lots of I/O.
>How is possible that there are lots of free memory and the number of pages that are paged out is rather high?
Writing to a mapped file and then flushing it. This would show up on PO but not in swapinfo.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2013 03:57 AM - edited 08-23-2013 04:34 AM
08-23-2013 03:57 AM - edited 08-23-2013 04:34 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
> It really depends if they are doing lots of I/O.
How do I get this information? Should I use "caliper cstack --system-usage=all -w" or
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2013 07:18 AM - edited 08-23-2013 07:19 AM
08-23-2013 07:18 AM - edited 08-23-2013 07:19 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
> It really depends if they are doing lots of I/O.
I compared scgprof report that was taken one year ago and today. Obviosly for read the average time and the maximum time became much worse today.
One year ago:
---------------
System Usage - I/O : has_serverd_1
--------------------------------------------------------------------------------
Relative Time Count Size/Sec --------- Size / Time (thread secs) ---------
I/O Type Count/Sec Minimum Average Maximum Total
--------------------------------------------------------------------------------
Overall
Logical I/O - Disk
read (size) 2119264 1586575.68 2 45 211492 96634912
(time) 34794.60 0.00000 0.00002 0.11838 39.11004
write (size) 2163982 1307654.55 2 36 65536 79646426
(time) 35528.79 0.00000 0.00000 0.03834 10.49590
total (size) 4283246 1447115.12 2 41 211492 176281338
(time) 35161.69 0.00000 0.00001 0.11838 49.60594
Today
-------
System Usage - I/O : has_serverd_1
--------------------------------------------------------------------------------
Relative Time Count Size/Sec --------- Size / Time (thread secs) ---------
I/O Type Count/Sec Minimum Average Maximum Total
--------------------------------------------------------------------------------
Overall
Logical I/O - Disk
read (size) 7201988 1424701.27 0 35 262144 258010550
(time) 39768.46 0.00000 0.00152 14.73399 10916.57038
write (size) 24377267 2819550.76 1 20 65536 510615000
(time) 134608.15 0.00000 0.00001 22.41641 229.75982
total (size) 31579255 2122126.02 0 24 262144 768625550
(time) 87188.30 0.00000 0.00035 22.41641 11146.33020
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2013 12:27 AM
08-24-2013 12:27 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
>> It really depends if they are doing lots of I/O.
>How do I get this information?
You use glance and probably not caliper. glance has a better system wide, non CPU view.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-24-2013 03:15 AM
08-24-2013 03:15 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
It is pretty clear that the pread() /pwrite() time is unacceptably slow.
From the 2 outputs what stands out is that the Maximum I/O size for reads is now 256K, slightly more than the earlier ~206K. What is unique about 256K is that it is the limit at which VxFS does a discovered direct I/O. However, even if that case the 20 seconds is unacceptably slow.
Can you run kiinfo / kitrace to capture some more logs ? kiinfo with the "-pid" option can be used to trace the wait times of a system call in the kernel. Likewise kitrace may also help.
-santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2013 11:22 PM
08-27-2013 11:22 PM
Re: What can be done to lower maximum time for read and writes in the following situation?
>>Other processes are not so big.
>It really depends if they are doing lots of I/O.
Finally it was found that one of shared libraries that was developed by another group and was loaded in our process started doing lots of updates what resulted in lots of I/O operations.
>> How do I get this information?
>You use glance and probably not caliper. glance has a better system wide, non CPU view.
Unfortunatelly this customer has not bought glance. Dennis, could you tell me, please, what other HP utilities can get this information?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2013 02:25 AM
08-28-2013 02:25 AM
Re: What can be done to lower maximum time for read and writes in the following situation?
>that one of shared libraries ... started doing lots of updates what resulted in lots of I/O operations.
If you have some extra I/O calls, caliper can measure these extra calls with scgprof.
>what other HP utilities can get this information?
There is vmstat(1) and iostat(1) but these don't tell you where.
But caliper can tell you that you are now calling pread more often. It's just that we were looking at how long it took, not how many calls.