1748177 Members
4235 Online
108758 Solutions
New Discussion

High disk usage 100%

 
jerrym
Trusted Contributor

High disk usage 100%

After migrating to EMC VMAX storage we are getting high disk usage on HPUX 11.31 ia64 servers, 100%.

Have been looking at glance and sar but do not know what is causing it. Found this diff on three of the servers affected but do not know if it plays a role here.

 

kctune on server not 100% disk usage all the time. Does changing to auto help disk I/O:

 

Tunable                    Current Usage             Current Setting
================================================================================
filecache_max              1421430784                10447491891

 

filecache_max                  10447491891  8%             Imm (auto disabled)
filecache_min                   6529682432  5%             Imm (auto disabled)

 

Other two serverd that have constant 100% usage.

 

Tunable                    Current Usage             Current Setting
================================================================================
filecache_max              2915418112                2940880650

 

filecache_max                  2940880650  1%            Imm (auto disabled)
filecache_min                  2940880650  1%            Imm (auto disabled)

 

 

 

Tunable                    Current Usage             Current Setting
=====================================================================================
filecache_max              1297690624                1306779811

 

filecache_max                  1306779811  1%            Imm (auto disabled)
filecache_min                  1306779811  1%            Imm (auto disabled)

 

 

 

 

Also ran truss on high disk usage oracle processes listed in glance output and was getting this alot as syscalls:

 

.

.

getrusage(RUSAGE_SELF, 0x9fffffffffff8d70)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff8d80)                      = 0

write(16, "0102\0\006\0\0\0\0\00601" \0\004".., 258)            = 258

read(16, "\06 \0\006\0\0\0\0\003N aa\0\0\0".., 8208)            = 54

getrusage(RUSAGE_SELF, 0x9fffffffffff9310)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff7280)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff72a0)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff72a0)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff84e0)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff84e0)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff8d70)                      = 0

getrusage(RUSAGE_SELF, 0x9fffffffffff8d80)                      = 0

write(16, "\0fd\0\006\0\0\0\0\00601" \0\004".., 253)            = 253

read(16, "\06 \0\006\0\0\0\0\003N ab\0\0\0".., 8208)            = 54

.

.

 

 

And in glance, why does the relatively good server that does not have 100% disk usage all the time have "na" listed  for logical read/writes in glance and the other two bad servers do not have "na" listed?

 

                                  DISK REPORT                       Users=    1
Req Type       Requests   %     Rate   Bytes     **bleep** Req    %  **bleep** Rate **bleep** Byte
--------------------------------------------------------------------------------
Local  Logl Rds    na    na      na       na        na     na      na        na
       Logl Wts    na    na      na       na        na     na      na        na
       Phys Rds  1424  92.5  1294.5   87.5mb      1424   92.5  1294.5    87.5mb
       Phys Wts   116   7.5   105.4    9.2mb       116    7.5   105.4     9.2mb
       User      1536  99.7  1396.3  143.3mb      1536   99.7  1396.3   143.3mb
       Virt Mem     0   0.0     0.0      0kb         0    0.0     0.0       0kb
       System       4   0.3     3.6     32kb         4    0.3     3.6      32kb
       Raw          0   0.0     0.0      8kb         0    0.0     0.0       8kb
Remote Logl Rds    na    na      na       na        na     na      na        na
       Logl Wts    na    na      na       na        na     na      na        na
       Phys Rds     3 100.0     1.8      0kb         3  100.0     1.8       0kb
       Phys Wts     0   0.0     0.0      0kb         0    0.0     0.0       0kb

 

 

                                  DISK REPORT                       Users=    3
Req Type       Requests   %     Rate   Bytes     **bleep** Req    %  **bleep** Rate **bleep** Byte
--------------------------------------------------------------------------------
Local  Logl Rds  2865  88.2   520.9   11.1mb    203405   90.8  1241.0    1.55gb
       Logl Wts   383  11.8    69.6    2.8mb     20725    9.2   126.4   207.6mb
       Phys Rds  1150  69.2   209.0    9.0mb     71183   76.7   435.1    1.47gb
       Phys Wts   513  30.8    93.2    3.8mb     21667   23.3   132.4   216.4mb
       User      1515  91.1   275.4   11.8mb     90708   97.7   554.4    1.68gb
       Virt Mem     0   0.0     0.0      0kb       114    0.1     0.6     892kb
       System      80   4.8    14.5    424kb      1357    1.5     8.2     4.3mb
       Raw         68   4.1    12.3    537kb       671    0.7     4.1     5.0mb
Remote Logl Rds     0   0.0     0.0      0kb         6   54.5     0.0      27kb
       Logl Wts     0   0.0     0.0      0kb         5   45.5     0.0      13kb
       Phys Rds     2 100.0     0.3      0kb       165   91.2     1.0      12kb
       Phys Wts     0   0.0     0.0      0kb        16    8.8     0.0      48kb

 

2 REPLIES 2

Re: High disk usage 100%

Do you actually have a performance problem? Are users actually complaining? glance or sar reporting 100% disk utilisation is pretty meaningless (all it is telling you is that during the interval, the disk was processing IO for 100% of the time - it doesn't mean the disk is struggling to deliver the IO required, or that it is responding slowly - by the same token, I could say when I have my PC turned on it is at 100% power utilisation, because all the time it is turned on it is drawing power - of course that isn't considering the current its actually drawing).

 

As for the data you provided...

 

- kctune of filecache_ settings - how do you think this is relevant? Controls the size of memory available on the host for file caching - TBH fiddling with this is unlikely to make much of a difference to disk IO if this is Oracle (which you do mention)

 

- truss output - presumably of an ora_*_dbwr process? Again fairly meaningless - the process looks like it is analysing its own resource usage through the getrusage syscall - entirely reasonable, and not something I would be concerned about.

 

- glance output - read the following:

 

http://www.hpug.org.uk/index.php?option=com_content&task=view&id=1264&Itemid=160

 


I am an HPE Employee
Accept or Kudo
Ken Grabowski
Respected Contributor

Re: High disk usage 100%

Hi Jerry,

 

Lot of missing information.  The disk reports you're showing are cumulative. What is showing when you select IOByDisk? Are there individual disks that are heavily loaded while others are not?  How does that compare to prior to migration to the vMax?  I've done a couple of vMax migrations from DMX and not had any issues. What where you using before?  How many HBA's?  Using pvlinks or Power Path? If Power Path, is it licensed and the multipathing enabled to allow parallel traffic on the HBA's?

 

If you have a X-Window server for your desk top, run the gpm version of Glance and look at the Disk Detail window of the busy disks. See what is showing for Queue Length and the Wait Queue Length table.