- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Understanding System Performance
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 04:38 AM
тАО02-03-2006 04:38 AM
Understanding System Performance
I've read things like:
http://docs.hp.com/en/1219/tuningwp.html
http://h21007.www2.hp.com/dspp/files/unprotected/devresource/Docs/TechPapers/UXPerfCookBook.pdf
I'm struggling with understanding/explaining %wio from sar data and comparing to metrics in MWA.
I also run sarcheck - and it states "no I/O bottle neck"
We've been having a minor performance issue friday mornings between 1 and 3 AM (when international sites are accessing our fairly large SAP/Oracle system).
System is: RP7410, 14GB ram, 5 active cpu's, about 1.5 TB db on DMX 1000 in MC/SG. From an EMC point of view - system is barely making a sweat.
DBA's have notice what appears to be I/O issues from the Oracle side.
From the system side, I see nothing out of the ordinary.
I've attached a fairly long txt file of sar/mwa data.
What I don't understand, id why is %wio fairly high ( >50%) sometimes and yet in mwa data, there is hardly any queueing, interupt cpu is low, etc...
I know from man page, %WIO is idle with some process waiting for I/O (only block I/O, raw I/O, or VM pageins/swapins indicated);
The mwa data in the txt file is quite wide - best to paste into Excel, then "DATA" -> "Text to columns" with | being the delimeter...
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 06:00 AM
тАО02-03-2006 06:00 AM
Re: Understanding System Performance
Your best guage if you've an I/O problem is "sar -d" and check for disks that have queelentgh in excess of 0.
Do you have vmstat output as well from sarcheck?
What you are possibly facing is an Oracle tuning issue. Possibly SGA sizing needs to be re-studied.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 06:08 AM
тАО02-03-2006 06:08 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 06:42 AM
тАО02-03-2006 06:42 AM
Re: Understanding System Performance
My local disks, on average over that 2.5 hour period were nill/0.5 and 6.42/4.95.
The Local disks are 15K rpm and are mirrored across the controllers.
Strange that vg00, which is c28t5d0 and c0t6d0 have different queues?
vg01 contains additional swap and /var/adm/crash - which really aren't used...so that explains their lack of stats...
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 06:44 AM
тАО02-03-2006 06:44 AM
Re: Understanding System Performance
This would be a typical picture when a small number (1 ?) of processes is reading through a lot of not-recently-used data, for example for a report.
Basically the process will be doing read-compute-read-compute for a long time. The compute is likely smallish compared to the io completion time. There is no chance for an IO queue because the process will only generate the next read after the compute for a prior one is done.
Only parallel queries would change this.
Now in day time, when the system gets busier, such one processes wait time is filled but by compute cycles for other processes and thus on a macro/system level will be labeled 'cpu busy', but on a micro/process level the wait is still happening. And those other processes can also issue more and independend / concurrent IOs, generating the IO queues.
So I concur with other observations that the system may simply be doing what it is supposed to be doing. I would however make sure to run an Oracle statspack with SNAPs bracketing the 1am - 3am window to doublecheck it is simply busy. Specifically I would verify that the average IO time is similar to the day time average, suggesting that the wait is normal but just more visible. And glance over the top queries of course.
I recently helped with a system with a similar complaint with high waits. It turned out that the SAN device was shared and other systems created excessive (BACKUP) load to it. This caused the IO response time for the system we were looking at to degrade to a point where it impacted performance.
Actually.. we were kinda lucky catching that.
For us it was a 1/2 hour glitch in a 2 hour run that was done to validate a supposed performance boost.
Had the situation been reversed... the glitch been 2 hours for an 1/2 hour test, then we might have falsly concluded that the performance improvement was broken. As it was we saw the improvement in general, just needed to explain the glitch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 07:04 AM
тАО02-03-2006 07:04 AM
Re: Understanding System Performance
Which disk are you getting queueing of 6.42/4.95? The c28t5d0 one or the c0t6d0 one which is the internal one? Is it c28t5d0 which is possibly on an external SCSI enclosure?
If so, what HBA is it connected to? Is it a combo U320/GigE one?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 08:31 AM
тАО02-03-2006 08:31 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 08:41 AM
тАО02-03-2006 08:41 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 08:52 AM
тАО02-03-2006 08:52 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 08:59 AM
тАО02-03-2006 08:59 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-03-2006 09:58 AM
тАО02-03-2006 09:58 AM
Re: Understanding System Performance
According to sarcheck:
The disk device c28t5d0 was busy an average of 8.79 percent of the time and had an average queue depth of 2.5 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 8.1 milliseconds. This is relatively fast. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request. The disk device c28t5d0 was reported by pvdisplay as being a 33.91 gigabyte disk. 14224 megabytes of space was reported as being free and 20496 megabytes have been allocated. This disk device was a part of volume group /dev/vg00 and contained 15 logical volumes. At least one logical volume occupied noncontiguous physical extents on the disk. Performance will suffer when logical volumes are busy and not mirrored because the disk's read/write heads are likely to travel back and forth in an inefficient manner.
Logical volume /dev/vg00/lvol6, 949 block gap
Logical volume /dev/vg00/lvol6, 1359 block gap
Logical volume /dev/vg00/lvol7, 1245 block gap
Logical volume /dev/vg00/lvol9, 724 block gap
Logical volume /dev/vg00/lvol6, 1669 block gap
Logical volume /dev/vg00/lvol6, 353 block gap
Logical volume /dev/vg00/lvol6, 663 block gap
Logical volume /dev/vg00/lvol6, 247 block gap
The disk device c0t6d0 was busy an average of 11.54 percent of the time and had an average queue depth of 1.9 (when occupied). This indicates that the device is not a performance bottleneck. The average service time reported for this device and its accompanying disk subsystem was 6.1 milliseconds. This is relatively fast. The disk device c0t6d0 was reported by pvdisplay as being a 33.91 gigabyte disk. 14224 megabytes of space was reported as being free and 20496 megabytes have been allocated. This disk device was a part of volume group /dev/vg00 and contained 15 logical volumes. At least one logical volume occupied noncontiguous physical extents on the disk.
Logical volume /dev/vg00/lvol6, 1547 block gap
So, /opt has some gaps - because it has been extended a few times...
Thanks for the info so far - points will be assigned at a later date...
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 02:47 AM
тАО02-06-2006 02:47 AM
Re: Understanding System Performance
delaylog, nodatainlog, mincache=direct, convosync=direct
for Oracle redo and data files.
What do you think?
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 02:56 AM
тАО02-06-2006 02:56 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 03:49 AM
тАО02-06-2006 03:49 AM
Re: Understanding System Performance
Like Nelson i agree with your suggested mount option change, but I do not expect a positive impact on the problem described.
Those options will avoid double buffering and with that reduce CPU and Memory presure, but it will not reduce IOs, it just makes the (cpu) path for the IOs shorter.
There is even a risk of increased IO load, if it runs out that your SGA(s) were under allocated and the buffer cache was actively helping out avoiding IOs.
Maybe I am a little slow here, but please help me understand why you think that is a problem in the first place.
Sure, you have some WIO time. So what? The system is busy waiting for an IO to come through and has nothing else to do. Great! No problem. The only way to make that better is to teach your system to look into the future and pre-fetch the data which the application is going to need next. Not a minor task.
Ok, I am obviouslly a little sarcastic here, but seriously is there for example user feedback that the end-user performance is not where it is expected to be? Has that been qualified and quantified?
Cheers,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 03:56 AM
тАО02-06-2006 03:56 AM
Re: Understanding System Performance
Strange thing is, only happens Fridays, and yet Friday is no different from any other day as far as number and type of batch jobs...
And yes, right now we have no mount options (carry over from original Service Guard setup).
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 04:04 AM
тАО02-06-2006 04:04 AM
Re: Understanding System Performance
Please allow me to go off the map a little (OK a lot) -
I recently had this too, and it was a problem with... (believe or not) entries in a rarp table! Is there maintenance(switching from primary to alternate, or vice-versa, or backups, or bouncing services or servers etc) on rarp(dns) machines/servers/services at this time on Friday nights, maybe for backups or regularly scheduled maintenance?
I know that's a long shot (so much so that I'm reluctant to mention it), but it may be worth checking out...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 04:07 AM
тАО02-06-2006 04:07 AM
Re: Understanding System Performance
DNS is fine - no swithing...
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 04:23 AM
тАО02-06-2006 04:23 AM
Re: Understanding System Performance
Since you have measureware, I suggest that you get the output from a "good" time interval and a "bad" one and compare them. I would use a fairly short sampling period.
The one thing I would look very closely at is the batch jobs that are being run at this time. Is there a unique batch job that is being run? Perhaps one that ran great a few months ago until someone deleted a "useless" index? Is there database maintenance during this time? Possibly deleting and recreating an index so that queries might be sequential during this interval? Are there any vxfs snapshots at this time?
Oh, and don't overlook something that could cause this kind of problem as the machine loads -- a bad timeslice setting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 04:26 AM
тАО02-06-2006 04:26 AM
Re: Understanding System Performance
The problem might not be on the box, and for sure not in the Oracle side of the box. The network would be the suspect. It wouldn't be a case of the old 'cleaning service jokes?! ( every evening at 1am a new guard/cleaning service shift begins, and they unplug a router to plug in a coffee maker, removing all traces of that activity by 3am as they go :-)
Specifically when you mention that "Sometimes, they can't even log in...".
Is that logging in to HPUX, or maybe makeing an Oracle Listener connections. Do you see anything at the hpux level (memoy, swap space, process count) which might slow down process creation?
Maybe you can come of with a silly benchmark process where you have two streams of logins every 5 or 10 minutes. One originated locally on the box not requiring any physical network, just logical and the other from a select international site.
Each localy measures and records response time every attempt, highlighting the UTC time-of-day when the response times is substandard.
After a few days you compare the results.
fwiw,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 04:51 AM
тАО02-06-2006 04:51 AM
Re: Understanding System Performance
The login issue is on the SAP side - IE - can't get into the SAPGUI.
Server logins no issue.
There are hundreds (anywhere from 50 to almost 500!) of ftp jobs every hour - mainly incoming - never a single failure.
We also do anywhere from a dozen to over 200 print jobs an hour - no failures there either.
I too am thinking it is more Oracle related - just want to have all my ducks in a row so to speak. For example, last week, Oracle changed the amount and size of the redo log files - and we didn't have the issue last friday.
Great suggestions, keep them coming.
Thanks...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 05:31 AM
тАО02-06-2006 05:31 AM
Re: Understanding System Performance
- check if you have a BCV operation going onin your DMX
- since your DMX is probly shared with other servers - check if the "other" servers are busy around that time.
- what else do you have running on the servers (and hence the network) around that time/ 1 to 3 AM US time seems to be a favourite backup or other intense processing window for most enterprises.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-06-2006 06:31 AM
тАО02-06-2006 06:31 AM
Re: Understanding System Performance
The way we discovered it was a rarp issue was to put the host address of client machine that was logging on the application on the middle tier's /etc/host file. Once again, it's a long shot, but it's a quick test if you'd care to look into it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2006 06:53 AM
тАО02-27-2006 06:53 AM
Re: Understanding System Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2006 07:26 AM
тАО02-27-2006 07:26 AM
Re: Understanding System Performance
# swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4096 0 4096 0% 0 - 1 /dev/vg00/lvol2
dev 22432 0 22432 0% 0 - 2 /dev/vg01/swap
reserve - 13996 -13996
memory 11139 1970 9169 18%
total 37667 15966 21701 42% - 0 -
App servers:
/usr/sbin/swapinfo -tam
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 8000 2206 5794 28% 0 - 1 /dev/vg00/lvol2
dev 17360 2209 15151 13% 0 - 1 /dev/vg03/lvswap2
dev 17360 2206 15154 13% 0 - 1 /dev/vg03/lvswap3
reserve - 13715 -13715
memory 7925 1376 6549 17%
total 50645 21712 28933 43% - 0 -
nproc is set to 2560 across the landscape...
On DB/CI - typical around 575 proc's used and on APP servers - 300. So, plenty of room there - but good advie just the same.
Thanks...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-01-2006 02:06 PM
тАО03-01-2006 02:06 PM