Re: Help me find my bottleneck.

Aharon Chernin · ‎08-29-2002

Unix is user friendly, it's just picky about its friends.

Aharon Chernin · ‎08-29-2002

Sorry, I messed up with my filesystem paste from glance. I wish I could preview my posts before posting..

All filesystems except for vg0 are on the EMC.

IO BY FILE SYSTEM Users= 961
Idx File System Device Type Logl IO Phys IO
--------------------------------------------------------------------------------
1 / /dev/vg00/lvol4 vxfs 564.0/273.9 21.7/ 10.2
2 /stand /dev/vg00/lvol1 hfs 0.0/ 0.1 0.0/ 0.4
3 /var /dev/vg00/lvol8 vxfs 1683/ 2066 112.4/ 70.1
4 /usr /dev/vg00/lvol7 vxfs 1019/694.0 15.7/ 1.8
5 /tmp /dev/vg00/lvol6 vxfs 3.6/ 1.1 1.4/ 1.1
6 /opt /dev/vg00/lvol5 vxfs 17.5/ 21.7 7.5/ 11.5
7 /home /dev/vg00/lvol9 vxfs 0.5/ 0.1 1.2/ 0.1
8 lvm swap device /dev/vg00/lvol2 hfs 0.0/ 0.0 0.0/ 0.0
9 lvm swap device /dev/vg00/lvol3 hfs 0.0/ 0.0 0.0/ 0.0
10 lvm swap device /dev/vg00/lvol10 hfs 0.0/ 0.0 0.0/ 0.0
11 /premdor /dev/vg01/lv_premdor vxfs 36576/35777 571.5/587.5
12 /premdor/tmp /dev/vg01/premtmp vxfs 81.4/264.2 7.8/ 25.6
13 /premdor/spool/uv /dev/vg01/premspool vxfs 38.9/834.6 0.7/ 2.6
14 /usr/opt */vg03/lv_v_universe vxfs 97.8/ 63.1 2.9/ 3.0
15 /work /dev/vg03/lv_work vxfs 0.0/ 0.0 0.0/ 0.0
16 /storage /dev/vg03/lv_storage vxfs 0.0/ 0.0 0.0/ 0.0
17 /premdoruk */vg02/lv_premdoruk vxfs 9532/ 5455 292.2/267.0

Unix is user friendly, it's just picky about its friends.

John Bolene · ‎08-29-2002

Glance disk utilizations are not quite true numbers for fast disk that has its own cache.

The bucket that is looked at is if any I/O is outstanding on the disk at the time of measurement. With faster disks and big cache memory, the EMC and VA can handle multiple I/O's at the same time.

What you need to look at is the disk request queue with glance. If there are any here, then you have too many I/O's to that disk. Also look at total bytes transferred, read and write.

EMC also has the ability to look at stats on the box itself. These are probably not available to the average Joe. I had to request stat runs from the internal console whenever the EMC guy was doing any PM. You need to look at the read/write cache hits, should be 90% or above for good response. You may need more cache if you are doing a lot of random database read/writes. The EMC numbers are based on doing sequential read/writes and getting most of the data from cache.

The VA has stats that are available to the average Joe with supplied utilities.

It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com

A. Clay Stephenson · ‎08-29-2002

I suppose the most basic question is "Do you have a problem?" Is the system performing badly? Glance often reports over-utilized disk for arrays because the only thing that it knows (or can know) is that an awful lot of I/O is going through a given device node. It has no way of knowing that /dev/dsk/c4t5d6 is actually comprised of 10 physical disks in your array.

In general, UNIX boxes (especially database servers) tend to be waiting on I/O - what else would they be waiting for?

I have seen quite a few SA's break their arrays into more LUN's simply so that things would APPEAR to be better because less I/O is going through a given device but the actual I/O rate didn't improve at all. So again, do you have a problem? Don't overlook the fact that the vast majority of database performance problems lie with the SQL code.

Food for thought, Clay

If it ain't broke, I can fix that.

Aharon Chernin · ‎08-29-2002

Well, I am fairly sure there is a bottleneck somewhere.. I understand where you guys are comming from about Glance not knowing what is really 100% utilization or not.

Simply put, when Glance does report 100% utilization, disk response is extremly slow. And our Database application becomes extremly slow as well.

I have no doubt that we could have bad code in our applications which would result in ineffecient queries to the DB. Unfortunatly my company would rather buy faster hardware, then fix thier code. Well, I guess this is good for HP and EMC..

So, yes, the disks do get SLOWWWWWWWWW... And about the EMC performance software. I have EMC dialed into our box running Sym-Top, which is not customer accessable. And they are telling me there are no performance issues, and IO rates are within the norm during our peak times. I have also installed Work Load Analyzer from EMC and about to look at results from it here shortly.

EMC mentioned Filesystem blocksize as a possibility of low performance. What do you guys think?

v2500:/ # fstyp -v /dev/vg01/lv_premdor
vxfs
version: 3
f_bsize: 8192
f_frsize: 8192
f_blocks: 19499008
f_bfree: 3207408
f_bavail: 3207408
f_files: 946688
f_ffree: 1073792296
f_favail: 1073792296
f_fsid: 1073807363
f_basetype: vxfs
f_namemax: 254
f_magic: a501fcf5
f_featurebits: 0
f_flag: 16
f_fsindex: 6
f_size: 19499008

Unix is user friendly, it's just picky about its friends.

A. Clay Stephenson · ‎08-29-2002

Well, if this were hfs filesystem blocksize might matter but in vxfs filesystem block size doesn't matter essentially at all because vxfs is extent based. It tends to write in 64k chunks regardless of anything that you do. You could stripe your LVOL's across different physical paths but Power Path should already be taking care of that.

I have no experience with PIC but you might try the OnlineJFS mount options convosync=direct,mincache=direct,delaylog,nodatainlog - Oracle on 10.20 and 11.0 usually performs best if the datafiles and indices filesystems use those options.

Youi might also consider upgrading to 11.11 and staying with cooked files; in Oracle on 11.11 using the more convention mount options delaylog,nodatainlog tends to be the better performer.

Finally, if the answer is hardware then solid-state disks will almost certainly fix your problem but I still suspect that the real answer is to start analyzing the code. My experience has been that if you are a really good SA and DBA you might get a 2x increase in performance (20% is much more likely) but if you need a 10x boost then the only place to look is in the code.

If it ain't broke, I can fix that.

Sandip Ghosh · ‎08-29-2002

Hi,

After looking at your output's, one thing is coming to my mind. I may be wrong.
How is your buffer-hit ratio? Run sar -v 10 50. Observe the %read cache. If it is below 95 then there must be some problem lying around. You may check whether your database is double buffered? like Unix buffer and oracle buffer. It should be better to allow the database to use the Oracle buffer only. Check with the dba about how is the performance of the Oracle buffer-hit. Since you are having enough memory you can increase the SGA size of the oracle. If you are using the 32bit oracle you can look for memory windows also to increase the shared memory utilisation.

Sandip

Good Luck!!!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Help me find my bottleneck.

Help me find my bottleneck.