Oracle ASM - more exacting, load on IO?

Alzhy · ‎01-20-2009

We recently moved to ASM. And we're seeing more load on our SAN Storage Array's ports than when we were on Cooked VxFS DierctIO'd filesystems.

Before, Our Filesystems (24 total) used to be on 8-way stripes (8 physical/SAN disks per).

On our new ASM Layout, we provide our DBAs with 48 disks. ASM suppsoedly stripes accross all these 48 disks.

Sar and Disk stats indeed point a uniform load accross these 48 disks but my service times jumped from sub-20ms to ~130ms. And my Array front end now show its port processors are doubly busy than before.

So is ASM more heavy, exacting on Storage Arrays? That it merits a relayout frontend. channel and even array group wise on the SAN Array end?

Hakuna Matata.

Steven E. Protter · ‎01-20-2009

Shalom,

I've seen these kind of results from ASM in the lab as well. ASM acts like it owns the disks, knows better, and takes over certain functions normally done by the OS.

ASM exacts a higher price an any kind of disk you give it. I believe the SAN/Disk array configuration is the better place to control disk, and am not a believer in ASM as a solution.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Hein van den Heuvel · ‎01-20-2009

As ASM wasintroduced on the system, it possibly switched from a cached file system to raw. (was it using direct-io/).
The file system may have been helping more than realized.. by using memory as cache.
In other words.. do you see less memory used in glance/vmstat since the switch?
Consider giving that now unused memory directly to oracle (SGA)?

20ms is not great.
130 ms is pretty darn slow.
Do you get user complaint now?
Is that response time measured with iostat (averaged over all disks)? Maybe this is mostly cause for example by spikes writing to archive logs which the end users do not feel? Time for redo and arch log (writer/buffer) tuning?

Can you check current and historical performance reports (statspack, awr) to figure out how Oracle perceives the IO response times? Does it 'add up'? Check read vs write, check whether particular object have vastly different Io response times.

Good luck,
Hein van den Heuvel
HvdH Performance Consulting.

Yogeeraj_1 · ‎01-20-2009

hi,

Just think of ASM as a filesystem for database stuff. That is, at the core, what it is.

A special purpose filesystem.

ASM was designed to take all of the devices and stripe over all of them.

here you have the CHOICE of redundancy: either you use ASM's mirror and stripe everything approach or external hardware redundancy.

So it you are providing your DBA with 48 disks (which is already in a RAID), you should also inform the DBA NOT to define any further level of redundancy.

can you confirm your configuration?

(RDBMS Instance: SELECT * FROM V$ASM_DISKGROUP;)

kind regards
yogeeraj

Ps. also read: http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14220/mgmt_db.htm#DCGGGJC

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Duncan Edmonstone · ‎01-21-2009

Nelson,

I have to say my first thought was also that the DBAs still had redundancy enabled in ASM (effectively doubling the IO rate).

So your service times have increased by a factor of 6, what about your throughput (MB/s) and IOs per second? Do you have any stats for changes there? (e.g. from Measureware looking at GBL_DISK_PHYS_BYTE_RATE and GBL_DISK_PHYS_RATE)

Also what about MPIO - how were you doing MPIO before and how are you doing it now? Any changes (cos ASM of course does no MPIO itself).

HTH

Duncan

I am an HPE Employee

Alzhy · ‎01-21-2009

Thanks for all your inputs.

I am still in the process of gathering stats from my XP12000, System Ststs (SAR and GlancePlus/OVPA) -- but it really appears we're over the top and those service times are worrisome.

And yes my DBA says he uses external redudancy in ASM:

SQL> @$ORACLE_BASE/local/asm/asm_diskgroups

Disk Group Sector Block Allocation
Name Size Size Unit Size State Type Total Size (MB) Used Size (MB) Pct. Used
-------------------- ------- ------- ------------ ----------- ------ --------------- -------------- ---------
PRDB001_DATA 1,024 4,096 1,048,576 MOUNTED EXTERN 2,555,376 2,531,335 99.06
--------------- --------------
Grand Total: 2,555,376 2,531,335

We are using VxVM underneath each component ASM "volume" -- which is a VxVM Volume that is wholly contained on one XP12K Disk. I can monitor the entire member disks using vxtools and I see ALL 48 member volumes active ALL the time as if data is striped 48-way (which is suppsoed to be how ASM works). I can validate the cXtYdZ dveices of these volumes via SAR and the service times for reads and writes are indeed breaching 100+ ms service times.

Now compare that to our OLD cooked Filesystems which were 8-way stripes per Filesystem/Volume. The database hardly stress HALF of however many disks we had as I.O jumps (interleaves) from FS to FS. But with ASM -- it seems ALL 48 disks are always busy (striped) that I am noticing my XP12000 front end processors to be far more utilized/sterssed than we were on Cooked or even simple RAW (with 8-way stripes)..

Hakuna Matata.

Alzhy · ‎01-21-2009

I guess what I am saying (claiming) is with ASM -- all component dsisks are ALWAYS active -- hence more I/O load on a Storage Array comparatively with a COoked Filesystem or Simple/Admin cotnrolled RAW Database Layout where striping and placemnt of data is totally controlled.

Do you agree?

I am still poring through stats though.

Hakuna Matata.

Duncan Edmonstone · ‎01-21-2009

This isn't a config I see a lot of - mostly if people are using ASM I see them using a simple MPIO tool (I'd assume HDLM in your case - you have HDS disk arrays yes?) I can't think I've come across anyone doing this with VxVM used just to get the DMP functions before - I wouldn't expect it to cause any issues, but its an extra variable I don't normally see... I know Symantec have qualified it and support it but I just don't see much of it. Sorry I'm not sure that helps any!

What sort of IO is the DB doing? Are there a lot of full table scans generating sequential IO? Doing that over 48 disks on multiple ports might make that sequential IO harder for the XP to spot (confusing the cache algorithms) I'd be interested in trying a test with your DMP turned off as well (i.e. all IO to just a primary path)

HTH

Duncan

I am an HPE Employee

Alzhy · ‎01-21-2009

Duncan, so you're aware of the Oracle validated "recipe" eh? Yeap -- we own VxVM already and view using it as a plus as it makes tracking and managing things things easier... going to HDLM was cost prohibitive and we can't go to HP-UX 11.31 (yet) on these PARISC 11.11 ecossytems we have.

But do you agree with my assumptions? I will hopefully soon have stats to support this claim.

Hakuna Matata.

Duncan Edmonstone · ‎01-22-2009

Nelson,

I'm not sure I *do* agree... is the database doing anything different? Is the DBA seeing much improved performance/throughput? The total amount of IO generated regardless of filesystems, raw or ASM shouldn't be significantly different (I guess the number of IOPS could change if ASM is reading/writing with different block sizes than VxFS was? In fact that gets me thinking... maybe with VxVM in the equation the ASM IO block sizes for reads/writes aren't matching the VxVM IO raw disk block sizes for reads writes? You'd know this if your total IOPS have gone up, but your throughput in MB/s has stayed largely the same. IIRC ASM will simply read in the sizes determined by the database parameters (i.e. database block size * multi-block read count) - how does that comnpare to the sizes that VxVM is reading?)

Also are you sure that ASM isn't doing any rebalancing? It will do thgis automatically if disks are added to an ASM disk group. Does the table V$ASM_OPERATION show anr jobs running? That would generate significant additional IO.

Finally waht about the internals of the XP? When you were on VxVM with 8 way stripes, you probvably had it configured so that each LUN in a strip came from a different array group and maybe even a different DKA/ACP... now you may have LUNs from the same array group in the same disk groupo - again if you have a lot of sequential IO this could cause some challenges as the disk heads thrash back and forth between the different ASM stripes on the same physical disks (if you follow me)

Anyway some things to think about. It'll be interesting to see those IO numbers.

HTH

Duncan

I am an HPE Employee

Yogeeraj_1 · ‎01-23-2009

Hi Nelson,

I would also recommend monitoring the system IO using the Oracle Enterprise manager Database control. There are loads of performance reports which you can analyse from there.

hope this helps too!
kind regards
yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Hein van den Heuvel · ‎01-27-2009

I was just reading a topic in the 'storage' family which may apply here.

http://forums13.itrc.hp.com/service/forums/questionanswer.do?threadId=1278758

It shows how Linux OS choices can influence storage performance significantly (2x trhoughput) only 'trying to help'.

The same thing coudl easily be happening with ASM.
As it tries to help too much, it may make data access patterns less understood and it may rob the controller from opportunities to optimize. Given dedicated CPUS and cache, the controller has the potential to do a much better job than the OS or an ASM.

This is already suggested by Duncan when he writes: "sequential IO? Doing that over 48 disks on multiple ports might make that sequential IO harder for the XP to spot (confusing the cache algorithms)", but I though It important enough to re-itterate.

Typically I am not one for 'me too' replies, but I also want to repeat an other question by Duncan: "The total amount of IO generated regardless of filesystems, raw or ASM shouldn't be significantly different".
The work might be done slower, but supposedly the same work has to be done.
Unless...
- The workload changed as well.
- OS caches were helping and no longer are.
- odd transfer alignment actions

As I wrote earlier, IMHO it is critical to review the IO quantity and quality using the Oracle reports using Statspack or AWR.
Compare the data from busy hours on comparable days before or after. Use the REDO log bytes or COMMIT rate to normalize the data if needed be. Similar numbers of read/write? Worsened avserv time confirmed by the Oracle avg IO time report?

hth,
Hein van den Heuvel
HvdH Performance Consulting.

Alzhy · ‎01-27-2009

Awesome "QUALITY" answers gentlemen!

The stats are quite glaring -- we've more I/O alright. Our front end array stats on the XP12000 are also through the roof. 2 Changes in along with ASM implementation.

1) We went from RAID5 (7+1P) sets using 73G 15KRPM to RAID10 (2+2) 300G 10kRPM disks.
2) We're using 8 array groups (still spread accross 4 ACPs) instead of 16+

My backend stats (ACP load, LDEV perfs) look normal - ACP max peak is no more than 18%. But as I have said, front end ports are way way too busy -- greater than 80% average peak which according to HP support is indicative of a hardowrking ChIP MP (port processor).

There is CACHE between the port and the LDEVs (chunks of the RAID sets) and I have every reason to agree there may be a problem there -- cache coherency issues as you've pointed out. I also heard about a sequential I.O penalty of sorts under certain striped IO situations with the XP Line.

I am strongly urging my management to engage HP Storageworks support to look at the matter further as we still have about 4 more mega Databases to convert to ASM. I have XP Performance Advisor gathering stats but analysis is way beyond my skills sets.

It also does not help that our DBA seem to not really have in-depth monitoring - spcially OEM tools. They have PRECISE i3 but its mainly used as a Query optimising tool.

Thanks Migz.

N.

Hakuna Matata.

Alzhy · ‎01-27-2009

Duncan, Yog and Hein,

Any specific StatsPack query or OEM stats that I can pass on to my DBA?

BTW, below is a snippet of ASMvolume stats from VxVM which matches native SAR stats. The last tow are read and write service times. 1st 2 stats are read and write operations written and next 2 are read and write blocks (1KB blocks)

vol asmdata01 417 7 21960 80 128.6 47.1
vol asmdata02 299 1 16024 8 110.1 120.0
vol asmdata03 390 2 14760 24 96.1 100.0
vol asmdata04 413 4 18240 32 115.5 72.5
vol asmdata05 199 8 9880 160 122.5 90.0
vol asmdata06 344 1 17192 8 110.9 150.0
vol asmdata07 343 3 17504 88 109.4 130.0
vol asmdata08 393 0 20296 0 114.9 0.0
vol asmdata09 325 6 16736 256 101.0 116.7
vol asmdata10 308 2 13256 16 110.7 155.0
vol asmdata11 364 5 18848 56 110.9 124.0
vol asmdata12 418 1 19496 8 108.6 150.0
vol asmdata13 355 2 18672 16 126.6 105.0
vol asmdata14 277 15 13776 350 144.4 83.3
vol asmdata15 331 0 17456 0 131.7 0.0
vol asmdata16 274 0 13952 0 94.7 0.0

Hakuna Matata.

Alzhy · ‎01-27-2009

From the good book on XP12000 Best Practices it is written:

"On sequential reads the CHiP will do prefetchs from disk on order to keep the data needed in cache in advance of the host asking for it. For sequential writes the CHiP will keep full stripes in cache to avoid any penalty for parity creation. It will also manage the amount of cache being used for the writes to avoid starving other I/O of cache hits. The CHiPs can detect sequential I/O across multiple ports if the I/Os are split uniformly across those ports (for example, eight I/Os to one port and then eight to another)."

Since ASM stripes FAR and wide -- it is possible cache coherency is indeed an issue -- would you agree?

Hakuna Matata.

Hein van den Heuvel · ‎01-27-2009

>> From the good book on XP12000 Best Practices it is written:

:=)

Those write times are really rather poor. I'm used to seeing write times in the low-ms or even sub-millisecond thanks to controller write-back caching. Still, then write IO rate is so low that you

The average blocks/io is 50+, larger than a DB page size, so suggesting 'scattered' reads for table scans.

The AWR / statspack numbers I woas thinking of is the basic per-tablespace read & write counts and times and from there perhpas drilled down to specific objects in those tablespaces. That data is likely to be captures whether "Precise" is oused or not.

Cheers,
Hein.

Alzhy · ‎01-27-2009

Hein,
I was also thinking about that scattering theory too. My vx_vol_maxio is 16MB btw. Any past experience on how this is addressed sir?

All I got from my DBA is:

WARNING:Oracle process running out of OS kernel I/O resources

Hakuna Matata.

Duncan Edmonstone · ‎01-27-2009

So are you saying you have a *lot* of sequential IO?

Yes the XP can detect this when coming on multiple ports... you should in this case make sure that system mode 327 is set on the XP though (I think manufacturing set this mode by default these days anyway, but check it).

Also what DMP load balancing policy are you using? For lots of sequential IO on an XP you should use balanced mode with a hefty partitionsize (start with 32MB and even consider 64 or 128MB). If we were talking just a filesystem sat on top of that Vx volume, I'd be confident then that the XP would detect that sequential IO effectively... of course as we have ASM on top of that, I'm not so sure.

HTH

Duncan

I am an HPE Employee

Yogeeraj_1 · ‎01-29-2009

hi,

Oracle Enterprise manager Database control is installed by default. All you need to do is start the process and access the web-based interface to monitor your database.

To verify if the Database control is running, you can issue the following command as the Oracle user:

e.g.
[oracle@server1 ~]$ emctl status dbconsole
Oracle Enterprise Manager 10g Database Control Release 10.2.0.3.0
Copyright (c) 1996, 2006 Oracle Corporation. All rights reserved.
http://server1.mydb.mu:1158/em/console/aboutApplication
EM Daemon is not running.
------------------------------------------------------------------
Logs are generated in directory /u01/app/oracle/product/10.2.0/mydb/server1.mydb.mu_pfs/sysman/log
[oracle@server1 ~]$

To start, use:
emctl start dbconsole

As per example above, you can access the database console from an url similar to:
http://server1.mydb.mu:1158/em

hope this helps!

kind regards
yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Yogeeraj_1 · ‎01-29-2009

hi again,

Please also note that using the Database Control you can also monitor the production database instance as well the ASM instance. In your case, the graphs from the ASM instance will be really interesting to analyse.

hope this helps too!
kind regards
yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Oracle ASM - more exacting, load on IO?

Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?

Re: Oracle ASM - more exacting, load on IO?