redo writes and lv_write_rate

 

redo writes and lv_write_rate

While monitoring the system over period of 20 minutes using oracle statspack and HP glance advisor I noticed that number of redo writes per second reported by oracle(statspack.redo_writes~500) does not come close to number of writes reported by glance for oracle redo (raw) volumes (lv_write_rate~2000).

At the same time Oracle statistic for number of bytes produced for redo per second (statspack.redo_size+redo_wastage) matches number of bytes written into redo log logical volume per second as reported by HP glance advisor (lv_write_byte_rate). ~6.5M per second

Since single write would be small 6.5M/500~13K
it does not look like we'd be hitting max_phys_io or something like this....

Redo and everything else is striped across 2 EVAs 5000 configured with 16 LUNs each with stripe size of 1M.

I think I remember something about EVA mirroring all writes but this probably would not be visible and logical volume level.

Does anybody have any idea why lv_write_rate is about 4 times of LGWR redo writes in this scenario?
Thank you.


5 REPLIES 5
Sridhar Bhaskarla
Honored Contributor

Re: redo writes and lv_write_rate

Hi,

When you said everything is striped across 2 EVAs, I assume it is LVM striped. If so, how is each LV striped? An 'lvdisplay /dev/vgxx/lvolx' will show you the number of stripes that LV is using. I believe your LVs are striped with 4 LUNs. Hence each IO generated by oracle is getting split into 4 LVs.

Though it seems IO overhead on the system due to striping, it will be receiving the responses back from the four luns almost simultaneously (and lesser time to process each request). So, LVM striping can offer you better performance if done well.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Oved
Advisor

Re: redo writes and lv_write_rate

Hey,

Is the DB in archive mode ?
If so, is the archive volume on the same FS as the redo log ?

If they are indeed on the same FS then after the DB will finish writing to a redo log file it will create an archive, which will cause another write to the FS.

I am not a DBA but I also know that in my organization the redo logs are saved twich, but on different FS. Are you saving another redo log file ? Is it on another volume ?

btw, I noticed that the i/o statistics of sar are much more accurate than those of glance, but the problem is that they bring the data in the raw device level, and not FS level. If you are measuring the same device you can use "sar -d X Y" where X is the interval, and Y is the number of times to run the check. (just in case you don't know it :-) ).

I hope it helps,
Oved

Re: redo writes and lv_write_rate

Every redo log is on separate lv which is striped over all 16 LUNs (sorry correction to my original post: each EVA has 8 LUNS) with stripe interleave 1M
For example:
"lvcreate -n redo01 -i 16 -I 1024 /dev/vg02"
"lvextend -L 256 /dev/vg02/redo01 \
/dev/dsk/c63t0d1 /dev/dsk/c70t0d1 \
/dev/dsk/c63t0d2 /dev/dsk/c70t0d2 \
/dev/dsk/c63t0d3 /dev/dsk/c70t0d3 \
/dev/dsk/c63t0d4 /dev/dsk/c70t0d4 \
/dev/dsk/c63t0d5 /dev/dsk/c70t0d5 \
/dev/dsk/c63t0d6 /dev/dsk/c70t0d6 \
/dev/dsk/c63t0d7 /dev/dsk/c70t0d7 \
/dev/dsk/c63t1d0 /dev/dsk/c70t1d0"

Thank you for both replies but they do not seem to help:
Sri assumption about 4LUNS as stripe members is incorrect. Also since each write is only 13K and stripe width is 1M why whould LVM write into all stripe members?

And since we use "stripe and mirror everything" methodology I cannot measure write size by sar just for redo since each LUN contains redo and the datafiles as well.

Please :
There must be someone in this forum with enough knowledge of LVM to have at least a theory that fits this scenario?
Sridhar Bhaskarla
Honored Contributor

Re: redo writes and lv_write_rate

Hi,

I misread your message thinking that the striping was on the EVA side with 1M stripe size hence I made the general assumption.
I know a bit of LVM but very little about the database. It's pretty straightforward on how LVM handles the requests. But I wonder if oracle itself is further splitting each write request. I would check the db_block_size and see if it is playing any part in it.

-Sri



You may be disappointed if you fail, but you are doomed if you don't try
Hein van den Heuvel
Honored Contributor

Re: redo writes and lv_write_rate


Interesting question. Good initial analysis.
You are sure you are using the raw device right? So that would be /dev/vg02/rredo01 in Oracle.
Does IOSTAT match GLANCE for the IO rates?
I guess that would be hard to correlate as it is all spread out huh? Maybe verify at the aggregate level?

Are you using securepath? I seem to recall some trouble with glance and EVAs due to multiple alternate scsi routes (often 4 per HBA per unit).

I like the SAME (Stripe And Mirror Everything) in general, but I feel it is NOT appropriate for Redo. There is no benefit in your case, only cost.
Any single 'disk' can do 10+MB/sec specially if that 'disk' is behind a write back cache and spread over a group of real disks.
I would just carve that 250 MB froma single PV. Why maximize head movement for what is a sequential write usage? (I would also use a small (8 member) disk group in the EVA for redo (and other) to further minimize the entropy).

Also... Just 250MB/Redo? at 6.5Mb/sec it will fill up in less than a minute?! Do you need the short logs for log shipping? If not, why accept those frequent log switches and implied checkpoints? Why not create 1 or 5GB log files and have 5-minutes to an hour between checkpoints? Much easier on the system! (it often drastically reduced undo writes).

fwiw,
Hein.