Disk Enclosures
cancel
Showing results for 
Search instead for 
Did you mean: 

Information on XP12000 cache algorythm

SOLVED
Go to solution
Jean-Baptiste Broccard
Occasional Advisor

Information on XP12000 cache algorythm

Hello,

This is my very first post on HP forum.
I've been using XP12000 in a Raid5 configuration for a Benchmark, and some questions have raised that couldn't be answered.
It seems that the latency (from the application side) is directly linked to the cache of the disk array. For example, when we were running 80/20 W/R , and cache was 65% used, it started to run LRU or something else that used MP, and the Writes started to pend.

Do the cache reserve any room depending on the type of access (rand vs seq, read vs write) ?

Are there any 'interesting' documentation on how the cache algorythm work ?
13 REPLIES
Peter Mattei
Honored Contributor

Re: Information on XP12000 cache algorythm

You can find very interesting whitepapers on http://h18006.www1.hp.com/storage/arraywhitepapers.html

There is one talking about XP12000 and cache in particular http://h71028.www7.hp.com/ERC/downloads/4AA0-7924ENW.pdf

Take care
XP-Pete
I love storage
Nigel Poulton
Respected Contributor

Re: Information on XP12000 cache algorythm

If cache write pending (CWP) reaches 65% the XP will be destaging data to disk at a higher priority than normal and it *may* delay acks to your host (I say "may" because I cant remember off the top of my head) and as a result your applications will notice.

Once you reach something like 69.83% the XP will go into priority destage mode and will reject IOs coming in on the front end until space is cleared in the cache. For this reason you should never see an XP12000 have CWP over 70% (30% is reserved for reads). When the XP starts to reject IO your applications will feel the pain. This is essentially write-through mode.

Nigel
Talk about the XP and EVA @ http://blog.nigelpoulton.com
Jean-Baptiste Broccard
Occasional Advisor

Re: Information on XP12000 cache algorythm

"Once you reach something like 69.83% the XP will go into priority destage mode and will reject IOs coming in on the front end until space is cleared in the cache." When you say 69%, you mean 69% of total cache utilization ? or 69% or Write ? I did never reach more than 15% of WP on my disk array.

"When the XP starts to reject IO your applications will feel the pain. This is essentially write-through mode." I worked in FS mode (no directio/quickio). When you mean in pain, is it Wait IO ?

Where did you get these numbers, this is very interesting for me ?

JB
Nigel Poulton
Respected Contributor

Re: Information on XP12000 cache algorythm

On the XP a max of 70% total cache can be assigned to writes because a minumum of 30% is "guaranteed" for read. When I said 69.83% I meant 69% of total cache (but this is obviously 100% of cache available for write).

Im not sure what you mnea by FS mode. What is FS?

"....When you mean in pain, is it Wait IO ?"
The XP12K does not issue waits or retries when it reaches 69% cache utilisation. It rejects I/O. The XP1024 used to issue waits but not the XP12. This allows for you applications to handle the situation better. This way your applications dont think the I/O has completed. If the XP did not reject the IO your app might think that it completed OK when actually it didnt.

PS. I dont know of any interesting documentation on cache algorithms.
Talk about the XP and EVA @ http://blog.nigelpoulton.com
Jean-Baptiste Broccard
Occasional Advisor

Re: Information on XP12000 cache algorythm

Thanks fir the answers.
Do you know whether there is a *write through the cache* mode (bypass the cache) if a certain level of WP is reached ?
Peter Mattei
Honored Contributor

Re: Information on XP12000 cache algorythm

The XP only goes into write trough when data integrity cannot be guaranteed like one cluster offline.
If write pending gets beyond a certain level host IOs will be delayed to allow write data to be destaged to disk and free space in write cache.

Anyway, when going to write trough you first need to destage data in write cache in order to keep data integrity!

Cheers
XP-Pete
I love storage
Jean-Baptiste Broccard
Occasional Advisor

Re: Information on XP12000 cache algorythm

Thanks Pete.

Though, I still have some more questions about XP.
I'm sorry for all that I will ask, but I couldn't find any interesting documentation on the Web about XP extraction Fields.

So I retrieved SolidDB files from the Windows collector and extracted into text files. 2 kinds of files were extracted per table: 1 for the data, 1 for the description of the fields.
I should mention that I am quite familiar with XP storage, but I just need a translation of the fields below, and sometimes a little bit of explanation.
My concerns are on the description of the fields, I don't understand the followings:
LDEV_PERF_DATA table:
CFW_READ_HITS -> what does cfw mean?
CACHE_INHIBIT_COUNT -> ?
BYPASS_CACHE_COUNT -> Is it the counter explained above ?
RG_UTIL_RAND_READ -> what does rg mean ?
DFW_SEQ_COUNT -> what does dfw mean?
BET_SEQUENTIAL_READS -> what does bet mean?

Cache Information:
MB_SIDEFILE_USAGE -> what does sidefile mean?
FBUS_HI -> ?
MBUS_LO -> ?

Besides, I wonder when I have AVG_READ_RESP_TIME per LUN and if I have 2 luns in my VG, should i divide it by two since it's parrallel access (in order to have average latency) ?

AVG_READ/WRITE_RESP_TIME, what is the threshold (when it's per LUN): 0.1 , 1 , 10 , 100 ms ?

I look forward to read your answers!
JB
Jean-Baptiste Broccard
Occasional Advisor

Re: Information on XP12000 cache algorythm

I have some more questions about cache :)

If I have Raid-5 3+1, is the parity calculated within the cache to avoid latency ?

Does the cache keep room per LUN (a LUN has garanteed room) ?

JB
Peter Mattei
Honored Contributor
Solution

Re: Information on XP12000 cache algorythm

You will find the expalanition of most of the fields in the Performance Advisor Manuals like on table 12 in the PA software user guide. See the maual section of PA http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentType=SupportManual&locale=en_US&docIndexId=179911&taskId=101&prodTypeId=12169&prodSeriesId=64823

For example CFW means Cache Fast Write and DFW disk fast write which is only used and relevant in the mainframe environment.

RG means Raid Group; a group of 4 or 8 disks

The sidefile is a dedicated area in the cache used for async CA (remote replication) to buffer data in order to guarantee data consistency in the case of dropped frames etc.

Then, the cache does not reserve space for each LUN but is a shared cache which is much more efficient.
RAID parity generation occurs in the ACP (called DKA in the XP24k).
Due to the nature of RAID5 a single block write requires reading the parity info from disk, recalcualtion and write of data and parity.
The XP is optimized in a way that it tries to keep RAID5 writes in cache as long as possible to minimize the need for parity reads.

Hope that clarifies a bit

Cheers
XP-Pete
I love storage
Jean-Baptiste Broccard
Occasional Advisor

Re: Information on XP12000 cache algorythm

Thanks for the comprehensive answer :)
Do you have an idea on the latency metrics ?

Let me give an example:
DG_data_n1 -> lun1 + lun2 + lun3 + lun4
Between date1 and date2 associated average latency on each lun:
lun1 = 0.3ms
lun2 = 0.4ms
lun3 = 0.3ms
lun4 = 0.4ms

Total latency is 1.4ms
Average latency is 0.35ms

Which one do you think is relevant ?
Besides, what is the threshold on this metric I should pay attention (0.1ms, 1ms, 10ms, 100ms) ?

Thanks,
JB
Peter Mattei
Honored Contributor

Re: Information on XP12000 cache algorythm

If I was you I would care not too much on the individual metrics as long as your array performs well.
I would rather use the "Report" function of PA by generating a report that gives you a good performance overview of your array.
Keep it and if you encounter any performance issues you will be able to compare.

Cheers
XP-Pete

PS: From an Server/OS/appliaction standpoint responsetimes <10ms are OK
I love storage
Jean-Baptiste Broccard
Occasional Advisor

Re: Information on XP12000 cache algorythm

Hi Pete,
The reason why I take care of all these metrics is because I don't like using PA for charts: You cannot save the charts, drawing a chart when you already have 10GB of SolidDB is so slow even w/ a 8-core w/ 32GB (!!!)
I don't like drawing report: you cannot choose a group of disks and an accurate timeframe.
So I rewrote a tool that extract the data from an intelligent Db (oracle/mysql, Solid as it's used is *dumb*), create IO profiles mattrix per DG and draw my own charts on a given timeframe.

I now use a xp24k, do you know whether the WP threshold is lower (like 40%) ? Because WP never reach more than 40%, just wondering if it was related to new storage... ?

JB
Peter Mattei
Honored Contributor

Re: Information on XP12000 cache algorythm

Congratulations to the XP24000. This is a real great box!
It is much faster partially due to faster HW but also due to enhanced firmware and algorythms.
So I guess your current load just do not really stress the XP24k and it has plenty of time to destage.

Cheers
XP-Pete
I love storage