Information on XP12000 cache algorythm

Jean-Baptiste Broccard · ‎03-05-2007

Hello,

This is my very first post on HP forum.
I've been using XP12000 in a Raid5 configuration for a Benchmark, and some questions have raised that couldn't be answered.
It seems that the latency (from the application side) is directly linked to the cache of the disk array. For example, when we were running 80/20 W/R , and cache was 65% used, it started to run LRU or something else that used MP, and the Writes started to pend.

Do the cache reserve any room depending on the type of access (rand vs seq, read vs write) ?

Are there any 'interesting' documentation on how the cache algorythm work ?

Peter Mattei · ‎03-07-2007

You can find very interesting whitepapers on http://h18006.www1.hp.com/storage/arraywhitepapers.html

There is one talking about XP12000 and cache in particular http://h71028.www7.hp.com/ERC/downloads/4AA0-7924ENW.pdf

Take care
XP-Pete

I love storage

Nigel Poulton · ‎03-07-2007

If cache write pending (CWP) reaches 65% the XP will be destaging data to disk at a higher priority than normal and it *may* delay acks to your host (I say "may" because I cant remember off the top of my head) and as a result your applications will notice.

Once you reach something like 69.83% the XP will go into priority destage mode and will reject IOs coming in on the front end until space is cleared in the cache. For this reason you should never see an XP12000 have CWP over 70% (30% is reserved for reads). When the XP starts to reject IO your applications will feel the pain. This is essentially write-through mode.

Nigel

Talk about the XP and EVA @ http://blog.nigelpoulton.com

Jean-Baptiste Broccard · ‎03-08-2007

"Once you reach something like 69.83% the XP will go into priority destage mode and will reject IOs coming in on the front end until space is cleared in the cache." When you say 69%, you mean 69% of total cache utilization ? or 69% or Write ? I did never reach more than 15% of WP on my disk array.

"When the XP starts to reject IO your applications will feel the pain. This is essentially write-through mode." I worked in FS mode (no directio/quickio). When you mean in pain, is it Wait IO ?

Where did you get these numbers, this is very interesting for me ?

JB

Nigel Poulton · ‎03-12-2007

On the XP a max of 70% total cache can be assigned to writes because a minumum of 30% is "guaranteed" for read. When I said 69.83% I meant 69% of total cache (but this is obviously 100% of cache available for write).

Im not sure what you mnea by FS mode. What is FS?

"....When you mean in pain, is it Wait IO ?"
The XP12K does not issue waits or retries when it reaches 69% cache utilisation. It rejects I/O. The XP1024 used to issue waits but not the XP12. This allows for you applications to handle the situation better. This way your applications dont think the I/O has completed. If the XP did not reject the IO your app might think that it completed OK when actually it didnt.

PS. I dont know of any interesting documentation on cache algorithms.

Talk about the XP and EVA @ http://blog.nigelpoulton.com

Jean-Baptiste Broccard · ‎10-26-2007

Thanks fir the answers.
Do you know whether there is a *write through the cache* mode (bypass the cache) if a certain level of WP is reached ?

Peter Mattei · ‎10-27-2007

The XP only goes into write trough when data integrity cannot be guaranteed like one cluster offline.
If write pending gets beyond a certain level host IOs will be delayed to allow write data to be destaged to disk and free space in write cache.

Anyway, when going to write trough you first need to destage data in write cache in order to keep data integrity!

Cheers
XP-Pete

I love storage

Jean-Baptiste Broccard · ‎10-30-2007

Thanks Pete.

Though, I still have some more questions about XP.
I'm sorry for all that I will ask, but I couldn't find any interesting documentation on the Web about XP extraction Fields.

So I retrieved SolidDB files from the Windows collector and extracted into text files. 2 kinds of files were extracted per table: 1 for the data, 1 for the description of the fields.
I should mention that I am quite familiar with XP storage, but I just need a translation of the fields below, and sometimes a little bit of explanation.
My concerns are on the description of the fields, I don't understand the followings:
LDEV_PERF_DATA table:
CFW_READ_HITS -> what does cfw mean?
CACHE_INHIBIT_COUNT -> ?
BYPASS_CACHE_COUNT -> Is it the counter explained above ?
RG_UTIL_RAND_READ -> what does rg mean ?
DFW_SEQ_COUNT -> what does dfw mean?
BET_SEQUENTIAL_READS -> what does bet mean?

Cache Information:
MB_SIDEFILE_USAGE -> what does sidefile mean?
FBUS_HI -> ?
MBUS_LO -> ?

Besides, I wonder when I have AVG_READ_RESP_TIME per LUN and if I have 2 luns in my VG, should i divide it by two since it's parrallel access (in order to have average latency) ?

AVG_READ/WRITE_RESP_TIME, what is the threshold (when it's per LUN): 0.1 , 1 , 10 , 100 ms ?

I look forward to read your answers!
JB

Jean-Baptiste Broccard · ‎10-30-2007

I have some more questions about cache :)

If I have Raid-5 3+1, is the parity calculated within the cache to avoid latency ?

Does the cache keep room per LUN (a LUN has garanteed room) ?

JB

Peter Mattei · ‎10-31-2007

You will find the expalanition of most of the fields in the Performance Advisor Manuals like on table 12 in the PA software user guide. See the maual section of PA http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?contentType=SupportManual&locale=en_US&docIndexId=179911&taskId=101&prodTypeId=12169&prodSeriesId=64823

For example CFW means Cache Fast Write and DFW disk fast write which is only used and relevant in the mainframe environment.

RG means Raid Group; a group of 4 or 8 disks

The sidefile is a dedicated area in the cache used for async CA (remote replication) to buffer data in order to guarantee data consistency in the case of dropped frames etc.

Then, the cache does not reserve space for each LUN but is a shared cache which is much more efficient.
RAID parity generation occurs in the ACP (called DKA in the XP24k).
Due to the nature of RAID5 a single block write requires reading the parity info from disk, recalcualtion and write of data and parity.
The XP is optimized in a way that it tries to keep RAID5 writes in cache as long as possible to minimize the need for parity reads.

Hope that clarifies a bit

Cheers
XP-Pete

I love storage

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Information on XP12000 cache algorythm

Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm

Re: Information on XP12000 cache algorythm