0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hein van den Heuvel · ‎04-07-2010

Good summary Murali. Thanks.

The read-ahead 3x description sounds suspect.
So it sees 3 (adjacent QIOs) and then adds just one read ahead QIO? How much? Of equal size to last IO, or some large size like 8 cache lines?

I don't think the hit rate is low.
I kinda expect 0%. The real file is 6+ GB, the primary key data perhaps 5+ GB. That could well be larger than the active maximum cache on an Alpha. Now if it is the same mox Mark asked about before then it is a 24-CPU Alphaserver GS1280 7/1300, running OVMS v7.3-2. So that is likely to have 48 GB or more memory, and the cache may be as high as 20 - 32 GB if not actively throttled. "
So normally a 6GB file would fit, and running the down stream program shortly (hours?) after the load would find the data in the cache. But other, totally unrelated activities, maybe as silly as SEAR [*...]*.log "FATAL ERROR", could flush out those, or a part of the 6GB, and cause a tremendous slowdown compared to other days/weeks.

Time to close shop for the day!
Cheers,
Hein.

Steve Reece_3 · ‎04-07-2010

Hi Mark,

I'm assuming that even though you're in a cluster the file is only being read in on one node at any time? If this isn't the case then you'll never effectively cache it in my experience with XFC. You'd need to rely on third party products like PerfectCache or on the raw IO performance of the system and the disk array that's hung off it.

Steve

P Muralidhar Kini · ‎04-07-2010

Hi,

From the information provided,

>> Total of 1 file, 13956192/13956192 blocks.
This is around 7GB.

>> XFC currently allocated at 2.75GB.
XFC current size is 2.75 GB

As Hein as pointed out,
The file size is bigger than the XFC cache
size and hence you cannot have the entire
file cached.

As and when file is accessed, it would get
in to the cache. But as the file size is
larger than the XFC cache, the data for the
large file when read may have to throw out
other data for the same file already in
cache.
Example: When block 50 of the file is read
and there is no space in XFC cache then XFC
may have to throw say block 20 of the same
file out of cache in order to make room for
the block 50.
Subsequent IO's to block 20 would now be a
read miss and the data to now go to the
disk. This way the read hits will come down.

Also certain activities have the potential
to thrash the entire XFC cache or depose the
data of a file/volume.

Thrash XFC cache
* SEARCH/COPY or any 3rd party backup
operation

Note: VMS backup does not thrash the XFC
cache because the IO it performs skips the
XFC cache. This is done by specifying the
function modifier IO$M_NOVCACHE for the IO
that it issues.

Depose File/volume data from cache
* Logical IO to file/volume
* cluster-wide write operation

What is the physical memory size of the
system ?
You can get that from DCL "$SHOW MEM/PHY"

XFC is sized at 2.75GB. By default XFC would
size itself to be 1/2 of physical memory.
So I guess the Physical memory would be
around 5.5GB. Is that correct.

As a side note, Other things to consider
from Read-hits point of view is caching at
different levels such as CRTL and RMS.
CRTL uses buffering and so does RMS with its
local buffering. When a file is accessed, it
its data is present in the CRTL or RMS
cache, then the request will be satisfied
from there itself. The request wonâ t come to
XFC and the XFC statistics would remain the
same.

Regards,
Murali

Let There Be Rock - AC/DC

P Muralidhar Kini · ‎04-08-2010

>> I'm assuming that even though you're in
>> a cluster the file is only being read in
>> on one node at any time?
>> If this isn't the case then you'll never
>> effectively cache it in my experience
>> with XFC.

Is there any particular scenario that you
would like to share. Because the XFC caching
behavior in the cluster environment should
be as follows

* Multiple Reader -
XFC does cache a file when they are
multiple readers of the same file.
The file will be cached on all nodes in
the cluster.

* One Writer, Multiple Reader-
In case there is only one writer node and
multiple reader nodes, then XFC does
caching for the file only on the Node
where the writer is present. On the nodes
where readers are present the file wont
the be cached.

* Multiple Reader/Writer
Where there are multiple writers to a
file, XFC wont cache that file cluster
wide.
i.e. All the nodes will not cache the file.

Regards,
Murali

Let There Be Rock - AC/DC

Mark Corcoran · ‎04-08-2010

Hoff:
>Records in an indexed file aren't necessarily adjacent, so there's no direct way to warm up a generic block cache given the current design of RMS.

In this particular case, the index by definition is sorted by primary key order; the actual data records are present in the file in primary key order too.

(A bit like what I was alluding to in my response to Hein - e.g. having records physically located in random order, but with the index in primary key order (so you quickly find it in the index, but to actually access the record, may involve a "lot" of disk activity, because the records are not logically adjacent on the disk)

Obviously, XFC can't know whether or not records in an indexed file happen to be stored adjacent to each other in order (but is this something it can guess at, or be told about??)

>It would be equally interesting to toss upgrade or a RAM disk or an SSD
As is often the case, one team looks after the O/S, and layered products, whereas another team looks after the primary application.

Getting agreement to O/S related changes is often an uphill struggle, and not something that would happen any time soon (lead time for notice of changes, getting all the approvers to approve changes, yada yada).

Unfortunately, this is particularly the case when it can't be definitiely stated how much of a difference it would make...

>When I go after RMS files from C, I use this code:
The person who wrote the program originally, does now use use RMS for accessing RMS-indexed files, but this is some of his earlier code; ideally, it will be changed, but of course, everyone is looking for a quick fix "X is wrong; Change Y to Z and that will fix it, or at least be a workaround, whilst we can schedule recoding the program into the plan...

Mark Corcoran · ‎04-08-2010

Murali:
>XFC will not cache IO's to a particular file in case -
>* IO's done to the file are of size greater
> than VCC_MAX_IO_SIZE blocks.
It's the C RTL being requested to fgets() 225 bytes or stop when a Line Feed character is encountered, whichever comes first; what it actually requests "behind the scenes", I don't know.

>file is present on a local RAMDISK
No

>The file is accessed cluster wide and there is atleast one node in the cluster that is doing a write IO to the file.
No - a single process running on a single node in the cluster accessing it for read only.

>XFC ReadAhead -
>* XFC does read ahead for a file if the SYSGEN parameter VCC$GL_READAHEAD is set to 1.
A little bit of confusion here using SDA symbols and SYSGEN params, but I'm guessing you mean VCC_READAHEAD, in which case I'll just list all the VCC settings:

$ MC SYSGEN SHOW VCC
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VCC_FLAGS 2 2 0 -1 Bitmask
VCC_MAXSIZE 6400 6400 0 3700000 Blocks
VCC_MAX_CACHE -1 -1 0 -1 Mbytes D
VCC_MAX_IO_SIZE 127 127 0 -1 Blocks D
VCC_MAX_LOCKS -1 -1 50 -1 Locks D
VCC_READAHEAD 1 1 0 1 Boolean D
VCC_WRITEBEHIND 1 1 0 1 Boolean D
VCC_WRITE_DELAY 30 30 0 -1 Seconds D
VCC_PAGESIZE 0 0 0 -1 D
VCC_RSVD 0 0 0 -1 D

>The question is why so low Hit-rate?
Well, actually, this is a funny thing....

When I was running the job yesterday afternoon, after 85 mins, the hit rate was 55%.

I ran it again this morning so that I could get the additional XFC stats from the SDA extension for you, and after 35 minutes, it was 83%.

It looks therefore that it might quite possibly be contention for XFC resource that is causing/contributing to the problem - in heav(y|ier) system loads, the file can't be cached as quickly, either because other files need to be (part) removed from the cache to make space, or there is a delay (timeout?) in XFC servicing read requests for this file.

I'm not actually sure how read requests get to the XFC, so I'm not sure whether or not such timeouts could occur...

Do all read requests go to the XFC first of all, and they have to wait for the XFC to say "in cache" or "not in cache" before progressing further (if it doesn't receive a response within X amount of time, does the read "just go to disk" rather than waiting for the XFC to respond?

>My suspicion is that, there is some other operation on that file (or volume on which the file resides) that is causing the contents of the file to get deposed (i.e. cleared) from the cache once in a while.

What I was observing, was that the the amount of the pages of the file being cached was constantly increasing, but the hit rate was remaining at 0%.

Obviously, I couldn't really tell whether or not some old pages were being dropped out of the cache as new blocks were being added (so, if allocated pages jumped from say 1000 to 1050, it could mean 50 new pages added, or it could mean 80 new pages added and 30 old pages removed).

>Please provide the following information about the file -
>1) XFC statistics from SDA

I've attached a file which shows two sets of XFC SHOW FILE /STAT from SDA - the first is 5 minutes after starting the job (when the hit rate was still 0%), and the second is from ~66mins after the job starts (hit rate=90%)

>2) How big is the IO size issued by the application to the file
C RTL fgets() call, with a max size of 225 bytes, but like I said, I don't know what DEC C is doing under the hood...

>3) Is the file accessed cluster-wide.
No, a single process on one node in the cluster doing sequential reads only - once the file has been created, it is not used by anything other than this job which post-processes it.

Hein:
> I kinda expect 0%. The real file is 6+ GB, the primary key data perhaps 5+ GB. That could well be larger than the active maximum cache on an Alpha. Now if it is the same mox Mark asked about before then it is a 24-CPU Alphaserver GS1280 7/1300, running OVMS v7.3-2. So that is likely to have 48 GB or more memory, and the cache may be as high as 20 - 32 GB if not actively throttled.
Yup Hein, it's the same cluster.

>But other, totally unrelated activities, maybe as silly as SEAR [*...]*.log "FATAL ERROR", could flush out those, or a part of the 6GB, and cause a tremendous slowdown compared to other days/weeks.
Well, when the job runs, the file is not in the cache, but more of the pages of the cache allocated to the file increase as the job runs.

Like I said earlier, I can't really tell whether or not XFC is dropping the pages from the start of the file, as it adds more pages as the file is read by the job.

On the face of it, it didn't appear to be the case, so it simply seemed that XFC was either doing read-behind in comparison to the job, or (if it is possible) the read has a timer driven AST so that if XFC hasn't responded within X time, the read goes "straight" to disk instead...

Steve:
>I'm assuming that even though you're in a cluster the file is only being read in on one node at any time?
One process, on one node, exclusively seuquentially reading the file, several hours after it has been created (and expunged from the cache).

Murali:
>As Hein as pointed out, The file size is bigger than the XFC cache size and hence you cannot have the entire file cached.

Having the entire file cached isn't really what we want or need - just a "window" on the bit of the file we are looking at - the file is being read sequentially, so once all of the records from a bucket are processed by the job, the job has no further interest or requirement in those records, and they could happily be expunged from the XFC.

>What is the physical memory size of the system ?
The two A/S GS1280 7/1150 systems each have 56GB, and the GS1280 7/1300 has 48GB

>When a file is accessed, it its data is present in the CRTL or RMS cache, then the request will be satisfied from there itself.
Indeed; since it seems that XFC can in fact detect that read ahead cacheing is required for this file under the right circumstances (system load?), I'm wondering whether or not it might actually be a better idea simply to have a few large RMS global buffers for the file, to ensure some kind of cacheing, rather than have the potential of failed read hits on the XFC...

Ian Miller. · ‎04-08-2010

Hein has said elsewhere the only wrong answer for global buffers is zero but for a file being accessed only by one process then do global buffers behaving the same has having local buffers?

Does the code specify a multi buffer count?
If not then it should pick up values set with SET RMS_DEFAULT so you can experiment.

____________________
Purely Personal Opinion

P Muralidhar Kini · ‎04-08-2010

Hi Mark,

>> A little bit of confusion here using SDA
>> symbols and SYSGEN params, but I'm
>> guessing you mean VCC_READAHEAD,
Yes. I meant VCC_READAHEAD SYSGEN parameter.
(VCC$GL_READAHEAD was typo)

>> Do all read requests go to the XFC first
>> of all,
Yes in case XFC caching is enabled.
Application would call QIO to perform the IO
operation. QIO would then check if XFC is
enabled, if yes then it would call XFC to
take over the IO. In case XFC is disabled on
the node then QIO would not call XFC.
Once XFC is called, XFC would do its own set
of checks to determine if the IO needs to
skip the cache.
Some common scenarios in which XFC decides
to Skip the IO are
- Caching is disabled on Volume
(MOUNT/NOCACHE)
- Caching is disabled on file
(SET FILE/CACHING_ATTRIBUTE=NO_CACHING)
- Caching is disabled on IO
(using function modifier IO$M_NOVCACHE in
the QIO call)
- IO size is greater than VCC_MAX_IO_SIZE

>> and they have to wait for the XFC to
>> say "in cache" or "not in cache" before
>> progressing further (if it doesn't
>> receive a response within X amount of
>> time, does the read "just go to disk"
>> rather than waiting for the XFC to
>> respond?

When XFC does a read IO to a file, it first
checks if the data is already available in
the XFC Cache. If YES then it returns the
data immediately. If not then it performs a
Read IO to disk. In any case, IO would
always go through XFC.

However, in case the file is shared
cluster-wide and some other node in the
cluster is doing a write operation to the
file then XFC won't be able to get a lock
on the file in the desired mode. In such a
case, XFC will convert the read-through to
read around IO. Read-around would mean, XFC
will make the IO skip the cache and let IO
happen to disk.
As you have mentioned that there is no other
node in cluster doing write IO to the file,
this scenario is eliminated.

>> (so, if allocated pages jumped from say
>> 1000 to 1050, it could mean 50 new pages
>> added, or it could mean 80 new pages
>> added and 30 old pages removed).
Yes thatâ s correct. Allocated pages only
indicates how much of the file's data is
currently in cache.

Data that you have provided
1) File is not accessed cluster-wide
From this we can rule out the scenario
where the file is written once in a while
from some other node of the cluster

2) IO Size issued by the application
Here also the application does not seem
to be doing a IO greater than
VCC_MAX_IO_SIZE

3) XFC SDA Data
>> XFC File stats from ~5 mins after job
>> starts (hit rate=0%)
The data here indicates that only a few IOs
were satisfied from the cache, for all other IO's XFC had to fetch the requested data
from the disk

>> XFC File stats from ~66 mins after job
>> starts (hit rate=90%)
Here we can see quite a number of reads
being satisfied from the cache and hence the
read hit rate is higher.

My suspicion is that, initially data is not
there in the XFC cache and hence hit rate
is very less. As data gets filled in the
cache, subsequent IO will find the data in
cache and hence the read hit rate increases. Sometime later, a logical IO might have been
performed on the Volume as a result of which
entire data on the volume gets cleared.
Next set of reads to the file has to now
fetch the data again from the disk, this
would now reduce the cache hit rate.

Some questions -
1) Is that every time the application runs,
it gets 0 hit rate in the beginning and
the hit rate increases after some point
of time.

2) When does the hit rate become 0.
Only when application starts accessing
the data for the first time or some other
time also.

3) Are you aware of any logical IO's being
performed on that volume.
If the disk is mounted cluster-wide then,
are any other nodes performing any
Logical IO to the volume.

>> everyone is looking for a quick fix "X is
>> wrong; Change Y to Z and that will fix
>> it, or at least be a workaround, whilst
>> we can schedule recoding the program into
>> the plan...
One suggestion for workaround would be to increase the XFC size -
The current Physical memory size is 56GB
(GS1280/1150) and 48GB(GS1280 7/1300).
You had mentioned that XFC is sized at
2.75GB. One suggestion would be to increase
the XFC size from the current 2.75GB
to 8GB. XFC is tested with memory sizes up
to 8GB and hence you can increase the current
size of XFC to 8GB for better performance.

Regards,
Murali

Let There Be Rock - AC/DC

Mark Corcoran · ‎04-08-2010

Ian:
>Hein has said elsewhere the only wrong answer for global buffers is zero but for a file being accessed only by one process then do global buffers behaving the same has having local buffers?

Mea culpa. Global buffers (as I understand them) isn't actually what I meant - I meant the buffers that SET RMS_DEFAULT /BUFFER= refers to.

>Does the code specify a multi buffer count?

Although the C RTL does allow specification of RMS options on the fopen(), it just specifies "r".

>If not then it should pick up values set with SET RMS_DEFAULT so you can experiment.

Great minds think alike - I was just doing some back-of-a-fag-packet calculations on what to use, and will post results back here.

The problem of course is that between tests, you have to wait for any cached part of the file to be expunged (either through normal system load, or to force it using something like "SEA [...]*.* blah") before you can test again.

Murali:
>Caching is disabled on Volume
Not in this case (obviously, otherwise none of the file would appear in it :-)

>Caching is disabled on file
Again, not in this case ("Caching attribute: Writethrough")

>Caching is disabled on IO
It is an fgets() call in the C RTL, but I wouldn't have thought that it would do any disabling.

>IO size is greater than VCC_MAX_IO_SIZE
Unless fgets() is doing something weird, when told to read 225 bytes.

>Sometime later, a logical IO might have been performed on the Volume as a result of which entire data on the volume gets cleared.
>Next set of reads to the file has to now fetch the data again from the disk, this would now reduce the cache hit rate.

I'm not certain what you mean in this context by a logical IO - can you give some examples?

I can understand that dismounting the volume (or a member of its shadow set) could cause this issue.

However, could "SEA [...]*.filename_type blah" really do this?

I'm not sure whether or not you mean this is a logical IO which would cause that volume's cache contents to be expunged...
...or if there is a per-volume limit in the XFC, and that depending on that limit (and the size of files be SEArched), this would cause the existing volume's cache contents to be expunged, to make way for those files that SEARCH is processing?

>1) Is that every time the application runs, it gets 0 hit rate in the beginning and the hit rate increases after some point of time.

Hmm, unfortunately, there's no statistics available from previous daily runs (unlike many of the other jobs, it actually runs at 09:00, so I don't need to log in in the middle of the night, to check).
However, from my manual testing, this appears to the case.

>2) When does the hit rate become 0. Only when application starts accessing the data for the first time or some other time also.
From my manual tests, it only appears to be when it starts accessing it for the first time (and where the start of the file is not in the cache), that the hit rate is 0.

I'm not sure how the cache hit reporting code works, but the only way for the rate to drop to 0% whilst the job runs, would be due to mathematical rounding...

(there will have been some successful hits, so successes/attempts would always yield a non-zero value unless you round it down.

>Are you aware of any logical IO's being performed on that volume.
>If the disk is mounted cluster-wide then, are any other nodes performing any Logical IO to the volume.

It depends on what precisely you mean by logical IO.

The volume is mounted cluster-wide.

Normally, nothing else creates files on this volume; the only other activity may be a backup or defragger job that runs overnight, but it should be finished by 09:00 when this job runs.

Ian Miller. · ‎04-08-2010

If there was a job before this one which accessed lots of other files on the same disk then the file in question is not going to be in the XFC.

I guess your real aim is to reduce the elapsed time of this job that processes the RDB dumped tables. What has happened in previous runs is only useful in perhaps helping you to reduce the time for future runs.

You can specify RMS options on the C fopen.

If there are no RMS options specified now then you can experiment with
$ SET RMS/INDEX/BUFFER=3/BLOCK=

I wonder about the physical layout of the file and could CONVERT help.

logical I/O - different from the usual virtual I/O which address a file as a array of blocks starting at 1 - logical I/O addresses a disk as a array of blocks starting at 0. Unlikely although I wonder about the defrag job - has it finished when this job starts?

____________________
Purely Personal Opinion

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?