topic 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()? in Operating System - OpenVMS

0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Mark Corcoran — Wed, 07 Apr 2010 10:44:03 GMT

Following on from my previous thread about CONVERT /SORT versus SORT + CONVERT /NOSORT, another
problem has arrived in my lap...

A job which post-processes RDB dumped tables (RMS indexed files) to generate a file with records
formed from related parts of these tables, has started to slow down.

[There is one main table file which is sorted in order, and depending on the field/columnar values on each row/record, will
determine whether or not the C program has to check other dumped table files]

Unfortunately, there's no evidence to back this up, just people's vague recollection of how quick
they think it used to be.

Looking at the job, the first thing I found is that the output file it generates, was very
fragmented - between 5000 and 7000 fragments of 200 to 900 blocks each.

To see if the fragmentation was the main issue, I worked around this by doing the following:

$ SET RMS_DEFAULT /EXTEND_SIZE=65376
$ COPY NLA0: dev:[dir]output_filename.ext /ALLOCATION=11000000 /CONTIGUOUS

The device on which the file is created has a cluster size of 288 blocks, and 65376 was the highest
multiple of 288 possible that was <= 65535.

The COPY pre-allocates a contiguous file for the C program which was updated to open the file in
append mode.

After running the new job, it was obvious from the following:

$ SHOW MEMORY /CACHE=FILE=dev:[dir]main_table.DAT

that whilst the main input table file was being cached, virtually no reads were being serviced by
the XFC from read aheads, and virutally all were read throughs.

I momentarily forgot that whilst the main input table file is an RMS indexed sequential file, it is
being read sequentially by the C program using simple fgets() calls.

Thinking that the file was perhaps not sorted in order after all, I TYPE/PAGEd it (given that it
has ~49m records in it, I wanted some control over when my ^C would get picked up), then held the
RETURN key down for a good minute or so.

The records appeared to be in order, but what I did notice was that after this, the SHOW MEMORY
/CACHE indicated that every single read was being serviced as a read ahead from the XFC.

After about 90mins, I killed the job, and found that the XFC cache hit rate was at ~90% (obviously,
it would never get to 100%, because of the initial ~14,000 which were treated as read throughs).

I then ran the job again, but without using TYPE /PAGE on the main table file.

It has now been running for almost as long as the first run, but the cache hit rate is 54%, and
although the read ahead counter value displayed by SHOW MEMORY /CACHE is increasing, so is the
read through counter - approximately 1 in 4 reads end up as READ AHEAD.

Now, I know that this is only 2 individual runs, and hardly what you'd call exhaustive evidence...

However, I'm going to go out on a limb here, and say that without my TYPE/PAGE, either:

a) the C program is largely running ahead of the XFC cache in reading the file contents, so most
reads won't cause sequential read ahead of the file to occur (unless from outside interference,
such as me doing TYPE /PAGE)

or

b) however XFC determines that something is performing sequential reads doesn't work (in this
particular scenario).

For what it's worth, the file attributes are this:

Size: 13956192/13956192 Owner: [SYSTEM,*]
Created: 5-APR-2010 11:01:01.32
Revised: 6-APR-2010 18:24:21.06 (4)
Expires:
Backup: 7-APR-2010 02:34:38.75
Effective:
Recording:
Accessed:
Attributes:
Modified:
Linkcount: 1
File organization: Indexed, Prolog: 3, Using 3 keys
In 2 areas
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 13956192, Extend: 65520, Maximum bucket size: 18
Global buffer count: 0, No version limit
Contiguous best try
Record format: Variable length, maximum 200 bytes, longest 0 bytes
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:RE, World:
Access Cntrl List: None
Client attributes: None

Total of 1 file, 13956192/13956192 blocks.

The last SHOW MEMORY /CACHE command gave the following results:
Extended File Cache File Statistics:

_dev:[dir]table.DAT;1 (open)
Caching is enabled, active caching mode is Write Through
Allocated pages 5122 Total QIOs 144399
Read hits 79682 Virtual reads 144399
Virtual writes 0 Hit rate 55 %
Read aheads 22443 Read throughs 144399
Write throughs 0 Read arounds 0
Write arounds 0

Total of 1 file for this volume

Write Bitmap (WBM) Memory Summary
Local bitmap count: 93 Local bitmap memory usage (MB) 8.40
Master bitmap count: 96 Master bitmap memory usage (MB) 8.27

Is the fact that the Global Buffer Count set to 0 and/or the fact that the file is an RMS indexed
file being read using the C RTL fgets() partly to blame here, or is something else going on?

Clearly, I'm reluctant to have a second concurrent job run at the same time as this main job,
simply to TYPE the table file, then be killed after a minute, to ensure that a sufficient
quantity of the file is cached to permit the XFC to service read requests.

If anybody has any thoughts/suggestions, I'd be most grateful.

Mark

[Grrr, hit some sequence on the keyboard, causing IE to go back a page, and lose 90% of this
post, so had to go back and do it from scratch again in notepad...]

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hein van den Heuvel — Wed, 07 Apr 2010 13:27:28 GMT

Hello Mark,

That sure is long description, and I coudl not always follow it they way I would have liked to, but at least we have some pertinent data. Good!.
I'll takes a first reply to clear some crud, and then try to get to the real problem.

[There is one main table file which is sorted in order, and depending on the field/columnar values on each row/record, will
determine whether or not the C program has to check other dumped table files]

>> people's vague recollection of how quick
they think it used to be.

Too late now, but sprinkle your programs liberally with LIB$SHOW_TIMER!

>> the output file it generates, was very fragmented

That can certainly cause unpredictable run times. pre-allocate, perhaps based on input file size, and max extend (64000, 65535, whatever).

>> $ SET RMS_DEFAULT /EXTEND_SIZE=65376

Fine for a process. But too much if done system wide. Slows down tasks like unzipping many little files.

>> $ COPY NLA0: dev:[dir]output_filename.ext /ALLOCATION=11000000 /CONTIGUOUS

Excellent. If contiguous then extend size is irrelevant.
I uses to use COPY NL: all the time myself for that purpose.
Since 8.3 I use the inline FDL strings:

$cre/fdl="file; contiguous yes; allo 12345678"/log x.x

>>highest multiple of 288 possible that was <= 65535.

Nice thought/touch, but largely irrelevant. OpenVMS has no choice but to round up.

>> The COPY pre-allocates a contiguous file for the C program which was updated to open the file in
append mode.

Excellent

>> input table file is an RMS indexed sequential file, it is being read sequentially by the C program using simple fgets() calls.

No matter. Those maps to RMS SYS$GET calls.

Next step is probably to SET FILE/STAT on the existing files, in an output, and use ANAL/SYS.. SHOW PROC/RMS=FSB or my RMS_STATS tool to display all counters.

>> Thinking that the file was perhaps not sorted in order after all

An indexed file is sorted by primary key. No ifs or buts about that.

>>, I TYPE/PAGEd it (given that it
has ~49m records in it, I wanted some control over when my ^C would get picked up), then held the
RETURN key down for a good minute or so.

How crude.
$ perl -pe "last if $. > 10000" > nl:

>> The records appeared to be in order, but what I did notice was that after this, the SHOW MEMORY
/CACHE indicated that every single read was being serviced as a read ahead from the XFC.

As pre-loaded by the program.

>> it would never get to 100%, because of the initial ~14,000 which were treated as read throughs).

Read-throughs is just through the cache, not through to the disk.
Read to the disk = reads-hits + ahead.

See HELP SHOW MEMORY... deep down:
7 Read throughs Number of Virtual Reads that are capable of being satisfied by the extended file cache.

>> Size: 13956192/13956192 Owner:

Is that the table/driver file?

>> Is the fact that the Global Buffer Count set to 0 and/or the fact that the file is an RMS indexed
file being read using the C RTL fgets() partly to blame here, or is something else going on?

Nah.

>> Clearly, I'm reluctant to have a second concurrent job run at the same time as this main job, simply to TYPE the table file, then be killed after a minute, to ensure that a sufficient
quantity of the file is cached to permit the XFC to service read requests.

That's not so clear to me.
Clearly TYPE is a silly tool for this, but you know more that the XFC can guess.
So launching something to pre-read is not that crazy an idea for predictable jobs with critical run time requirement.

I once created a 'read-ahead-and-keep-ahead' tool, just for that reason.
It woudl pre-read N buckets worth of data. The used an RMS compatible bucket lock with blocking AST on the first to detectt 'interest in a bucket'. When the AST triggered on bucket M, grab a lock for the next (M+1), release M, read M + N + 1.

>> If anybody has any thoughts/suggestions, I'd be most grateful.

- RMS stats.
- Be sure to watch activity on those other files.
- Engage a professional in this space if it is really critical.

Cheers,
Hein van den Heuvel ( at gmail dot com )
HvdH Performance Consulting

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hein van den Heuvel — Wed, 07 Apr 2010 13:29:57 GMT

Meant to open with

0% hit rate is perfectly normal for
- files that have not been read/written in a while.
- files that well exceed the cache capacity and a read sequentially
- when nochace is in effect
- when IOs are done larger than the max-cache IO size.
- when concurrent updates are happening on other nodes in the cluster.

Hein

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Ian Miller. — Wed, 07 Apr 2010 13:40:06 GMT

At present
- files that well exceed the cache capacity and a read sequentially

looks likely but how big is the cache on this system, and what aged version of VMS is being used?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Mark Corcoran — Wed, 07 Apr 2010 14:51:59 GMT

Hein:
>Too late now, but sprinkle your programs liberally with LIB$SHOW_TIMER!
I know how much the bean-counters like to have stats, so I always try to make sure I get timing info for various stages of programs (can also be useful for myself too).

Alas, this is someone else's code, developered some time ago, and the concern was more with getting it working than making it perfect ;-)

>>> $ SET RMS_DEFAULT /EXTEND_SIZE=65376
>Fine for a process. But too much if done system wide

Don't worry, it was only for this one job, as a test :-)

>Excellent. If contiguous then extend size is irrelevant.

I'd wondered about this - assuming that the next highest contiguous block on the disk was 130752 blocks, and the RMS extend size had been set to 65376, I'm guessing that if exactly 130752 blocks were required, then:
a) they'd be allocated in two logical single operations
b) as far as BITMAP.SYS is concerned, the fact that there are two groups of 65376 blocks is irrelevant, because they are "next to each other", so would appear as a single fragment...

>>>highest multiple of 288 possible that was <= 65535.
>Nice thought/touch, but largely irrelevant. OpenVMS has no choice but to round up.

So, if I set extent size to 65535, and the cluster size was 288 blocks, presumably extending the file should theoretically mean 65664 blocks allocated?

I had guessed that the 65535 limit was as a result of a word being used to store the value, so I couldn't see how 65664 (17 bits) would fit...

>Next step is probably to SET FILE/STAT on the existing files, in an output, and use ANAL/SYS.. SHOW PROC/RMS=FSB or my RMS_STATS tool to display all counters.

I tried the SET FILE/STAT and a MONITOR RMS /FILE=, but to be honest, it didn't reveal very much - the only non-zero counters were the CUR, AVE andMAX $GET Call Rate (Seq).

I knocked up a quick .EXE of my own to effectively do the same as the real one, and this was the MONITOR RMS /FILE output (as a snapshot):

Active Streams: 1 CUR AVE MIN MAX

$GET Call Rate (Seq) 19375.33 4123.63 0.00 21861.00
(Key) 0.00 0.00 0.00 0.00
(RFA) 0.00 0.00 0.00 0.00
$FIND Call Rate (Seq) 0.00 0.00 0.00 0.00
(Key) 0.00 0.00 0.00 0.00
(RFA) 0.00 0.00 0.00 0.00
$PUT Call Rate (Seq) 0.00 0.00 0.00 0.00
(Key) 0.00 0.00 0.00 0.00
$READ Call Rate 0.00 0.00 0.00 0.00
$WRITE Call Rate 0.00 0.00 0.00 0.00
$UPDATE Call Rate 0.00 0.00 0.00 0.00
$DELETE Call Rate 0.00 0.00 0.00 0.00
$TRUNCATE Call Rate 0.00 0.00 0.00 0.00
$EXTEND Call Rate 0.00 0.00 0.00 0.00
$FLUSH Call Rate 0.00 0.00 0.00 0.00

As for the ANA /SYS and SHOW PROC /FSB, that didn't reveal much either:

FSB Address: 00064000
-----------
OPEN: 1. CLOSE: 0.
CONNECT: 1. DISCONN: 0.
REWIND: 0. FLUSH: 0.
EXTEND: 0. blocks: 0.
TRUNCATE: 0. blocks: 0.

FIND seq: 0. key: 0. rfa: 0.
GET seq: 159199. key: 0. rfa: 0. bytes: 18296029.
PUT seq: 0. key: 0. bytes: 0.
UPDATE: 0. bytes: 0.
DELETE: 0.

READ: 0. bytes: 0.
WRITE: 0. bytes: 0.

LOCAL CACHE attempts: 161187. hits: 159198. read: 1989. write: 0.
GLOBAL CACHE attempts: 0. hits: 0. read: 0. write: 0.
GLOBAL BUFFER INTERLOCKING:
GBHSH Intlck Collisions: 0 GBH Intlck Collisions: 0
GBHSH Held at Rundown: 0 GBH Held at Rundown: 0

LOCKS: Enqueue Dequeue Convert Block-ast
Shared file: 0. 0. 0. 0.
Local buffer: 0. 0. 0. 0.
Global buffer: 0. 0. 0. 0.
Shared append: 0. 0. 0. 0.
Global section: 0. 0. 0. 0.
Data record: 0. 0. 0.

XQP QIO: 1.

BUCKET SPLIT (1) : 0. SPLIT (N) : 0. OUTBUFQUO: 0.

DEV1 .. DEV5: 00000000 00000000 00000000 00000000 00000000

>An indexed file is sorted by primary key. No ifs or buts about that.
Ah sorry, I *think* what I meant was that the file is indexed in order, and the records are also stored in order (rather than having the a nice sequential index still pointing to "random" disk blocks).

>>> Size: 13956192/13956192 Owner:
>Is that the table/driver file?

Yes, this is the primary input file, just under 14m blocks in size.

>That's not so clear to me.
>Clearly TYPE is a silly tool for this, but you know more that the XFC can guess.
I looked up XFC in the system management manual, and its discussion of XFC detecting sequential reads of same-size I/O requests led me to the VCC_READAHEAD SYSGEN parameter - thinking that perhaps it wasn't set, but alas it was.

On the face of it, it appears that the executable is simply reading from the primary input file sequentially quicker than XFC can detect that that is what is happening, so although XFC is cacheing the file, it's always behind the executable (unless it gets a head start from something else, whereby the reads from executable allow XFC to keep on topping up the file into the cache).

>Engage a professional in this space if it is really critical.
perhaps this is not the place to discuss it, but I never heard the story about how you and the Hoff come to part ways with HP - jumped, or pushed? How has the private sector been treating you since?

>when concurrent updates are happening on other nodes in the cluster.
Not the case here - other jobs may happen to read the same primary input file, but certainly during my testing, there was just the one process accessing the file, and it was doing the sequential read.

Ian:
>looks likely but how big is the cache on this system
XFC currently allocated at 2.75GB.

>and what aged version of VMS is being used?
You know me and many other HP customers only too well ;-) 7.3-2 on this cluster.

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hein van den Heuvel — Thu, 08 Apr 2010 01:55:03 GMT

>> I'd wondered about this - assuming that the next highest contiguous block on the disk was 130752 blocks, and the RMS extend size had been set to 65376, I'm guessing that if exactly 130752 blocks were required, then:
a) they'd be allocated in two logical single operations

Yes.

>> b) as far as BITMAP.SYS is concerned, the fact that there are two groups of 65376 blocks is irrelevant, because they are "next to each other", so would appear as a single fragment...

They would appear as a single fragment in the MAP area for the file using them ($ DUMP/HEAD/BLOCK=COUNT=0 ). In the bitmap they would be 2 * 227 adjacent bits.

>> So, if I set extent size to 65535, and the cluster size was 288 blocks, presumably extending the file should theoretically mean 65664 blocks allocated?

Yes indeed. Because VMS has to give you 227 + 1 cluster to satisfy the extend request.

>> the 65535 limit was as a result of a word being used to store the value

Correct

>>> I tried the SET FILE/STAT and a MONITOR RMS /FILE=, but to be honest, it didn't reveal very much

IMHO the way MONI RMS presents that data is next to useless.

>>> As for the ANA /SYS and SHOW PROC /FSB, that didn't reveal much either:

FSB Address: 00064000
:
GET seq: 159199. key: 0. rfa: 0. bytes: 18296029.
:
LOCAL CACHE attempts: 161187. hits: 159198. read: 1989. write: 0.

IMHO that indicated a lot. You needed an IO about once every 80 records. So there must have been 80 records to a bucket. Those 1989 IOs would have gone through to the XFC to be resolved thread from a prior read (ahead) or from a real IO.

>> Ah sorry, I *think* what I meant was that the file is indexed in order, and the records are also stored in order (rather than having the a nice sequential index still pointing to "random" disk blocks).

Got it. Yes, for records arriving in primary key order both CONVERT FAST-LOAD and Plain-old RMS will allocate in ever increasing adjacent buckets. A minor exception is that if the file needs to grow while doing so, then the new bucket is started in the fresh extend, potentially leaving the tail end of the current extend unused for up to bucket size minus 1. In this case the bucket size divides evenly into the cluster size, so that's not an issue.

>>> On the face of it, it appears that the executable is simply reading from the primary input file sequentially quicker than XFC can detect that that is what is happening

I never really studied the read-ahead for XFC. RMS only does read-ahead for sequential files, not indexed, and for sequential files it 'bursts' reading a bunch, but not keeping ahead. I actually tried to implement that while in RMS engineering but there were gotcha and I had to abandon at the time.

>> I never heard the story about how you and the Hoff come to part ways with HP - jumped, or pushed?

I can only speak for myself. I received an early retirement opportunity which seemed too nice to refuse. It was a volunteered choice creating optimal (financial) conditions to try work independent for a while. That was October 2005. So far so good!

Regards,
Hein

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

John McL — Thu, 08 Apr 2010 02:16:07 GMT

Hein, I'm watching this thread with some interest so a question - two actually - for you...

In the second last paragraph of your response immediately above this one you seem to be implying that there's no read-ahead on indexed files but there is for sequential files. Is this correct?

If so, is that set by the file characteristics or by the parameters in the open statement?

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hein van den Heuvel — Thu, 08 Apr 2010 02:38:52 GMT

Hello John

John >>In the second last paragraph of your response immediately above this one you seem to be implying that there's no read-ahead on indexed files but there is for sequential files. Is this correct?

Only from an RMS perspective, is it not reading ahead into its buffers.
The XFC is blisfully ignorant as to whether RMS is doign an IO from a sequential file or indexed file, so the XFC can independent from RMS trigger a read-ahead into its buffers for RMS to find the data later.
And behind the XFC the Controller knows even less and it can do read-aheads, and behind that the physical Disk can be doing read ahead. So the odds that you'd be waiting for a disk seek/rotation are low!

>> If so, is that set by the file characteristics or by the parameters in the open statement?

For sequential file you have to request RAB$V_RAH in the connect, which is part of teh OPEN from an HLL perspective. It is the default for many languages. The number of buffers defines how deep the read ahead goes.

The RMS read ahead (on sequential files) can probably disrupt the XFC read ahead recognition. I never experimented with that though.

RMS Read ahead on indexed file would not seem too hard to implement, but it was never done nor requested. Regrettably. Again, the XFC may well decided to do the read ahead for indexed files.

I haven't looked at the code, but it woudl nor surprise me if the XFC would find it easier to do read-ahead for IOs which nicely line up with its 16-block cache lines. But for that to happen for an indexed files, many stars need to line up! (Bucketsize 2, 4, 8,16, or 32. Clustersize a power of 2. Rms primary key data NOT in area 0, or not pre-allocated.)

Hein

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hoff — Thu, 08 Apr 2010 03:13:20 GMT

Records in an indexed file aren't necessarily adjacent, so there's no direct way to warm up a generic block cache given the current design of RMS. RMS would need to do that, or to provide hints to XFC. Neither of which, AFAIK, exists at present.

Whether Hein's suggested leading-traversal approach might be worth the implementation effort is interesting; I'd want to measure that cache pre-populate scheme.

It would be equally interesting to toss upgrade or a RAM disk or an SSD at the problem, and measure throughput with that. 66 megabytes isn't all that much data; that'd be close to fitting entirely into the RAM in my cellphone, and would be dwarfed by what I've got stored in the flash. Best case, this application should be limited by the spiral transfer rate of the disk. Or by your RAM disk or SSD bandwidth. Arguably, RMS could just be getting in the way here if you can run from analogous in-memory data structures. (RMS doesn't have the concept of hauling an entire file into memory as one big wad, performing the required operations, and then rolling it all out as a big wad.)

It'd be interesting to compare RMS indexed files to an application built on Apache Cassandra, too. But that's fodder for discussion on another day. And no, I'm not aware of a VMS port of Cassandra.

And after that wall of text...

When I go after RMS files from C, I use this code:

http://labs.hoffmanlabs.com/node/595

And generally not with the file I/O portions of the C RTL.

The C I/O has its share of considerations here; that you can even get at indexed files through a mostly-generic C API is somewhat of a remarkable implementation achievement. But by that same token, don't expect it to be the go-fast implementation. I might well look to haul it all into memory with a few and large I/Os.

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

P Muralidhar Kini — Thu, 08 Apr 2010 03:23:55 GMT

XFC will not cache IO's to a particular file
in case -
* IO's done to the file are of size greater
than VCC_MAX_IO_SIZE blocks.

* file is present on a local RAMDISK

* The file is accessed cluster wide and
there is atleast one node in the cluster
that is doing a write IO to the file.

* file will be temporarily not cached if
logical IO's are done to the file or the
volume on which the file resides

XFC ReadAhead -
* XFC does read ahead for a file if the
SYSGEN parameter VCC$GL_READAHEAD is set
to 1.

* XFC has a read ahead factor of 3 which
would mean that when read ahead is being
performed on a file, 1 among 4 IO's to the
file will be read ahead.

XFC ReadHits
* Whether the IO is read-through or
read-ahead, it is still a Read IO
operation that XFC has to perform and
would be used in the statistic as a IO.

* The hit rate for the file is calculated
as follows -
HitRate = ReadHits/TotalIO

Here,
ReadHits - Number of times a Read
operation was satisfied from
the cache
Total IO - Number of Read operations

Both the "ReadHits" as well as "TotalIO"
include read-through as well as read-ahead.

From the information you have provided,
>> SHOW MEMORY /CACHE
>> Allocated pages 5122
>> Total QIOs 144399
>> Read hits 79682
>> Virtual reads 144399
>> Virtual writes 0
>> Hit rate 55 %

IO's to the file are going through the XFC
cache and there are some number of IO's
that are getting satisfied from the cache
and hence we are seeing the hit rate of 55%.

The question is why so low Hit-rate?
My suspicion is that, there is some other
operation on that file (or volume on which
the file resides) that is causing the
contents of the file to get deposed
(i.e. cleared) from the cache once in a
while. This would cause subsequent IO's to
the file to get read from the disk(read
miss). Couple of obvious reasons for the
file depose would be either logical IO's
to the file/volume or cluster-wide write
operations on the file.

Please provide the following information about the file -
1) XFC statistics from SDA
ANAL/SYS
SDA> XFC SHOW FILE/ID=/STATS
SDA> XFC SHOW MEM

NOTE: FID_IN_HEX is the FID of the file
(dev:[dir]table.DAT;1) in Hex

2) How big is the IO size issued by the
application to the file
(i.e. How big is the IO's that the
application issues to the file. are
they 50 blocks or 100 blocks ....)

3) Is the file accessed cluster-wide.
If yes, what type of IO (Read/write)
are performed on that file cluster-wide
and how frequently

These information could provide further
clues as to why the hit rate is very low
for the file.

Regards,
Murali

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Hein van den Heuvel — Thu, 08 Apr 2010 03:45:36 GMT

Good summary Murali. Thanks.

The read-ahead 3x description sounds suspect.
So it sees 3 (adjacent QIOs) and then adds just one read ahead QIO? How much? Of equal size to last IO, or some large size like 8 cache lines?

I don't think the hit rate is low.
I kinda expect 0%. The real file is 6+ GB, the primary key data perhaps 5+ GB. That could well be larger than the active maximum cache on an Alpha. Now if it is the same mox Mark asked about before then it is a 24-CPU Alphaserver GS1280 7/1300, running OVMS v7.3-2. So that is likely to have 48 GB or more memory, and the cache may be as high as 20 - 32 GB if not actively throttled. "
So normally a 6GB file would fit, and running the down stream program shortly (hours?) after the load would find the data in the cache. But other, totally unrelated activities, maybe as silly as SEAR [*...]*.log "FATAL ERROR", could flush out those, or a part of the 6GB, and cause a tremendous slowdown compared to other days/weeks.

Time to close shop for the day!
Cheers,
Hein.

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Steve Reece_3 — Thu, 08 Apr 2010 05:37:29 GMT

Hi Mark,

I'm assuming that even though you're in a cluster the file is only being read in on one node at any time? If this isn't the case then you'll never effectively cache it in my experience with XFC. You'd need to rely on third party products like PerfectCache or on the raw IO performance of the system and the disk array that's hung off it.

Steve

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

P Muralidhar Kini — Thu, 08 Apr 2010 06:49:04 GMT

Hi,

From the information provided,

>> Total of 1 file, 13956192/13956192 blocks.
This is around 7GB.

>> XFC currently allocated at 2.75GB.
XFC current size is 2.75 GB

As Hein as pointed out,
The file size is bigger than the XFC cache
size and hence you cannot have the entire
file cached.

As and when file is accessed, it would get
in to the cache. But as the file size is
larger than the XFC cache, the data for the
large file when read may have to throw out
other data for the same file already in
cache.
Example: When block 50 of the file is read
and there is no space in XFC cache then XFC
may have to throw say block 20 of the same
file out of cache in order to make room for
the block 50.
Subsequent IO's to block 20 would now be a
read miss and the data to now go to the
disk. This way the read hits will come down.

Also certain activities have the potential
to thrash the entire XFC cache or depose the
data of a file/volume.

Thrash XFC cache
* SEARCH/COPY or any 3rd party backup
operation

Note: VMS backup does not thrash the XFC
cache because the IO it performs skips the
XFC cache. This is done by specifying the
function modifier IO$M_NOVCACHE for the IO
that it issues.

Depose File/volume data from cache
* Logical IO to file/volume
* cluster-wide write operation

What is the physical memory size of the
system ?
You can get that from DCL "$SHOW MEM/PHY"

XFC is sized at 2.75GB. By default XFC would
size itself to be 1/2 of physical memory.
So I guess the Physical memory would be
around 5.5GB. Is that correct.

As a side note, Other things to consider
from Read-hits point of view is caching at
different levels such as CRTL and RMS.
CRTL uses buffering and so does RMS with its
local buffering. When a file is accessed, it
its data is present in the CRTL or RMS
cache, then the request will be satisfied
from there itself. The request wonâ t come to
XFC and the XFC statistics would remain the
same.

Regards,
Murali

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

P Muralidhar Kini — Thu, 08 Apr 2010 07:05:19 GMT

>> I'm assuming that even though you're in
>> a cluster the file is only being read in
>> on one node at any time?
>> If this isn't the case then you'll never
>> effectively cache it in my experience
>> with XFC.

Is there any particular scenario that you
would like to share. Because the XFC caching
behavior in the cluster environment should
be as follows

* Multiple Reader -
XFC does cache a file when they are
multiple readers of the same file.
The file will be cached on all nodes in
the cluster.

* One Writer, Multiple Reader-
In case there is only one writer node and
multiple reader nodes, then XFC does
caching for the file only on the Node
where the writer is present. On the nodes
where readers are present the file wont
the be cached.

* Multiple Reader/Writer
Where there are multiple writers to a
file, XFC wont cache that file cluster
wide.
i.e. All the nodes will not cache the file.

Regards,
Murali

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Mark Corcoran — Thu, 08 Apr 2010 08:06:00 GMT

Hoff:
>Records in an indexed file aren't necessarily adjacent, so there's no direct way to warm up a generic block cache given the current design of RMS.

In this particular case, the index by definition is sorted by primary key order; the actual data records are present in the file in primary key order too.

(A bit like what I was alluding to in my response to Hein - e.g. having records physically located in random order, but with the index in primary key order (so you quickly find it in the index, but to actually access the record, may involve a "lot" of disk activity, because the records are not logically adjacent on the disk)

Obviously, XFC can't know whether or not records in an indexed file happen to be stored adjacent to each other in order (but is this something it can guess at, or be told about??)

>It would be equally interesting to toss upgrade or a RAM disk or an SSD
As is often the case, one team looks after the O/S, and layered products, whereas another team looks after the primary application.

Getting agreement to O/S related changes is often an uphill struggle, and not something that would happen any time soon (lead time for notice of changes, getting all the approvers to approve changes, yada yada).

Unfortunately, this is particularly the case when it can't be definitiely stated how much of a difference it would make...

>When I go after RMS files from C, I use this code:
The person who wrote the program originally, does now use use RMS for accessing RMS-indexed files, but this is some of his earlier code; ideally, it will be changed, but of course, everyone is looking for a quick fix "X is wrong; Change Y to Z and that will fix it, or at least be a workaround, whilst we can schedule recoding the program into the plan...

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Mark Corcoran — Thu, 08 Apr 2010 08:45:52 GMT

Murali:
>XFC will not cache IO's to a particular file in case -
>* IO's done to the file are of size greater
> than VCC_MAX_IO_SIZE blocks.
It's the C RTL being requested to fgets() 225 bytes or stop when a Line Feed character is encountered, whichever comes first; what it actually requests "behind the scenes", I don't know.

>file is present on a local RAMDISK
No

>The file is accessed cluster wide and there is atleast one node in the cluster that is doing a write IO to the file.
No - a single process running on a single node in the cluster accessing it for read only.

>XFC ReadAhead -
>* XFC does read ahead for a file if the SYSGEN parameter VCC$GL_READAHEAD is set to 1.
A little bit of confusion here using SDA symbols and SYSGEN params, but I'm guessing you mean VCC_READAHEAD, in which case I'll just list all the VCC settings:

$ MC SYSGEN SHOW VCC
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VCC_FLAGS 2 2 0 -1 Bitmask
VCC_MAXSIZE 6400 6400 0 3700000 Blocks
VCC_MAX_CACHE -1 -1 0 -1 Mbytes D
VCC_MAX_IO_SIZE 127 127 0 -1 Blocks D
VCC_MAX_LOCKS -1 -1 50 -1 Locks D
VCC_READAHEAD 1 1 0 1 Boolean D
VCC_WRITEBEHIND 1 1 0 1 Boolean D
VCC_WRITE_DELAY 30 30 0 -1 Seconds D
VCC_PAGESIZE 0 0 0 -1 D
VCC_RSVD 0 0 0 -1 D

>The question is why so low Hit-rate?
Well, actually, this is a funny thing....

When I was running the job yesterday afternoon, after 85 mins, the hit rate was 55%.

I ran it again this morning so that I could get the additional XFC stats from the SDA extension for you, and after 35 minutes, it was 83%.

It looks therefore that it might quite possibly be contention for XFC resource that is causing/contributing to the problem - in heav(y|ier) system loads, the file can't be cached as quickly, either because other files need to be (part) removed from the cache to make space, or there is a delay (timeout?) in XFC servicing read requests for this file.

I'm not actually sure how read requests get to the XFC, so I'm not sure whether or not such timeouts could occur...

Do all read requests go to the XFC first of all, and they have to wait for the XFC to say "in cache" or "not in cache" before progressing further (if it doesn't receive a response within X amount of time, does the read "just go to disk" rather than waiting for the XFC to respond?

>My suspicion is that, there is some other operation on that file (or volume on which the file resides) that is causing the contents of the file to get deposed (i.e. cleared) from the cache once in a while.

What I was observing, was that the the amount of the pages of the file being cached was constantly increasing, but the hit rate was remaining at 0%.

Obviously, I couldn't really tell whether or not some old pages were being dropped out of the cache as new blocks were being added (so, if allocated pages jumped from say 1000 to 1050, it could mean 50 new pages added, or it could mean 80 new pages added and 30 old pages removed).

>Please provide the following information about the file -
>1) XFC statistics from SDA

I've attached a file which shows two sets of XFC SHOW FILE /STAT from SDA - the first is 5 minutes after starting the job (when the hit rate was still 0%), and the second is from ~66mins after the job starts (hit rate=90%)

>2) How big is the IO size issued by the application to the file
C RTL fgets() call, with a max size of 225 bytes, but like I said, I don't know what DEC C is doing under the hood...

>3) Is the file accessed cluster-wide.
No, a single process on one node in the cluster doing sequential reads only - once the file has been created, it is not used by anything other than this job which post-processes it.

Hein:
> I kinda expect 0%. The real file is 6+ GB, the primary key data perhaps 5+ GB. That could well be larger than the active maximum cache on an Alpha. Now if it is the same mox Mark asked about before then it is a 24-CPU Alphaserver GS1280 7/1300, running OVMS v7.3-2. So that is likely to have 48 GB or more memory, and the cache may be as high as 20 - 32 GB if not actively throttled.
Yup Hein, it's the same cluster.

>But other, totally unrelated activities, maybe as silly as SEAR [*...]*.log "FATAL ERROR", could flush out those, or a part of the 6GB, and cause a tremendous slowdown compared to other days/weeks.
Well, when the job runs, the file is not in the cache, but more of the pages of the cache allocated to the file increase as the job runs.

Like I said earlier, I can't really tell whether or not XFC is dropping the pages from the start of the file, as it adds more pages as the file is read by the job.

On the face of it, it didn't appear to be the case, so it simply seemed that XFC was either doing read-behind in comparison to the job, or (if it is possible) the read has a timer driven AST so that if XFC hasn't responded within X time, the read goes "straight" to disk instead...

Steve:
>I'm assuming that even though you're in a cluster the file is only being read in on one node at any time?
One process, on one node, exclusively seuquentially reading the file, several hours after it has been created (and expunged from the cache).

Murali:
>As Hein as pointed out, The file size is bigger than the XFC cache size and hence you cannot have the entire file cached.

Having the entire file cached isn't really what we want or need - just a "window" on the bit of the file we are looking at - the file is being read sequentially, so once all of the records from a bucket are processed by the job, the job has no further interest or requirement in those records, and they could happily be expunged from the XFC.

>What is the physical memory size of the system ?
The two A/S GS1280 7/1150 systems each have 56GB, and the GS1280 7/1300 has 48GB

>When a file is accessed, it its data is present in the CRTL or RMS cache, then the request will be satisfied from there itself.
Indeed; since it seems that XFC can in fact detect that read ahead cacheing is required for this file under the right circumstances (system load?), I'm wondering whether or not it might actually be a better idea simply to have a few large RMS global buffers for the file, to ensure some kind of cacheing, rather than have the potential of failed read hits on the XFC...

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Ian Miller. — Thu, 08 Apr 2010 11:26:28 GMT

Hein has said elsewhere the only wrong answer for global buffers is zero but for a file being accessed only by one process then do global buffers behaving the same has having local buffers?

Does the code specify a multi buffer count?
If not then it should pick up values set with SET RMS_DEFAULT so you can experiment.

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

P Muralidhar Kini — Thu, 08 Apr 2010 11:58:43 GMT

Hi Mark,

>> A little bit of confusion here using SDA
>> symbols and SYSGEN params, but I'm
>> guessing you mean VCC_READAHEAD,
Yes. I meant VCC_READAHEAD SYSGEN parameter.
(VCC$GL_READAHEAD was typo)

>> Do all read requests go to the XFC first
>> of all,
Yes in case XFC caching is enabled.
Application would call QIO to perform the IO
operation. QIO would then check if XFC is
enabled, if yes then it would call XFC to
take over the IO. In case XFC is disabled on
the node then QIO would not call XFC.
Once XFC is called, XFC would do its own set
of checks to determine if the IO needs to
skip the cache.
Some common scenarios in which XFC decides
to Skip the IO are
- Caching is disabled on Volume
(MOUNT/NOCACHE)
- Caching is disabled on file
(SET FILE/CACHING_ATTRIBUTE=NO_CACHING)
- Caching is disabled on IO
(using function modifier IO$M_NOVCACHE in
the QIO call)
- IO size is greater than VCC_MAX_IO_SIZE

>> and they have to wait for the XFC to
>> say "in cache" or "not in cache" before
>> progressing further (if it doesn't
>> receive a response within X amount of
>> time, does the read "just go to disk"
>> rather than waiting for the XFC to
>> respond?

When XFC does a read IO to a file, it first
checks if the data is already available in
the XFC Cache. If YES then it returns the
data immediately. If not then it performs a
Read IO to disk. In any case, IO would
always go through XFC.

However, in case the file is shared
cluster-wide and some other node in the
cluster is doing a write operation to the
file then XFC won't be able to get a lock
on the file in the desired mode. In such a
case, XFC will convert the read-through to
read around IO. Read-around would mean, XFC
will make the IO skip the cache and let IO
happen to disk.
As you have mentioned that there is no other
node in cluster doing write IO to the file,
this scenario is eliminated.

>> (so, if allocated pages jumped from say
>> 1000 to 1050, it could mean 50 new pages
>> added, or it could mean 80 new pages
>> added and 30 old pages removed).
Yes thatâ s correct. Allocated pages only
indicates how much of the file's data is
currently in cache.

Data that you have provided
1) File is not accessed cluster-wide
From this we can rule out the scenario
where the file is written once in a while
from some other node of the cluster

2) IO Size issued by the application
Here also the application does not seem
to be doing a IO greater than
VCC_MAX_IO_SIZE

3) XFC SDA Data
>> XFC File stats from ~5 mins after job
>> starts (hit rate=0%)
The data here indicates that only a few IOs
were satisfied from the cache, for all other IO's XFC had to fetch the requested data
from the disk

>> XFC File stats from ~66 mins after job
>> starts (hit rate=90%)
Here we can see quite a number of reads
being satisfied from the cache and hence the
read hit rate is higher.

My suspicion is that, initially data is not
there in the XFC cache and hence hit rate
is very less. As data gets filled in the
cache, subsequent IO will find the data in
cache and hence the read hit rate increases. Sometime later, a logical IO might have been
performed on the Volume as a result of which
entire data on the volume gets cleared.
Next set of reads to the file has to now
fetch the data again from the disk, this
would now reduce the cache hit rate.

Some questions -
1) Is that every time the application runs,
it gets 0 hit rate in the beginning and
the hit rate increases after some point
of time.

2) When does the hit rate become 0.
Only when application starts accessing
the data for the first time or some other
time also.

3) Are you aware of any logical IO's being
performed on that volume.
If the disk is mounted cluster-wide then,
are any other nodes performing any
Logical IO to the volume.

>> everyone is looking for a quick fix "X is
>> wrong; Change Y to Z and that will fix
>> it, or at least be a workaround, whilst
>> we can schedule recoding the program into
>> the plan...
One suggestion for workaround would be to increase the XFC size -
The current Physical memory size is 56GB
(GS1280/1150) and 48GB(GS1280 7/1300).
You had mentioned that XFC is sized at
2.75GB. One suggestion would be to increase
the XFC size from the current 2.75GB
to 8GB. XFC is tested with memory sizes up
to 8GB and hence you can increase the current
size of XFC to 8GB for better performance.

Regards,
Murali

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Mark Corcoran — Thu, 08 Apr 2010 13:04:30 GMT

Ian:
>Hein has said elsewhere the only wrong answer for global buffers is zero but for a file being accessed only by one process then do global buffers behaving the same has having local buffers?

Mea culpa. Global buffers (as I understand them) isn't actually what I meant - I meant the buffers that SET RMS_DEFAULT /BUFFER= refers to.

>Does the code specify a multi buffer count?

Although the C RTL does allow specification of RMS options on the fopen(), it just specifies "r".

>If not then it should pick up values set with SET RMS_DEFAULT so you can experiment.

Great minds think alike - I was just doing some back-of-a-fag-packet calculations on what to use, and will post results back here.

The problem of course is that between tests, you have to wait for any cached part of the file to be expunged (either through normal system load, or to force it using something like "SEA [...]*.* blah") before you can test again.

Murali:
>Caching is disabled on Volume
Not in this case (obviously, otherwise none of the file would appear in it :-)

>Caching is disabled on file
Again, not in this case ("Caching attribute: Writethrough")

>Caching is disabled on IO
It is an fgets() call in the C RTL, but I wouldn't have thought that it would do any disabling.

>IO size is greater than VCC_MAX_IO_SIZE
Unless fgets() is doing something weird, when told to read 225 bytes.

>Sometime later, a logical IO might have been performed on the Volume as a result of which entire data on the volume gets cleared.
>Next set of reads to the file has to now fetch the data again from the disk, this would now reduce the cache hit rate.

I'm not certain what you mean in this context by a logical IO - can you give some examples?

I can understand that dismounting the volume (or a member of its shadow set) could cause this issue.

However, could "SEA [...]*.filename_type blah" really do this?

I'm not sure whether or not you mean this is a logical IO which would cause that volume's cache contents to be expunged...
...or if there is a per-volume limit in the XFC, and that depending on that limit (and the size of files be SEArched), this would cause the existing volume's cache contents to be expunged, to make way for those files that SEARCH is processing?

>1) Is that every time the application runs, it gets 0 hit rate in the beginning and the hit rate increases after some point of time.

Hmm, unfortunately, there's no statistics available from previous daily runs (unlike many of the other jobs, it actually runs at 09:00, so I don't need to log in in the middle of the night, to check).
However, from my manual testing, this appears to the case.

>2) When does the hit rate become 0. Only when application starts accessing the data for the first time or some other time also.
From my manual tests, it only appears to be when it starts accessing it for the first time (and where the start of the file is not in the cache), that the hit rate is 0.

I'm not sure how the cache hit reporting code works, but the only way for the rate to drop to 0% whilst the job runs, would be due to mathematical rounding...

(there will have been some successful hits, so successes/attempts would always yield a non-zero value unless you round it down.

>Are you aware of any logical IO's being performed on that volume.
>If the disk is mounted cluster-wide then, are any other nodes performing any Logical IO to the volume.

It depends on what precisely you mean by logical IO.

The volume is mounted cluster-wide.

Normally, nothing else creates files on this volume; the only other activity may be a backup or defragger job that runs overnight, but it should be finished by 09:00 when this job runs.

Re: 0% read hit rate on XFC cache for RMS indexed file being read sequentially using C RTL fgets()?

Ian Miller. — Thu, 08 Apr 2010 13:59:14 GMT

If there was a job before this one which accessed lots of other files on the same disk then the file in question is not going to be in the XFC.

I guess your real aim is to reduce the elapsed time of this job that processes the RDB dumped tables. What has happened in previous runs is only useful in perhaps helping you to reduce the time for future runs.

You can specify RMS options on the C fopen.

If there are no RMS options specified now then you can experiment with
$ SET RMS/INDEX/BUFFER=3/BLOCK=

I wonder about the physical layout of the file and could CONVERT help.

logical I/O - different from the usual virtual I/O which address a file as a array of blocks starting at 1 - logical I/O addresses a disk as a array of blocks starting at 0. Unlikely although I wonder about the defrag job - has it finished when this job starts?