Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

'Dir Data' - File System Caching Headache!

Edmundo T Rodriguez
Frequent Advisor

'Dir Data' - File System Caching Headache!

Hi

 

I previously brough some problem to the attention of all of you in order to get some advise

and expect to get through you comments a solution to another one.

 

I have an Alpha ES47 (4 CPU - 32 GM mem) system runing OpenVMS 8.3 and for quiet sometime

I been dealing with the File Ssystem Caching. Found to way to improve some of them but at least

one is causing me a headache: Dir Data ( SYSGEN ACP_DIRCACHE)

 

Attached you will find a document which show some of the captured stats since June and some

of the system status output with my calculation of how I believe should be worked out.

 

I may need to receive some enlighment here because I already tune the system couple of time

since end of May and the cache tune, seems like going the other way around.

 

Please, could you provide some ideas?

 

Thank you very much.

 

11 REPLIES
Hoff
Honored Contributor

Re: 'Dir Data' - File System Caching Headache!

Please consider posting a somewhat more detailed problem statement — some information on what is wrong here; about the particular performance degradations seen, misbehavior or other symptoms, and why you're looking at XFC settings. 

 

Some idea of what's going on when the directory cache activity spikes might help, too — a BACKUP or a "crapplication" can play havoc with cache activity, for instance.   Identify XFC activity, and also look for hideously large directory files, either on the disks or created as part of local application activity.

 

Also try replicating this on OpenVMS V8.4 with more current patches, and not with software from ~eight years ago, too.

 

abrsvc
Respected Contributor

Re: 'Dir Data' - File System Caching Headache!

I'm not sure here what the problem actually is!! You have hit rates aproaching 100%. Can't get much better than that. Even with the absurdly high attempt rates, the hit percentage is high. I'dbe looking at what is causing these attempts rather than trying to meake the cache more efficient.

To be perfectly honest, I'd suspect that the atempt rates are not correct and perhaps are more of a previously unseen problem rather than reality.

As Hoff has suggested, please describe what is running on the system when these rates are seen.

Dan
Edmundo T Rodriguez
Frequent Advisor

Re: 'Dir Data' - File System Caching Headache!

Hi Hoff/abrsvc, sory for not explaining enough.

 

Reality is that this issue is somehow a 'sequel' or connected with a problem I presented during mid May with a 'Performance Problem while running with 4 CPU' were most of the Monitor Mode time was (and is) reflected in

'User Mode' while there was not a big correlation with what was going on the system.

 

After some analysis and hints by Volker, Kier, Hardly and abrsvc, we came to the conclusion that the data presented pointed to a possible bottleneck in the XFC.

 

After been following that recomendation for the past two months and seen an improvement in overall XFC, I am still disapointed mainly due to what 'abrsvc' emphasize as 'absurdly high attemp', because such behavior is disturbing.

 

Let me bring back the point about the overall setup of this server: is a production system 24x7x365 which minimum possible time for maintenance where we are runnning Intersystem Cache and GE Centricity Business (previous IDX FlowCast) with 83 db some really huge by themself in a disk-volume but with few disk-volumes with some large directory files were files are continuously generated, read, transfer, etc., etc.

 

            We cant migrate to 8.4, at least not yet.

 

I been trying to reduce the fragmentation at both the Caché databases and at the disk-volume RMS level, but that is not easy when all this thing takes so long and the window maintenance time is too short. So, I need to be behind this system to prevent any possible degradation and this took me to keep trying to improve overall XFC.


Hope this clarifies.

 

abrsvc
Respected Contributor

Re: 'Dir Data' - File System Caching Headache!

This may not be the right place for this discussion as there needs to be a significant amount of investigation. User mode usually translated to "usefull" work being done as opposed to the other modes which account for the OS overhead to accomplish that work. That being said, seeing high amounts of user mode is not, in and of itself, a concern.

If you are not getting the throughput you expect (transactions per unit time), then the execuiting code needs to be examined for efficiency. I don't recall the previous discussion (but I'll look it up now). The caching usually speeds up events not slow them down. If the cache activity seems to slow things down, a closer examination of how I/Os are done is in order. With the high hit rates you are reporting, I don't see how you can get much better other than to avoid the I/Os altogether.

Where are you located geographically, perhaps a site visit to gather more context is in order...

Dan
Edmundo T Rodriguez
Frequent Advisor

Re: 'Dir Data' - File System Caching Headache!

We are located east USA

 

Now easy, there has to be some information which somebody may be aware of, which may be connected with the main reason of why the disparity between High-Hit-% and extremely High-Attemp-Rates; and I am mainly talking about 'Dir Data' XFC. 'File Hdr' XFC may be a side effect.

 

I want to clarify that I am not saying that the system is really degradated, in overall is doing ok,
but that is not behaving the way I believe should after upgrading it to ...
4 CPU, 32 GB memory, and leaving a bigger pagefile alone in its own disk with the pertinet performance tweaking, etc., specifically while running jobs that are tied to reporting which increments 'User Mode' dramatically.

 

We did talk and analyzed this before: with frequency the Extended File Cache Read Hit-Rate goes down as low as 22%.

 

Based on what HP says,

 

 at 50% Hit-rate is the point at which performance gains from caching

               begins to exceed the overhead required to maintain the cache.

 

 If that is true and and related cache has been consequently incremented and system tuned accordingly then
 

WHY there is not balanced effect on the High-Attemp-Rates?

 

For me, something is not working the way is suposed to!

 

 

Hein van den Heuvel
Honored Contributor

Re: 'Dir Data' - File System Caching Headache!

Here is an other way of looking at hit rate:

  99.99% cache hit rates (any cache!)  suggest a piss-poor application design causing it to look for the same data over and over.

  

>> disparity between High-Hit-% and extremely High-Attemp-Rates; 

 

That's BS. The only thing that ties the two together is ACCESS PATTERN, not a formula.

 

The only fomulas is, using helper variable miss-rate:

   Miss-Rate = (100% - Hit-rate)

   IO-Rate = Miss-Rate * Attempt-Rate.

 

>> the Extended File Cache Read Hit-Rate goes down as low as 22%

 

Possibly more BS, see opening statement.

 

Do you think 22% is good or bad? There is no telling! It can be very good indeed or very bad.

The attached log showed no information on XFC size, or access rates.

 

You indicated this was Cache, A database, presumably with a good cache, good read-ahead, and with lots of memory.

The better the DB cache works, the less it has to go to the disk, and when it goes to the disk is does so for data that either has never been touched, or has been touched a long time ago.

Now if there are lots of pages in the DB cache, why is there any reason to believe that if a page is NOT there, that it could possibly be in the XFC cache ? (I'll give you one reason... read ahead working different between the two.)

High Cache internal cache hit rates would be consistent with high user mode... the data it needs it there to work with, no need for an IO... no need for system time.

 

Now in a cluster, a low cache rate may be very bad but easily explained.

If the dominant file access is to a cluster wide write shared file then the XFC can not help the system.

All it will do is start to build up a cache, only to walk away from it all when the other node issues a write.

 

When I tuned systems for RMS indexed file users, I always made it a point to set the right expectations, one of them that the goal of RMS tuning, or perhaps a measure of succes of that tuning, was to LOWER the XFC cache rate

RMS can cache the data closer to the user, with less lock activity, and is cluster friendly. 

As the RMS cache and access pattern is improved, there is less to be done by the XFC. 

Seeing it's hit rate, and access rate, go down is a good thing!

 

High XFC cache rates point to a stupid application which reads the same data over and over. Agreed?

 

 

Is there a T4 file to peruse? Perhaps attached to the older topic?

Have a handy pointer to the older topic? (I can probably easily find it by author)

 

Cheers,

Hein

 

Edmundo T Rodriguez
Frequent Advisor

Re: 'Dir Data' - File System Caching Headache!

The bottom line conclusion of your exposition is a little bit difficult to be applied to my problem due to a couple of facts exposed in my previous post (sorry you may have to read it)


link:

http://h30499.www3.hp.com/t5/General/Help-Performance-Problem-Can-t-find-4-CPU-are-been-loaded/m-p/6484644

 

You see, main thing is that this system type of application environment, quiet depends on a big memory reservation were most of the Caché db instance (similar MUMPS), use it righ from the system startup, were all the db caching is done
(all 83 db namespaces/globals has the following 'Caching attribute' enabled: No_caching - at the file level)

This is NOT a 'piss-poor application design effect, there is no 'BS' here.

 

So, all the rest of the Application environment processes generating IO which are really depending (affecting) XFC in multiple ways. All these db's are distributed across shadow-sets (dual-member) on a fast SAN environment with redundant fiber connectivity.

 

The only main problem generated through time that I found is disk-volumes fragmentation; were there is alot writing apart from the db writing, which actually is affected by multiple expansion caused by the big dynamic activity of users, which eventually need compaction.

 

Sorry, but what I perceive is that...

 

'High XFC cache rates "DOES-NOT" point to a stupid application which reads the same data over and over'

 

 May be I don't explain myself good enough for OpenVMS gurus!

 

Hoff
Honored Contributor

Re: 'Dir Data' - File System Caching Headache!

So in summary, the environment is running adequately for current needs, and you're off investigating potential performance problems.

 

OK.  

 

So the usual approach toward this?  Gather T4 data over a week or a month or (better) much longer, and determine the trends for the applications and the environment.  This is far beyond the cache rates, too — detecting a trend that's headed toward some I/O, CPU, memory or "wallclock processing window" limit is far more useful than any individual data point.

 

Investigating I/O cache rates (prematurely) probably isn't the most effective approach here, unfortunately.  

 

Cache hit rates — whether high or low — are a symptom of something else that's going on here, after all.

 

You've been pointed at T4 in that previous thread.  What are your current trends?

 

As for the application portion of a typical performance investigation, that involves a detailed look at the source code, and the investment here can range from minimal to a large investment in time and effort.   Some dumb design decisions in source code can be remedied quickly and easily.  Other application design decisions can be much more intractable.   If some underlying tool — Caché or otherwise — proves to be the limit, then you'll have to check with the vendor, and check for alternative application algorithms or designs.   If you have a test environment, build the code with whatever integrated performance tools are available enabled, and look at using DECset PCA or some similar tool to profile where the application is spending most of its time.  

 

Assumptions here are often hazardous, and PCA and similar profiling can point to application code that was never even remotely considered to be a performance or CPU or I/O limit.  Surprises can happen all too often when profiling.

 

Quickest fix for many of these cases is the classic "throw hardware at the problem" — you're on an old and slow server, for instance, and your storage is very likely contemporary with that old server.   An equivalent dual-socket Itanium server is generally far faster than that old Alpha can be, and newer SAS and particularly SSD storage is very fast stuff.  You'll need to determine if your environment can be reasonably migrated, of course.

 

But until you know where your application processing time is going and what the longer-term performance trends are, the rest of the discussions I've posted here are arguably premature.  There's seldom a quick answer to a performance problem.  Not without investing time and effort in research.

 

If you want to see a systematic approach, the OpenVMS performance management manual can give you some guidelines.  That's a little old, and probably doesn't go into the depth that's really necessary here.  But it shows a general process of establishing theories and then collecting data and proving or disproving the theories.

abrsvc
Respected Contributor

Re: 'Dir Data' - File System Caching Headache!

I would agree and state more clearly that you may be focusing on the trees rather than the forest. It is not out of the realm of possibility that the workload is at or nearing the total capacity of the machine.

Also, you will only have somewhat limited ability to change the behavior of Cache. Something as simple as a database index addition or change could make a huge difference here. Please note that as the database grows, you may be hitting bottlenecks that weren't apparent before.

Dan
Volker Halle
Honored Contributor

Re: 'Dir Data' - File System Caching Headache!

Edmundo,

 

when looking at the T4 data you've posted to the other thread, I only see high Dir Data attempts between 22:58 and 23:05 on both days (19 and 20-MAY-2014). Has this changed now ? Could you post current T4 data ?

 

Did you try to look for big directories on your most busy disks ? Use $ DIR/SIZ/SEL=SIZ=MIN=1000 disk:[*...]*.DIR

 

Volker.

Hein van den Heuvel
Honored Contributor

Re: 'Dir Data' - File System Caching Headache!

>> The bottom line conclusion of your exposition is a little bit difficult to be applied to my problem.

 

Hmm, ok. I think it applies perfectly.

 

>> due to a couple of facts exposed in my previous post (sorry you may have to read it)

I remember that now. Actually found the T4 files on my system when I tried to re-download.

There are a few scheduled jobs (every 15 minutes early day, and ever 10 minutes peak business) that seem to stress the file system some. They are probably polling a directory over and over.

This can possibly be improved with a directory AST ... if there is a need to improve them

 

>> You see, main thing is that this system type of application environment, quiet depends on a big memory reservation >> were most of the Caché db instance (similar MUMPS), use it righ from the system startup, were all the db caching is >> done (all 83 db namespaces/globals has the following 'Caching attribute' enabled: No_caching - at the file level)

>> This is NOT a 'piss-poor application design effect, there is no 'BS' here.

 

Exactly my point. In a well writen application like that you can NOT expect the (file) system  to help.

Low cache rates are just an indication this plan is working, you should not expect, nor want to try to fix that.

 

 >> The only main problem generated through time that I found is disk-volumes fragmentation; were there is alot writing apart from the db writing, which actually is affected by multiple expansion caused by the big dynamic activity of users, which eventually need compaction.

 

Hmmm, doesn't look too bad to me.

 

>> Sorry, but what I perceive is that... 

>> 'High XFC cache rates "DOES-NOT" point to a stupid application which reads the same data over and over'

 

Well, it does, but his application, according to the T4 data is not one those.

 

>> May be I don't explain myself good enough for OpenVMS gurus!

 

You are explaining yourself fine, but you are not listening to your own explanation.

The system appears to be working fine... from a VMS perspective.

Lots of USER mode time, no lock contention, and so on, so there is little or nothing to tune form the SYSTEM perspective.

Any and all improvements will have to come from the DB (Cache) and its users.

 

What you have not explained fine, or maybe I missed it is why you are looking at the numbers you are looking at.

Those are just numbers, number may be an indication things can be improved, or just represent work being done.

 

What are the main complaints?

What are the troubling times... the morning and afternoon camel humps from 8:30 - 17:30 when the users come in?  That (backup) batch job that really kicks in 20:40 - 22:40 (batch-dirio, shad-dirio, xfc cache-bypass-read  ... and more all peak.))

 

Good luck.

Hein