Operating System - OpenVMS
1752587 Members
3734 Online
108788 Solutions
New Discussion юеВ

Re: Java performance issue on large directories

 
Anders Wallin
New Member

Java performance issue on large directories

Hello,

I am using SVNKIT (Java Subversion Client) on VMS Itanum V8.3. I have noticed that performance declines rapidly when checking out a large number of files to the _one_ directory .

Checking out 2700 files to one directory took close to an hour. When splitting the files into 7 subdirectories the execution time dropped from one hour to 10 minutes.

Has anyone noticed this? Anyone knows what to do about it? I have done the usual stuff, installed Java sharables resident, using pure ODS5 etc but it doesn't really help.

Any input or suggestion i s most welcome.
7 REPLIES 7
Hein van den Heuvel
Honored Contributor

Re: Java performance issue on large directories


10 minutes still sounds like 'for-ever'.

How long is the average file name?

Can you do something about order?

OpenVMS Directories 'like' two modes of acces.
1) add to end, and delete from end
2) removes and inserts ROUGHLY in the same sort order.

You may want to MEASURE what the system is doign during those 10 minutes (or that hour)
using $MONI FCP,FILE
Bettter still, have T4 do the collection and replay.

You may need to ADJUST the SYSGEN ACP params for this application.
Notably: $MCR SYSGEN SHOW /ACP
ACP_MAXREAD
ACP_DIRCACHE

Hope this helps some,
Hein van den Heuvel ( at gmail dot com )
HvdH Performance COnsulting




Craig A Berry
Honored Contributor

Re: Java performance issue on large directories

Hein covered the VMS-specific gotchas related to large directories (though I'm not sure 2700 files counts as all that large anymore). I think a more likely culprit would be SVNKIT loading tons of data into memory at the same time and causing excessive page faults. So check your working set usage, paging file useage, fault rates, etc. You can also probably use Java tools to trace what it's doing as well as more traditional OpenVMS tools.
Anders Wallin
New Member

Re: Java performance issue on large directories

Well, I finally stumbled on the answer.

I am posting this for the benefit of other OpenVMS SVNKIT users.

I have had problems checking out/updating large directories, 2700 files or more. The check out time was 60 (sixty) minutes and the CPU time just as much.

After defining svnkit.symlinks=false the checkout time dropped from 60 minutes to 4 minutes.

I added the following change to startup file "jsvnsetup.openvms".
It is important that the quoting is exactly as below to get the final jsvn command right.

....
$ OPT = """""-Dsvnkit.symlinks=false"""""
......

Anders Wallin
Hein van den Heuvel
Honored Contributor

Re: Java performance issue on large directories

Thanks for following up!
Excellent feedback.
Of course it begets the question what it all does extra when symlinks are enabled. Might be a porting problem? Did it matter whether the directory was on an ODS-2 or ODS-5 structure level disk, or is ODS-5 requiered per chance?!
Anyway.. thanks!
Hein.

John Gillings
Honored Contributor

Re: Java performance issue on large directories

Does "checking out" involve deleting directory entries? Could this be a manifestation of the large directory delete issue?

What is the size of the directory file itself, and does the operation involve deleting files (or directory entries)?

A crucible of informative mistakes
Craig A Berry
Honored Contributor

Re: Java performance issue on large directories

The technical explanation of what they are doing is here:

http://www.nabble.com/Re%3A-svnkit-cli-slow-on-large-dirs-%28many-files%29-p12255769.html

Basically the JDK does not have a way to check whether a file is a symlink, so they do some evil hack to figure it out. What it *sounds* like they are doing is spawning an ls command and rooting through the output. I think the polite thing to call such an approach is "architecturally challenged."

In C you would just call lstat() and check the st_mode field with the S_ISLNK macro. Seems like a few lines of JNI code would do the trick, though then they'd have platform-specific bits to maintain and distribute.

Turning off symlink support does seem like the right thing to do here.
Dennis Handly
Acclaimed Contributor

Re: Java performance issue on large directories

>Craig: Basically the JDK does not have a way to check whether a file is a symlink ...
>In C you would just call lstat

On HP-UX, I see calls to readlink(2).