Online Expert Day - HPE Data Storage - Live Now
April 24/25 - Online Expert Day - HPE Data Storage - Live Now
Read more
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

poor FTP performance with large directories

VMSRox
Advisor

poor FTP performance with large directories

Hi,

We are experiencing poor ftp server performance when accessing a directory with many (30,000+) files.

This seems to be the case regardless of ftp client (tried several different Windows GUI clients, DOS command prompt ftp, VMS ftp client, Unix ftp client). This is also the case using the VMS ftp client directly on the ftp server.

The slowness is evident just in doing a ls or dir to list the directory contents. It takes about 15 minutes for the listing to finish. It is strange because at first the directory listing is fast, until it gets somewhere in the range of 10,000 - 15,000 files, then it slows drastically until the listing is complete.

I upped the acp_dircache setting to be 3072 (previously was set to 1712), but there didn't seem to be any affect. I also defined the TCPIP$FTP_WNDSIZ logical to 8192, again with no noticable affect.

For sake of comparison I setup a Pathworks share on this same directory and using a mapped drive can retrieve a directory listing in about 1m20secs. I exported the directory via NFS and using a remote mount am able to get a directory listing in about 8 mins which is slow, but still twice as fast as FTP. I monitored via TCPDUMP but didn't notice any obvious problems. I also installed the HGFTP server and it exhibited the same problems. Is this simply a problem that is inherent in ftp with large directories? Or are there some other tuning parameters that I should try?

HP TCP/IP Services for OpenVMS Alpha Version V5.4 - ECO 5 running OpenVMS V7.3-2.

Does anyone out there use FTP on large directories such as this? What type of performance do you see? In order to avoid any changes to the application we need to ftp using this directory. The obvious solution would be to move the files of interest to a smaller directory and ftp from it, but I don't know if that will be possible.

Thank you for your help.
11 REPLIES
labadie_1
Honored Contributor

Re: poor FTP performance with large directories

I think this is not a Tcpip/network problem. Can you check how long a basic
$ dir
needs to display locally the 30 000 files ?
Can you use a search-list and split the 30 000 files in 10 directories of 3 000 files for example ? How long a dir needs to display 3 000 files ?
Volker Halle
Honored Contributor

Re: poor FTP performance with large directories

VMSRox,

I happen to have a directory with 29549. files on our rx2600 on a USB disk.

FTP> LS takes about 2:45 minutes
FTP> DIR takes about 4:02 minutes

$ DIR node::dna0: takes about 2:32 minutes
$ DIR/DAT=CRE/SIZ=ALL/OWN node::dna0: takes about 4:40 minutes

Output is sent to a remote screen (-VPN-DSL-Powerterm) in all cases, so it's comparable. All tests have been done between the same pair of nodes.

CPU load increases on the FTP/DECnet server node over time during this operation. High kernel mode, caused by file system (check MONI FILE,FCP).

So DECnet and FTP remote file lookups seem to perform comparably.

Volker.
Volker Halle
Honored Contributor

Re: poor FTP performance with large directories

Local operations:

$ DIR dn0: takes about 0:36 minutes
$ DIR dn0: takes about 3:35 minutes

Output still goes to the screen.

Major difference between local DIR and FTP/DECnet is that MONI FCP shows File lookup Rate = 0 in the local case. File lookup rate is HIGH with the FTP/DECnet test, then suddenly drops to about 50% while Dir Data Attempt Rate (MONI FCP) jumps into the 200000s, Hit Rate 100%

Volker.
Richard Whalen
Honored Contributor

Re: poor FTP performance with large directories

The FTP NLST is equivalent to a VMS DIR, which just reads the directory file and displays the information in it (a list of file names).

FTP LIST command is equivalent to DIR/SIZE/DATE/OWNER/PROTECTION. This requires readind the directory file and opening each file referenced by it to get the specifics about the file.
Volker Halle
Honored Contributor

Re: poor FTP performance with large directories

Local operations (sending the output to NLA0:)

$ DIR dn0:/OUT=NLA0: takes about 2 seconds
$ DIR/OWN/SIZ=ALL/DAT=CRE dn0:/OUT=NLA0: takes about 1:30 minutes

As soon as information is required to be read from the file headers on disk, the performance drops significantly. Still no File Lookup operations performed for local access.

Both DECnet and FTP do lots of File Lookup operations (see MONI FCP). This seems to be the major difference.

Volker.
Volker Halle
Honored Contributor

Re: poor FTP performance with large directories

A simple F$SEARCH loop (locally) will also take a considerable amount of time, causing File Lookups. At some point in time, Dir Data attempt rate will also jump to a very high rate (>100000/sec).

For my , it takes 2:35 minutes

This seems to be a comparable operation with similar performance to what FTP server and FAL seem to be doing, less the actual transfer of the data over the net.

So this does not seem to have anything to do with FTP or DECnet. Handling of large directories seems to be the problem here...

Volker.
VMSRox
Advisor

Re: poor FTP performance with large directories

Hi,

Thank you for your replies. Here are the answers to some of the questions:

labadie: doing a local VMS dir command is very fast. Takes less than 5 secs to display all files - this is on a terminal emulator session over the network. I created some search list logicals and attempted to use them via ftp. The first search list contained just over 5,000 files and the ls command was fast. The second contained around 10,000 files, and at first was fast, but then part way through slowed considerably. It slowed at about the same point in the listing that slows when doing a ls on the full directory.

Richard: it appears that the DOS command prompt ftp used nlst by default when issuing a ls command. At the end of a ls command, the output says "226 NLST Directory transfer complete". So even with nlst I am seeing very slow performance.

Volker: I agree with your assessment. When doing "monitor file" while doing the ftp listing of the large directory, the "Dir Data" attempt rate is around 900 early on, while the listing is still quite fast. Then when the listing slows down, the "Dir Data" attempte rate jumps to around 10,000! Also the Hit % starts to drop from 99% to the low 80s. That is why I though upping the ACP_DIRCACHE setting would help, but it didn't seem to have any impact. You seem to be having much better performance than I am, even with about the same number of files. Would you mind comparing the attached listing of ACP settings to what you have set on your server and let me know how they compare, or would you mind posting your acp settings? Or maybe there is some other setting that I should be considering?

Thank you again!
Volker Halle
Honored Contributor

Re: poor FTP performance with large directories

VMSrox,

I've attached the ACP settings of our rx2600 V8.2 system for comparison. ACP_DIRCACHE is (explicitly) set to 8000 on our system, as I've been testing SAMBA with large directories some time ago.

My is 2780. blocks in size. Dir Cache hit rate is still 100%, even when when dir cache attempt rate reaches between 200000 and 240000. There are nearly no disk-IOs (in case of NLST, i.e. just DIR), if you would use DIR/OWN..., you need to add the IOs for accessing each file header. The CPU (kernel mode usage) then jumps to 70-80% (on an idle rx2600 !). The CPU is mostly spent in F11BXQP (according to SDA PCS - PC sampling).

And again, a simple F$SEARCH loop provides the SAME load and results.

My 'gut feeling' is, that - once the dir cache is filled - the XQP spends a lot of CPU time searching in the cached dir data.

I've created some nice T4 data (10 seconds sampling rate) comparing a F$SEARCH(large_dir) loop and a DIR node:: - you can cleary see the same effects. If you want to download TLVIZ from the T4 home page, I can probably post the .CSV file.

Volker.
Volker Halle
Honored Contributor

Re: poor FTP performance with large directories

This is an interesting problem and very easy to reproduce:

- write a little DCL procedure to do about 100 file lookups per second in a large directory (e.g. 29549. files in a 2780 block .DIR file in my case):

$ cnt=0
$loop:
$ file=F$SEARCH("*.*;*")
$ if file .EQS. "" THEN $ EXIT
$ WRITE SYS$OUTPUT "''cnt' - ''file'"
$ WAIT 0:0:0.01 ! wait 10 ms
$ cnt=cnt+1
$ GOTO loop

- run the procedure (on an idle system) and watch with $ MONITOR FILE,FCP,MODE

In the beginning, you'll see about 80-90 File Lookups per second and about 90 Dir Data attempts/sec, Kernel mode is low.

Then (after about 10500 files - this may vary due to filename distribution, filename size etc.), you'll see a sharp increase in kernel mode time and a huge increase in Dir Data attempts/sec (up to 120000 in my case), Dir Data Hit rate is still 100%, so all dir data is in memory, but the XQP seems to burn lots of CPU cycles to 'find' the next filename.

I assume the problem to be in the ineffectiveness of the directory index cache implementation for 'large' directories, resulting in sequential searches through the directory data.

Volker.
VMSRox
Advisor

Re: poor FTP performance with large directories

Hi Volker,

Thanks again for the help. I've modified the acp settings on our test server and am seeing similar results to what you reported, although this box is older and slower so my times are much longer than what you reported.

With the changed ACP parameters, doing an ftp ls takes about 7m23s. Running your f$search command procedure (with the write sys$output and the wait command commented out) takes about 6m16secs. This is on a directory with 29,119 files.

Running the same command procedure on a directory with 10,000 files takes 14 secs. As a test, I added an additional 1,000 files to this directory and the command procedure took 1m13 secs. So it took about one minute to read the addition 1,000 files. Extrapolating using the time to run on the 10,000 files, it should take the command procedure about 42 secs. to run on the original directory of 29,119 files.

So based on this it appears that what you stated about the XQP inefficiency with large directories (over 10,000 files) is exactly correct.

It is interesting though that doing a directory listing via a drive mounted using Pathworks does not exhibit this problem. The directory listing took about 1m10 secs, and the Dir Data attempt rate never exceeded 2,000, unlike when running the command procedure or doing an ftp listing where the Dir Data attempt rate shot up as high as 145,000. Also, simply issuing a VMS dir command on the directory is very fast and never shows high Dir Data attempt rates. We have a different directory that has over 60,000 files in it, and the Dir command has no issues there either.

So VMS dir and Pathworks must have a more efficient means of listing the dir contents than does ftp or f$search.

Thanks again for the help. I will continue to look into this problem and will post any worthy info if I find any.
VMSRox
Advisor

Re: poor FTP performance with large directories

I spoke with someone in VMS engineering and was told there is a known performance issue when VMS directory files exceed 1,000 blocks in size.

According to the engineer directory sizes less than 500 blocks perform fine. Between 500 - 1000 blocks performance degrades with growth. Over 1000 blocks performance degrades significantly.

Performance also depends upon whether VMS directly accesses the directory data or if RMS does the lookup. RMS is used if any wildcarding is done. The RMS directory cache size is 128 blocks, so larger directories cause increased IO's. Even if the entire directory is cached (via ACP_DIRCACHE) many IO's still occur to the cached directory which is evident in a CPU maxed in kernel mode.

This is paraphrased, but I think it is captured correctly.

FTP and f$search must use RMS to get directory listings. While DIR (without wildcards) and Pathworks likely do not. That explains the difference in performance.

Thank you again for all of your assistance. You folks on the VMS forums are great!