Operating System - OpenVMS
1751858 Members
5770 Online
108782 Solutions
New Discussion юеВ

How to improve PIPE performance

 
SOLVED
Go to solution
Peter T Jackson
New Member

How to improve PIPE performance

I have a 700,000 block text files, zipped to 90,000 blocks.
Unzipping takes 11 minutes.
Searching takes 6 minutes.
Using PIPE to send the output of the unzip directly to the zip and avoid using a temporary file takes 50 minutes.

Is there anyway to reduce the overhead of using PIPE?
12 REPLIES 12
Craig A Berry
Honored Contributor

Re: How to improve PIPE performance

I assume you mean "directly to the search" rather than "directly to the zip"? In other words, you only care about the search results and not the file itself? In any case I'm not sure why creating a temporary file is considered a bad thing, and clearly it's faster in your case.

I don't know of any way you can influence PIPE performance directly, though you might want to look at

$ help pipe description Improving_Subprocess_Performance

I doubt the recommendations there will make much difference in your case since I suspect the overhead comes from interprocess communication rather than from the initial spawning of a subprocess.

If you have a *lot* of memory, you could try unzipping to a RAM disk and doing the search there.

If the situation merits custom programming, you could start with the unzip sources and write something that searches unzipped output on the fly.
Martin P.J. Zinser
Honored Contributor

Re: How to improve PIPE performance

Hello,

assuming you are at 7.3-2 you might want to look at

http://h71000.www7.hp.com/doc/732FINAL/5763/5763pro_046.html

Altough this discusses the pipe C-RTL function since the DCL pipe is most probably implemented in C setting the logicals might affect it too.

Defining DECC$PIPE_BUFFER_SIZE to 65535 might not be a bad value.

And yes, I know there is at least one person here in the forum who is much more competent to comment on this ;-)

Greetings, Martin
Craig A Berry
Honored Contributor
Solution

Re: How to improve PIPE performance

Martin,

Unless this changed recently, DCL's PIPE command uses the undocumented pipe driver and bears no relation to the CRTL's pipe() function. Both do interprocess communication, but I don't know if there's any way to adjust buffer sizes for the pipe driver.
Craig A Berry
Honored Contributor

Re: How to improve PIPE performance

For a comparison of the pipe driver with the CRTL pipe from the guy who wrote the pipe driver (MPA0:), see

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=3B30F1EA.17FB4BAE%40compaq.com.doom
Peter T Jackson
New Member

Re: How to improve PIPE performance

Thanks for the answers.

Yes I meant "directly to search".
The files can be much larger than the one I used for my testing. The largest I have seen was over 11 Gbytes unzipped. Temporary files that large are a problem.
The system that generated that large a file had lots of memory but it was in use.

The procedure is written in DCL so that it can be easily distributed and so that security conscious customers can check that it is not a risk.
Often it will be run on the text file directly. The option to handle zipped files is there to avoid having to find the disk space to unzip them after receiving the zipped version over the internet, and so to reduce the size of the collection of files I use for testing. PIPE is simpler than trying to automate the handling of a temporary file.

I looked at the suggestions in help and played around with set RMS before asking here.

Pete
Brad McCusker
Respected Contributor

Re: How to improve PIPE performance

As others have said, DCL pipe is not implemented via C RTL pipe().

To help complete the circle of information in this string, the C RTL pipe implementation is based on mailboxes and the logicals DECC$PIPE_BUFFER_SIZE and DECC$PIPE_BUFFER_QUOTA directly translate to corresponding parameters in the crembx call.

Brad

Brad McCusker
Software Concepts International
Craig A Berry
Honored Contributor

Re: How to improve PIPE performance

I think the only options available to you, Peter, work against your requirement that it be something available out of the box in DCL.

If that requirement can be lifted, there are all sorts of PC utilities that search within zip archives, and it might be possible to find an open source one and port it.

I also found a Perl-based solution. There's no way to know whether it's faster than PIPE without trying it in your environment, but it might be worth a look. If you want to try it you'll have to have Perl and install the following extensions:

Compress::Zlib
Archive::Zip

These can be obtained from http://search.cpan.org. They have some problems building on VMS, but it can be done. I can probably walk you through it if you're interested.

Archive::Zip comes with a sample script that does exactly what you want, i.e., searches the contents of a zip archive. The following example does a case insensitive search for "fall" in a zipped version SYS$MANAGER:SYLOGIN.TEMPLATE.

$ zip archive.zip sys$manager:sylogin.template
adding: [.SYSMGR]SYLOGIN.TEMPLATE (deflated 61%)
$ perl zipGrep.pl "(?i:fall)" archive.zip
sysmgr/sylogin.template:$! process logins. Each section falls through into the next section,
sysmgr/sylogin.template:$! the "Batch" section. (Note that all "Interactive" users will "fall
sysmgr/sylogin.template:$! then fall through again into the other sections.)
sysmgr/sylogin.template:$! Fall through...
sysmgr/sylogin.template:$! Fall through...
sysmgr/sylogin.template:$! Fall through...
Martin P.J. Zinser
Honored Contributor

Re: How to improve PIPE performance

Hello Craig,

which opens up the question: Do we do our own pipe() in Perl for recent versions or is it the C-RTL one if and when available ;-)

Happy new year,

Martin
Craig A Berry
Honored Contributor

Re: How to improve PIPE performance

Just to be clear, for Peter's sake, Martin's question is a tangent. I suggested Archive::Zip because it doesn't use pipes at all, not DCL's, not the CRTL's, and not Perl's home-grown ones.

It's an interesting tangent, though. Perl since 5.6 uses a homegrown pipe implementation because the one in the CRTL is so prone to hangs and deadlocks. I believe this was still true as of v7.3-1, but I should probably give it a whirl again. I have a goal of making this configurable so you can choose which pipe implementation you want, but I haven't done it yet.