Operating System - OpenVMS
1753379 Members
5264 Online
108792 Solutions
New Discussion юеВ

Re: search file and size

 
sleone
New Member

search file and size

Hi,

I've got a lots of files and I need to select that ones matching a given pattern but also I need to avoid files which size is bigger than a certain value.

I use the following search command:

search/win=0 *.msg;* /out=file_list.lis

and the file_list.lis got, one per line, the absolute paths to the searched files.

But I don't know how to keep off the list all the files bigger than 10mb.

Any idea?

Regards,
Salvatore
6 REPLIES 6
Hein van den Heuvel
Honored Contributor

Re: search file and size

You should turn that around.
Find the smaller files first, then search.
It would be wasteful to search through lots and lots of megabytes to find a hit (or not) and then to hdisgard that work.

Hein
Volker Halle
Honored Contributor

Re: search file and size

Salvatore,

it may not be possible to do what you want with one single standard OpenVMS command, but you could easily write a DCL procedure:

DIR/SEL=SIZ=MAX=20000000/OUT=x.x/COL=1 *.msg;*

DEFINE SYS$OUTPUT file_list.lis

Open and read x.x and do a SEARCH 'file.msg'

for all files found in file x.x

DEASSIGN SYS$OUTPUT

Volker.
Joseph Huber_1
Honored Contributor

Re: search file and size

Well unfortunately SEARCH doesn't have the /SELECT=(SIZE=MAXIMUM:20000) option (or does it nowadays?) like DIRECTORY.
One of the "do this on a list of files", namely EACH.COM does work on the directory output like this:

@EACH *.msg;/select=(size=maximum:20000) -
"pipe search/win=0 'S' && write lis S"
where lis is the listing file opend by DCL before the command (closed aferwards), or replace the write statement by a commandfile writing the argument (S) to the desired list file.

get EACH.COM from the freeware disks or from my server:

http://wwwvms.mpp.mpg.de/vms$common/com/each.com
http://www.mpp.mpg.de/~huber
sleone
New Member

Re: search file and size

Thank you!

I will try your suggestions useful suggestions!
Hein van den Heuvel
Honored Contributor

Re: search file and size

Given your first approach you coudl post-filter the search output with PERL or DCL.

I use a bunch of .COB files as test, looking for the string 'test' and the size cutoff at 2000 byte = 4 blocks (approx). It all worked (fr me :-)

$ pipe sea *.cob test/win=0/nowarn | perl -lne "chomp; print if (-s) > 2000" > file_list.lis

Or with a command file
------- tmp.com ----
$loop:
$read/end=done sys$pipe filename
$if f$file(filename,"EOF").gt.4 then write sys$output filename
$goto loop
$done:
--------

$pipe sea *.cob test/win=0/nowarn | @tmp /out= file_list.lis


Now let's turn the roles around:

------- tmp.com ------------
$loop:
$read/end=done sys$pipe filename
$searc /win=0/nowarn 'filename' 'p1'
$goto loop
$done:
-------------

$ pipe dir/sele=siz=max=4/nohead/notrai *.cob; | @tmp test


Myself, I use perl 'one-liners' for jobs like this:

$ perl -le "for $f (<*.cob>){ next if -s $f <5000; open (F, ""<$f""); while () { if (/test/i) {print $f; last}}}"

As a perl 'script' that looks like.

for $f (<*.cob>) { # GLOB over selected files
next if -s $f <5000; # check size in bytes
open (F, "<$f"); # open file for read on handle F
while () { # loop through file F
if (/test/i) { # match desired string?
print $f; #
last
}
}
}

Enjoy,
Hein
John Gillings
Honored Contributor

Re: search file and size

Salvatore,
One other improvement...

search/win=0 *.msg;* /out=file_list.lis

Since you're not looking at the actual matches, once you've found one, there's no need to continue searching the file so add /LIMIT=1

Simple example: searching a bunch of command procedures for "!". Statistics without /LIMIT=1

Files searched: 142 Buffered I/O count: 546
Records searched: 30227 Direct I/O count: 144
Characters searched: 846068 Page faults: 28
Records matched: 3198 Elapsed CPU time: 0 00:00:00.11
Lines printed: 115 Elapsed time: 0 00:00:00.13


and with /LIMIT=1

Files searched: 142 Buffered I/O count: 546
Records searched: 2409 Direct I/O count: 144
Characters searched: 75626 Page faults: 28
Records matched: 115 Elapsed CPU time: 0 00:00:00.06
Lines printed: 115 Elapsed time: 0 00:00:00.07

Same list of files output, but half the time!

Note - if your 10Mb files are likely to have your pattern somewhere near the start, you may not need to exclude them.
A crucible of informative mistakes