Operating System - Linux
1829767 Members
5604 Online
109992 Solutions
New Discussion

a better way to exclude items other than a series of "grep -v" piped commands

 
SOLVED
Go to solution
TwoProc
Honored Contributor

a better way to exclude items other than a series of "grep -v" piped commands

This may be an awk question, a perl question, or a grep or sed problem...

I've got a list of file name extensions I don't want to see, and I want everything else from a "find" list.

e.g.

find . -type f | grep -v "\.xml" | grep -v "\.gif" | grep -v "\.htm" | grep -v "\.html"

...

and the list goes on for about 50 file extensions.

Can anyone please demonstrate a better way to exclude a list from a find than spawning a bazillion greps?

Note: handling it all in a single grep from the command line in the following fashion with something like "grep -e -v "\.gif" -e -v "\.htm" doesn't work.

Note2: I'd be find with putting all of the extensions in an input file of some sort, so that part doesn't have to come from the command line itself.

Thanks in advance
We are the people our parents warned us about --Jimmy Buffett
17 REPLIES 17
Patrick Wallek
Honored Contributor
Solution

Re: a better way to exclude items other than a series of "grep -v" piped commands

A couple of things come to mind:

1) Do it all with find:
# find . -type f \( ! -name "*.xml" -o ! -name "*.gif" -o ! -name "*.htm" -o ! -name "*.html" \)

Just keep listing your extensions. The '-o' means a 'logical or' and the '! -name "*.htm"' means NOT anything ending with .htm. Do a 'man find' for more info.

2) You could do it with find and grep:
# find . -type f | grep -v -E "\.xml|\.gif|\.htm|\.html"

James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi John:

Put the patterns you don't want returned in your selection in a file; something like:

# cat /tmp/excludes
.awk
.c
.htm
.html
.log
.old
.pl
.pm
.sh

Now do:

# find /tmp -type f |grep -E -v -f /tmp/excludes

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi (again) John:

Oops! In your exclusive file, if you truly want to skip files with dot (".") suffixes, escape the dot in the exclusion specification. The file contains regular expressions and a dot signifies any character when not escaped. My example '/tmp/excludes' should have looked like:


# cat /tmp/excludes
\.awk
\.c
\.htm
\.html
\.log
\.old
\.pl
\.pm
\.sh

Regards!

...JRF...

TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Patrick and James,

Thanks for the suggestions, I really like the ability to use the file with "grep -v -E -f" command -that exactly what I'm looking for.

Patrick - you're suggestion number #2 is quite cool, and I had BEEN wondering how to do an "or" in a grep for a word list *and* use it in an exclude grep function. I had only used it (and then rarely) before for include functions, and somehow never put 2+2 together to make it work for an exclude function. Thank you.

Upon reflection, the list method is going to win the day, because I want to account for all file extensions in the intended directory, and it will go into the hundreds of named extensions.

Thanks very much for the suggestions guys,

John
We are the people our parents warned us about --Jimmy Buffett
Hein van den Heuvel
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands



And to get that list of file name extensions going you might want to use:

perl -le 'while (<*>) { m/\.(.*)/; $seen{$1}++} print "\\.$_" foreach (sort keys %seen)'

Hein.
mobidyc
Trusted Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hello,

# find . -type f | grep -v "\.xml" | grep -v "\.gif" | grep -v "\.htm" | grep -v "\.html"

it's very ugly ;)

prefer a shortcut:
# touch foo.xml foo.txt foo.htm foo.html foo.avi foo.dat

# find . -type f |egrep -v "\.xml|\.gif|\.htm?"
./foo.txt
./foo.avi
./foo.dat
#

and better is:
# G_ARGS=".\xml|.gif|\.htm?"
# find . -type f |egrep -v "$G_ARGS"
./foo.txt
./foo.avi
./foo.dat
#

Regards,
Cedrick Gaillard
Best regards, Cedrick Gaillard
OldSchool
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hmm..
# cat /tmp/excludes
\.awk
\.c
\.htm
\.html
\.log
\.old
\.pl
\.pm
\.sh

You may want to "anchor" the end of line as well, esp if you want to exclude .htm but *not* .html's

something like:

\.htm$
\.log$
James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi (again) John:

> OldSchool: You may want to "anchor" the end of line as well, esp if you want to exclude .htm but *not* .html's

Yes, absolutely true! I'm afraid that I was in too much of a hurry to catch a favorite television show and assumed the appropriate nuances applied. ;-)

Regards!

...JRF...
TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

OldSchool - thanks for catching the fine point of end of line - much appreciated.

Hein -> I finished a script to do this as the initial part of the problem, but I'm intrigued by your perl script - because I've been slowly learning perl. I'm betting I will be spending quite a bit of time trying to figure out how it works, thank you. I really hate to ask - BUT... would you mind posting a little synopsis on the parts of that script and how it works, so I could learn? If you don't have the time, or its too lengthy of a request, I certainly understand, so feel free to ignore the request. Thanks again all.
We are the people our parents warned us about --Jimmy Buffett
Hein van den Heuvel
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

John,

It's actuallly no problem as I had it already written up two weeks ago.
And it is not perfect....

http://dcl.openvms.org/stories.php?story=07/03/22/8049141

There are more differences with the VMS version that I did not take care of.
- VMS needs to special case ";" as file version seperator
- and VMS typically is case-blind for filenames (Unless ODS-5 is used)
- VMS is strict about extentions allowing just 1 dot in the file name (unless ODS-5).

=> The ; provided a 'right side' anchor for the regexp. Need to use $ for "eol" under Unix.
=> Need to allow for multiple dots.

This gives:

$ perl -le 'while (<*>) {$seen{$1}++ if /\.([^.]*$)/} print "\\.$_" foreach (sort keys %seen)'

Explanation:

-l = print new line with each print

-e = program text to follow

while (<*>) { = loop over 'globbed' list of files, putting filename in automatic variable $_

$seen{uc($1)}++ = Increment (and create) an associative array elemement with name being last match from $1

if /\.([^.]*$)/ = Only do the aforementioned on a match of a piece of string to be called $1 with 'any non dot' after a period and before end of line. The first period is escaped with a backslash to make it real, not a wild character

} = loop end

print = print the default variable $_

foreach (sort = loop over sorted array from...

keys %seen = all the keys for associative array 'seen', stashed in default variable $_ one at a time.

Regards, Hein van den Heuvel

TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hein, thank you very much for the posting, now I've got something to go to the books with and review. That was graciously posted, thank you.

I was laughing to myself looking over the code aspects as you've written them; because had I written them in Perl, they would have come out being a) MUCH longer, and b) looking like someone who is used to writing in both C and ksh "wrote a combination C/ksh script in perl".
:-)
We are the people our parents warned us about --Jimmy Buffett
Dennis Handly
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

>handling it all in a single grep from the command line ...

Your only problem was you only need one -v and the -e must be right before each pattern.
And if you didn't want to quote those ".", use fgrep.

>Patrick: # find . -type f | grep -v -E "\.xml|\.gif|\.htm|\.html"

There is no reason to use the egrep hammer.
Just use -f for that file solution. Or use multiple -e:
grep -v -e "\.xml$" -e "\.gif$" ...
TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Denis,

thanks for your response - however a bit of clarification here is needed. I said that the "grep -v -e" doesn't work, and it doesn't , at least in HPUX 11i (might work in the newer ones).

So, while the "grep -v -e" in repetition works fine in Linux, it does not on HPUX:

example:

$ cat > test
ehlllo
goodbye
bye
hello
no
yes
sayit
say
yell
scream
ice cream
ice
cream

$ cat test | grep -v -e "test" -v -e "yell" -v -e "ice"
ehlllo
goodbye
bye
hello
no
yes
sayit
say
yell
scream
ice cream
ice
cream

Notice that no lines are missing:

HOWEVER, on Linux the above test works as expected:

$ cat test | grep -v -e "test" -v -e "yell" -v -e "ice"
ehlllo
goodbye
bye
hello
no
yes
sayit
say
scream
cream

Which is why I put the question in the HPUX forum...
We are the people our parents warned us about --Jimmy Buffett
James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi John:

The '-v' switch needs to occur *once*:

# grep -v -e "test" -e "yell" -e "ice" file

By the way, you can skip the extra process (the 'cat') and let 'grep' open the file(s) specified as its argument(s) as above.

Regards!

...JRF...
OldSchool
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

"Note: handling it all in a single grep from the command line in the following fashion with something like "grep -e -v "\.gif" -e -v "\.htm" doesn't work."

Actually, as noted above, it should be:

grep -v -e "\.gif" -e "\.html" .....

w/o repeating "-v".

Unfortunately, Linux isn't unix, its a work-alike, developed from observed behaviour / documentation of unix + "enhancements"
TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Well then,
it looks like the Linux version has figured out how to ignore the repeatiing series of "-v" to make it work...

thanks all for the clarification,
and I see that re-reading Dennis' post, he didn't repeat the "-v" over and over again.

Sorry Dennis, wish they had a do-over button on the points.
We are the people our parents warned us about --Jimmy Buffett
Dennis Handly
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

>I see that re-reading Dennis' post, he didn't repeat the "-v" over and over again.

I would assume you could repeat it and work, but I'm lazy. You just can't put the -v after the -e.

Ah, you're right, there is a bug in grep. They just increment vflag then they do bit stuff on it.

>Sorry Dennis, wish they had a do-over button on the points.

Ok, you can add the rest here. :-)
So you don't feel short changed, I filed a bug report on it:
CR JAGag37626:
Multiple -v in grep cause all to be ignored