Operating System - Linux
1753835 Members
7892 Online
108806 Solutions
New Discussion юеВ

Re: a better way to exclude items other than a series of "grep -v" piped commands

 
SOLVED
Go to solution
TwoProc
Honored Contributor

a better way to exclude items other than a series of "grep -v" piped commands

This may be an awk question, a perl question, or a grep or sed problem...

I've got a list of file name extensions I don't want to see, and I want everything else from a "find" list.

e.g.

find . -type f | grep -v "\.xml" | grep -v "\.gif" | grep -v "\.htm" | grep -v "\.html"

...

and the list goes on for about 50 file extensions.

Can anyone please demonstrate a better way to exclude a list from a find than spawning a bazillion greps?

Note: handling it all in a single grep from the command line in the following fashion with something like "grep -e -v "\.gif" -e -v "\.htm" doesn't work.

Note2: I'd be find with putting all of the extensions in an input file of some sort, so that part doesn't have to come from the command line itself.

Thanks in advance
We are the people our parents warned us about --Jimmy Buffett
17 REPLIES 17
Patrick Wallek
Honored Contributor
Solution

Re: a better way to exclude items other than a series of "grep -v" piped commands

A couple of things come to mind:

1) Do it all with find:
# find . -type f \( ! -name "*.xml" -o ! -name "*.gif" -o ! -name "*.htm" -o ! -name "*.html" \)

Just keep listing your extensions. The '-o' means a 'logical or' and the '! -name "*.htm"' means NOT anything ending with .htm. Do a 'man find' for more info.

2) You could do it with find and grep:
# find . -type f | grep -v -E "\.xml|\.gif|\.htm|\.html"

James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi John:

Put the patterns you don't want returned in your selection in a file; something like:

# cat /tmp/excludes
.awk
.c
.htm
.html
.log
.old
.pl
.pm
.sh

Now do:

# find /tmp -type f |grep -E -v -f /tmp/excludes

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi (again) John:

Oops! In your exclusive file, if you truly want to skip files with dot (".") suffixes, escape the dot in the exclusion specification. The file contains regular expressions and a dot signifies any character when not escaped. My example '/tmp/excludes' should have looked like:


# cat /tmp/excludes
\.awk
\.c
\.htm
\.html
\.log
\.old
\.pl
\.pm
\.sh

Regards!

...JRF...

TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Patrick and James,

Thanks for the suggestions, I really like the ability to use the file with "grep -v -E -f" command -that exactly what I'm looking for.

Patrick - you're suggestion number #2 is quite cool, and I had BEEN wondering how to do an "or" in a grep for a word list *and* use it in an exclude grep function. I had only used it (and then rarely) before for include functions, and somehow never put 2+2 together to make it work for an exclude function. Thank you.

Upon reflection, the list method is going to win the day, because I want to account for all file extensions in the intended directory, and it will go into the hundreds of named extensions.

Thanks very much for the suggestions guys,

John
We are the people our parents warned us about --Jimmy Buffett
Hein van den Heuvel
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands



And to get that list of file name extensions going you might want to use:

perl -le 'while (<*>) { m/\.(.*)/; $seen{$1}++} print "\\.$_" foreach (sort keys %seen)'

Hein.
mobidyc
Trusted Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hello,

# find . -type f | grep -v "\.xml" | grep -v "\.gif" | grep -v "\.htm" | grep -v "\.html"

it's very ugly ;)

prefer a shortcut:
# touch foo.xml foo.txt foo.htm foo.html foo.avi foo.dat

# find . -type f |egrep -v "\.xml|\.gif|\.htm?"
./foo.txt
./foo.avi
./foo.dat
#

and better is:
# G_ARGS=".\xml|.gif|\.htm?"
# find . -type f |egrep -v "$G_ARGS"
./foo.txt
./foo.avi
./foo.dat
#

Regards,
Cedrick Gaillard
Best regards, Cedrick Gaillard
OldSchool
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hmm..
# cat /tmp/excludes
\.awk
\.c
\.htm
\.html
\.log
\.old
\.pl
\.pm
\.sh

You may want to "anchor" the end of line as well, esp if you want to exclude .htm but *not* .html's

something like:

\.htm$
\.log$
James R. Ferguson
Acclaimed Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

Hi (again) John:

> OldSchool: You may want to "anchor" the end of line as well, esp if you want to exclude .htm but *not* .html's

Yes, absolutely true! I'm afraid that I was in too much of a hurry to catch a favorite television show and assumed the appropriate nuances applied. ;-)

Regards!

...JRF...
TwoProc
Honored Contributor

Re: a better way to exclude items other than a series of "grep -v" piped commands

OldSchool - thanks for catching the fine point of end of line - much appreciated.

Hein -> I finished a script to do this as the initial part of the problem, but I'm intrigued by your perl script - because I've been slowly learning perl. I'm betting I will be spending quite a bit of time trying to figure out how it works, thank you. I really hate to ask - BUT... would you mind posting a little synopsis on the parts of that script and how it works, so I could learn? If you don't have the time, or its too lengthy of a request, I certainly understand, so feel free to ignore the request. Thanks again all.
We are the people our parents warned us about --Jimmy Buffett