1753464 Members
4719 Online
108794 Solutions
New Discussion юеВ

shell script problem

 
SOLVED
Go to solution
Landen
Occasional Contributor

shell script problem

I was wondering how to write a script that accepts a pattern and filename as arguments and then counts the number of occurrences of the pattern in the file (A pattern may occur more than once in a line and comprise only alphanumeric characters and underscore).

Any assistance would be appreciated!!

Arrivederci ...
8 REPLIES 8
Kevin Wright
Honored Contributor

Re: shell script problem

you could do something like
grep $1 $2 | wc -l
Charles McCary
Valued Contributor

Re: shell script problem

grep -c pattern filename
Sachin Patel
Honored Contributor

Re: shell script problem

Hi
Do you want to write in perl?


#!/usr/local/bin/perl
$filename=shift; #first argument is filename
$pattern=shift; #second argument is pattern

open (FILE,"$filename") ||die "can't open";

while ()
{
if(/$pattern/)
{
count++; #put your logic here
}
}

This will get you started.

Sachin


Is photography a hobby or another way to spend $
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: shell script problem

Hi,

This is a fairly interesting one in that you have to exemine each line for possible multiple matches. My solution is to call awk as a 'here doc' and use the gsub function to substitute the target string with a nonsense string. gsub returns the number of substitutions and hence the number of pattern matches. We dont save the altered line so no harm is done.

Usage: count.sh target file1 file2 ...
will list each file and the count of matches
or count.sh target < stdin will read stdin and count the number of pattern matches.

Enjoy, Clay
If it ain't broke, I can fix that.
Curtis Larson_1
Valued Contributor

Re: shell script problem

here is a script from Unix power tools by O'reilly. I'm sure you can adapt it to your situation. Basically, do tr to put each word on a separate line, then count the lines. something like:

cat your_file | tr -cs "[:alnum:]_" "[\012*]" | grep -c your_pattern

#! /bin/sh
### wordfreq - count number of occurrences of each word in input
### Usage: wordfreq [-i] [files]
#
# ** CONFIGURATION NOTE **: See comments above second "tr" command below
#
## wordfreq counts the number of occurrences of each word in its input.
## If you give it files, it reads from them; otherwise it reads stdin.
## The -i option folds upper case into lower case (capitalized letters
## will count the same as lower-case).
#
# Adapted from "concordance", which Carl Brandauer posted to USENET.

# Different versions are a pain... :-(
case "$1" in
-i) shift
tr1="[a-z]"
tr2bsd="a-z'" tr2sys5="[a-z]'"
;;
*) # no case conversion
tr1="[A-Z]"
tr2bsd="A-Za-z'" tr2sys5="[A-Z][a-z]'"
;;
esac

cat ${1+"$@"} | # Work around problem with "$@" in some shells
tr "[A-Z]" "$tr1" | # Convert upper case to lower if -i option
#
# NOTE: If you use Berkeley tr(1), comment out the second tr command and
# uncomment the first tr command:
#
#tr -cs "$tr2bsd" "\012" |
tr -cs "$tr2sys5" "[\012*]" | # Replace all characters not a-z or ' with
# a new line. i.e. one word per line
sort | # uniq expects sorted input
uniq -c | # Count the number of times each word appears
sort +0nr +1d # Sort first from most to least frequent,
# then alphabetically

James R. Ferguson
Acclaimed Contributor

Re: shell script problem

Hi:

Very interesting problem! Here's a small, unembellished script which returns the count of the matched pattern.

#!/usr/bin/sh
typeset P=$1
typeset F=$2
R=`awk -v P=$P '{for (i=1;i <=NF;i++) ary [$i]=1}
END{ for (S in ary) if (S~P) {k=k+1};print k}' $F`
echo $R
#_end.

Call the script "my.sh" and execute this:

# ./my.sh local /etc/hosts

...This will print "1" for having found the string "localhost" in /etc/hosts.

# ./my.sh lo /etc/hosts

...will print "3" since "lo" matched "loghost" on one line, and both "localhost" and "loopback" together on another line.

Regards!

...JRF...
A. Clay Stephenson
Acclaimed Contributor

Re: shell script problem

Hi James,

Nice solution. We probably should suggest that both of our scripts should actually feed awk with a grep command. Grep's generally faster and then awk would only need to look for the matched lines for multiple patterns.

Regards, Clay
If it ain't broke, I can fix that.
James R. Ferguson
Acclaimed Contributor

Re: shell script problem

Hi Clay:

Thanks. Your suggestion to use 'grep' to do the initial filtering is a nice touch. In my script's case, the array built for subsequent evaluation would be greatly reduced in size. I find that intrinsically appealing having grown up in environments where it was cheaper to invest programming time to gain performance than to "throw more hardware" at the problem.

Regards!

...JRF...