Operating System - Linux
1754319 Members
3019 Online
108813 Solutions
New Discussion юеВ

counting the number of times a word appears in a file

 
SOLVED
Go to solution
curt larson_1
Honored Contributor

Re: counting the number of times a word appears in a file

here is a program that might do a beter job

http://hpux.cs.utah.edu/hppd/hpux/Misc/conc-0.5/
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Curt thanks for the two replies but no luck on the awk, I am not sure the correct writing format in a script.

Test file with 3 lines
this is a test of the smtp; but the SMTP would %smtp smtp SMTP& work
helo smtp

cat /tmp/test | tr "[:upper:]" "[:lower:]" | tr -cs "[a-z0-9]" "\012" |sort | uniq -c | sort +0nr +1d
+ cat /tmp/test
+ tr [:upper:] [:lower:]
+ tr -cs [a-z0-9] \012
+ sort
+ sort +0nr +1d
+ uniq -c
1 helo smtp
1 this is a test of the smtp; but the smtp
1 would %smtp smtp smtp& work

With your awk scrip i get the following error good ole bail out

Tahoe: /tmp ./t2.sh
#!/bin/ksh -xv
cat /tmp/test |awk '{
for (i=1, i<=NF;i++)
num[$i]++;
}

END{
for ( word in num )
print word, num[word];
}'|grep -i smtp
+ cat /tmp/test
+ awk {
for (i=1, i<=NF;i++)
num[$i]++;
}

END{
for ( word in num )
print word, num[word];
}
+ grep -i smtp
syntax error The source line is 2.
The error context is
for >>> (i=1, <<<
awk: The statement cannot be correctly parsed.
The source line is 2.
syntax error The source line is 2.

James R. Ferguson
Acclaimed Contributor
Solution

Re: counting the number of times a word appears in a file

Hi James:

Try this:

# perl -lne '$count++ while (m/smtp/ig);END{print $count}' logfile

Regards!

...JRF...
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

James, you have come through for me more times than I can count, thank you very much and thanks to the rest of you guys for and for all the help you provided.
curt larson_1
Honored Contributor

Re: counting the number of times a word appears in a file

| tr -cs "[a-z0-9]" "\012"

should be
| tr -cs "[a-z0-9]" "\012*"
star after \012

and

The error context is
for >>> (i=1, <<<
awk: The statement cannot be correctly parsed.
i=1, < no comma, it is a semi colon ";"
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Curt, I got the awk statement to work, thanks, it gives me a count for each line, so I had to add a counter for that, still wouldn't give me a correct count for the tr usage. But thanks again guys...
James R. Ferguson
Acclaimed Contributor

Re: counting the number of times a word appears in a file

Hi (again) James:

To improve thins a bit, I'd amend my expression thusly:

# perl -lne '$count++ while (m/\bsmtp\b/ig);END{print $count}' logfile

This eliminates counting the string "smtp" (in any case or mixture) *within* the bounds of another string. That is, this ould *not* count "this_was_from_smtp".

Regards!

...JRF...

Sandman!
Honored Contributor

Re: counting the number of times a word appears in a file

James,

Within awk set your field separator to the word smtp or SMTP and then count its occurence within the input file as:

# awk -F"smtp|SMTP" 'BEGIN{cnt=0} {cnt+=(NF-1)} END{print cnt}' input_file

cheers!
Sandman!
Honored Contributor

Re: counting the number of times a word appears in a file

In fact to make your "find && count SMTP" case in-sensitive do the following:

# awk -F"[sS][mM][tT][pP]" 'BEGIN{cnt=0} {cnt+=(NF-1)} END{print cnt}' input_file

cheers!
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Thanks Sandman, your awk version also works, I had to change the cnt from cnt=0 to cnt=1 to get the accurate count.