Operating System - Linux
1839319 Members
2598 Online
110138 Solutions
New Discussion

Re: counting the number of times a word appears in a file

 
SOLVED
Go to solution
Belinda Dermody
Super Advisor

counting the number of times a word appears in a file

I have a extracted log file and with the word smtp or SMTP appearing either once or multiple times per line, is there a way that we can count how many times smtp appears in the log, can't just count the lines because it appears more than once on certain lines...
20 REPLIES 20
Pete Randall
Outstanding Contributor

Re: counting the number of times a word appears in a file

grep -i smtp |wc -w

ought to do it.


Pete

Pete
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Sorry Pete, I tried that earlier, it grabs the lines with smtp and counts all the words in the line...
baiju_3
Esteemed Contributor

Re: counting the number of times a word appears in a file

If the word smtp|SMTP is having any delimiter , if so you can write a script which search for each word compare the word and then incriment a counter if the word is matched .


Thanks,
BL.

Good things Just Got better (Plz,not stolen from advertisement -:) )
Pete Randall
Outstanding Contributor

Re: counting the number of times a word appears in a file

James,

ARGHH! You're right of course. I remember a similar question a month or so ago but I can't find it at the moment and don't remember what the answer was.


Pete

Pete
Jeff Schussele
Honored Contributor

Re: counting the number of times a word appears in a file

HI James,

You can still use grep & wc -w
Save the grepped lines in a tmp file
Then just read through that file in a loop and increment a counter from the wc -w output.
Should work.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

There is no special delimiter and the fields are not fixed, there could be a % ; or even a white space before the smtp or SMTP.

Jeff I do not understand your response, even if I throw all the lines with either smtp | SMTP into an output file I will still have all the other words (mail stuff that would come up in the wc command.
curt larson_1
Honored Contributor

Re: counting the number of times a word appears in a file

a quick awk script

cat file | awk '{
for ( i=1; i<=NF; i++)
num[$i]++;
}
END {
for ( word in num )
print word, num[word];
}' | grep -i smtp
curt larson_1
Honored Contributor

Re: counting the number of times a word appears in a file

another method

cat file | tr "[:upper:]" "[:lower:]" |
tr -cs "[a-z0-9']" "\012" | sort |
uniq -c | sort +0nr +1d

convert all uppercase to lowercase
replace all characters not a-z0-9' with a new line. that means one word per line
sort because uniq expects sorted input
uniq counts the number of times each word appears then sort first from most to least frequent then alphabetically
Stephen Keane
Honored Contributor

Re: counting the number of times a word appears in a file

One way (though you'll have to change it to cope with uppercase SMTP to

exactly as typed, including the '\' !!


# sed -e 's/smtp/smtp\
/g' your_file | grep -i "smtp" | wc -l

maybe use tr in there to convert SMTP to smtp?

changes "smtp" into "smtp\n" so each smtp is on a separate line, then uses grep/wc to count them. Just a thought.
curt larson_1
Honored Contributor

Re: counting the number of times a word appears in a file

here is a program that might do a beter job

http://hpux.cs.utah.edu/hppd/hpux/Misc/conc-0.5/
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Curt thanks for the two replies but no luck on the awk, I am not sure the correct writing format in a script.

Test file with 3 lines
this is a test of the smtp; but the SMTP would %smtp smtp SMTP& work
helo smtp

cat /tmp/test | tr "[:upper:]" "[:lower:]" | tr -cs "[a-z0-9]" "\012" |sort | uniq -c | sort +0nr +1d
+ cat /tmp/test
+ tr [:upper:] [:lower:]
+ tr -cs [a-z0-9] \012
+ sort
+ sort +0nr +1d
+ uniq -c
1 helo smtp
1 this is a test of the smtp; but the smtp
1 would %smtp smtp smtp& work

With your awk scrip i get the following error good ole bail out

Tahoe: /tmp ./t2.sh
#!/bin/ksh -xv
cat /tmp/test |awk '{
for (i=1, i<=NF;i++)
num[$i]++;
}

END{
for ( word in num )
print word, num[word];
}'|grep -i smtp
+ cat /tmp/test
+ awk {
for (i=1, i<=NF;i++)
num[$i]++;
}

END{
for ( word in num )
print word, num[word];
}
+ grep -i smtp
syntax error The source line is 2.
The error context is
for >>> (i=1, <<<
awk: The statement cannot be correctly parsed.
The source line is 2.
syntax error The source line is 2.

James R. Ferguson
Acclaimed Contributor
Solution

Re: counting the number of times a word appears in a file

Hi James:

Try this:

# perl -lne '$count++ while (m/smtp/ig);END{print $count}' logfile

Regards!

...JRF...
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

James, you have come through for me more times than I can count, thank you very much and thanks to the rest of you guys for and for all the help you provided.
curt larson_1
Honored Contributor

Re: counting the number of times a word appears in a file

| tr -cs "[a-z0-9]" "\012"

should be
| tr -cs "[a-z0-9]" "\012*"
star after \012

and

The error context is
for >>> (i=1, <<<
awk: The statement cannot be correctly parsed.
i=1, < no comma, it is a semi colon ";"
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Curt, I got the awk statement to work, thanks, it gives me a count for each line, so I had to add a counter for that, still wouldn't give me a correct count for the tr usage. But thanks again guys...
James R. Ferguson
Acclaimed Contributor

Re: counting the number of times a word appears in a file

Hi (again) James:

To improve thins a bit, I'd amend my expression thusly:

# perl -lne '$count++ while (m/\bsmtp\b/ig);END{print $count}' logfile

This eliminates counting the string "smtp" (in any case or mixture) *within* the bounds of another string. That is, this ould *not* count "this_was_from_smtp".

Regards!

...JRF...

Sandman!
Honored Contributor

Re: counting the number of times a word appears in a file

James,

Within awk set your field separator to the word smtp or SMTP and then count its occurence within the input file as:

# awk -F"smtp|SMTP" 'BEGIN{cnt=0} {cnt+=(NF-1)} END{print cnt}' input_file

cheers!
Sandman!
Honored Contributor

Re: counting the number of times a word appears in a file

In fact to make your "find && count SMTP" case in-sensitive do the following:

# awk -F"[sS][mM][tT][pP]" 'BEGIN{cnt=0} {cnt+=(NF-1)} END{print cnt}' input_file

cheers!
Belinda Dermody
Super Advisor

Re: counting the number of times a word appears in a file

Thanks Sandman, your awk version also works, I had to change the cnt from cnt=0 to cnt=1 to get the accurate count.
Sandman!
Honored Contributor

Re: counting the number of times a word appears in a file

James,

If "cnt" is declared in the BEGIN section of the awk construct then I don't know why it needs to be set equal to 1 and if it's defined in the ACTION section, then every new line read would clobber its contents and reset it back to 1. Could you share the awk construct you used to get the correct result?

Maybe I'm not able to understand your requirements.

thx