how to get count of repeated words in a flat file

Gopi Kishore m · ‎03-04-2011

I want to know how many times a particular word is repeated in a particular flat file.

I am using the following command

grep word textfile |wc -l

word is the desired word

textfile is the file iam searching.

but the above file doesnot give exact count in some scenarios like if the word is repeated in a line it will consider a 1. please suggest

James R. Ferguson · ‎03-04-2011

Hi:

# $ perl -nle '$n++ while m{\bword\b}g;END{print $n}' file

...will look for the string "word" and count every instance in the file argument. Matches that begin at the start of a line or terminate at the end, as well as matches are counted. If you substituted the string "words" only matches to "words" and not "word" would be found.

Regards!

...JRF...

Dennis Handly · ‎03-04-2011

You could first start with grep to find the lines then use tr(1) or sed(1) to split up the words into separate lines, then just count that:
grep word textfile | tr '[:space:]' '\012' | grep -c word

Hein van den Heuvel · ‎03-05-2011

I like JRF's solution.

Pay close attention to the usage of the '\b' regular expression component which takes no space itself bu specifies a work boundary. Just what is needed here it seems.

Applied to the topic text it reports '5' as count for the word 'word' which obviously needs to be changed or become a variable for real work.

Depending on exactly what problem you are trying to solve, it may be beneficial to just count all words and then address the selected words for further processing.

Here is a 'one-liner' to demonstrate that:

$ perl -nle '$w{$_}++ for (split) }{ for (sort {$w{$b}<=>$w{$a}} keys %w) { pri
nt qq($w{$_}\t$_)}' tmp.txt
5 the
5 word
4 a
4 is
3 in
2 file
2 repeated
2 textfile
:

As you see, it also reports 5 for the word 'word'

Enjoy,
Hein

Raj D. · ‎03-05-2011

Gopi,

$ awk '{for(i=1;i<=NF;++i) if($i~ "^word$") print $i}' textfile| wc -l

Enjoy, Have fun!,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "

Hein van den Heuvel · ‎03-05-2011

Raj,
That is also a fine a solution but I don't understand why you opted for a pipe. I guess I will never understand the typical Unix thinking involved. I come from VMS land, where for the longest times we did not have pipes. When we got them we understood the costs involved.

Not that it matters for occasional use like here, but why print to a pipe segment and re-count what comes out when you can just count while there and print when done?!

Might I suggest:

$ awk '{for(i=1;i<=NF;++i) if($i~ "^word$") count++} END { print count }' textfile

Of course due to the simple split by whitespace, that suffers from the same problem as my perl --> array example.

It will not recognize 'word' in *this* example line, due to the quotes.

Using perl you can fix that using \b to split.

$ perl -nle '$w{$_}++ for (split /\b/) }{ for (sort {$w{$b}<=>$w{$a}} keys %w) { print qq($w{$_}\t$_)}' tmp.txt

(but now it counts whitespace as words also)

Hein.

Raj D. · ‎03-05-2011

Hein,

Thats great, thanks for adding the count , pipe is not required as count can be done inside the awk, thanks!. And perl code is nice specially for whitespace trick.,

Rgds,
Raj.

Gopi,
pls post points once you are done.

" If u think u can , If u think u cannot , - You are always Right . "

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

how to get count of repeated words in a flat file

how to get count of repeated words in a flat file

Re: how to get count of repeated words in a flat file

Re: how to get count of repeated words in a flat file

Re: how to get count of repeated words in a flat file

Re: how to get count of repeated words in a flat file

Re: how to get count of repeated words in a flat file

Re: how to get count of repeated words in a flat file