- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- how to get count of repeated words in a flat file
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2011 04:55 AM
тАО03-04-2011 04:55 AM
how to get count of repeated words in a flat file
I am using the following command
grep word textfile |wc -l
word is the desired word
textfile is the file iam searching.
but the above file doesnot give exact count in some scenarios like if the word is repeated in a line it will consider a 1. please suggest
- Tags:
- uniq
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2011 06:10 AM
тАО03-04-2011 06:10 AM
Re: how to get count of repeated words in a flat file
# $ perl -nle '$n++ while m{\bword\b}g;END{print $n}' file
...will look for the string "word" and count every instance in the file argument. Matches that begin at the start of a line or terminate at the end, as well as matches are counted. If you substituted the string "words" only matches to "words" and not "word" would be found.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-04-2011 06:46 PM
тАО03-04-2011 06:46 PM
Re: how to get count of repeated words in a flat file
grep word textfile | tr '[:space:]' '\012' | grep -c word
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2011 07:28 AM
тАО03-05-2011 07:28 AM
Re: how to get count of repeated words in a flat file
Pay close attention to the usage of the '\b' regular expression component which takes no space itself bu specifies a work boundary. Just what is needed here it seems.
Applied to the topic text it reports '5' as count for the word 'word' which obviously needs to be changed or become a variable for real work.
Depending on exactly what problem you are trying to solve, it may be beneficial to just count all words and then address the selected words for further processing.
Here is a 'one-liner' to demonstrate that:
$ perl -nle '$w{$_}++ for (split) }{ for (sort {$w{$b}<=>$w{$a}} keys %w) { pri
nt qq($w{$_}\t$_)}' tmp.txt
5 the
5 word
4 a
4 is
3 in
2 file
2 repeated
2 textfile
:
As you see, it also reports 5 for the word 'word'
Enjoy,
Hein
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2011 12:44 PM
тАО03-05-2011 12:44 PM
Re: how to get count of repeated words in a flat file
$ awk '{for(i=1;i<=NF;++i) if($i~ "^word$") print $i}' textfile| wc -l
Enjoy, Have fun!,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2011 01:10 PM
тАО03-05-2011 01:10 PM
Re: how to get count of repeated words in a flat file
That is also a fine a solution but I don't understand why you opted for a pipe. I guess I will never understand the typical Unix thinking involved. I come from VMS land, where for the longest times we did not have pipes. When we got them we understood the costs involved.
Not that it matters for occasional use like here, but why print to a pipe segment and re-count what comes out when you can just count while there and print when done?!
Might I suggest:
$ awk '{for(i=1;i<=NF;++i) if($i~ "^word$") count++} END { print count }' textfile
Of course due to the simple split by whitespace, that suffers from the same problem as my perl --> array example.
It will not recognize 'word' in *this* example line, due to the quotes.
Using perl you can fix that using \b to split.
$ perl -nle '$w{$_}++ for (split /\b/) }{ for (sort {$w{$b}<=>$w{$a}} keys %w) { print qq($w{$_}\t$_)}' tmp.txt
(but now it counts whitespace as words also)
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-05-2011 01:34 PM
тАО03-05-2011 01:34 PM
Re: how to get count of repeated words in a flat file
Thats great, thanks for adding the count , pipe is not required as count can be done inside the awk, thanks!. And perl code is nice specially for whitespace trick.,
Rgds,
Raj.
Gopi,
pls post points once you are done.