Operating System - HP-UX
1834087 Members
2742 Online
110063 Solutions
New Discussion

count occurrence of a regex in a very long line

 
SOLVED
Go to solution
Mike_Ca Li
Regular Advisor

count occurrence of a regex in a very long line

I have a very long line which has about 10 thousand words and many words with "regexpression" how to use a script command to get the count for "regexpression"? Note that one way is break the long line, by using space as separator, into lots of shorter line but that is not efficient. Thank you.

eg: quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog quick brown fox jumps on the lazy dog
I need the count for "jumps" for instance.
11 REPLIES 11
Steven E. Protter
Exalted Contributor

Re: count occurrence of a regex in a very long line

Shalom Mike,

Perhaps process the string with awk?

echo $string | awk -F '{print $1 ... }'

Maybe someting with xargs, not sure.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: count occurrence of a regex in a very long line

Mike,

Ah, forget that stuff..... in my first post, unless it unexpected helpful.

store the data in a file:

grep jumps filename | wc -l

Gets you count of jumps.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Sandman!
Honored Contributor

Re: count occurrence of a regex in a very long line

How about a short one-line awk construct like...

# awk -F"jumps" '{cnt+=(NF-1)} END{print cnt}'

regards!
Mike_Ca Li
Regular Advisor

Re: count occurrence of a regex in a very long line

Thanks for reply SEP.
I tried
awk -F" " '{print NF}' filename
awk: record ... too long. Any other ideas?
Vincent Fleming
Honored Contributor
Solution

Re: count occurrence of a regex in a very long line

Steven,

Having a bad day? (this isn't like you!)

grep jumps filename | wc -l

will tell you how many lines "jumps" appears on, not how many times it appears in one line.

I can't think of an elegent way of doing this, given the regular expression requirement... UNIX shell tools are all pretty much line-based.

You could use 'sed' to change spaces to \r (ie: so that each word is on it's own line), then pipe it through grep|wc or awk... or something similar, but that'll work only if the regex you're looking for does not contain spaces.

Of course, there's always a C program - that's ALWAYS elegent!

Regards,

Vince

No matter where you go, there you are.
James R. Ferguson
Acclaimed Contributor

Re: count occurrence of a regex in a very long line

Hi Mike:

# perl -lne '$i++ while m/jumps/g;END{print $i}'

Regards!

...JRF...
Vincent Fleming
Honored Contributor

Re: count occurrence of a regex in a very long line

Better idea, use "tr" instead of "sed"...

oh - and I also meant \n, not \r...

echo "whatever" | tr ' ' '\n' | grep jumps | wc -l

"tr" is smaller and more efficient than sed since it does so little...

-Vince
No matter where you go, there you are.
James R. Ferguson
Acclaimed Contributor

Re: count occurrence of a regex in a very long line

Hi (again) Mike:

I might add (of course) that you can pipe input to the perl code or specify the filename as an argument:

# perl -lne '$i++ while m/jumps/g;END{print $i}' filename

# echo ... | # perl -lne '$i++ while m/jumps/g;END{print $i}'

Regards!

...JRF...
Hein van den Heuvel
Honored Contributor

Re: count occurrence of a regex in a very long line


An other perl way readily allowing for a regexpr:

perl -ne 'print scalar split (/jumps/)."\n"' filename

Of course this prints one too many.

So fix to:

perl -ne '$x=scalar split(/jumps/) -1; print "$x\n"' filename


Or for single count of many words on many lines:

perl -ne '$x+=scalar split(/jumps/) -1;END{print "$x\n"}' filename


Hein.
RAC_1
Honored Contributor

Re: count occurrence of a regex in a very long line

No perl, no awk, just plain Os built in commands.

Are the words on line seperated by a space?? If yes, do as follows.

tr " " "\n" < your_file | grep -ic "your_reg_exp"
There is no substitute to HARDWORK
Mike_Ca Li
Regular Advisor

Re: count occurrence of a regex in a very long line

Thanks a lot for all the commands and suggestions.
I tested all the perl and OS commands. Plain OS command run faster