Operating System - HP-UX
1753326 Members
4773 Online
108792 Solutions
New Discussion юеВ

Re: Script Problem - should be a simple one.

 
SOLVED
Go to solution
Paul Middleton
Frequent Advisor

Script Problem - should be a simple one.

Greeting One and All,

I seem to have gone brain dead early this week. I???m trying to write a quick script that pulls all the occurences of a word or number from a file. Then extract the first word or number from that to echo along with the count.
.
so far, I find collect the occurrences by
somenum=$(cat basefile | grep ???keyword??? | awk ???{print $5}???)
to load the word or number into ???somenum???.
.
Then, to find the number of times the target word represented by $5 is
found, I use
another=$(cat basefile|grep ???keyword???|wc ???l)
because the target word is only on the same line as the keyword.
.
Now, I need to pull the first word or number from ???somenum??? so I can have a ----
echo ??? has occurred $another times???
.
The $another, of course is the count and is, again, represented by $5.
.
I can???t remember a way to pull the first occurrence of the target word from somenum.
.
Any help is greatly appreciated. I???m very generous with points.
.
Paul Middleton
Dilligad - Do I Look Like I Give A Damn
5 REPLIES 5
James R. Ferguson
Acclaimed Contributor
Solution

Re: Script Problem - should be a simple one.

Hi Paul:

# N=`echo $somenum|awk '{print $1}'`

Regards!

...JRF...
Ramkumar Devanathan
Honored Contributor

Re: Script Problem - should be a simple one.

Paul,

>>>>>>>>>>>>>>>>>>>>
#!/usr/bin/ksh

countwords() {
count=0
for word in `cat $2`
do
if [[ $word = $1 ]]; then
count=`expr $count +1`
fi
done
echo $count
}

echo "the word $1 occurs `countwords $1 $2` times in file $2"

exit 0
<<<<<<<<<<<<<<<<<<<<<
Call as follows -

count.sh

grep is only going to return the number of lines with (any number of occurences) of the word. so although it would be a pain to check each and every word and arrive at the count, unless you use perl, it isn't going to be easy...

and i don't know perl very well either. ;)

- ramd.
HPE Software Rocks!
Ramkumar Devanathan
Honored Contributor

Re: Script Problem - should be a simple one.

A performance enhancement -

for word in `cat $2`

may be changed to

for word in `cat $2 | grep $1`

May be a good cut in iterations where there a lot of lines containing zero occurences of the word.

- ramd.
HPE Software Rocks!
Paul Middleton
Frequent Advisor

Re: Script Problem - should be a simple one.

James - I changed your input to N=($somenum|awk '{print $1}') to allow for the shell I'm using and it works great.
.
ramd - You gave me an idea for another project. I can use your input for reviewing older files on our systems. Currently the script is just for new files being ftp'd in. Now I can do both old and new a lot easier.
.
Thanks to both of you for your quick response.
.
Paul Middleton
Dilligad - Do I Look Like I Give A Damn
Curtis Larson_2
Advisor

Re: Script Problem - should be a simple one.

here is another way:

tr "[A-Z]" "[a-z]" |
#convert all uppercase to lowercase, use depending if you want to ignore capitialization
tr -cs "[a-z'0-9]" "\12" |
# replace all characters not a-z, ', or 0-9 with a newline, ie one word per line
sort|
#uniq expects sorted input
uniq -c |
#count number of times each word appears
sort +0nr +1d
# sort first from most to least frequent then alphabetically, or just grep for you word, grep $yourword

the quick and dirty is

num=$(<$yourfile |
tr -cs "[a-zA-Z0-9]" "\12" |
grep -xc $yourword |)

print "$yourword occurs $num times."

adjust your tr to include capitialization,digits,or punctuation that is allowed.