Re: Script Problem - should be a simple one.

Paul Middleton · ‎04-01-2003

Greeting One and All,

I seem to have gone brain dead early this week. I???m trying to write a quick script that pulls all the occurences of a word or number from a file. Then extract the first word or number from that to echo along with the count.
.
so far, I find collect the occurrences by
somenum=$(cat basefile | grep ???keyword??? | awk ???{print $5}???)
to load the word or number into ???somenum???.
.
Then, to find the number of times the target word represented by $5 is
found, I use
another=$(cat basefile|grep ???keyword???|wc ???l)
because the target word is only on the same line as the keyword.
.
Now, I need to pull the first word or number from ???somenum??? so I can have a ----
echo ??? has occurred $another times???
.
The $another, of course is the count and is, again, represented by $5.
.
I can???t remember a way to pull the first occurrence of the target word from somenum.
.
Any help is greatly appreciated. I???m very generous with points.
.
Paul Middleton

Dilligad - Do I Look Like I Give A Damn

James R. Ferguson · ‎04-01-2003

Hi Paul:

# N=`echo $somenum|awk '{print $1}'`

Regards!

...JRF...

Ramkumar Devanathan · ‎04-01-2003

Paul,

>>>>>>>>>>>>>>>>>>>>
#!/usr/bin/ksh

countwords() {
count=0
for word in `cat $2`
do
if [[ $word = $1 ]]; then
count=`expr $count +1`
fi
done
echo $count
}

echo "the word $1 occurs `countwords $1 $2` times in file $2"

exit 0
<<<<<<<<<<<<<<<<<<<<<
Call as follows -

count.sh

grep is only going to return the number of lines with (any number of occurences) of the word. so although it would be a pain to check each and every word and arrive at the count, unless you use perl, it isn't going to be easy...

and i don't know perl very well either. ;)

- ramd.

HPE Software Rocks!

Ramkumar Devanathan · ‎04-01-2003

A performance enhancement -

for word in `cat $2`

may be changed to

for word in `cat $2 | grep $1`

May be a good cut in iterations where there a lot of lines containing zero occurences of the word.

- ramd.

HPE Software Rocks!

Paul Middleton · ‎04-01-2003

James - I changed your input to N=($somenum|awk '{print $1}') to allow for the shell I'm using and it works great.
.
ramd - You gave me an idea for another project. I can use your input for reviewing older files on our systems. Currently the script is just for new files being ftp'd in. Now I can do both old and new a lot easier.
.
Thanks to both of you for your quick response.
.
Paul Middleton

Dilligad - Do I Look Like I Give A Damn

Curtis Larson_2 · ‎04-01-2003

here is another way:

tr "[A-Z]" "[a-z]" |
#convert all uppercase to lowercase, use depending if you want to ignore capitialization
tr -cs "[a-z'0-9]" "\12" |
# replace all characters not a-z, ', or 0-9 with a newline, ie one word per line
sort|
#uniq expects sorted input
uniq -c |
#count number of times each word appears
sort +0nr +1d
# sort first from most to least frequent then alphabetically, or just grep for you word, grep $yourword

the quick and dirty is

num=$(<$yourfile |
tr -cs "[a-zA-Z0-9]" "\12" |
grep -xc $yourword |)

print "$yourword occurs $num times."

adjust your tr to include capitialization,digits,or punctuation that is allowed.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Script Problem - should be a simple one.

Script Problem - should be a simple one.