Operating System - Linux
1829149 Members
2178 Online
109986 Solutions
New Discussion

easy! bash, awk, sed or perl or whatelse to sum recurrrences results

 
SOLVED
Go to solution

easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Well, points only for the first one!

Hi, I would like to count from a text file recurrences (more then one recurrence per single record) and if the string is a number, sum .
i.e. "red" is the recurrence and we have two different records.

red=4,blu=6,red=7
gray=5,red=2
---
red=3,13 (4+7+2=13)

p.s. file can be large and sum also.

Thanks

Leo.
8 REPLIES 8
Jared Middleton
Frequent Advisor

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Hi Leo,

I understand the example you provided, but I'm unclear on your actual request.

Correct me if I'm wrong...

You have a potentially large file containing records with multiple name/value pairs in which the "value" (after the equal sign) may or may not be an integer?

You want a script to which you can input a string (name) and the script should output the number of occurences of that string along with the sum of the corresponding values (assuming they're numeric)?

In short, please elaborate on the requirements of this "challenge".

Jared
Muthukumar_5
Honored Contributor

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Use perl with hash data structures.

--
Muthu









Easy to suggest when don't know about the problem!
Muthukumar_5
Honored Contributor
Solution

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Use this:

perl -lan -F/,/ -e 'for ($i=0;$i<=$#F;$i++) { @arr=split (/=/,$F[$i]); $count{"$arr[0]"}++;$color{"$arr[0]"}+=$arr[1];}
END{foreach $key (keys(%count)){ print $key . "=" . $count{"$key"} . "," . $color{"$key"};}}'

Output:

blue=1,6
gray=1,5
red=3,13

--
Muthu
Easy to suggest when don't know about the problem!

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Thank you Jared and Muthu, about my task I think I should be more precise. I would like to count different things in mail log file for statistic results, then I would like to pass to the script the variable I have to count (nrcpt=xx,number of mail sent to a single domain,size=xxxxx) then I can use this "function" in my bash script.
Thank you!
Leo.
Stuart Browne
Honored Contributor

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Given the nature of log files, especially mail log files, it would be better to write the entire thing in perl.

What Muthukumar has given you already is the basis of it, you just need to select which of the key values you want to print out, instead of just looping through all values and printing them.

Trust me when I say that it will be easier in the long run, rather than trying to swap back-and-forth between shell and perl scripts.
One long-haired git at your service...
Muthukumar_5
Honored Contributor

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Yes, As said by Stuart use the same script to count the nrcpt=xx,number of mail sent to a single domain,size=xxxxx the things. It will come. Try with small example file with my script above. If you found problem then revert back.

--
Muthu
Easy to suggest when don't know about the problem!
Vincent Fleming
Honored Contributor

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

You should take a look at LogWatch (I think I found it in SourceForge) - it's a set of perl scripts that scan and summarize most of your logs, including maillog.

I've hacked mine up a bit to give me some more interesting statistics, but the basic script (which is long) works great as an example - I'm not much of a Perl guy, but it's close enough to C that anyone with some programming background should be able to grok the needed changes.

LogWatch comes already with breakdowns of messages sizes (how many messages 0-10k, 10-20k, etc.), total bytes transfered, biggest relays, etc.

Good stuff - I highly recommend it.
No matter where you go, there you are.
Sandman!
Honored Contributor

Re: easy! bash, awk, sed or perl or whatelse to sum recurrrences results

Hi Leonardo,

Though it's almost three weeks old, but it's intriguing enough to take a stab at it. You'ave probably solved it already but just in case here's my attempt...although belated :)

Save the attached file (containing awk commands) as "myawkscr" and run them against your input file as follows:

# awk -f myawkscr inputfile

cheers!