1848131 Members
7543 Online
104022 Solutions
New Discussion

Re: Script enhancement

 
SOLVED
Go to solution
Ferdie Castro
Advisor

Script enhancement

Hi All,

I have a problem right now. I have a list of files: filea, fileb, filec-------* (can be more). Each file has data example below delimited by comma ","
2003-11-09 04:13:07,baddete,30000,5,35,2003-11-08 23:59:29
2003-11-09 04:12:43,gerry,30000,35,85,2003-11-08 23:59:14
2003-11-09 04:13:32,lance,30000,35,35,2003-11-08 23:59:49
2003-11-09 04:13:32,lance,30000,35,178,2003-11-08 23:59:49
2003-11-09 04:13:32,lance,30000,35,605,2003-11-08 23:59:49


I want to make a script that prints output
example the name of the person $2 (example gerry, lance, badette) and the number of occurence where $1 (date),$4 (can be 5, 35 from example above) and $3 (example 30000)occured are the same. Also prints the $1,$4, & $3
Ouput file sample

lance 2 2003-11-09 04:13:32 30000 35
$2 (twice) $1 $3 $4 (which means that this event appears twice in all files*)
Can you help me here? thanks so much.

10 REPLIES 10
Henrik BOYE
Occasional Advisor

Re: Script enhancement

Hi,
use awk
awk -v FS="," '{print $1 " "$4" "$3 }'
filelist
or
for in in file*
do
awk -v FS="," '{print $1 " "$4" "$3 }' $i
done
Graham Cameron_1
Honored Contributor

Re: Script enhancement

Ferdie

That's a tall order.

I can get you started with

for f in file?
do
awk -F, '{printf "%s 1 %s %s %s\n", $2, $1, $3, $4}' $f >> intermediate_file
done

Then you'd have to do some sorting on intermediate_file to find the duplicates and sum them.

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
TSaliba
Trusted Contributor

Re: Script enhancement

hi

cd dir
cat file* > file_all
while read -r line
do
DATE=echo $line | awk ' FS = "," { print $1 } '`
NAME=`echo $line | awk ' FS = "," { print $2 } '`
VAL1=`echo $line | awk ' FS = "," { print $3 } '`
VAL2==`echo $line | awk ' FS = "," { print $4 } '`


COUNT=`cat file_all | grep "$DATE" | grep $VAL1 | grep -c $VAL2`
echo "$NAME $COUNT $DATE $VAL1 $VAL2"

NB: NOT TESTED

TS
jj
Ferdie Castro
Advisor

Re: Script enhancement

To simplify everything
I cat file* > masterfile
Now I need to get how many occurences for $2
where $1, $3, $4 are the same.
Output file can be
$2 occurences= x $1 $3 $4

lance occurence= 2 2003-11-09 04:13:32 30000 35

The problem is how can I print x.
PS will only print greater than 1 occurence.
If you can help me use awk much faster.

Thanks.
Elmar P. Kolkman
Honored Contributor

Re: Script enhancement

I think you could do it like this:

sort -d, -k 1,3,4,2 masterfile | awk '
prev == $1 $3 $4 { count++; }
prev != $1 $3 $4 {
printf "%s occurences = %d %",
prevlab,count,prev;
prev=$1 $3 $4;
prevlab=$2;
count=0
}
END {
printf "%s occurences = %d %",
prevlab,count,prev;
}'

Depending on what you find most important, you could change the sort order from 1,3,4,2 to 2,1,3,4, meaning that you get output per column 2 instead of per date.
Every problem has at least one solution. Only some solutions are harder to find.
Ferdie Castro
Advisor

Re: Script enhancement

Hi Elmar,
Error occured can be in the usage.
sort: illegal option -- ,
Usage: sort [-AbcdfiMmnru] [-T Directory] [-tCharacter] [-y kilobytes] [-o File]
[-k Keydefinition].. [[+Position1][-Position2]].. [-z recsz] [File]..
occurences = 0 %
root#
TSaliba
Trusted Contributor

Re: Script enhancement

hi
in my reply the content of varaible COUNT=x
so to print only x>1 add the following
if [ $COUNT -gt 1 ]
echo ....
else
:
fi
jj
Henrik BOYE
Occasional Advisor
Solution

Re: Script enhancement

make awk program
ttt.awk
# begin ttt.awk

BEGIN {
LAST=""
COUNT=0
FS="|"
}
NR==1 { LAST=$0
LAST1=$1
LAST2=$2
COUNT= 1 }
NR > 1 {
if ( LAST == $0 )
{
COUNT +=1
}
else
{
print LAST1 " " LAST2 " Count : " COUNT
COUNT=1
LAST=$0
LAST1=$1
LAST2=$2
}
}

# cut here

example:
files tt1 tt2

cat tt? | awk -v FS="," '{print $2 "|"$1" "$4" "$3 }' | sort |awk -f ttt.awk

Elmar P. Kolkman
Honored Contributor

Re: Script enhancement

Sorry, I found the problem too. My mistake was with using 'cut' arguments to sort. It should be:
sort -t "," -k 1,3,4,2 | .....

(-t instead of -d)
Every problem has at least one solution. Only some solutions are harder to find.
Elmar P. Kolkman
Honored Contributor

Re: Script enhancement

And some 's'-es are gone from the printf statements, I see. It should be a %s\n on the end of the strings:
printf "%s occurrence = %d %s\n",prevlab,count,prev

Sorry.
Every problem has at least one solution. Only some solutions are harder to find.