Re: Need awk scirpt for below requirements

Swetha reddy · ‎04-25-2006

1|222|22|45|99
1|221|33|33|88
6|333|21|65|12

Need to verify for the duplicate rows( keys 1 column ,5 th column)

Output:

Number of rows=3
Number of duplicate rows= 1

Note:

code should take care about millions of records

RAC_1 · ‎04-25-2006

Not getting it. What exactly you want? If first coulmn has same value, will be taken as duplicate entry?

There is no substitute to HARDWORK

Peter Godron · ‎04-25-2006

Swetha,
cut -d'|' -f1 data.lis | uniq -c
would return:
2 1
1 6

which translates into:
2 records with a key of value 1
1 record with a key of 6

Victor Fridyev · ‎04-25-2006

Hi,

If you really need to count duplication of the first column, so a right code is
cut -d'|' -f1 data.lis|sort| uniq -c

If you need to check a duplication of more than one column, so

awk -F| '{printf("%s%s\n",$1,$5)}' data.lis |sort| uniq -c

HTH

Entities are not to be multiplied beyond necessity - RTFM

Peter Godron · ‎04-25-2006

Victor,
assumed sorted files based on previous thread by same poster:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1018063

Ninad_1 · ‎04-25-2006

Swetha,

Not very much clear with what you treat as duplicate when you say "keys 1 column ,5 th column" - do you mean
1st and 5th column OR
1st to 5th column (all fields when same - then treat as duplicate)
1st column only?

If its 1st column only then the simplest thing I can think is

norows=$(wc -l t1.dat | awk '{print $1}')
duplicate=$(echo "$norows - $(cut -f 1 -d "|" t1.dat | sort -u | wc -l | awk '{p
rint $1}')" | bc)
echo "Number of rows=$norows"
echo "Number of duplicate rows=$duplicate"

Regards,
Ninad

Peter Nikitka · ‎04-25-2006

Hi,

since your request is not uniq, I make smone assumptions:
- 'duplicate' means duplicate in col1 OR col5
- multiple duplicate cols (more than 2) count multiple
- data are in input file /tmp/data

sort -t'|' -k1n,1 -k5n /tmp/data |
awk -F'|' 'NR==1 {c1=$1;c5=$5; next}
{if($1==c1) dup1++; else c1=$1
if ($5==c5) dup5++; else c5=$5}
END {print "Number of rows",NR;print "Number of duplicate rows",dup1+dup5}'

First line is treated special else it would be reported as duplicate if first column value was NULL.

mfG Peter

The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Need awk scirpt for below requirements

Need awk scirpt for below requirements