- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Need awk scirpt for below requirements
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 07:09 PM
04-25-2006 07:09 PM
Need awk scirpt for below requirements
1|221|33|33|88
6|333|21|65|12
Need to verify for the duplicate rows( keys 1 column ,5 th column)
Output:
Number of rows=3
Number of duplicate rows= 1
Note:
code should take care about millions of records
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 07:14 PM
04-25-2006 07:14 PM
Re: Need awk scirpt for below requirements
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 07:33 PM
04-25-2006 07:33 PM
Re: Need awk scirpt for below requirements
cut -d'|' -f1 data.lis | uniq -c
would return:
2 1
1 6
which translates into:
2 records with a key of value 1
1 record with a key of 6
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 07:41 PM
04-25-2006 07:41 PM
Re: Need awk scirpt for below requirements
If you really need to count duplication of the first column, so a right code is
cut -d'|' -f1 data.lis|sort| uniq -c
If you need to check a duplication of more than one column, so
awk -F| '{printf("%s%s\n",$1,$5)}' data.lis |sort| uniq -c
HTH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 07:46 PM
04-25-2006 07:46 PM
Re: Need awk scirpt for below requirements
assumed sorted files based on previous thread by same poster:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1018063
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 07:50 PM
04-25-2006 07:50 PM
Re: Need awk scirpt for below requirements
Not very much clear with what you treat as duplicate when you say "keys 1 column ,5 th column" - do you mean
1st and 5th column OR
1st to 5th column (all fields when same - then treat as duplicate)
1st column only?
If its 1st column only then the simplest thing I can think is
norows=$(wc -l t1.dat | awk '{print $1}')
duplicate=$(echo "$norows - $(cut -f 1 -d "|" t1.dat | sort -u | wc -l | awk '{p
rint $1}')" | bc)
echo "Number of rows=$norows"
echo "Number of duplicate rows=$duplicate"
Regards,
Ninad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2006 09:01 PM
04-25-2006 09:01 PM
Re: Need awk scirpt for below requirements
since your request is not uniq, I make smone assumptions:
- 'duplicate' means duplicate in col1 OR col5
- multiple duplicate cols (more than 2) count multiple
- data are in input file /tmp/data
sort -t'|' -k1n,1 -k5n /tmp/data |
awk -F'|' 'NR==1 {c1=$1;c5=$5; next}
{if($1==c1) dup1++; else c1=$1
if ($5==c5) dup5++; else c5=$5}
END {print "Number of rows",NR;print "Number of duplicate rows",dup1+dup5}'
First line is treated special else it would be reported as duplicate if first column value was NULL.
mfG Peter