- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: how to take out duplicate ones and keep the se...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 07:23 AM
05-22-2007 07:23 AM
Thanks in advance
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 07:41 AM
05-22-2007 07:41 AM
Re: how to take out duplicate ones and keep the sequences in the file
What about the "uniq" command?
Robert-Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 07:41 AM
05-22-2007 07:41 AM
Re: how to take out duplicate ones and keep the sequences in the file
OLDFILE=/tmp/original_file
NEWFILE=/tmp/new_file
touch $NEWFILE
for LINE in `cat $OLDFILE`
do
EXISTS=`grep -w $LINE $NEWFILE | wc -l`
if [ $EXISTS -eq 0 ]
then
# The word is not in the new file yet
echo $LINE >> $NEWFILE
fi
done
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 07:44 AM
05-22-2007 07:44 AM
Re: how to take out duplicate ones and keep the sequences in the file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 07:55 AM
05-22-2007 07:55 AM
Re: how to take out duplicate ones and keep the sequences in the file
# perl -ne 'push @list,$_ unless $found{$_}++;END{print for (@list)}' file
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 07:57 AM
05-22-2007 07:57 AM
Re: how to take out duplicate ones and keep the sequences in the file
----------------------------------------
#!/usr/bin/ksh
TDIR=${TMPDIR:-/var/tmp}
UNIQUES=${TDIR}/F${$}.uniq
DUPS=${TDIR}/F${$}.dup
TFILE=${TDIR}/F${$}.tmp
trap 'eval rm -r ${UNIQUES} ${DUPS} ${TFILE}' 0 1 2 3 15
# Copy stdin to a temp file
rm -f ${TFILE} ${DUPS}
while read X
do
echo "${X}" >> ${TFILE}
done
# Sort temp file and find unique words
sort ${TFILE} | uniq -u > ${UNIQUES}
echo "\c" > ${DUPS} # null file
# Now read temp file; if word is unique echo it
cat ${TFILE} | while read X
do
grep -q "${X}" ${UNIQUES}
STAT=${?}
if [[ ${STAT} -eq 0 ]]
then
echo "${X}"
else
# not found in Unique file; see if it is in dups
grep -q "${X}" ${DUPS}
STAT=${?}
if [[ ${STAT} -ne 0 ]]
then # not already written; echo to stdout and insert in dups file
echo "${X}"
echo "${X}" >> ${DUPS}
fi
fi
done
exit 0
-----------------------------------------
Useit like this:
removedups.sh < infile > outfile
What is does is first copy each line of stdin to a temporary file. Next that temporary file is sorted and passed to uniq -u to create a second temporary file containing only unique lines. Now we reread the temporary file and use grep -q to determine if the line is unique. If so, we echo it to stdout. If not, we now need to determine if this is the first time that the duplicate word has been echo'ed. We use grep to examine a third temporary file to see if the word is found, if not, echo the line to stdout and also append it to the third temporary file. When finished, a trap removes all the temporary file and your duplicates have been removed and the original order has been preserved.
NOTE: This still should have been done in Perl.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 08:23 AM
05-22-2007 08:23 AM
Re: how to take out duplicate ones and keep the sequences in the file
that is it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 08:28 AM
05-22-2007 08:28 AM
Re: how to take out duplicate ones and keep the sequences in the file
> uniq oldfile > newfile
that is it
*NO* it's not, unless the input file is sorted.
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 08:48 AM
05-22-2007 08:48 AM
Re: how to take out duplicate ones and keep the sequences in the file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 08:50 AM
05-22-2007 08:50 AM
Re: how to take out duplicate ones and keep the sequences in the file
You are right, Just find out why I can not use "uniq". Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-22-2007 08:55 AM
05-22-2007 08:55 AM
Solution# awk '{x[$1]++;if(x[$1]==1) print $1}' inputfile
~cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-23-2007 01:17 AM
05-23-2007 01:17 AM
Re: how to take out duplicate ones and keep the sequences in the file
cat test.words |
grep -n .* |
sort -u -t: -k2 |
sort -t: -1n |
cut -d: -f2-
> test.words.sansdupes
1. Prefix a line number and : to each line
2. Sort by remainder of line and remove dupes.
3. Sort by line number
4. Remove line number
Interesting,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-23-2007 07:09 PM
05-23-2007 07:09 PM
Re: how to take out duplicate ones and keep the sequences in the file
Yes, that's how I would do it. Except you can refine your steps:
$ nl -ba -s: -nrz test.words | sort -t: -u -k2,2 | sort -t: -n -k1,1 |
cut -d: -f2- > test.words.sansdupes
I'm not sure why you had sort -1n? It worked but you would be hard pressed to prove it was legal from sort(1).
The problem with Ivan and Clay's solutions is that it will be real slow if there are lots of lines, because it searches each line against all others.
>Clay: # Copy stdin to a temp file
This can be done with cat - > file
>echo "\c" > ${DUPS} # null file
This can be done with just: > ${DUPS}
> grep -q "${X}" ${UNIQUES}
The only advantage over Ivan's is that the uniques file is smaller.
Sandman's solution trades off memory for speed, so would be good for small files.