Operating System - HP-UX
1753471 Members
4765 Online
108794 Solutions
New Discussion юеВ

Re: how to take out duplicate ones and keep the sequences in the file

 
SOLVED
Go to solution
Sandman!
Honored Contributor
Solution

Re: how to take out duplicate ones and keep the sequences in the file

For uniq(1) to work the repeated lines need to be adjacent. Moreover uniq(1) will not preserve the original order of the items in the input file. See the man page of uniq(1) for details. The awk construct below might work so give it a try:

# awk '{x[$1]++;if(x[$1]==1) print $1}' inputfile

~cheers
drb_1
Occasional Advisor

Re: how to take out duplicate ones and keep the sequences in the file

Though I personally prefer a 1-line perl for such, I was intrigued to discover how easily this could be done in shell.

cat test.words |
grep -n .* |
sort -u -t: -k2 |
sort -t: -1n |
cut -d: -f2-
> test.words.sansdupes

1. Prefix a line number and : to each line
2. Sort by remainder of line and remove dupes.
3. Sort by line number
4. Remove line number

Interesting,
Dennis Handly
Acclaimed Contributor

Re: how to take out duplicate ones and keep the sequences in the file

>drb: 1. Prefix a line number and : to each line

Yes, that's how I would do it. Except you can refine your steps:
$ nl -ba -s: -nrz test.words | sort -t: -u -k2,2 | sort -t: -n -k1,1 |
cut -d: -f2- > test.words.sansdupes

I'm not sure why you had sort -1n? It worked but you would be hard pressed to prove it was legal from sort(1).

The problem with Ivan and Clay's solutions is that it will be real slow if there are lots of lines, because it searches each line against all others.

>Clay: # Copy stdin to a temp file

This can be done with cat - > file

>echo "\c" > ${DUPS} # null file

This can be done with just: > ${DUPS}

> grep -q "${X}" ${UNIQUES}

The only advantage over Ivan's is that the uniques file is smaller.

Sandman's solution trades off memory for speed, so would be good for small files.