Re: how to take out duplicate ones and keep the sequences in the file

Sandman! · ‎05-22-2007

For uniq(1) to work the repeated lines need to be adjacent. Moreover uniq(1) will not preserve the original order of the items in the input file. See the man page of uniq(1) for details. The awk construct below might work so give it a try:

# awk '{x[$1]++;if(x[$1]==1) print $1}' inputfile

~cheers

drb_1 · ‎05-23-2007

Though I personally prefer a 1-line perl for such, I was intrigued to discover how easily this could be done in shell.

cat test.words |
grep -n .* |
sort -u -t: -k2 |
sort -t: -1n |
cut -d: -f2-
> test.words.sansdupes

1. Prefix a line number and : to each line
2. Sort by remainder of line and remove dupes.
3. Sort by line number
4. Remove line number

Interesting,

Dennis Handly · ‎05-23-2007

>drb: 1. Prefix a line number and : to each line

Yes, that's how I would do it. Except you can refine your steps:
$ nl -ba -s: -nrz test.words | sort -t: -u -k2,2 | sort -t: -n -k1,1 |
cut -d: -f2- > test.words.sansdupes

I'm not sure why you had sort -1n? It worked but you would be hard pressed to prove it was legal from sort(1).

The problem with Ivan and Clay's solutions is that it will be real slow if there are lots of lines, because it searches each line against all others.

>Clay: # Copy stdin to a temp file

This can be done with cat - > file

>echo "\c" > ${DUPS} # null file

This can be done with just: > ${DUPS}

> grep -q "${X}" ${UNIQUES}

The only advantage over Ivan's is that the uniques file is smaller.

Sandman's solution trades off memory for speed, so would be good for small files.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: how to take out duplicate ones and keep the sequences in the file