1829904 Members
2370 Online
109993 Solutions
New Discussion

A script

 
ivy1234
Frequent Advisor

A script

I have a file , the file as below ,

aa,bb,cc
aa,cc,dd
cc,dd,ee
ff,zz
dd,aa


the file have many lines, and the content may be duplicated and separated by "," sign . Now if I want to erase some contents in the file

1) if the content is duplicate , then output 1 time
2) the result should be in 1 line.

so my desired output is

aa
bb
cc
dd
ee
ff
zz

can advise what can i do ? thx
13 REPLIES 13
Raj D.
Honored Contributor

Re: A script

ivy,

check this out:

$ cat file | tr "," "\n" | uniq -u

Hth,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
ivy1234
Frequent Advisor

Re: A script

thx ,

but it do not handle duplicate case , that mean the output is deplicate ,

can advise if I want if the data is duplicated then do not output the same data , what can i do ? thx
ivy1234
Frequent Advisor

Re: A script

thx

the |uniq -u seems not work in this case.
Raj D.
Honored Contributor

Re: A script

ivy,
You can use uniq -c and cut the numeric first field,

$ cat file | tr "," "\n" | uniq -c | cut -c 1-2

I cant check it now, as don't hv system now.
Hth,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "
Steven Schweda
Honored Contributor

Re: A script

man uniq
man sort

alp$ < 1474162.txt tr ',' '\n' | sort | uniq
aa
bb
cc
dd
ee
ff
zz
alp$

Try it first without the "| uniq".
Hein van den Heuvel
Honored Contributor

Re: A script


For 'uniq' to work the stream has to be sorted first

Man...
"DESCRIPTION
Discard all but one of successive identical lines from INPUT"

Here the solution with "tr | sort | uniq" probably works just fine.

For modest dataset you may also want to play with PERL to allow for more tricky splitting, parsing, counting and printing.

In this simple example we can set up an array value for each word found and at the end ( eskimo kiss: }{ :-) print all the keys thus established

$ perl -lne '$x{$_}=1 for split /,/ } { print for (sort keys %x) ' x.txt
aa
bb
cc
dd
ee
ff
zz


fwiw,
Hein
Raj D.
Honored Contributor

Re: A script

Ivy,
Here you go with awk,
# cat file | tr "," "\n" | awk '!x[$0]++'


aa
bb
cc
dd
ee
ff
zz
#

Enjoy, Have fun! Remember to assign points to all posts,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
Mel Burslan
Honored Contributor

Re: A script

Raj is still missing the point. uniq only senses consecutive lines which were duplicates

aa
bb
aa
bb
bb

ran thru uniq, will generate:

aa
bb
aa
bb

not

aa
bb

the one liner should be something like this:

cat file | tr "," "\n" | sort | uniq

Hope this helps
________________________________
UNIX because I majored in cryptology...
Steven Schweda
Honored Contributor

Re: A script

> cat file | tr "," "\n" | sort | uniq

Geez. Why didn't _I_ think of that. No,
wait...

And my version lacked the (much hated) "cat".
And, when picoseconds count, I figure that
'x' should be faster than "x" -- no looking
for dollar signs in 'x'.
Dennis Handly
Acclaimed Contributor

Re: A script

>Steven: ... | sort | uniq

You can optimize this by using "sort -u".
Raj D.
Honored Contributor

Re: A script

Mel, thanks good thing leaned abt uniq,
, uniq only senses consecutive lines which were duplicates. sometime I used to wonder why uniq not working properly, now it make sense, t u.


" If u think u can , If u think u cannot , - You are always Right . "
Arturo Galbiati
Esteemed Contributor

Re: A script

cat file | tr ',' '\n'|sort -u
HTH,
Art
Raj D.
Honored Contributor

Re: A script

# cat file|tr "," "\n" | awk '!x[$0]++' #Enjoy!.
" If u think u can , If u think u cannot , - You are always Right . "