1847206 Members
3200 Online
110263 Solutions
New Discussion

Re: scripting question

 
Dan Copeland
Regular Advisor

scripting question

How can you sort on a column of a text file and remove the duplicates from that column?

I assume I can use the sort command, but I haven't had much success w/ it.

attached is a sample output of the file. I want to sort on the Serial # column and remove duplicates.

tia,
Frank
15 REPLIES 15
Scott Van Kalken
Esteemed Contributor

Re: scripting question

you can use sort -k 5

but I think it may spit it because some of the across the page do not have anything where others have BCV.

So essentially, some have less fields.

For removing the duplicates, you can use uniq

Scott.
Uncle Liew
Advisor

Re: scripting question

Hi Frank,

I think the best you use Microsoft Excel.

Ftp the file to your PC.
In your PC,
Remove the Headers:

Device Product Device
----------------------- --------- --------------------- ------------------
Name Type Vendor ID Rev Ser Num Cap (KB)
----------------------- --------- --------------------- ------------------


Later you can add them back in.

Open your Ms Excel.

Under File --> Open, choose your .txt file.

After you have successfully import the data, cut & paste the Serial# Column & paste it in the 1st Excel Column

Highlight all the columns.

Under Data --> Sort.

That's it .....

Hope this helps.
Patrick Chim
Trusted Contributor

Re: scripting question

Hi,

I think there is a little bit difficult to use sort because there are different columns in each row. As I see in your file, there are blank values in the TYPE field and when you using SORT some of the field will shifted.

I'll try my best whether there is any other method to do so or other experts here can do that with SORT ! :)

Regards,
Patrick
H.Merijn Brand (procura
Honored Contributor

Re: scripting question

# perl -ne '1..5 and print,next;$snr=substr($_,56,9);$x{$snr}||=$_;END{print$x{$_}for(sort keys%x)}' xx.dta
Enjoy, Have FUN! H.Merijn
Patrick Chim
Trusted Contributor

Re: scripting question

Hi,

Can you try the following script,

for i in `cut -c57-66 | sort -u`
do
grep "$i" | head -1
done >

I suggest you to cut off all the header and trailer before you issue this script.

Regards,
Patrick
Supporto Unix
Occasional Advisor

Re: scripting question

Hi
if you want only that column in sort and whitout duplicates try this:

more text_file | grep "^/"|cut -c57-64|sort|uniq

bye
Bjoern Myrland
Advisor

Re: scripting question

Another one(simple ;) )

# cat |cut -c57-65|sort -u

Will check all lines, so headers, and unrelevant information should be removed. Could do this with grep for example like this:
# cat |grep -i '/dev'|cut -c57-65|sort -u
Robin Wakefield
Honored Contributor

Re: scripting question

Frank,

In case procura's script gives syntax errors, try:

perl -ne '1..5 and print,next;$snr=substr($_,56,9);$x{$snr}||=$_;END{for(sort keys%x){print$x{$_}}}'

Rgds, Robin.
H.Merijn Brand (procura
Honored Contributor

Re: scripting question

Robin, what version do you use? I think that 'for' as statement modifier works in 5.6.1 as well.

OTOH it might indeed be good to remomber that not all of you run perl-5.8.0, and certainly not like me with the defined-or patches in :)
Enjoy, Have FUN! H.Merijn
Robin Wakefield
Honored Contributor

Re: scripting question

Hi procura, I tried it on an early 5.004 version (yeah I know), so I'm sure it's OK in later releases. I was just trying to show what needs to be done if it does fail.

Rgds, Robin
Pierce Byrne_1
Frequent Advisor

Re: scripting question

Try this, you may want to mess about with formatting but it should work
The results go to file "sortedfile"
"sorter" is the source file


echo "HEADER INFO" > sortedfile
for snum in `grep rdsk sorter | cut -c57-64 | sort -u`
do
linedets=`grep ${snum} sorter | head -n1`
lineSnum=`echo "${linedets}" | cut -c57-64`
if [ "$lineSnum" = "$snum" ]
then
echo $linedets >> sortedfile
fi
done
Robin Wakefield
Honored Contributor

Re: scripting question

Hi Frank,

This is an awk version, inc. a sort routine:

awk 'NR<8{ print }
NR>7 { a[substr($0,56,9)]=$0 }
END {
i=0
for(b in a){
array[i]=b
i++
}
for (j=1;j<=i-1; ++j)
for (k=j;array[k-1]>array[k];--k){
temp=array[k]
array[k]=array[k-1]
array[k-1]=temp
}
for (i=0;i print a[""array[i]]
}
}' filename

Rgds, Robin

Sean OB_1
Honored Contributor

Re: scripting question

Frank,

You can use the -u option of sort. Or the uniq command.

Sean
john korterman
Honored Contributor

Re: scripting question

Hi Frank,
I take it that you only want to write out a serial number once, namely for the first occurrence of a number of items sharing the same. That is at least what the attached script does. However, headings are messed up.
regards,
John K.
it would be nice if you always got a second chance
Jordan Bean
Honored Contributor

Re: scripting question


Correct me if I'm mistaken, but it looks like you're not using PowerPath. If you were, the last three digits of the serial numbers would be unique to each path. It would serve you best to sort on the first five digits.

Use this PERL script:

#!/usr/bin/perl
use strict;
use integer;
our $h={};
our $k;
while(<>){
next if /^\s*$/;
print,next unless m[^/];
@_ = unpack('A23 A10 A22 A18',$_);
@_ = map { (split(/\s+/,$_),undef)[0,1] } @_;
$k = ( $_[2] eq 'EMC' )?substr($_[6],0,5):$_[6];
$h->{$k} = $_ unless defined $h->{$k};
}
foreach $k (sort keys %$h) { print $h->{$k}; }


It breaks up the input line from syminq more than you really need just in case you want to manipulate more than just the serial number. It's probably not the most efficient, but it seems to works.