1833873 Members
3197 Online
110063 Solutions
New Discussion

Re: Scripting problem.

 
SOLVED
Go to solution
Sean OB_1
Honored Contributor

Scripting problem.

Hello!

I have a scripting problem I need some help with.

I need to take a file and sort through it and remove records with a duplicate key field. That part is easy, however the hard part is that I need to remove a particular record based on what another field is.

Here is a sample input file:

2443:oliveira:jacquelyn:f:424225:employee:234:ef4442
3465:garvey:thomas:j:554322:employee:235:ef4422
4135:swan:monroe::314642:employee:245:sw4443
4135:swan:monroe::314642:student:245:sf4345
1244:praxel:lynn:m:652424:employee:243:ef4432
1244:praxel:lynn:m:652424:student:243:sf4432
7665:anderson:thomas:j:554322:student:265:sf4422

In this file there are two duplicate people: Swan, and Praxel. I need to keep the employee record if the last field doesn't start with sw.

However if the last field starts with sw, then I need to keep the student record.

So for the above, this is the resulting file I need:

2443:oliveira:jacquelyn:f:424225:employee:234:ef4442
3465:garvey:thomas:j:554322:employee:235:ef4422
4135:swan:monroe::314642:student:245:sf4345
1244:praxel:lynn:m:652424:employee:243:ef4432
7665:anderson:thomas:j:554322:student:265:sf4422


TIA,

Sean
10 REPLIES 10
George A Bodnar
Trusted Contributor
Solution

Re: Scripting problem.

awk -F: '{if (match($NF,"^sw") == 0) print}' < file
Rodney Hills
Honored Contributor

Re: Scripting problem.

Here is a simple perl script. It will display only unique id's, unless it is a "sw" type, then it will display that entry.

open(INP,"while() {
chomp;
@a=split(":",$_);
$code=substr($a[7],0,2);
$ix=$code eq "sw" ? 1 : 0;
$hold{$a[0]}[$ix]=$_;
}
foreach $nbr (sort keys %hold) {
if ($rec=$hold{$nbr}[1]) { print $rec,"\n"; }
else { print $hold{$nbr}[0],"\n"; }
}

HTH

-- Rod Hills
There be dragons...
H.Merijn Brand (procura
Honored Contributor

Re: Scripting problem.

l1:/tmp 177 > cat xx.txt
2443:oliveira:jacquelyn:f:424225:employee:234:ef4442
3465:garvey:thomas:j:554322:employee:235:ef4422
4135:swan:monroe::314642:employee:245:sw4443
4135:swan:monroe::314642:student:245:sf4345
1244:praxel:lynn:m:652424:employee:243:ef4432
1244:praxel:lynn:m:652424:student:243:sf4432
7665:anderson:thomas:j:554322:student:265:sf4422
l1:/tmp 178 > perl -naF: -e'push@{$e{$F[1]}},[@F]}END{$,=":";for(keys%e){@x=@{$e{$_}};@x>1 and@x=grep{$_->[-1]!~/^sw/}@x;@x>1 and@x=grep{$_->[-3]=~/^e/}@x;print@$_ for@x}' xx.txt
3465:garvey:thomas:j:554322:employee:235:ef4422
4135:swan:monroe::314642:student:245:sf4345
1244:praxel:lynn:m:652424:employee:243:ef4432
7665:anderson:thomas:j:554322:student:265:sf4422
2443:oliveira:jacquelyn:f:424225:employee:234:ef4442
l1:/tmp 179 >
Enjoy, Have FUN! H.Merijn
Rodney Hills
Honored Contributor

Re: Scripting problem.

If you want a one-liner, how about-

perl -naF: -e '{next if $t{$F[0]} eq "sw";$h{$F[0]}=$_;$t{$F[0]=substr($F[7],0,2)}END{print values %h}'

This collects each record into $h{} keys on id number (1st field) until a "sw" is found, then no more is collected on the id number. The output will not be in any sorted order.

-- Rod Hills
There be dragons...
harry d brown jr
Honored Contributor

Re: Scripting problem.


What do you want to do when the ONLY record has in the last field a string that starts with "sw" ?? Like if the following record was in the file (NO duplicate):

4465:duck:donald::884712:student:239:sw6969



live free or die
harry
Live Free or Die
H.Merijn Brand (procura
Honored Contributor

Re: Scripting problem.

Rodney, I thought he wanted to use $F[1] as key, but maybe I'm reading it wrong
Harry, my solution already catches that

If for field [1] (the name) there is only one record, print it. If there is more than one, filter out the records where the last field starts with sw. If there still more than one record for that key, use the employee one.

which yielded what he wanted
Enjoy, Have FUN! H.Merijn
Tom Maloy
Respected Contributor

Re: Scripting problem.

Slight change to Rodney's one-liner:

perl -naF: -e '{
next if defined($t{$F[0]}) && $t{$F[0]} ne "sw";
$h{$F[0]}=$_;
$t{$F[0]}=substr($F[7],0,2);
}
END{print values %h}'

Tom
Carpe diem!
Sean OB_1
Honored Contributor

Re: Scripting problem.

Harry,

In theory there should NEVER be only a record with a sw field.

But in any case if there is only one record for the ID, then that record needs to be kept.

The basis of this is a student file and an employee file being combined into a unique file.

It is possible that a person could be an actual employee and student, in which case we want the employee record. It's also possible that a person could be a student and student worker (employee) in which case we want the student record.

Anyone in only one of the two files should be kept.

Rodney Hills
Honored Contributor

Re: Scripting problem.

Harry,

Both of my solutions handle that situtation. It grabs whatever records are available into a hash variable and then dumps out what was collected (minus duplicates and keeping "sw" even if it the only record for that id).

Syntax fix to my previous one liner-
perl -naF: -e '{next if $t{$F[0]} eq "sw";$h{$F[0]}=$_;$t{$F[0]}=substr($F[7],0,2)}END{print values %h}' xx.txt

(forgot a closing '}')...

-- Rod Hills
There be dragons...
Sean OB_1
Honored Contributor

Re: Scripting problem.

Well we found out that there were other issues with the data, and this script is no longer needed.

I didn't get a chance to test all of your solutions, so everyone gets 8 points for their efforts.

Thanks a lot for the help.

Sean