Operating System - HP-UX
1753620 Members
6096 Online
108797 Solutions
New Discussion юеВ

Re: Need help on creating script to split data

 
Ahmad Munawwar
Frequent Advisor

Need help on creating script to split data

Hi,

I need some help and guidance to create a shell script to split data into different file.

I have data in one file look like this:

file1:
|A|LR|
|B|LR|
|B|FO|
|C|LR|
|D|LR|
|D|FO|
|E|LR|
|F|LR|
|G|LR|
|G|FO|

I want to split "double entry" B,D and G into one file and "single entry" A,C,E and F in another file.

Would appreciate if you could help me to do so.

Regards,
Munawwar



9 REPLIES 9
Steven E. Protter
Exalted Contributor

Re: Need help on creating script to split data

cat file1 | grep A | grep C | grep E > file2


cat file1 | grep G | grep D | grep G > file3


SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
H.Merijn Brand (procura
Honored Contributor

Re: Need help on creating script to split data

perl -pe'BEGIN{open A,">A";open B,">B"}select(/\b[BDG]\b/?B:/\b[ACEF]\b/?A:STDOUT)' file

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Ahmad Munawwar
Frequent Advisor

Re: Need help on creating script to split data

Hi Steven,

The things is that I have about 30,000 of such data in one file.

Actually the first field is represent numbering.

A = 12345
B = 34521
C = 25431
D = 43521
E = 54213
F = 32541
G = 45123

For duplicate data i.e. B, D and G.
it has same number.

H.Merijn Brand (procura
Honored Contributor

Re: Need help on creating script to split data

perl -pe'BEGIN{open A,">file1";open B,">file2"}select(/\b(34521|43521|45123)\b/?B:/\b(12345|25431|54213|32541)\b/?A:STDOUT)' file

TMTOWTDI

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Peter Godron
Honored Contributor

Re: Need help on creating script to split data

Munawwar,

#!/usr/bin/sh
# Extract unique only
cut -d'|' -f2 datafile.lis | uniq -u > unique.lis
# Change the format
sed "1,$ s/^/^|/" unique.lis > unique2.lis
sed "1,$ s/$/|/" unique2.lis > unique.lis
rm unique2.lis
# Extract Uniques
grep -f unique.lis datafile.lis > unique.data
# Extract Duplicates
grep -vf unique.lis datafile.lis> dup.data
rm unique.lis

datfile.lis is the input filename

Regards

Hein van den Heuvel
Honored Contributor

Re: Need help on creating script to split data


I like the uniq method myself.
Do you know the records are in order, and have just a single duplicate per key?

Here is some somewhat convoluted awk to do the job:

----- x.awk ---------
END{if (dup){print last>>"dups"} else {print last}}
{ if ($2==key) {
print last>>"dups";
dup=1;
} else {
if (dup) {
print last>>"dups";
dup=0;
} else {
if (NR>1) {print last};
}
}
last=$0;
key=$2;
}

It processes the last record based on current key matching the last or not.
It has to avoid the printing nothing for the first, and it has to special case the end for the last last. Yikes.

Usage with your sample data in file 'x'

# awk -F"|" -f x.awk x
|A|LR|
|C|LR|
|E|LR|
|F|LR|

# cat dups
|B|LR|
|B|FO|
|D|LR|
|D|FO|
|G|LR|
|G|FO|


If you just have 30,000 record or so, then you can readily suck them into perl and spit back out based on dups or not:
----- x.pl -----------
while (<>) {
$key = (split(/|/))[1];
$records{$key} .= $_;
}
open (DUPS, ">dups");
foreach $key (sort keys %records) {
$_ = $records{$key};
if (/\n\|/) {print DUPS} else {print};
}
-----------------
So here each record fets concattenated with any prior data for a given key. If there was nothing, it'll be just that new record. If there was something it gets added.
When all is read, retrieve the key, and the data for the key. If there is a newline + bar in the record, it must have been a dup!



Usage: # perl x.pl x


hth,
Hein.

Ahmad Munawwar
Frequent Advisor

Re: Need help on creating script to split data

Great,

Thanks for the input... I will try tomorrow and see which one will work :-)

/munawar
H.Merijn Brand (procura
Honored Contributor

Re: Need help on creating script to split data

Let's start with being happy that you assign points, but I'd rather see the points assigned *after* you tried, so we can see what worked and what didn't, and maybe more important *why* (not).

We like feedback as wel. This way we also can improve ourselves.

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Hein van den Heuvel
Honored Contributor

Re: Need help on creating script to split data

I should learn to leave well enough alone...

Here is an alternate perl solution, suitable for much large files. It makes two passes over the input. First just count occurences for each key. The second time print to the right file based on the key

----

$file = shift @ARGV or die "please provide file";
open (IN,"<$file") or die "Could not open $file";
while () {
$keys{(split(/|/))[1]}++;
}
open (DUPS, ">dups");
open (IN,"<$file");
while () {
if ($keys{(split(/|/))[1]} > 1) {print DUPS} else {print};
}

-----------------


variant second part:


while () {
$filehandle = ($keys{(split(/|/))[1]} > 1) ? DUPS : STDOUT;
print $filehandle $_;
}



Cheers,
Hein.