topic Re: Need help on creating script to split data in Operating System - HP-UX

Need help on creating script to split data

Ahmad Munawwar — Thu, 14 Apr 2005 05:46:21 GMT

Hi,

I need some help and guidance to create a shell script to split data into different file.

I have data in one file look like this:

file1:
|A|LR|
|B|LR|
|B|FO|
|C|LR|
|D|LR|
|D|FO|
|E|LR|
|F|LR|
|G|LR|
|G|FO|

I want to split "double entry" B,D and G into one file and "single entry" A,C,E and F in another file.

Would appreciate if you could help me to do so.

Regards,
Munawwar

Re: Need help on creating script to split data

Steven E. Protter — Thu, 14 Apr 2005 05:52:09 GMT

Re: Need help on creating script to split data

H.Merijn Brand (procura — Thu, 14 Apr 2005 06:02:37 GMT

perl -pe'BEGIN{open A,">A";open B,">B"}select(/\b[BDG]\b/?B:/\b[ACEF]\b/?A:STDOUT)' file

Enjoy, Have FUN! H.Merijn

Re: Need help on creating script to split data

Ahmad Munawwar — Thu, 14 Apr 2005 06:18:45 GMT

Hi Steven,

The things is that I have about 30,000 of such data in one file.

Actually the first field is represent numbering.

A = 12345
B = 34521
C = 25431
D = 43521
E = 54213
F = 32541
G = 45123

For duplicate data i.e. B, D and G.
it has same number.

Re: Need help on creating script to split data

H.Merijn Brand (procura — Thu, 14 Apr 2005 06:24:35 GMT

perl -pe'BEGIN{open A,">file1";open B,">file2"}select(/\b(34521|43521|45123)\b/?B:/\b(12345|25431|54213|32541)\b/?A:STDOUT)' file

TMTOWTDI

Enjoy, Have FUN! H.Merijn

Re: Need help on creating script to split data

Peter Godron — Thu, 14 Apr 2005 07:09:34 GMT

Munawwar,

#!/usr/bin/sh
# Extract unique only
cut -d'|' -f2 datafile.lis | uniq -u > unique.lis
# Change the format
sed "1,$ s/^/^|/" unique.lis > unique2.lis
sed "1,$ s/$/|/" unique2.lis > unique.lis
rm unique2.lis
# Extract Uniques
grep -f unique.lis datafile.lis > unique.data
# Extract Duplicates
grep -vf unique.lis datafile.lis> dup.data
rm unique.lis

datfile.lis is the input filename

Regards

Re: Need help on creating script to split data

Hein van den Heuvel — Thu, 14 Apr 2005 08:17:05 GMT

I like the uniq method myself.
Do you know the records are in order, and have just a single duplicate per key?

Here is some somewhat convoluted awk to do the job:

----- x.awk ---------
END{if (dup){print last>>"dups"} else {print last}}
{ if ($2==key) {
print last>>"dups";
dup=1;
} else {
if (dup) {
print last>>"dups";
dup=0;
} else {
if (NR>1) {print last};
}
}
last=$0;
key=$2;
}

It processes the last record based on current key matching the last or not.
It has to avoid the printing nothing for the first, and it has to special case the end for the last last. Yikes.

Usage with your sample data in file 'x'

# awk -F"|" -f x.awk x
|A|LR|
|C|LR|
|E|LR|
|F|LR|

# cat dups
|B|LR|
|B|FO|
|D|LR|
|D|FO|
|G|LR|
|G|FO|

If you just have 30,000 record or so, then you can readily suck them into perl and spit back out based on dups or not:
----- x.pl -----------
while (<>) {
$key = (split(/|/))[1];
$records{$key} .= $_;
}
open (DUPS, ">dups");
foreach $key (sort keys %records) {
$_ = $records{$key};
if (/\n\|/) {print DUPS} else {print};
}
-----------------
So here each record fets concattenated with any prior data for a given key. If there was nothing, it'll be just that new record. If there was something it gets added.
When all is read, retrieve the key, and the data for the key. If there is a newline + bar in the record, it must have been a dup!

Usage: # perl x.pl x

hth,
Hein.

Re: Need help on creating script to split data

Ahmad Munawwar — Thu, 14 Apr 2005 10:09:16 GMT

Great,

Thanks for the input... I will try tomorrow and see which one will work :-)

/munawar

Re: Need help on creating script to split data

H.Merijn Brand (procura — Thu, 14 Apr 2005 10:14:19 GMT

Let's start with being happy that you assign points, but I'd rather see the points assigned *after* you tried, so we can see what worked and what didn't, and maybe more important *why* (not).

We like feedback as wel. This way we also can improve ourselves.

Enjoy, Have FUN! H.Merijn

Re: Need help on creating script to split data

Hein van den Heuvel — Thu, 14 Apr 2005 10:42:30 GMT

I should learn to leave well enough alone...

Here is an alternate perl solution, suitable for much large files. It makes two passes over the input. First just count occurences for each key. The second time print to the right file based on the key

----

$file = shift @ARGV or die "please provide file";
open (IN,"<$file") or die "Could not open $file";
while () {
$keys{(split(/|/))[1]}++;
}
open (DUPS, ">dups");
open (IN,"<$file");
while () {
if ($keys{(split(/|/))[1]} > 1) {print DUPS} else {print};
}

-----------------

variant second part:

while () {
$filehandle = ($keys{(split(/|/))[1]} > 1) ? DUPS : STDOUT;
print $filehandle $_;
}

Cheers,
Hein.