1753831 Members
9569 Online
108806 Solutions
New Discussion юеВ

Organize data

 
SOLVED
Go to solution

Organize data

Hi guys,

I have the follow entrie:

a data1 data2 data3
a data4 data5 data6
a data7 data8 data9 data10
a data11 data12 data13
d data1 data2 data3
d data4 data5 data6 data7 data8
d data9 data10 data11
b data1 data2 data3
b data4 data5 data6
b data7 data8 data9
b data10 data11 data12 data13
b data14 data15 data16

How can I reorganize it and get the follow structure:

a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11

Thank you

Andre
Andre Augusto
13 REPLIES 13
James R. Ferguson
Acclaimed Contributor
Solution

Re: Organize data

Hi Andre:

Given the data as shown, the first thing we need to do is filter it so it can be sorted. To keep the order you want, we need to temporarily append a zero to each item with a single digit; sort the file; and then strip the added digit. We could do this like:

# perl -ple 's/(\D+)(\d)\s/${1}0${2} /' myfile | sort -k1,1 | perl -ple 's/0(\d\s)/$1/'

Next, pipe the output of the above to this:

# cat .filter
#!/usr/bin/perl
use strict;
use warnings;
my ( $n, $tag, $oldtag ) = ( 0, undef, undef );
my @fields;
while (<>) {
@fields = split;
$tag = shift @fields;
if ( $n == 0 ) {
$n++;
print "$tag\n";
$oldtag = $tag;
}
if ( $tag ne $oldtag ) {
print "======\n", $tag, "\n";
$oldtag = $tag;
}
print "@fields\n";
}

Overall:

# perl -ple 's/(\D+)(\d)\s/${1}0${2} /' ./myfile | sort -k1,1 | perl -ple 's/0(\d\s)/$1/' | ./filter

a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11


Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: Organize data

Hi (again) Andre:

If the above post meets your needs, this is a simple integrated script:

# cat ./filter
#!/usr/bin/perl
use strict;
use warnings;
my ( $n, $tag, $oldtag ) = ( 0, undef, undef );
my ( @list, @fields );
while (<>) {
s/(\D+)(\d)\s/${1}0${2} /;
push @list, $_;
}
@list = sort @list;
for (@list) {
s/0(\d\s)/$1/;
@fields = split;
$tag = shift @fields;
if ( $n == 0 ) {
$n++;
print "$tag\n";
$oldtag = $tag;
}
if ( $tag ne $oldtag ) {
print "======\n", $tag, "\n";
$oldtag = $tag;
}
print "@fields\n";
}

Then, simply run as:

# ./filter file
a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11

Regards!

...JRF...

Re: Organize data

Hi JRF...

Ow... I really appreciate your answers, can you send a shell script program too? Because is going to make a part of another script that I'm writing...

Your answer help me a lot, but if could help once...

Regards

Andre
Andre Augusto
Hein van den Heuvel
Honored Contributor

Re: Organize data

Andre, just call the perl as a function from the outer shell??
If you can not accept a perl solution, then please indicate so. Also please re-consider as it it often rather efficient and just about as easy to maintain as a shell script, even for non-perl folks.

In your example output is there a "=====" missing after the last block, or before the first block, or is this exactly as desired?

Here is an other (perl, sorry) alternative:


#------------- group.pl --------

#!/usr/bin/perl
use strict;
use warnings;
my ($name, %tables);
while (<>) {
m/^(.)\s(.*)/;
push @{ $tables{$1} }, $2;
}
foreach $name (sort keys %tables) {
print "$name\n";
foreach (@{$tables{$name}}) {
print "$_\n";
}
print "=======\n";
}

This script uses an 'hash' with an array for each group. As data comes in, it is pushed (at the end) of the array in the hash identified by the first character.
[note: if you want that to be a word use: m/^(\S+)\s+(.*)/ ]
When all data is read, return the sorted key values for the array (those first chars). Next use that to grab the arrays themself and print.

use as :

./group.pl list.txt > group.txt

or

perl group.pl list.txt > group.txt

hth,
Hein.

Re: Organize data

Hi Hein,

Thank you very much for your alternative and explanation, I'm accepting any kind of yours smart sugestions. Like you said, I can use the perl solution, and both work very well.

I'm newer in perl, and this scripts help me to learn many tips, logics, commands and is most efficient. I'd like to receive a shell sugestion because for 2 days I'm trying to write and studying, but I didn't get it...

Any other solution will be welcome (perl, shell, etc).

The output "=====" is just to separate the blocks, is not a requirement.

Thank you again

Andre
Andre Augusto
Sandman!
Honored Contributor

Re: Organize data

If you are familiar with awk(1) then you can try the construct below. Simply copy and paste it inside your shell script:

awk '{
for(i=2;i<=NF;i++) l=l?l" "$i:$i
x[$1]=x[$1]?x[$1]"\n"l:l
l=""
} END {
for(i in x) {
if (i!=prev) print "======"
print i"\n"x[i]
prev=i
}
}' file
Hasan  Atasoy
Honored Contributor

Re: Organize data

hi andre ;

i did not test but this should works.

cat file | cut -b 0-1 | sort | uniq | while read VAR1
do
echo $VAR1
grep "^${VAR1} file | cut -b 1-120
echo "======="
done

Hasan.

Re: Organize data

Hello guys,

Very good help...

Hasan, I've just changed the value of "cut -b 1-120" to "cut -b 3-120" and works fine.

Sandman, the output of your routine begin with the last block sequence (with the letter "d"):
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11
======
a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16

Can we change it, and put it in alphabetical order?

Thanks all
Andre Augusto
Hein van den Heuvel
Honored Contributor

Re: Organize data

Sandman,
Correct me if I'm wrong, but your solution seems to rely on order in the way elements are returned from an awk array.
This can not be relied upon. My documentation explicitly states: "The order in which elements of teh array are accessed (by this statement) is determined by the internal arrangement of the array elements within awk and cannot be controlled or changed". Change the leading a in the 2nd or 3rd line to 'x' and try again.

Hasan,

Not bad. Bot bad. But why not try?

To make this work change to:

grep "^${VAR1}" file | cut -b 3-120

And optionally:

$ cut -b 0-1 file | sort -u | while read VAR1


Hein.