Organize data

Andre Augusto Ferreira · ‎01-15-2008

Hi guys,

I have the follow entrie:

a data1 data2 data3
a data4 data5 data6
a data7 data8 data9 data10
a data11 data12 data13
d data1 data2 data3
d data4 data5 data6 data7 data8
d data9 data10 data11
b data1 data2 data3
b data4 data5 data6
b data7 data8 data9
b data10 data11 data12 data13
b data14 data15 data16

How can I reorganize it and get the follow structure:

a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11

Thank you

Andre

Andre Augusto

James R. Ferguson · ‎01-15-2008

Hi Andre:

Given the data as shown, the first thing we need to do is filter it so it can be sorted. To keep the order you want, we need to temporarily append a zero to each item with a single digit; sort the file; and then strip the added digit. We could do this like:

# perl -ple 's/(\D+)(\d)\s/${1}0${2} /' myfile | sort -k1,1 | perl -ple 's/0(\d\s)/$1/'

Next, pipe the output of the above to this:

# cat .filter
#!/usr/bin/perl
use strict;
use warnings;
my ( $n, $tag, $oldtag ) = ( 0, undef, undef );
my @fields;
while (<>) {
@fields = split;
$tag = shift @fields;
if ( $n == 0 ) {
$n++;
print "$tag\n";
$oldtag = $tag;
}
if ( $tag ne $oldtag ) {
print "======\n", $tag, "\n";
$oldtag = $tag;
}
print "@fields\n";
}

Overall:

# perl -ple 's/(\D+)(\d)\s/${1}0${2} /' ./myfile | sort -k1,1 | perl -ple 's/0(\d\s)/$1/' | ./filter

a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11

Regards!

...JRF...

James R. Ferguson · ‎01-15-2008

Hi (again) Andre:

If the above post meets your needs, this is a simple integrated script:

# cat ./filter
#!/usr/bin/perl
use strict;
use warnings;
my ( $n, $tag, $oldtag ) = ( 0, undef, undef );
my ( @list, @fields );
while (<>) {
s/(\D+)(\d)\s/${1}0${2} /;
push @list, $_;
}
@list = sort @list;
for (@list) {
s/0(\d\s)/$1/;
@fields = split;
$tag = shift @fields;
if ( $n == 0 ) {
$n++;
print "$tag\n";
$oldtag = $tag;
}
if ( $tag ne $oldtag ) {
print "======\n", $tag, "\n";
$oldtag = $tag;
}
print "@fields\n";
}

Then, simply run as:

# ./filter file
a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11

Regards!

...JRF...

Andre Augusto Ferreira · ‎01-15-2008

Hi JRF...

Ow... I really appreciate your answers, can you send a shell script program too? Because is going to make a part of another script that I'm writing...

Your answer help me a lot, but if could help once...

Regards

Andre

Andre Augusto

Hein van den Heuvel · ‎01-15-2008

Andre, just call the perl as a function from the outer shell??
If you can not accept a perl solution, then please indicate so. Also please re-consider as it it often rather efficient and just about as easy to maintain as a shell script, even for non-perl folks.

In your example output is there a "=====" missing after the last block, or before the first block, or is this exactly as desired?

Here is an other (perl, sorry) alternative:

#------------- group.pl --------

#!/usr/bin/perl
use strict;
use warnings;
my ($name, %tables);
while (<>) {
m/^(.)\s(.*)/;
push @{ $tables{$1} }, $2;
}
foreach $name (sort keys %tables) {
print "$name\n";
foreach (@{$tables{$name}}) {
print "$_\n";
}
print "=======\n";
}

This script uses an 'hash' with an array for each group. As data comes in, it is pushed (at the end) of the array in the hash identified by the first character.
[note: if you want that to be a word use: m/^(\S+)\s+(.*)/ ]
When all data is read, return the sorted key values for the array (those first chars). Next use that to grab the arrays themself and print.

use as :

./group.pl list.txt > group.txt

or

perl group.pl list.txt > group.txt

hth,
Hein.

Andre Augusto Ferreira · ‎01-15-2008

Hi Hein,

Thank you very much for your alternative and explanation, I'm accepting any kind of yours smart sugestions. Like you said, I can use the perl solution, and both work very well.

I'm newer in perl, and this scripts help me to learn many tips, logics, commands and is most efficient. I'd like to receive a shell sugestion because for 2 days I'm trying to write and studying, but I didn't get it...

Any other solution will be welcome (perl, shell, etc).

The output "=====" is just to separate the blocks, is not a requirement.

Thank you again

Andre

Andre Augusto

Sandman! · ‎01-15-2008

If you are familiar with awk(1) then you can try the construct below. Simply copy and paste it inside your shell script:

awk '{
for(i=2;i<=NF;i++) l=l?l" "$i:$i
x[$1]=x[$1]?x[$1]"\n"l:l
l=""
} END {
for(i in x) {
if (i!=prev) print "======"
print i"\n"x[i]
prev=i
}
}' file

Hasan Atasoy · ‎01-15-2008

hi andre ;

i did not test but this should works.

cat file | cut -b 0-1 | sort | uniq | while read VAR1
do
echo $VAR1
grep "^${VAR1} file | cut -b 1-120
echo "======="
done

Hasan.

Andre Augusto Ferreira · ‎01-15-2008

Hello guys,

Very good help...

Hasan, I've just changed the value of "cut -b 1-120" to "cut -b 3-120" and works fine.

Sandman, the output of your routine begin with the last block sequence (with the letter "d"):
======
d
data1 data2 data3
data4 data5 data6 data7 data8
data9 data10 data11
======
a
data1 data2 data3
data4 data5 data6
data7 data8 data9 data10
data11 data12 data13
======
b
data1 data2 data3
data4 data5 data6
data7 data8 data9
data10 data11 data12 data13
data14 data15 data16

Can we change it, and put it in alphabetical order?

Thanks all

Andre Augusto

Hein van den Heuvel · ‎01-15-2008

Sandman,
Correct me if I'm wrong, but your solution seems to rely on order in the way elements are returned from an awk array.
This can not be relied upon. My documentation explicitly states: "The order in which elements of teh array are accessed (by this statement) is determined by the internal arrangement of the array elements within awk and cannot be controlled or changed". Change the leading a in the 2nd or 3rd line to 'x' and try again.

Hasan,

Not bad. Bot bad. But why not try?

To make this work change to:

grep "^${VAR1}" file | cut -b 3-120

And optionally:

$ cut -b 0-1 file | sort -u | while read VAR1

Hein.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Organize data

Organize data

Re: Organize data

Re: Organize data

Re: Organize data

Re: Organize data

Re: Organize data

Re: Organize data

Re: Organize data

Re: Organize data

Re: Organize data