Operating System - HP-UX
1828239 Members
2406 Online
109975 Solutions
New Discussion

Group and summarize data in shell script (like in SQL)

 
SOLVED
Go to solution
Scott Lindstrom_2
Regular Advisor

Group and summarize data in shell script (like in SQL)

I have some data that looks like this:

/dev/rdsk/c8t8d3 EMC SYMMETRIX 2798B000 14284801 2798B000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c8t8d5 EMC SYMMETRIX 2798D000 14284801 2798D000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c8t8d7 EMC SYMMETRIX 2798F000 14284801 2798F000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c8t9d1 EMC SYMMETRIX 27991000 14284801 27991000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c8t9d3 EMC SYMMETRIX 27993000 14284801 27993000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c8t9d5 EMC SYMMETRIX 27995000 14284801 27995000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c8t9d7 EMC SYMMETRIX 27997000 14284801 27997000 BCV FIBRE SYMM4 unprotected
/dev/rdsk/c18t2d5 EMC SYMMETRIX 27601008 21582721 27601008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t2d7 EMC SYMMETRIX 27603008 21582721 27603008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t3d1 EMC SYMMETRIX 27605008 21582721 27605008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t3d3 EMC SYMMETRIX 27607008 21582721 27607008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t3d5 EMC SYMMETRIX 27609008 21582721 27609008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t3d7 EMC SYMMETRIX 2760C008 21582721 2760C008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t4d1 EMC SYMMETRIX 2760E008 21582721 2760E008 REG FIBRE SYMM4 RAID-S
/dev/rdsk/c18t4d2 EMC SYMMETRIX 2712B008 21582721 2712B008 REG FIBRE SYMM4 RAID-S

I'd like to be able to read this file and output some messages like:

There are 7 disks that are 14284801 bytes each totalling 99993607 bytes
There are 8 disks that are 21582721 bytes each totalling 172661768 bytes

I thought perhaps there was some shell, perl, or whatever command I could use in the ksh script to sort, count, and summarize on the 5th field (the disk size). Can anyone point me in the correct direction to look?

Scott
7 REPLIES 7
James R. Ferguson
Acclaimed Contributor

Re: Group and summarize data in shell script (like in SQL)

Hi Scott:

One way:

# cat ./groupit
#!/usr/bin/perl
use strict;
use warnings;
my ( $size, %disk_sizes, %disk_counts );
while (<>) {
($size) = (split)[4];
$disk_sizes {$size}+= $size;
$disk_counts{$size}++;
}
for $size ( sort keys %disk_sizes ) {
printf "There are %d disks that are %d bytes each totalling %d bytes\n",
$disk_counts{$size}, $size, $disk_sizes{$size};
}
1;

...run as:

# ./groupit file

Regards!

...JRF...
Scott Lindstrom_2
Regular Advisor

Re: Group and summarize data in shell script (like in SQL)

James - that looks fantastic! The only problem I have is perhaps an overflow situation perhaps? Here is my output:

There are 542 disks that are 14284801 bytes each totalling -1 bytes
There are 18 disks that are 21582721 bytes each totalling 388488978 bytes
There are 1 disks that are 2880 bytes each totalling 2880 bytes

The numbers are all correct except for the "-1". It should be 7742362142. Is that due to some overflow in perl, or something else?

Scott
James R. Ferguson
Acclaimed Contributor
Solution

Re: Group and summarize data in shell script (like in SQL)

Hi (again) Scott:

The "%d" specification can't handle values larger than 32-bits so use:

# cat ./groupit
#!/usr/bin/perl
use strict;
use warnings;
my ( $size, %disk_sizes, %disk_counts );
while (<>) {
($size) = (split)[4];
$disk_sizes {$size}+= $size;
$disk_counts{$size}++;
}
for $size ( sort keys %disk_sizes ) {
printf "There are %d disks that are %.0f bytes each totalling %.0f bytes\n",
$disk_counts{$size}, $size, $disk_sizes{$size};
}
1;

Regards!

...JRF...
Scott Lindstrom_2
Regular Advisor

Re: Group and summarize data in shell script (like in SQL)

James - that solved that!

In the meantime I am going to look into how I can display the output as "7.2 GB". (answer/1024/1024/1024)

(But if you want to chime in the meantime with the way to do it I would be forever grateful) :-)

Scott
James R. Ferguson
Acclaimed Contributor

Re: Group and summarize data in shell script (like in SQL)

Hi Scott:

Change the 'printf' to:

printf "There are %d disks that are %.0f bytes each totalling %7.2f GB\n",
$disk_counts{$size}, $size, $disk_sizes{$size}/1024/1024/1024;


...just the way you thought you would!

...JRF...
Scott Lindstrom_2
Regular Advisor

Re: Group and summarize data in shell script (like in SQL)

A million thanks James!
Dennis Handly
Acclaimed Contributor

Re: Group and summarize data in shell script (like in SQL)

Here is something using awk:

sort -k5,5 file | awk '
function print_entry() {
print "There are", count, "disks that are", last, "bytes each totaling",
last*count/(1024*1024), "Mb"
}
function enter_entry() {
last = $5
count = 1
}
BEGIN { getline; enter_entry() }
{
if ($5 == last) {
++count
next
}
print_entry()
enter_entry()
}
END { print_entry() }'