HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- statistics by file extension (sic) required
Operating System - HP-UX
1825705
Members
3266
Online
109686
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Go to solution
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2008 05:19 PM
02-13-2008 05:19 PM
Hi
I wonder if you guys could help me please ?
My boss has given me a task that I thought might be straight forward and is proving too difficult for me. I know unix doesnt use file extensions as such but often files are created that end in *.dbf , *.doc , *.txt etc. He wants a scan of a full server which shows stats for each different file extension showing the capacity used by that file extension (in Gb preferably) and also the number of files that exist for each file extension.
I came across quite a good example when I googled for it but the example was for a Mac and the sed command looks like it might be slighly different on Mac as it gives an error when ran on HP. This is the syntax I found :
find / -fstype local -type f 2>/dev/null | tr '[:upper:]' '[:lower:]' | sed -Ee 's/^.*\/\.?//' -e 's/.*(\.[^.]*)/\1/' -e 's/^[^.]*$/NONE/' | sort | uniq -c | sort +0nr
The full item can be seen at http://ask.metafilter.com/21222/Most-idespread-file-format
I would much apreciate your help. thanks
I wonder if you guys could help me please ?
My boss has given me a task that I thought might be straight forward and is proving too difficult for me. I know unix doesnt use file extensions as such but often files are created that end in *.dbf , *.doc , *.txt etc. He wants a scan of a full server which shows stats for each different file extension showing the capacity used by that file extension (in Gb preferably) and also the number of files that exist for each file extension.
I came across quite a good example when I googled for it but the example was for a Mac and the sed command looks like it might be slighly different on Mac as it gives an error when ran on HP. This is the syntax I found :
find / -fstype local -type f 2>/dev/null | tr '[:upper:]' '[:lower:]' | sed -Ee 's/^.*\/\.?//' -e 's/.*(\.[^.]*)/\1/' -e 's/^[^.]*$/NONE/' | sort | uniq -c | sort +0nr
The full item can be seen at http://ask.metafilter.com/21222/Most-idespread-file-format
I would much apreciate your help. thanks
Solved! Go to Solution.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2008 06:42 PM
02-13-2008 06:42 PM
Re: statistics by file extension (sic) required
Someone might already have a script that does this but...
You can easily adjust the given command to work in hp-ux.
find /etc -type f | sed -e 's?/.*/??' | grep "\." | sed -e 's/.*\.//' | sort | uniq -c |sort +0nr |more
(The two sed and the grep can probably be simplified into one sed but I dont have my reference book with me right now. I am sure someone will correct me)
The capacity as you say is more involved. Probably start with something like this
find /etc -type f -exec ll {} \; | awk '{print $5" "$9}' | sed -e 's?/.*/??' > /size-name
You now have a long listing of the size and the name of each file.
From here on you have at least two options.
1) Use an excel spreadsheet to sort filenames and sum up the sizes
2) write a script to grep each extension and use the "bc" or "dc" calculators to add up the size of each file
You can easily adjust the given command to work in hp-ux.
find /etc -type f | sed -e 's?/.*/??' | grep "\." | sed -e 's/.*\.//' | sort | uniq -c |sort +0nr |more
(The two sed and the grep can probably be simplified into one sed but I dont have my reference book with me right now. I am sure someone will correct me)
The capacity as you say is more involved. Probably start with something like this
find /etc -type f -exec ll {} \; | awk '{print $5" "$9}' | sed -e 's?/.*/??' > /size-name
You now have a long listing of the size and the name of each file.
From here on you have at least two options.
1) Use an excel spreadsheet to sort filenames and sum up the sizes
2) write a script to grep each extension and use the "bc" or "dc" calculators to add up the size of each file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2008 06:55 PM
02-13-2008 06:55 PM
Solution
I know I rpelied a similar question (for OpenVMS) hger before but can not readily find it. Anyway... here is a perl 'one-liner':
find . | perl -ne $ find . | perl -ne 'chomp; $t=(/[^.\/]\.(\w+)$/)?$1:"?"; $c{$t}++; $s{$t}+=-s $_ }{ for (sort keys %c) { printf "%6d %5.1fmb %s\n", $c{$_}, $s{$_}/1048576, $_}'
But is is better written as a (perl) script...
---- by_file_extention.pl -------
use strict;
my ($extention, %size, %count);
while (<>) {
chomp;
$extention=lc((/[^.\/]\.(\w+)$/)?$1:"? no extention");
$count{$extention}++;
$size{$extention}+=-s;
}
for (sort keys %count) {
printf "%6d %5.1fmb %s\n", $count{$_}, $size{$_}/(1024*1024), $_;
}
For the usage example I just used a local directory structure. And I report 'kb', not gb. Easy edit for label and and extra *1024
And you for the 'unkown' extention you may want to code upa -d (directory) test and call it that. Left as an excersize!
find . | perl by_file_extention.pl
50 3.1mb ? no extention
5 0.0mb awk
5 0.0mb c
1 0.0mb dos
2 0.0mb el
2 1.1mb exe
:
find . | perl -ne $ find . | perl -ne 'chomp; $t=(/[^.\/]\.(\w+)$/)?$1:"?"; $c{$t}++; $s{$t}+=-s $_ }{ for (sort keys %c) { printf "%6d %5.1fmb %s\n", $c{$_}, $s{$_}/1048576, $_}'
But is is better written as a (perl) script...
---- by_file_extention.pl -------
use strict;
my ($extention, %size, %count);
while (<>) {
chomp;
$extention=lc((/[^.\/]\.(\w+)$/)?$1:"? no extention");
$count{$extention}++;
$size{$extention}+=-s;
}
for (sort keys %count) {
printf "%6d %5.1fmb %s\n", $count{$_}, $size{$_}/(1024*1024), $_;
}
For the usage example I just used a local directory structure. And I report 'kb', not gb. Easy edit for label and and extra *1024
And you for the 'unkown' extention you may want to code upa -d (directory) test and call it that. Left as an excersize!
find . | perl by_file_extention.pl
50 3.1mb ? no extention
5 0.0mb awk
5 0.0mb c
1 0.0mb dos
2 0.0mb el
2 1.1mb exe
:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2008 09:03 PM
02-13-2008 09:03 PM
Re: statistics by file extension (sic) required
Here is a version which adds directory logic, and a grand total.
Note, those elements are labeled with ~~ to make them sort towards then end.
Example:
$ find /opt | perl by_file_extension.pl
find: ... some messages for STDERR open ...
25 0.1 0
:
1318 0.0 pl
1834 0.0 pm
568 0.0 png
:
5 0.0 zip
4743 0.0 ~~ Directory
3869 1.0 ~~ No extension
51994 3.7 ~~~ Grand Total ~~~
Updated source
----------- by_file_extension.pl -----
use strict;
my ($extension, %size, %count);
while (<>) {
chomp;
$extension=(/[^.\/]\.(\w+)$/) ? lc($1) : (-d) ? "~~ Directory" : "~~ No extension";
$count{$extension}++;
$size{$extension}+=-s;
$count{"~~~ Grand Total ~~~"}++;
$size{"~~~ Grand Total ~~~"}+=-s;
}
for (sort keys %count) {
printf "%8d %7.1f %s\n", $count{$_}, $size{$_}/(2**30), $_;
}
So I match the file names found by find with:
/[^.\/]\.(\w+)$/
So is looks for ....
[^.\/] = NOT ( a dot or a slash ), excluding 'hidden' files as extension.
\. = a dot
(\w+) = 1 or more 'word' characters (a-z, 0-9, _) ... and remember in $1
$ = at the enf of the line.
If matches, then use the lower case for $1 (the word) as a key in an associative array.
So .EXE is counted with .exe
If not match, then check whether it is a directory ( -d ) and pick an artificial extension name based on result.
Enjoy!
Hein.
Note, those elements are labeled with ~~ to make them sort towards then end.
Example:
$ find /opt | perl by_file_extension.pl
find: ... some messages for STDERR open ...
25 0.1 0
:
1318 0.0 pl
1834 0.0 pm
568 0.0 png
:
5 0.0 zip
4743 0.0 ~~ Directory
3869 1.0 ~~ No extension
51994 3.7 ~~~ Grand Total ~~~
Updated source
----------- by_file_extension.pl -----
use strict;
my ($extension, %size, %count);
while (<>) {
chomp;
$extension=(/[^.\/]\.(\w+)$/) ? lc($1) : (-d) ? "~~ Directory" : "~~ No extension";
$count{$extension}++;
$size{$extension}+=-s;
$count{"~~~ Grand Total ~~~"}++;
$size{"~~~ Grand Total ~~~"}+=-s;
}
for (sort keys %count) {
printf "%8d %7.1f %s\n", $count{$_}, $size{$_}/(2**30), $_;
}
So I match the file names found by find with:
/[^.\/]\.(\w+)$/
So is looks for ....
[^.\/] = NOT ( a dot or a slash ), excluding 'hidden' files as extension.
\. = a dot
(\w+) = 1 or more 'word' characters (a-z, 0-9, _) ... and remember in $1
$ = at the enf of the line.
If matches, then use the lower case for $1 (the word) as a key in an associative array.
So .EXE is counted with .exe
If not match, then check whether it is a directory ( -d ) and pick an artificial extension name based on result.
Enjoy!
Hein.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Support
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP