- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Q about huge filesystem handling
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2008 11:18 PM
12-02-2008 11:18 PM
I come across quite a standard headache problem here and wanna to listen any good suggestions on how to handle this. Our environment has deployed the NAS techonlogy here, which are "widely" NFS shared by big number of Unix hosts, and no doubt the file system size is growthing bigger & bigger. If I want to write a script to report those "ascii" file size, how should we tackle this, those cmd come up to my mind a.t.m. like "find / -name "log"" , file
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2008 11:29 PM
12-02-2008 11:29 PM
Solutionuse command
# bdf
for file and directory size u can use
#du
for file size
for example if u need names of file size whose size is greater than 100 bytes
use
find . -size +100 -print
i think this might b useful for u
thank u
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2008 11:36 PM
12-02-2008 11:36 PM
Re: Q about huge filesystem handling
1. Run for a month or even year to finish :)
2. The internal buffer of the cmd will get full during execution of such huge list
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2008 12:03 AM
12-03-2008 12:03 AM
Re: Q about huge filesystem handling
> full during execution of such huge list
Avoiding problems like that is what "find" is
good for.
You might start by writing a script to do
what you need done, and testing it on
something smaller than the entire universe.
That should give you some idea how long it
would need to work on the whole thing. Then,
if it really is too slow, you can worry about
making it faster.
> [...] any "advanced" function I can use to
> get those info?
I don't know what an '"ascii" file size' is,
so it's not entirely clear to me what,
exactly, "those info" means.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2008 12:26 AM
12-03-2008 12:26 AM
Re: Q about huge filesystem handling
Thx, for ascii file, what I refer to is like below
file < a file >
rc: ascii text
Coz my ultimate goal is to count the size of those ascii text files, and make an estimation on how many % can be saved by "gzip" it. ( Binary is excluded )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2008 05:38 AM
12-03-2008 05:38 AM
Re: Q about huge filesystem handling
If I understand correctly, you want to examine every file under the root ('/') file system looking only for ASCII files (i.e. non-binary, "text" files) and report their names and character sizes.
If that is all you want, Perl does this effortlessly:
# perl -MFile::Find -e 'find(sub{printf "%10d %s\n",(stat($_))[7],$File::Find::name if -T $_},@ARGV)' /
Notice that I passed "/" as the argument to the script. You can specify any number of directories (e.g. "/var /usr /apps") or you can 'cd' to a directory and simply specify "." for the current directory argument.
Tranversing the entire filesystem ("/") isn't inexpensive, but if that's your requirement the above will perform about as well as you can expect without limitations.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2008 06:32 AM
12-03-2008 06:32 AM
Re: Q about huge filesystem handling
> of those ascii text files, and make an
> estimation on how many % can be saved by
> "gzip" it.
The compression fraction you'll get from gzip
(or any other program) will depend on what's
in the file(s) to be compressed. Estimating
without actually compressing some typical
files may be quite inaccurate.
> ( Binary is excluded )
Why? Binary files are often quite
compressible. Again, actual tests are more
reliable than my guesses.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2008 08:13 AM
12-03-2008 08:13 AM
Re: Q about huge filesystem handling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2008 07:01 AM
12-04-2008 07:01 AM
Re: Q about huge filesystem handling
For completeness and safety, I should have explictly limited your searches to regular files, although in most cases the test for "text" files would have returned that subset. This avoids any problems if you scan device files:
# perl -MFile::Find -e 'find(sub{printf "%10d %s\n",(stat($_))[7],$File::Find::name if -f $_ && -T _},@ARGV)' /path
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2008 12:00 AM
12-08-2008 12:00 AM
Re: Q about huge filesystem handling
Thx for yr gr8 advice agai ( and thx all also ), btw I did try to run on 1 of the comparable small volumes, it almost iterate a day to finis it :)
However the difficult situation is the output is too long, and even using a Excel cannot open it ( too many listing as u can expect ), so if I want to further "limit my scope" to filer those not updated for 1/2 year by timestamp, how to tune yr Perl script ? Thx. ( sorry I am a novice on Perl ) :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2008 12:13 AM
12-08-2008 12:13 AM
Re: Q about huge filesystem handling
For find(1), you can use: -mtime -183
This only does the ones modified in the last 1/2 year. +183 for the reverse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2008 05:34 AM
12-08-2008 05:34 AM
Re: Q about huge filesystem handling
> filter out those not updated for 1/2 year by timestamp
By this, I assume that you want files that are older than about 183 days. Simply do:
# # perl -MFile::Find -e 'find(sub{printf "%10d %s\n",(stat($_))[7],$File::Find::name if -f $_ && -M _ > 183 && && -T _},@ARGV)' /path
The '-M' has units of days and represents the last modification timestamp ('mtime'). Fractional days are allowed and you can compare '>=', '>', '==', '=<' or '<' to a value.
By the way, as a professional forum, please use professional English and skip the chat-room, instant-message abbreviations.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2008 05:38 AM
12-08-2008 05:38 AM
Re: Q about huge filesystem handling
Oops! I have an extra "and". The last post should be:
# perl -MFile::Find -e 'find(sub{printf "%10d %s\n",(stat($_))[7],$File::Find::name if -f $_ && -M _ > 183 && -T _},@ARGV)' /path
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2008 07:05 AM
12-08-2008 07:05 AM
Re: Q about huge filesystem handling
So don't let a billy-boy tool do man's job!
Do you really want to know those file names?
WHY would you stick this stuff in excel if you hpux script can just properly report all you want to know right there.
Just change the perl script graciouisly offered and make it add up instead of report files.
Something along the lines of (untested)...
use File::Find;
use strict;
my ($files, $target_files,$bytes)
find ( sub{
$files++;
if ( -f $_ && -M _ > 183 && -T $_) {
$bytes+=(stat($_))[7];
$target_files++;
}
},shift);
printf "$bytes bytes in $target_files/$files files.\n"
With that in place it will be trivial generate an array of files and bytes based on age: over a year, 1/2 year - 1 year, less than 1/2 year.
It will also be easy enough to group those text files in for example '.log' and '.csv' and 'others' to be able to use better emperical compression data points.
Of course you want that perl find subroutin to become a proper, labeled, subroutine if you add any more complexity.
Good luck!
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-08-2008 07:03 PM
12-08-2008 07:03 PM
Re: Q about huge filesystem handling
Or you can take the approach that every file stored is important and must never be removed, so just keeping adding more terabytes. It would certainly improve the performance of the system.
Personally, I would require each Unix system administrator to manage their disk space as if it were not shared. In Unix, ordinary users are restricted to their home directories and /tmp, /var/tmp to create files. Simple quotas resolve home, and the tmp directories are, well, temporary. Just slash and burn old files and oversize files. Until disk costs are measured in a few dollars per petabyte, managing disk space will still be the sysadmin's responsibility.
Bill Hassell, sysadmin