Re: Q about huge filesystem handling

Gordon_3 · ‎12-02-2008

Hi all,

I come across quite a standard headache problem here and wanna to listen any good suggestions on how to handle this. Our environment has deployed the NAS techonlogy here, which are "widely" NFS shared by big number of Unix hosts, and no doubt the file system size is growthing bigger & bigger. If I want to write a script to report those "ascii" file size, how should we tackle this, those cmd come up to my mind a.t.m. like "find / -name "log"" , file , ls -l etc etc, but considering the FS size, don't think it's feasible, any "advanced" function I can use to get those info?

Gordon

saravanan08 · ‎12-02-2008

for file system size

use command
# bdf

for file and directory size u can use
#du

for file size

for example if u need names of file size whose size is greater than 100 bytes
use
find . -size +100 -print

i think this might b useful for u

thank u

Gordon_3 · ‎12-02-2008

Hi, yeah I also think of those cmd, but the tricky part here is I need to traverse the whole FS starting from "/" , ( maybe by find ), and then base on the list, then maybe "grep" those name with "log" and do a "ls -l" on that,however as the size we talk about here is very big, I guess > 100k of files and in terms of "TB" of size, so those normally Unix cmd like "find" etc may hit issue for

1. Run for a month or even year to finish :)
2. The internal buffer of the cmd will get full during execution of such huge list

Gordon

Steven Schweda · ‎12-03-2008

> 2. The internal buffer of the cmd will get
> full during execution of such huge list

Avoiding problems like that is what "find" is
good for.

You might start by writing a script to do
what you need done, and testing it on
something smaller than the entire universe.
That should give you some idea how long it
would need to work on the whole thing. Then,
if it really is too slow, you can worry about
making it faster.

> [...] any "advanced" function I can use to
> get those info?

I don't know what an '"ascii" file size' is,
so it's not entirely clear to me what,
exactly, "those info" means.

Gordon_3 · ‎12-03-2008

HI Steven,

Thx, for ascii file, what I refer to is like below

file < a file >
rc: ascii text

Coz my ultimate goal is to count the size of those ascii text files, and make an estimation on how many % can be saved by "gzip" it. ( Binary is excluded )

Gordon

James R. Ferguson · ‎12-03-2008

Hi Gordon:

If I understand correctly, you want to examine every file under the root ('/') file system looking only for ASCII files (i.e. non-binary, "text" files) and report their names and character sizes.

If that is all you want, Perl does this effortlessly:

# perl -MFile::Find -e 'find(sub{printf "%10d %s\n",(stat($_))[7],$File::Find::name if -T $_},@ARGV)' /

Notice that I passed "/" as the argument to the script. You can specify any number of directories (e.g. "/var /usr /apps") or you can 'cd' to a directory and simply specify "." for the current directory argument.

Tranversing the entire filesystem ("/") isn't inexpensive, but if that's your requirement the above will perform about as well as you can expect without limitations.

Regards!

...JRF...

Steven Schweda · ‎12-03-2008

> Coz my ultimate goal is to count the size
> of those ascii text files, and make an
> estimation on how many % can be saved by
> "gzip" it.

The compression fraction you'll get from gzip
(or any other program) will depend on what's
in the file(s) to be compressed. Estimating
without actually compressing some typical
files may be quite inaccurate.

> ( Binary is excluded )

Why? Binary files are often quite
compressible. Again, actual tests are more
reliable than my guesses.

OldSchool · ‎12-03-2008

the downside is that, as this is disk that is NFS shared with a number of servers, you may have little idea of the impact this would have on other users or applications were you to simply look for every "ascii" file and start zipping them.

James R. Ferguson · ‎12-04-2008

Hi (again) Gordon:

For completeness and safety, I should have explictly limited your searches to regular files, although in most cases the test for "text" files would have returned that subset. This avoids any problems if you scan device files:

# perl -MFile::Find -e 'find(sub{printf "%10d %s\n",(stat($_))[7],$File::Find::name if -f $_ && -T _},@ARGV)' /path

Regards!

...JRF...

Gordon_3 · ‎12-08-2008

HI James,

Thx for yr gr8 advice agai ( and thx all also ), btw I did try to run on 1 of the comparable small volumes, it almost iterate a day to finis it :)

However the difficult situation is the output is too long, and even using a Excel cannot open it ( too many listing as u can expect ), so if I want to further "limit my scope" to filer those not updated for 1/2 year by timestamp, how to tune yr Perl script ? Thx. ( sorry I am a novice on Perl ) :)

Gordon

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Q about huge filesystem handling

Q about huge filesystem handling