Operating System - HP-UX
1833685 Members
3991 Online
110062 Solutions
New Discussion

How to delete a large number of small files?

 
SOLVED
Go to solution
yongye_1
Advisor

How to delete a large number of small files?

Hi,

I got a directory contains a number of 300k small files(aroud 200GB). I wonder if there is a way to delete them quickly. I experienced 3+ hours to delete them by deleting the directory directly(rm -rf directory).

OS: HP-UX B.11.11.
Server: Superdome.
Storage: XP1024.

Regards,
Yongye
19 REPLIES 19
Ivan Krastev
Honored Contributor

Re: How to delete a large number of small files?

You can try to delete it in reverse order:

#rm z*
#rm y*
#rm x*
...

#rm 1*
#rm 0*



regards,
ivan
Peter Godron
Honored Contributor

Re: How to delete a large number of small files?

Hi,
if they are all contained on one lvol, newfs the lvol and restore the backup of the files you do need.
Is 3Hrs for 200GB that bad ?
Dennis Handly
Acclaimed Contributor

Re: How to delete a large number of small files?

>You can try to delete it in reverse order:
#rm z*

I'm not sure if reverse would help but instead of doing the directory search (z*) 26*2+10 times, you may want to use ls, then sort -r then xargs to remove a bunch at a time:

ls | sort -r | xargs -n40 rm -rf

I'm not sure this would be any faster than rm -rf?
Matti_Kurkela
Honored Contributor

Re: How to delete a large number of small files?

When doing any file operations in a directory with hundreds of thousands of files, you must remember the following:

1.) Using shell wildcards directly is bad, because it causes the filenames to be expanded by the shell. The expansion will make the command line extremely long, maybe longer than the shell can handle. Furthermore, the expanded list will be sorted alphabetically, which may take a non-trivial time and memory when the amount of files is huge.

2.) Using 'ls' command is almost as bad: it will also try sort the filenames before displaying the output, which makes it a waste of resources. In a huge directory, the system needs to skip back and forth in the directory structure, which takes time unless the directory structure fits entirely in the caches. If this is a production server which already has a significant workload, you may have problems.

When dealing with huge directories like this, the tool of choice is 'find'. It will browse through the directory structure in the order files are stored on the disk, *without sorting the list in any way*. Proceeding in the disk order (vs. alphabetical order) will allow the directory manipulation operation to be done as one pass through the directory structure.

Theoretically, the "rm -rf directory" without any wildcards should be as fast as this kind of a recursive removal operation can possibly be.

Unfortunately, the slowness may be caused by the directory structure being fragmented to multiple parts, so most of the time is spent seeking back and forth across the disk. The VxFS filesystem normally handles fragmentation quite well, but heavy usage with lots of small files being created and deleted may cause fragmentation. If you want to defragment a VxFS, see "man fsadm_vxfs".


If you have a need to completely clean huge directories like this frequently, consider making the respective directories into separate filesystems: unmounting the filesystem, running a mkfs on it and then mounting the now-empty filesystem may well be much faster than 'rm -rf', when the number of files in one directory is large. This will also eliminate any fragmentation.
MK
Steven E. Protter
Exalted Contributor
Solution

Re: How to delete a large number of small files?

Shalom,

My solution:
There are many

ls -1 > list

while read -r filename
do
rm -f $filename
done < list

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Yogeeraj_1
Honored Contributor

Re: How to delete a large number of small files?

hi Yongye,

my two perferred way are:
1. create a new file system and move the rest of the files to the new file system. (adjust the mount points accordingly..

2. perform delete using parallel commands


hope this helps!

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
James R. Ferguson
Acclaimed Contributor

Re: How to delete a large number of small files?

Hi:

If you are removing *everything*, as noted, use 'mkfs' to destroy and recreate the entire filesystem. Do this even if you have to followup with a list of empty directories to create.

I would believe that you are probably more selective in your deletion, however. Thus, my choice would be to use a small Perl script that makes one-pass through the filesystem directory, removing candidates that meet the criteria, "inplace".

# perl -MFile::Find -le 'find(sub{unlink if -f && -M >= 30},"/mypath")'

This simple script searches the directory (or directories) of you choice, looking for *files* that have not been modified in 30 or more days and unlink()'s (removes) them.

Modify the age and path according to your needs.

Regards!

...JRF...
yongye_1
Advisor

Re: How to delete a large number of small files?

Hi SEP,

I tried your solution. It cost me 1+ hour to delete them.

228GB/293885 files
real 1:04:01.4
user 8:33.8
sys 53:40.6

Can you explain the details why it spent less time than rm -rf direcotry? Thanks.

Hi others,

I want thank you provide me the solution. I have given you the point. And I will try your solution next time and give the result to you.

Yongye

Dave Hutton
Honored Contributor

Re: How to delete a large number of small files?

My guess why it is faster is rm -rf probably builds list of files before removing them.

Where Stevens takes the work that the shell would have to do to build that list and tells it directly which files to remove.

I know sometimes when you have a lot of files and you rm * you would get arg list too long.
So you know its building a list. My guess is even without a wildcard it still builds the list.





A. Clay Stephenson
Acclaimed Contributor

Re: How to delete a large number of small files?

I would be very surprised if building a list beforehand results in a consistant 3X performance increase. I suspect that this was not an apples to apples comparison in that the number of files was different, system loads were different, or the filesystem itself was more active. At most, I would expect that pre-building the list only results in an extremely modest improvement because almost all of the time is done in searching the directory (which is a linear search) and in locking the directory while the updates are done. The real answer to your problem is rethinking the design because that many files in one directory is never a good idea.
If it ain't broke, I can fix that.
Ana María
Occasional Contributor

Re: How to delete a large number of small files?

Hi there.
To empty that kind of directories containing too many small files I use a simple find:
find . -size +100c -exec rm {} \;
You can increase/decrease the size number.
Ana María
Occasional Contributor

Re: How to delete a large number of small files?

Hi there.
To empty that kind of directories containing too many small files I use a simple find:
find . -size +100c -exec rm {} \;
You can increase/decrease the size number.
Anyway, I assume there is nothing faster than delete the entire folder.
James R. Ferguson
Acclaimed Contributor

Re: How to delete a large number of small files?

Hi Yongye:

As a matter of interest, I've experimented a bit. I would agree with Clay, that most approaches probably yield similar times.

I simply wrote a small shell script to create 5,000 files in '/tmp' on a test server. I then compared (using 'timex') the execution times for the Perl script I suggested; to Dennis's divide-and-conquer solution; to a simple 'rm -rf /path'.

Statistically the execution times are the same with the 'rm -rf' perhaps being slightly faster.

I believe that the search time (in this case) was negligible compared to the time required to lock the directory for updates.

Note that the *slowest* approach that one can take in situations like this is to use 'find' with '-exec'. This spawns a new process for *every* file to be removed. The use of a piped 'xargs' eliminates this. After all, birthing isn't called "labor" without reason.

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: How to delete a large number of small files?

>JRF: Note that the *slowest* approach that one can take in situations like this is to use 'find' with '-exec'. This spawns a new process for *every* file to be removed.

Not if you use -exec ... +, it is like xargs.

Here is a silly idea. What if you /etc/unlink on the directory. Would using fsck to recover the lost disk space be faster since no locks?
James R. Ferguson
Acclaimed Contributor

Re: How to delete a large number of small files?

Hi Dennis:

> ..if you use -exec ... +, it is like xargs.

Ah, yes, now I recall that is the benefit of using "+" as the '-exec cmd' terminator. Thanks for that correction. It's documented in the 'find' manpages too. Old habits sometimes die hard.

As for using '/usr/sbin/unlink' --- that's an interesting concept as you noted. Perhaps if it were exercised for *files* and not directories, or at least files *first* and then empty directories, then some speed could be gained. The manpages for 'link(1M)' note the appropriate caution.

Regards!

...JRF...
Dennis Handly
Acclaimed Contributor

Re: How to delete a large number of small files?

>JRF: The manpages for 'link(1M)' note the appropriate caution.

There is no "appropriate caution" since we are trying to fool it and create orphans.

Probably removing the directory would just put the files in lost+found, we would have to tell fsck not to do that. And with files, unlink would just remove them and act as rm(1).

But if someone is going to try the newfs solution, they could try this first?
James R. Ferguson
Acclaimed Contributor

Re: How to delete a large number of small files?

Hi (again) Dennis:

My thinking was that if we "...exercised [unlink] for *files* and not directories, or at least files *first* and then empty directories", then your suggestion might gain speed.

I had not thought of trying to leave orphaned files. That was my reference to "appropriate caution".

In my mind, still central to this whole thread's discussion is whether or not the author has a directory in which only *some* of its files (but not subdirectories?) are to be removed, or if the entire directory can be recursively removed en masse.

Thanks for the thought provoking discussion points!

Regards!

...JRF...
Yogeeraj_1
Honored Contributor

Re: How to delete a large number of small files?

Hi JRF/Dennis,

This is indeed a very discussion topic, maybe we can create a new thread and post our test results and findings so that this is benefical others in the future.

We all come across such issues during our administration tasks at some points in time.

just some thoughts!

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Yogeeraj_1
Honored Contributor

Re: How to delete a large number of small files?

Hi JRF/Dennis,

This is indeed a very interesting discussion topic, maybe we can create a new thread and post our test results and findings so that this is benefical others in the future.

We all come across such issues during our administration tasks at some points in time.

just some thoughts!

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)