1833866 Members
2522 Online
110063 Solutions
New Discussion

bulk delete of files

 
Mangal Pandey
Occasional Contributor

bulk delete of files

I'm getting around 1 million files daily to delete from the system. All these entries i recieve throug a file where every line is the absolute path of a file to be deleted. Currently i'm reading this file line by line and deleting all the files one by one.
It is taking too much of time to process million entries. Is there any way i can delete files in bulk from the system.
5 REPLIES 5
A. Clay Stephenson
Acclaimed Contributor

Re: bulk delete of files

You could obviously read perhaps 50 entries at a time using xargs and send them in groups to the rm command but that is only going to help marginally. The fundamental problem is the locking of the directory while each and every update is done.
If it ain't broke, I can fix that.
Dennis Handly
Acclaimed Contributor

Re: bulk delete of files

Are these all small files?

As Clay said, the most you could do is use xargs to cut down the cost of invoking rm 1 million times.

Are these files scattered all over the system, or in a very few directories?

If the former, you may want to try doing separate xargs/rms for separate directories.
I'm not sure if sorting the files will help or hurt?
kenj_2
Advisor

Re: bulk delete of files

Hi Mangal -

Any concurrent access to the same file system where you are deleting files will slow things down. Directory locking is managed with shared read and exclusive write locks. Thus, two processes can be reading the directory at the same time, however, if a process wants to update the directory, it will have to wait for the reader processes to exit to obtain an exclusive write lock, which will cause other writers and readers to block.

With early JFS (3.1) using a full path to access files was very inefficient. Later JFS versions don't have this problem and have other improvements in directory management.

If you could get these incoming files organized into subdirectories then it would be efficient to do parallel processing.

Ken Johnson


Hein van den Heuvel
Honored Contributor

Re: bulk delete of files


a) retain what you need daily and blow away the whole filesystem weekly?

b) are you using a shell script to read and remove? That will fork a process for each delete, unless you manage to aggregate arguments (with xargs or otherwise). Try a program to read the file and call teh delete service instead of the delete tool (rm).

For example the following two word perl program might just be faster than your script:

$ perl -ne 'chomp; unlink' list-of-files

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
Peter Nikitka
Honored Contributor

Re: bulk delete of files

Hi,

if your filenames do not contain a space or TAG chatacter, try a simple
xargs rm -f
mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"