Finding duplicate filenames

Preet Dhillon · ‎10-15-2001

I'd like to find all files with the same names in a given filesystem. The difficulty is that I don't know what the filenames are - could be anything. Also, the files could live in dozens of different directories. Is there a way of listing all duplicate files with full paths to where they are ?
Many thanks in advance.

Nothing succeeds like excess

G. Vrijhoeven · ‎10-15-2001

Hi Preet:

to list duplicate names:

find . type f | awk -F / '{ print $NF }' | sort | uniq -d

hope this will help

Sridhar Bhaskarla · ‎10-15-2001

Preet,

Give this a try

#!/usr/bin/ksh

if [ $# -ne 1 ]
then
echo "$0: full path to the directory/filesystem"
exit 1
fi

DIR=$1

if [ ! -d $DIR ]
then
echo "$DIR no such directory"
exit 1
fi

if [ -f duplicates ]
then
rm duplicates
fi

find $DIR -type f > list$$
awk '{FS="/";print $NF}' list$$ |sort |uniq -d >> files$$
for FILE in `cat files$$`
do
grep $FILE list$$ >> tmp$$
for ENTRY in `cat tmp$$`
do
ONE=`echo $ENTRY|awk '{FS="/";print $NF}'`
if [ $ONE = $FILE ]
then
echo $ENTRY >> duplicates
fi
done
done

rm tmp$$ files$$ list$$
echo "Check the file "duplicates" in the current dir"

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Wim Rombauts · ‎10-15-2001

I think you best start with a "find" and put the output in a textfile. Then you can work on the textfile to find duplicate filesnames.

The find command looks like this :
find -print > filenames.txt

With 'sed', you can covert this file to a list of filenames without path and the patch itself on the same line. Then sort this file.

Sridhar Bhaskarla · ‎10-15-2001

Preet,

You can cut and paste my script. It gives the full paths to the duplicate files in a file called "duplicates" in the current directory.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

harry d brown jr · ‎10-15-2001

This will "GRAB" the file name from the path, put the file name first, and then the directory after it. That way you get what you waht, duplicate files in different directories.

find /var -type f|sed "s/$^\/.*\/$$.*$$/\2 \1/"|sort >filenames.out

Live Free or Die

Stefan Farrelly · ‎10-15-2001

And the winner is....Vrijhoeven

His answer handles spaces in file and director names (very important!) but needs some massaging to be complete;

1. cd ; find . -print > /tmp/tempfile
2. for i in $(find . -type f | awk -F / '{print $NF}'|sort|uniq -d)
do
grep "/${i}" /tmp/tempfile
done

Tested it, works great.

Im from Palmerston North, New Zealand, but somehow ended up in London...

Frank Slootweg · ‎10-15-2001

Maybe this is a start. I think this does what you want. It lists *all* files, but it lists the duplicate ones first.

$ cat ../doit
#! /usr/bin/sh

find . -type f |
{
while read filename
do
echo `dirname $filename` `basename $filename`
done
} | sort -k 2,2

$ ls -R
dir1 dir2 dir3

./dir1:
a c d

./dir2:
a b e

./dir3:
b c f

$ ../doit
./dir1 a
./dir2 a
./dir2 b
./dir3 b
./dir1 c
./dir3 c
./dir1 d
./dir2 e
./dir3 f
:;

sort -k 2,2

Sridhar Bhaskarla · ‎10-15-2001

Not to point out but...

Winner's script doesn't work in this scenario.
My directory structure is

.
./test1
./test1/test
./test1/sri
./test2
./test2/test
./test2/new
./test3
./test3/test
./test3/sri
./file

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Frank Slootweg · ‎10-15-2001

OOPS! Typos!
Please ignore the stuff after "./dir3 f", i.e. the silly ":;" and the standalone "sort -k 2,2 ".

Stefan Farrelly · ‎10-15-2001

oops, change the last grep to;

grep "${i}$" /tmp/tempfile

This will find the filenames at the end of the line. Works with the above test1,2,3 directory structure now.

Im from Palmerston North, New Zealand, but somehow ended up in London...

Robin Wakefield · ‎10-15-2001

Hi,

This method avoids having to run find twice:

===========================================
find . -type f | while read file ; do
echo $(dirname "$file")'\t'$(basename "$file")
done |
sort -t" " -k 2 - > /tmp/listfiles
uniq -f 1 -u /tmp/listfiles |
join -t" " -v 2 -1 2 -2 2 -o 2.1,2.2 - /tmp/listfiles |
sed 's+ +/+'
=============================================

Both join and sort commands have the tab character between the double quotes.

Rgds, Robin.

Stefan Farrelly · ‎10-15-2001

Hey Robin,

you sure your script works with spaces in a directory and filename ? usually dirname and basename dont work with spaces in names.

Im from Palmerston North, New Zealand, but somehow ended up in London...

Robin Wakefield · ‎10-15-2001

Hi Stefan,

Seems OK, as long as it's quoted.

# basename /tmp/a b c
# NOTHING !!

# dirname "/tmp/a b c"
/tmp
# basename "/tmp/a b c"
a b c

I tried a few variations, and it seemed to behave itself.

Cheers, Robin.

Robin Wakefield · ‎10-15-2001

...and there's a tab between the 1st pair of "+" characters in the final sed...Robin

Stefan Farrelly · ‎10-15-2001

Thanks for checking that Robin. OK, now we have 2 answers which do the business.

Preet - dont forget to assign points.

Im from Palmerston North, New Zealand, but somehow ended up in London...

harry d brown jr · ‎10-17-2001

Here is a perl script (attachment) I found that will actually find DUPLICATE files, and not just duplicate file names.

credits:

http://www.geocities.com/fcheck2000/download.html

Live Free or Die

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Finding duplicate filenames

Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames

Re: Finding duplicate filenames