Operating System - HP-UX
1822909 Members
3748 Online
109645 Solutions
New Discussion юеВ

Finding duplicate filenames

 
SOLVED
Go to solution
Preet Dhillon
Advisor

Finding duplicate filenames

I'd like to find all files with the same names in a given filesystem. The difficulty is that I don't know what the filenames are - could be anything. Also, the files could live in dozens of different directories. Is there a way of listing all duplicate files with full paths to where they are ?
Many thanks in advance.
Nothing succeeds like excess
16 REPLIES 16
G. Vrijhoeven
Honored Contributor
Solution

Re: Finding duplicate filenames

Hi Preet:

to list duplicate names:

find . type f | awk -F / '{ print $NF }' | sort | uniq -d

hope this will help
Sridhar Bhaskarla
Honored Contributor

Re: Finding duplicate filenames

Preet,

Give this a try

#!/usr/bin/ksh

if [ $# -ne 1 ]
then
echo "$0: full path to the directory/filesystem"
exit 1
fi

DIR=$1

if [ ! -d $DIR ]
then
echo "$DIR no such directory"
exit 1
fi

if [ -f duplicates ]
then
rm duplicates
fi

find $DIR -type f > list$$
awk '{FS="/";print $NF}' list$$ |sort |uniq -d >> files$$
for FILE in `cat files$$`
do
grep $FILE list$$ >> tmp$$
for ENTRY in `cat tmp$$`
do
ONE=`echo $ENTRY|awk '{FS="/";print $NF}'`
if [ $ONE = $FILE ]
then
echo $ENTRY >> duplicates
fi
done
done

rm tmp$$ files$$ list$$
echo "Check the file "duplicates" in the current dir"

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Wim Rombauts
Honored Contributor

Re: Finding duplicate filenames

I think you best start with a "find" and put the output in a textfile. Then you can work on the textfile to find duplicate filesnames.

The find command looks like this :
find -print > filenames.txt

With 'sed', you can covert this file to a list of filenames without path and the patch itself on the same line. Then sort this file.
Sridhar Bhaskarla
Honored Contributor

Re: Finding duplicate filenames

Preet,

You can cut and paste my script. It gives the full paths to the duplicate files in a file called "duplicates" in the current directory.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
harry d brown jr
Honored Contributor

Re: Finding duplicate filenames


This will "GRAB" the file name from the path, put the file name first, and then the directory after it. That way you get what you waht, duplicate files in different directories.

find /var -type f|sed "s/\(^\/.*\/\)\(.*$\)/\2 \1/"|sort >filenames.out
Live Free or Die
Stefan Farrelly
Honored Contributor

Re: Finding duplicate filenames


And the winner is....Vrijhoeven

His answer handles spaces in file and director names (very important!) but needs some massaging to be complete;

1. cd ; find . -print > /tmp/tempfile
2. for i in $(find . -type f | awk -F / '{print $NF}'|sort|uniq -d)
do
grep "/${i}" /tmp/tempfile
done

Tested it, works great.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Frank Slootweg
Honored Contributor

Re: Finding duplicate filenames

Maybe this is a start. I think this does what you want. It lists *all* files, but it lists the duplicate ones first.

$ cat ../doit
#! /usr/bin/sh

find . -type f |
{
while read filename
do
echo `dirname $filename` `basename $filename`
done
} | sort -k 2,2

$ ls -R
dir1 dir2 dir3

./dir1:
a c d

./dir2:
a b e

./dir3:
b c f

$ ../doit
./dir1 a
./dir2 a
./dir2 b
./dir3 b
./dir1 c
./dir3 c
./dir1 d
./dir2 e
./dir3 f
:;


sort -k 2,2
Sridhar Bhaskarla
Honored Contributor

Re: Finding duplicate filenames

Not to point out but...

Winner's script doesn't work in this scenario.
My directory structure is

.
./test1
./test1/test
./test1/sri
./test2
./test2/test
./test2/new
./test3
./test3/test
./test3/sri
./file

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Frank Slootweg
Honored Contributor

Re: Finding duplicate filenames

OOPS! Typos!
Please ignore the stuff after "./dir3 f", i.e. the silly ":;" and the standalone "sort -k 2,2 ".
Stefan Farrelly
Honored Contributor

Re: Finding duplicate filenames


oops, change the last grep to;

grep "${i}$" /tmp/tempfile

This will find the filenames at the end of the line. Works with the above test1,2,3 directory structure now.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Robin Wakefield
Honored Contributor

Re: Finding duplicate filenames

Hi,

This method avoids having to run find twice:

===========================================
find . -type f | while read file ; do
echo $(dirname "$file")'\t'$(basename "$file")
done |
sort -t" " -k 2 - > /tmp/listfiles
uniq -f 1 -u /tmp/listfiles |
join -t" " -v 2 -1 2 -2 2 -o 2.1,2.2 - /tmp/listfiles |
sed 's+ +/+'
=============================================

Both join and sort commands have the tab character between the double quotes.

Rgds, Robin.
Stefan Farrelly
Honored Contributor

Re: Finding duplicate filenames

Hey Robin,

you sure your script works with spaces in a directory and filename ? usually dirname and basename dont work with spaces in names.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Robin Wakefield
Honored Contributor

Re: Finding duplicate filenames

Hi Stefan,

Seems OK, as long as it's quoted.

# basename /tmp/a b c
# NOTHING !!

# dirname "/tmp/a b c"
/tmp
# basename "/tmp/a b c"
a b c

I tried a few variations, and it seemed to behave itself.

Cheers, Robin.
Robin Wakefield
Honored Contributor

Re: Finding duplicate filenames

...and there's a tab between the 1st pair of "+" characters in the final sed...Robin
Stefan Farrelly
Honored Contributor

Re: Finding duplicate filenames


Thanks for checking that Robin. OK, now we have 2 answers which do the business.

Preet - dont forget to assign points.
Im from Palmerston North, New Zealand, but somehow ended up in London...
harry d brown jr
Honored Contributor

Re: Finding duplicate filenames

Here is a perl script (attachment) I found that will actually find DUPLICATE files, and not just duplicate file names.

credits:

http://www.geocities.com/fcheck2000/download.html

Live Free or Die