Script Help

Cem Tugrul · ‎12-02-2004

Hi forum,

Let's say i have a directory which includes thousand of files and i want to
compare each file's contents with the others bye one one and try to find out repeated
files(contents-records)

Help....

Our greatest duty in this life is to help others. And please, if you can't

Steven E. Protter · ‎12-02-2004

The command is probably diff

diff file1 file2

You can build a script to read file lists and create diff output.

Do you need help setting up such a looping script?

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Rodney Hills · ‎12-02-2004

I might do the following-

cksum * | sort

This would run checksum on all the files then sort by checksum value. Those files that were the same would sort together with the same checksum value.

HTH

-- Rod Hills

There be dragons...

Fred Ruffet · ‎12-02-2004

Do you mean you want to suppress duplicate files or duplicate lines across different files.

Case 1 correponds to what SEP says (diff solution).

In case 2, you could cat all files through sort and uniq commands and get one file with unrepeated records.

Regards,

Fred

--

"Reality is just a point of view." (P. K. D.)

Ivajlo Yanakiev · ‎12-02-2004

you need loop:

for i in `ls`

do
for n in `ls`
do
diff $i $n >> /tmp/whatever
done
done

Cem Tugrul · ‎12-02-2004

Hi forum,
Thank's all answers...
Yes,i need help setting up such a looping script urgently...
Please help...

and Fred i need the contents(records) of all
files to compare and try to find out Ohh
these are the same files...
But my files names are different so maybe
best approach is files size...

Our greatest duty in this life is to help others. And please, if you can't

Rodney Hills · ‎12-02-2004

If you are looking for files that are the same, what about my "cksum" solution?

It would be better then checking file size.

The "diff" solution others have given are to show how the files are different.

Maybe a little more explaination on what you have and why you are looking for "sameness".

-- Rod Hills

There be dragons...

H.Merijn Brand (procura · ‎12-02-2004

Would my answer in this thread be the start to your solution?

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=749983

You can extend it to report duplicates like:

use Digest::MD5 qw( md5_hex );
use Digest::SHA1 qw( sha1_hex );
use File::Find;
my %arr;
find (sub {
-f or return;
local $/;
open my $p, "< $_" or die "$_: $!\n";
my $f = <$p>;
my $sum = md5_hex ($f) . sha1_hex ($f);
if (exists $arr{$sum}) {
print "File $File::Find::name is the same as file $arr{$sum}\n";
# unlink $_;
return;
}
$arr{$sum} = $File::Find::name;
}, ".");

Enjoy, Have FUN! H.Merijn

Enjoy, Have FUN! H.Merijn

Fred Ruffet · ‎12-02-2004

I think Rodney's solution is very good. Using diff between every combination of two files will make you parse each file a huge number of time, whereas cksum once on each file and work on a ckecksum file would only arse once each file.
It should look like this :
cksum * > cksum.tmp
sort cksum.tmp > cksum.out
Then you can look at cksum.out. If two following lines have the same checksum it is the same file.

Regards,

Fred

--

"Reality is just a point of view." (P. K. D.)

Cem Tugrul · ‎12-02-2004

Hi,

Let's say i have a directory as below;

-rw------- 1 cemt bsp 6 Dec 3 08:17 a.txt
-rw------- 1 cemt bsp 6 Dec 3 08:17 b.txt
-rw------- 1 cemt bsp 6 Dec 3 08:18 c.txt
-rw------- 1 cemt bsp 9 Dec 3 08:22 d.txt
-rw------- 1 cemt bsp 6 Dec 3 08:22 e.txt

Now i try to find out which files are the same???if you are a magician you can easily
say the file "a.txt" and "c.txt" are the same file!!!
Why;
Before cat these 5 files we can easily ignore the file "d.txt" because it's size is different than the others so;
Let's cat each file;

$ cat a.txt
11111
$ cat b.txt
22222
$ cat c.txt
11111
$ cat e.txt
33333

And we decided "a.txt" and "c.txt" are the same(repeated file)....is it clear???

Now i have more than 2000 files and try to find out repeated files in a directory?

Our greatest duty in this life is to help others. And please, if you can't

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Script Help

Script Help

Re: Script Help

Re: Script Help

Re: Script Help

Re: Script Help

Re: Script Help

Re: Script Help

Re: Script Help

Re: Script Help

Re: Script Help