1834935 Members
2437 Online
110071 Solutions
New Discussion

Script Help

 
SOLVED
Go to solution
Cem Tugrul
Esteemed Contributor

Script Help

Hi forum,

Let's say i have a directory which includes thousand of files and i want to
compare each file's contents with the others bye one one and try to find out repeated
files(contents-records)

Help....
Our greatest duty in this life is to help others. And please, if you can't
26 REPLIES 26
Steven E. Protter
Exalted Contributor

Re: Script Help

The command is probably diff

diff file1 file2

You can build a script to read file lists and create diff output.

Do you need help setting up such a looping script?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rodney Hills
Honored Contributor

Re: Script Help

I might do the following-

cksum * | sort

This would run checksum on all the files then sort by checksum value. Those files that were the same would sort together with the same checksum value.

HTH

-- Rod Hills
There be dragons...
Fred Ruffet
Honored Contributor

Re: Script Help

Do you mean you want to suppress duplicate files or duplicate lines across different files.

Case 1 correponds to what SEP says (diff solution).

In case 2, you could cat all files through sort and uniq commands and get one file with unrepeated records.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Ivajlo Yanakiev
Respected Contributor

Re: Script Help

you need loop:

for i in `ls`

do
for n in `ls`
do
diff $i $n >> /tmp/whatever
done
done



Cem Tugrul
Esteemed Contributor

Re: Script Help

Hi forum,
Thank's all answers...
Yes,i need help setting up such a looping script urgently...
Please help...

and Fred i need the contents(records) of all
files to compare and try to find out Ohh
these are the same files...
But my files names are different so maybe
best approach is files size...
Our greatest duty in this life is to help others. And please, if you can't
Rodney Hills
Honored Contributor

Re: Script Help

If you are looking for files that are the same, what about my "cksum" solution?

It would be better then checking file size.

The "diff" solution others have given are to show how the files are different.

Maybe a little more explaination on what you have and why you are looking for "sameness".

-- Rod Hills
There be dragons...
H.Merijn Brand (procura
Honored Contributor
Solution

Re: Script Help

Would my answer in this thread be the start to your solution?

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=749983

You can extend it to report duplicates like:

use Digest::MD5 qw( md5_hex );
use Digest::SHA1 qw( sha1_hex );
use File::Find;
my %arr;
find (sub {
-f or return;
local $/;
open my $p, "< $_" or die "$_: $!\n";
my $f = <$p>;
my $sum = md5_hex ($f) . sha1_hex ($f);
if (exists $arr{$sum}) {
print "File $File::Find::name is the same as file $arr{$sum}\n";
# unlink $_;
return;
}
$arr{$sum} = $File::Find::name;
}, ".");

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Fred Ruffet
Honored Contributor

Re: Script Help

I think Rodney's solution is very good. Using diff between every combination of two files will make you parse each file a huge number of time, whereas cksum once on each file and work on a ckecksum file would only arse once each file.
It should look like this :
cksum * > cksum.tmp
sort cksum.tmp > cksum.out
Then you can look at cksum.out. If two following lines have the same checksum it is the same file.

Regards,

Fred
--

"Reality is just a point of view." (P. K. D.)
Cem Tugrul
Esteemed Contributor

Re: Script Help

Hi,

Let's say i have a directory as below;

-rw------- 1 cemt bsp 6 Dec 3 08:17 a.txt
-rw------- 1 cemt bsp 6 Dec 3 08:17 b.txt
-rw------- 1 cemt bsp 6 Dec 3 08:18 c.txt
-rw------- 1 cemt bsp 9 Dec 3 08:22 d.txt
-rw------- 1 cemt bsp 6 Dec 3 08:22 e.txt

Now i try to find out which files are the same???if you are a magician you can easily
say the file "a.txt" and "c.txt" are the same file!!!
Why;
Before cat these 5 files we can easily ignore the file "d.txt" because it's size is different than the others so;
Let's cat each file;

$ cat a.txt
11111
$ cat b.txt
22222
$ cat c.txt
11111
$ cat e.txt
33333

And we decided "a.txt" and "c.txt" are the same(repeated file)....is it clear???

Now i have more than 2000 files and try to find out repeated files in a directory?


Our greatest duty in this life is to help others. And please, if you can't
Cem Tugrul
Esteemed Contributor

Re: Script Help

Hi Procura,
also thank's for yourlink but i do not have any experience on perl.
How can i solve this problem with basic ux
scripting?(if possible)
Greetings,
Our greatest duty in this life is to help others. And please, if you can't
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

Did you try to put that snippet of perl in a file and execute it?

I've attached the (little modified) script, and show you an example:


lt09:/tmp 103 # find_dups.pl
File /tmp/mcop-merijn/Arts_PlayObjectFactory is the same as file /tmp/mcop-merijn/Arts_SoundServerV2
File /tmp/mcop-merijn/Arts_SoundServer is the same as file /tmp/mcop-merijn/Arts_SoundServerV2
File /tmp/mcop-merijn/Arts_SimpleSoundServer is the same as file /tmp/mcop-merijn/Arts_SoundServerV2
File /tmp/pics/zelfmoordactie.jpg is the same as file /tmp/blah/zelfmoordactie.jpg
File /tmp/pics/xmas.jpg is the same as file /tmp/blah/xmas.jpg


Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Cem Tugrul
Esteemed Contributor

Re: Script Help

Hi Procura,

i wanted to run it on my dir but faced
an error as ;
baan01:/users/rvs/usrdat/fave_org#perl find_dup.pl
Can't locate strict.pm in @INC (@INC contains: /opt/perl5/lib/5.00502/PA-RISC1.1 /opt/perl5/lib/5.00502 /opt/perl5/lib/site_perl/5.005/PA-RISC1.1 /opt/perl5/lib/site_perl/5.005 .) at find_dup.pl line 3.
BEGIN failed--compilation aborted at find_dup.pl line 3.
Our greatest duty in this life is to help others. And please, if you can't
Cem Tugrul
Esteemed Contributor

Re: Script Help

baan01:/#whereis perl
perl: /usr/contrib/bin/perl /opt/perl/bin/perl /opt/baf/bin/perl /opt/perl/man/man1/perl.1

do i have change perl path in your script?
Our greatest duty in this life is to help others. And please, if you can't
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

strict.pm SHOULD be there. There might be a small problem in the installation. Could you check

# which perl

and

# perl -v

and make sure /usr/contrib/bin is NOT somewhere in the front of your $PATH. /usr/contrib/bin usually contains old perls from HP.

but with the perl you have, it is very unlikely that you have the second Digest module installed.

Would you consider installing perl-5.8.5 from my site? It's more complete, includes a lot of modules (including the three I use in the example script) and is pretty recent.

I don't expect you to learn perl in a week or so, but once you uncover it's power, you will learn to love it fast!

Assuming you have a HP-UX 11.11 system, and no need for 64bit database connections just follow the instructions on http://mirrors.develooper.com/hpux/#Perl

My HP ITRC site pages can be found at (please use LA as primary choice):

USA Los Angeles http://mirrors.develooper.com/hpux/
SGP Singapore https://www.beepz.com/personal/merijn/
USA Chicago http://ww.hpux.ws/
NL Hoofddorp http://www.cmve.net/~merijn/

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

Please move /usr/contrib/bin to the END of your $PATH. It'll save you a lot of annoyancies in the future.

It's old cruft!

Check /etc/PATH for it too

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Cem Tugrul
Esteemed Contributor

Re: Script Help

$ which perl
/usr/contrib/bin/perl
$ perl -v

This is perl, version 5.005_02 built for PA-RISC1.1

Copyright 1987-1998, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
Our greatest duty in this life is to help others. And please, if you can't
Cem Tugrul
Esteemed Contributor

Re: Script Help

hi Procura,
i have downloaded perl
perl-5.8.5-gcc-3.4.1-11.11-elf64
and my srv specs from (print_manifest);
Model: 9000/800/L1000-36
Main Memory: 2048 MB
Processors: 2
OS mode: 64 bit
Your system was installed with HP-UX version B.11.11.
How do i install perl 5.8.5?
Greetings,
Our greatest duty in this life is to help others. And please, if you can't
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

Note that that is a 64bit perl

# cd /opt
# bzip2 -d < /tmp/perl-5.8.5-gcc-3.4.1-11.11-elf64.tbz | tar xf -
# export PATH=/opt/perl64/bin:$PATH
# perl -v

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Cem Tugrul
Esteemed Contributor

Re: Script Help

oppsss,
this time i do not have bzip2:-(
Our greatest duty in this life is to help others. And please, if you can't
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

http://mirrors.develooper.com/hpux/bzip2-1.0.2-pa2.0 (just fetch and move to 'bzip2' in your $PATH somewhere)

or http://hpux.connect.org.uk/hppd/hpux/Misc/bzip2-1.0.2/

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Cem Tugrul
Esteemed Contributor

Re: Script Help

hi Procura,
Finally;

baan03:/#perl -v

This is perl, v5.8.5 built for PA-RISC2.0-LP64
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2004, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

and then;
baan03:/#perl -V
Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=hpux, osvers=11.11, archname=PA-RISC2.0-LP64
uname='hp-ux r3 b.11.11 u 9000800 1909236376 unlimited-user license '
config_args='-Dusedevel -Dcc=gcc64 -Duse64bitall -des'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=define uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc64', ccflags ='-mpa-risc-2-0 -D_HPUX_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/pa20_64/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-g -O',
cppflags='-mpa-risc-2-0 -D_HPUX_SOURCE -mpa-risc-2-0 -D_HPUX_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/pa20_64/include'
ccversion='', gccversion='3.4.1', gccosandvers='hpux11.11'
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=87654321
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='/usr/bin/ld', ldflags =' -L/usr/local/pa20_64/lib -L/lib/pa20_64'
libpth=/usr/local/pa20_64/lib /lib/pa20_64 /lib /usr/lib /usr/ccs/lib /usr/local/lib
libs=-lcl -lpthread -lnsl -lnm -lgdbm -ldb -ldl -ldld -lm -lsec -lc
perllibs=-lcl -lpthread -lnsl -lnm -ldl -ldld -lm -lsec -lc
libc=/lib/pa20_64/libc.sl, so=sl, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_hpux.xs, dlext=sl, d_dlsymun=undef, ccdlflags='-Wl,-E '
cccdlflags='-fPIC', lddlflags='-b -L/usr/local/pa20_64/lib -L/lib/pa20_64'


Characteristics of this binary (from libperl):
Compile-time options: DEBUGGING USE_64_BIT_INT USE_64_BIT_ALL USE_LARGE_FILES
Locally applied patches:
Built under hpux
Compiled at Jul 24 2004 22:05:26
@INC:
/opt/perl64/lib/5.8.5/PA-RISC2.0-LP64
/opt/perl64/lib/5.8.5
/opt/perl64/lib/site_perl/5.8.5/PA-RISC2.0-LP64
/opt/perl64/lib/site_perl/5.8.5
/opt/perl64/lib/site_perl
.

Do you think any problem on installing?
means everything is OK or not?
Our greatest duty in this life is to help others. And please, if you can't
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

Brilliant!

Now make sure that /opt/perl64/bin is in the start of your $PATH and try that script again

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Cem Tugrul
Esteemed Contributor

Re: Script Help

baan03:/home/fave_test#env
_=/usr/bin/env
MANPATH=/usr/share/man/%L:/usr/share/man:/usr/contrib/man/%L:/usr/contrib/man:/usr/local/man/%L:/usr/local/man:/opt/mx/share/man:/opt/upgrade/share/man/%L:/opt/upgrade/share/man:/opt/resmon/share/man:/usr/dt/share/man:/opt/pd/share/man/%L:/opt/pd/share/man:/opt/pd/share/man/%L:/opt/pd/share/man:/opt/pd/share/man/%L:/opt/pd/share/man:/opt/perf/man/%L:/opt/perf/man:/opt/ignite/share/man/%L:/opt/ignite/share/man:/opt/hparray/share/man/%L:/opt/hparray/share/man:/opt/graphics/common/man://opt/perl/man:/opt/prm/man/%L:/opt/prm/man:/opt/scr/share/man:/opt/wlm/share/man/%L:/opt/wlm/share/man:/opt/omni/lib/man:/opt/hpsmc/shc/man
PATH=/usr/sbin:/oracle/9.2.0/bin:/baan/bse/bin:/usr/bin:/usr/ccs/bin:/usr/contrib/bin:/opt/mx/bin:/opt/hparray/bin:/opt/nettladm/bin:/opt/upgrade/bin:/opt/fcms/bin:/opt/resmon/bin:/opt/pd/bin:/opt/perf/bin:/opt/ignite/bin:/opt/netscape:/usr/bin/X11:/usr/contrib/bin/X11:/opt/graphics/common/bin://opt/perl/bin:/opt/prm/bin:/opt/scr/bin:/usr/sbin/diag/contrib:/opt/wlm/bin:/opt/omni/bin:/sbin:/home/root


how do i make /opt/perl64/bin is my start path?
and what do i change in your script?
baan03:/home/fave_test#more find_dup.pl
#!/opt/perl64/bin/perl
use strict;
use warnings;

use Digest::MD5 qw( md5_hex );
use Digest::SHA1 qw( sha1_hex );
use Cwd qw( getcwd );
use File::Find;

my @dir = @ARGV && -d $ARGV[0] ? @ARGV : getcwd;
my %arr;
find (sub {
-f or return;
local $/;
open my $p, "< $_" or die "$_: $!\n";
my $f = <$p>;
my $sum = md5_hex ($f) . sha1_hex ($f);
if (exists $arr{$sum}) {
print "File $File::Find::name is the same as file $arr{$sum}\n";
# unlink $_;
return;
}
$arr{$sum} = $File::Find::name;
}, @dir);
Our greatest duty in this life is to help others. And please, if you can't
H.Merijn Brand (procura
Honored Contributor

Re: Script Help

Your PATH is constructed on login
Look in your .profile (sh and ksh) or .cshrc (csh and tcsh). If you see a line like

export PATH=/usr/sbin:/oracle/9.2.0/bin:/baan/bse/bin:/usr/bin:/usr/.....

edit it to also contain /opt/perl64/bin: somewhere upfront, like

export PATH=/usr/sbin:/opt/perl64/bin:/oracle/9.2.0/bin:/baan/bse/bin:/usr/bin:/usr/.....

but it is likely that most of the PATH settings are derived from the content of file /etc/PATH, which should reflect the default $PATH for all users. If you can be root, whic you probably can because you were able to install /opt/perl64, add /opt/perl64/bin to /etc/PATH and login again

Than start the script from the folder you want to check

You can de a fast check by passing a folder name

# find_dups.pl /tmp

You have changed the script OK

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn