- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- how can I compare lines by words. ...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2006 04:53 AM
02-10-2006 04:53 AM
Is there a way to compare two files line by line?
For example I have filea.
dpd0
dpd6 mdmdr tgtd
dpd6 sd0r dpd6a fld0r
and fileb:
dpd0
dpd6 tgtd mdmdr1
dpd6 sd0r fld0r dpd6a
I have tried diff sdiff but the problem that I have is that line 3 is the same.
They have the same words but just in different order. The diff commands see them as different. And I would like them to be considered as the same.
Line two is different and I would just like the lines and the line number to compare to a spreadsheet that I have.
Of course the files may be bigger and longer different number of words in the line. I have come to the conclusion that shell may not do this too well and I am having a hard time poking around with perl.
THanks
Richard
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2006 05:00 AM
02-10-2006 05:00 AM
Re: how can I compare lines by words. ...
This sounds much like the problem I helped with a few days ago:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=998340
At the very least it will give you a good starting point for a perl script.
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2006 05:11 AM
02-10-2006 05:11 AM
Re: how can I compare lines by words. ...
I know there are a couple of us here working on this. Because I asked around for help and no one has gotten back to me.
Let me take a look at that.
Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2006 05:19 AM
02-10-2006 05:19 AM
SolutionIs assumes the files are basically the same, just the line contents might vary
You may need to sort the files first, and/or build in logic to read individual file records until they line up again.
It helps to know more about the files to do so.
For example... is that first word sort of a key, indicative of a zone in the file?
$ type 1.tmp
dpd0
dpd6 mdmdr tgtd
dpd6 sd0r dpd6a fld0r
$ type 2.tmp
dpd0
dpd6 tgtd mdmdr1
dpd6 sd0r fld0r dpd6a
$ perl tmp.pl 1.tmp 2.tmp
word mdmdr1 not found on line 2 in 2.tmp
word mdmdr not found on line 2 in 1.tmp
Files are NOT the same
$
$ type tmp.pl
$f1 = shift @ARGV or die "Must provide first filename";
$f2 = shift @ARGV or die "Must provide second file name";
open F1, "<$f1" or die "Could not read file $f1";
open F2, "<$f2" or die "Could not read file $f2";
while (
$line++;
chomp;
undef %words;
foreach $word (split) {
$words{$word}++;
}
$_ =
foreach $word (split) {
if ($words{$word}) {
delete $words{$word};
} else {
print "word $word not found on line $line in $f2\n";
$not = " NOT";
}
}
foreach $word (keys %words) {
print "word $word not found on line $line in $f1\n";
#not = " NOT";
}
}
print "Files are${not} the same\n";
$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2006 03:29 AM
02-11-2006 03:29 AM
Re: how can I compare lines by words. ...
If performance is a concern, then the script above should probaly first see if there is a simple match between lines in the files before splitting it into words.
Depending on the exact needs and data attributes , I would probably (re)write it as a 'diff' post-filter:
# diff f1 f2 | perl diff-word-compare.pl
So diff would do the bulk of the work, and the perl filter could remove sections seen as different, but with the same words and thus considered the same for the intended usage.
fwiw,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2006 02:08 AM
02-13-2006 02:08 AM
Re: how can I compare lines by words. ...
would you mind taking this conversation offline?
Can you send me an email
richard@rleon.net
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2006 02:26 AM
02-13-2006 02:26 AM
Re: how can I compare lines by words. ...
Once upon a time i also needed such a kind of request as yours and i had approx 2500
files and tried to findout whether there was any duplicated files(same contents of files) in my dir
and found a script like;
#more find_dup.pl
#!/opt/perl64/bin/perl
use strict;
use warnings;
use Digest::MD5 qw( md5_hex );
use Digest::SHA1 qw( sha1_hex );
use Cwd qw( getcwd );
use File::Find;
my @dir = @ARGV && -d $ARGV[0] ? @ARGV : getcwd;
my %arr;
find (sub {
-f or return;
local $/;
open my $p, "< $_" or die "$_: $!\n";
my $f = <$p>;
my $sum = md5_hex ($f) . sha1_hex ($f);
if (exists $arr{$sum}) {
print "File $File::Find::name is the same as file $arr{$sum}\n";
# unlink $_;
return;
}
$arr{$sum} = $File::Find::name;
}, @dir);
Hope it helps!!!
Good Luck,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2006 08:00 AM
02-13-2006 08:00 AM
Re: how can I compare lines by words. ...
What you need is to sort each line, then compare.
I have made a Posix script that might solve it for you.
#!/bin/sh
#Input files
FILEA="./filea"
FILEB="./fileb"
# Simple check to see if identical
diff $FILEA $FILEB >/dev/null && {
print "Files are identical"
exit 0
}
# Check which files have most lines
LINES_A="`cat $FILEA |wc -l`"
LINES_B="`cat $FILEB |wc -l`"
typeset -i LINES_A LINES_B
[[ $LINES_B -gt $LINES_A ]] && let LINES_A=$LINES_B
# Open files for read
exec 3<$FILEA
exec 4<$FILEB
let LINE=0
until [[ "$LINE" = "$LINES_A" ]]
do
let LINE=${LINE}+1
read -u3 LINEA
read -u4 LINEB
SORTA="`echo $LINEA |tr ' ' '\012'| sort | tr '\012' ' '`"
SORTB="`echo $LINEB |tr ' ' '\012'| sort | tr '\012' ' '`"
print "Matching line : $LINE - \c"
[[ "$SORTA" = "$SORTB" ]] && print "EQUAL" || print "DIFFER"
done
/Tor-Arne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2006 08:52 AM
02-13-2006 08:52 AM
Re: how can I compare lines by words. ...
$line = join " " => sort split m/ / => $line, -1;
or in one blow
"@{[sort split/ /,$line1]}" eq "@{[sort split/ /,$lineb]}" and print "Lines are equal\n";
Enjoy, Have FUN! H.Merijn
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2006 12:20 PM
02-13-2006 12:20 PM
Re: how can I compare lines by words. ...
http://www.perl.com/doc/FMTEYEWTK/sort.html
F = Far
M = More
T = Than
E = Everything
Y = You
E = Ever
W = Want
T = To
K = Know
- about sort
/Tor-Arne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2006 08:06 AM
02-14-2006 08:06 AM
Re: how can I compare lines by words. ...
Thanks everyone