Operating System - HP-UX
1833875 Members
1970 Online
110063 Solutions
New Discussion

Re: Verify two files are exactly equal eachother.

 
Edgar Brito
Advisor

Verify two files are exactly equal eachother.

I made a copy of a file. How can I compare them and asure that they both are equal one to the other?
11 REPLIES 11
Pete Randall
Outstanding Contributor

Re: Verify two files are exactly equal eachother.

Run diff:

diff file1 file2

man diff for details.


Pete

Pete
Ivan Krastev
Honored Contributor

Re: Verify two files are exactly equal eachother.

Dave Hutton
Honored Contributor

Re: Verify two files are exactly equal eachother.

sum or cksum (sum I think is being obsoleted)

sum filea fileb
cksum filea fileb
Robert-Jan Goossens
Honored Contributor

Re: Verify two files are exactly equal eachother.

Hi,\

cksum -- cksum is typically used to verify data integrity when copying files between systems.

Regards,
Robert-Jan
James R. Ferguson
Acclaimed Contributor

Re: Verify two files are exactly equal eachother.

Hi:

As noted, 'diff' for ASCII (text) files. For binary files you can use 'cksum' or for a bigger, better hammer, use 'md5'.

Regards!

...JRF...
Hein van den Heuvel
Honored Contributor

Re: Verify two files are exactly equal eachother.

I the files are available from a single system, then 'diff' is the only and right way to go. Binary files as well as text files.
Why 'risk' a false positive from a checksum?

For files on different systems, perhaps even with a different platform, a checksum tool like cksum or md5 is appropriate.

Food for thought: When is the last time you, or anyone out here ever saw a file being different after a succesful copy?
I appreciate the 'check and double-check' attitude, but in reality a simple byte-count (ls -l) will probably catch 99.9999% of the cases. My advice? Don't bother!

And if you wanted to check seriously then you must make sure that the (OS) and controller caches are flushed because a compare against recently copied files might not read anything from the disk itself, relying on cached data. So if it misteriously was read wrong the first time, then the second time the same wrong data could be used.

Enjoy!
Hein.
TwoProc
Honored Contributor

Re: Verify two files are exactly equal eachother.

Hein,

Re: last time I saw...

On an Intel box? Often enough to worry about it.

On an HP box? Can't even remember a case...
We are the people our parents warned us about --Jimmy Buffett
James R. Ferguson
Acclaimed Contributor

Re: Verify two files are exactly equal eachother.

Hi (again):

> Hein: If the files are available from a single system, then 'diff'...Binary files as well as text files.

Ah, thanks, I never remember that 'diff' doesn't care.

> Hein: Why 'risk' a false positive from a checksum?

While I would consider that very unlikely, that's why I suggested 'md5sum'.

> Hein: For files on different systems, perhaps even with a different platform, a checksum tool like cksum or md5 is appropriate.

This assumes that the systems share a common newline indication for text files. If the end-of-line characters are different (e.g. UNIX vs. Windows or MAC: \012 versus \015\012 versus \015, respectively) then this is going to alter the results.

> Hein: And if you wanted to check seriously then you must make sure that the (OS) and controller caches are flushed...

Hmmm. that is interesting.

Regards!

...JRF...
Steven Schweda
Honored Contributor

Re: Verify two files are exactly equal eachother.

> diff file1 file2

> use cksum

If you care only _whether_ they're different,
but not about what the differences are, then
"cmp" can be much nicer than plain "diff".

"cmp" appears to quit when it finds the first
difference, while "diff" can waste a lot of
time processing (large) different files.
Similarly, a checksum calculation always
requires reading all of each file.

Also, a lame "diff" program can require you
to use "> /dev/null" to avoid confusing your
terminal when dealing with non-text files.

> When is the last time you, or anyone out
> here ever saw a file being different after
> a succesful copy?

It's not very hard to get a partial FTP or
HTTP download which fails with no symptom
other than a corrupt (partial) file.
Sometimes it's the server's problem,
sometimes the client's.
Hein van den Heuvel
Honored Contributor

Re: Verify two files are exactly equal eachother.

[Dang, have to re-enter. Got the bogus "Due to the presence of characters known to be used in Cross Site Scripting attacks, access is forbidden. This web site does not allow Urls which might include embedded HTML tags.". Balony! (sp?). Normally I have a copy in a paste buffer, but not this time.]

Steven wrote> "cmp" appears to quit when it finds the first difference, while "diff" can waste

Ah! Thanks for that.
This is why I keep reading even seemingly trivial questions.
Someone might point out a new solution, or at least remind me of long since forgotten alternatives.

Hein> ever saw a file being different after a succesful copy?
Steven> It's not very hard to get a partial FTP or

That's why I wrote copy, not transfer.
And I suppose copies with NFS involved are a little suspect also.
it's all too easy to get

Hein> (OS) and controller caches are flushed...
JRF> Hmmm. that is interesting.

Well, for a good sized file (1GB or more) the tails of the files with chase the heads out of the caches. But file of even a few 100 MBs might just fit in the cache, unless the cache is trained to recoginize long sequential IO patterns (iirc the Tru64 UBC special cased this).

So you really want to at least issue 'sync' before a compare. The sync might actually trigger a report for IO errors for a long since (30 seconds :-) completed copy.

Cheers,
Hein.



Dennis Handly
Acclaimed Contributor

Re: Verify two files are exactly equal eachother.

>JRF: For binary files you can use 'cksum'

As Hein says, you can also use diff(1) but you'll need to suppress the nasty warning when different.

For Steven's cmp(1), you'll want to use -s to suppress the output.

>Hein: [Dang, have to re-enter.

With firefox, just go back and resubmit.