Operating System - HP-UX
1822556 Members
3025 Online
109642 Solutions
New Discussion юеВ

check data integrity: checksum, md5sum, ...

 
Franz P
Advisor

check data integrity: checksum, md5sum, ...

We are moving our data from one hpux host to another. Our auditing department asks for some checks/documentation to proof data integrity. We just double checked a md5sum check of a 25 GByte Oracle Data File takes about 45 minutes. Since we have some of them (files) and a shortage of time we try to find a quick utility to check data integrity. "checksum" is even slower than "md5sum". Are there any fast utilities you would recommend?
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: check data integrity: checksum, md5sum, ...

Shalom,

You are already using the best.

I would not expect md5sum to be fast on a 25 GB datafile.

I personally would be satisfied if the number of bytes were the same after transfer.

Those tools are designed to check the integrity of binaries to prevent them from being altered in transit, not data.

You might find a sql script is better for confriming your data. A little report or something. It would certainly be faster.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Franz P
Advisor

Re: check data integrity: checksum, md5sum, ...

Shalom Steven,

thank you for your post.

>> I personally would be satisfied if the number of bytes were the same after transfer.

so do I, but my entrepreneur is an insurance company, which are not that keen... ;-)

We are experiencing with some thread options to md5sum, which might speed it up a little bit.
A. Clay Stephenson
Acclaimed Contributor

Re: check data integrity: checksum, md5sum, ...

One that springs to mind is "sum" without any arguments. The algorithm is much simpler that that of md5 or cksum but it will not find as many errors as will the utilities you mention but is computationally faster. It's probably a reasonable compromise and will be faster than any other.

The main weakness of sum is that it is unable to distinguish between transposed bytes but that should be a low-risk probability in your environment.

The only other thing that springs to mind would be a statistical approach that reads blocks of data chosen either in a
fixed interval fashion or in a pseudo-random manner from the source and destination files and does a cksum of the selected data and compares them.
If it ain't broke, I can fix that.
Fred Ruffet
Honored Contributor

Re: check data integrity: checksum, md5sum, ...

I don't really know if this is really a good idea, but it comes to mind : What about doing a diff through an NFS mount ? You mount the old FS on a temporary dir with NFS and diff the files. It depends on your network capabilities, but you could give it a try.
--

"Reality is just a point of view." (P. K. D.)
Franz P
Advisor

Re: check data integrity: checksum, md5sum, ...

Thank you guys for your respond.

Since we have three environments test, integration and production we will use the strong check md5sum only for our production files (this is, where we make money with ;-) ) and use "weak" checks such as sum for test and integration envs.

Thank you for your attention. Your help was very appreciated!
Franz P
Advisor

Re: check data integrity: checksum, md5sum, ...

n/a