- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- comparing using diff or something else
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 05:09 AM
тАО09-17-2002 05:09 AM
I am trying to decide what method to use to compare two rather large files.
The two files have roughly 2.5 million records in each and each record consists of about 10 fields of approximately 30 characters each.
I want to attempt to 'diff' these files (or compare them in another way), and produce a log of the dicrepancies etc.
any ideas
thanks a million
John
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 05:22 AM
тАО09-17-2002 05:22 AM
Re: comparing using diff or something else
Perhaps "comm" will do the job. The files need to be sorted. Then comm can report
- Lines common to both files
- Lines only in the firet file
- Lines only in the second file
in any combination. Have aa look at "man comm".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 05:24 AM
тАО09-17-2002 05:24 AM
Re: comparing using diff or something else
Given that the files are very large, you probably will need 'bdiff' which is 'diff' for "b"ig files. You might also look at 'cmp'.
See the man pages for more information on each of the above.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 05:28 AM
тАО09-17-2002 05:28 AM
Re: comparing using diff or something else
for a file that is not going to grow more than 2.5 million, is cmp and comm suitable?
it's just that I prefer these two methods over diff.
thanks again
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 05:45 AM
тАО09-17-2002 05:45 AM
Re: comparing using diff or something else
In answer to your last question regarding the suitability of 'comm' and 'cmp' for million-record files, my advice is simply to try it.
"Your-milage-may-vary" always applies. I have no experience with these utilities and files this large.
It is noteworthy, however, that 'bdiff', 'cmp' and 'comm' are described as being capable of handling largefiles. See the section "Text Processing Commands" in the "Large Files White Paper":
http://docs.hp.com/hpux/onlinedocs/os/lgfiles4.pdf
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 05:51 AM
тАО09-17-2002 05:51 AM
Re: comparing using diff or something else
script (e.g. Perl) to analyze the deltas in some meaning ful way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-17-2002 06:01 AM
тАО09-17-2002 06:01 AM
Re: comparing using diff or something else
the characters are textual data that are pipe delimited. The data is address data, i.e.
131|real street|Richmond|London|UK ....etc
so based onthis, you are suggesting bdiff is the man/woman for the job
Bummer, I don't get on with diff
thanks a bunch for your help guys!
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-19-2002 05:41 AM
тАО09-19-2002 05:41 AM
SolutionAfter the data is prepared, you may want to try writing a Perl script to get a more valuable answer from the results other than bdiff or comm can give. If comm or bdiff is good enough, then ignore the rest of this message.
It *appears* your data is from some flat file database or spreadsheet, and you are looking for what has changed within a listing OR what has been removed from either list. Here's the PSEUDOCODE of what to accomplish. I'll use the word "key" as an account number, something that is unique to all records.
read from a
read from b
repeat
if a == b
print a to MATCHED file
read from a
read from b
else
# Is this a modified entry? If so, print
# out both a and b entries for my review.
#
# This is the only reason to write a
# script instead of using bdiff or comm
# if you don't need this specific data,
# don't bother with the scripting.
#
if key[a] == key[b]
print "WAS " a to MODIFIED file
print "NOW " b to MODIFIED file
read from a
read from b
else
# non matching keys, so move on
if key[a] < key[b]
print a to NOT_IN_B file
read from a
else
print b to NOT_IN_A file
read from b
endif
endif
endif
until (end of a) or (end of b)
while not (end of a)
print a to NOT_IN_B file
read from a
endwhile
while not (end of b)
print b to NOT_IN_A file
read from b
endwhile