<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: comparing using diff or something else in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807639#M83479</link>
    <description>Hi (again) John:&lt;BR /&gt;&lt;BR /&gt;In answer to your last question regarding the suitability of 'comm' and 'cmp' for million-record files, my advice is simply to try it.&lt;BR /&gt;&lt;BR /&gt;"Your-milage-may-vary" always applies.  I have no experience with these utilities and files this large.&lt;BR /&gt;&lt;BR /&gt;It is noteworthy, however, that 'bdiff', 'cmp' and 'comm' are described as being capable of handling largefiles.  See the section "Text Processing Commands" in the "Large Files White Paper":&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://docs.hp.com/hpux/onlinedocs/os/lgfiles4.pdf" target="_blank"&gt;http://docs.hp.com/hpux/onlinedocs/os/lgfiles4.pdf&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt;&lt;BR /&gt;...JRF...</description>
    <pubDate>Tue, 17 Sep 2002 12:45:20 GMT</pubDate>
    <dc:creator>James R. Ferguson</dc:creator>
    <dc:date>2002-09-17T12:45:20Z</dc:date>
    <item>
      <title>comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807635#M83475</link>
      <description>Chaps&lt;BR /&gt;&lt;BR /&gt;I am trying to decide what method to use to compare two rather large files.&lt;BR /&gt;&lt;BR /&gt;The two files have roughly 2.5 million records in each and each record consists of about 10 fields of approximately 30 characters each.&lt;BR /&gt;&lt;BR /&gt;I want to attempt to 'diff' these files (or compare them in another way), and produce a log of the dicrepancies etc.&lt;BR /&gt;&lt;BR /&gt;any ideas&lt;BR /&gt;&lt;BR /&gt;thanks a million&lt;BR /&gt;John</description>
      <pubDate>Tue, 17 Sep 2002 12:09:29 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807635#M83475</guid>
      <dc:creator>u856100</dc:creator>
      <dc:date>2002-09-17T12:09:29Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807636#M83476</link>
      <description>Hi&lt;BR /&gt;&lt;BR /&gt;Perhaps "comm" will do the job. The files need to be sorted. Then comm can report&lt;BR /&gt;- Lines common to both files&lt;BR /&gt;- Lines only in the firet file&lt;BR /&gt;- Lines only in the second file&lt;BR /&gt;in any combination. Have aa look at "man comm".</description>
      <pubDate>Tue, 17 Sep 2002 12:22:37 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807636#M83476</guid>
      <dc:creator>Leif Halvarsson_2</dc:creator>
      <dc:date>2002-09-17T12:22:37Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807637#M83477</link>
      <description>Hi John:&lt;BR /&gt;&lt;BR /&gt;Given that the files are very large, you probably will need 'bdiff' which is 'diff' for "b"ig files.  You might also look at 'cmp'.&lt;BR /&gt;See the man pages for more information on each of the above.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt;&lt;BR /&gt;...JRF...</description>
      <pubDate>Tue, 17 Sep 2002 12:24:05 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807637#M83477</guid>
      <dc:creator>James R. Ferguson</dc:creator>
      <dc:date>2002-09-17T12:24:05Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807638#M83478</link>
      <description>thanks for your answers chaps,&lt;BR /&gt;&lt;BR /&gt;for a file that is not going to grow more than 2.5 million, is cmp and comm suitable?&lt;BR /&gt;&lt;BR /&gt;it's just that I prefer these two methods over diff.&lt;BR /&gt;&lt;BR /&gt;thanks again&lt;BR /&gt;John</description>
      <pubDate>Tue, 17 Sep 2002 12:28:17 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807638#M83478</guid>
      <dc:creator>u856100</dc:creator>
      <dc:date>2002-09-17T12:28:17Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807639#M83479</link>
      <description>Hi (again) John:&lt;BR /&gt;&lt;BR /&gt;In answer to your last question regarding the suitability of 'comm' and 'cmp' for million-record files, my advice is simply to try it.&lt;BR /&gt;&lt;BR /&gt;"Your-milage-may-vary" always applies.  I have no experience with these utilities and files this large.&lt;BR /&gt;&lt;BR /&gt;It is noteworthy, however, that 'bdiff', 'cmp' and 'comm' are described as being capable of handling largefiles.  See the section "Text Processing Commands" in the "Large Files White Paper":&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://docs.hp.com/hpux/onlinedocs/os/lgfiles4.pdf" target="_blank"&gt;http://docs.hp.com/hpux/onlinedocs/os/lgfiles4.pdf&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt;&lt;BR /&gt;...JRF...</description>
      <pubDate>Tue, 17 Sep 2002 12:45:20 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807639#M83479</guid>
      <dc:creator>James R. Ferguson</dc:creator>
      <dc:date>2002-09-17T12:45:20Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807640#M83480</link>
      <description>You failed to mention one very important aspect of the problem. Are you comparing textual data and are the records linefeed separated? If those conditions are true then bdiff is probably the weapon of choice but if this is binary data then the task becomes more difficult and may actually require a custom &lt;BR /&gt;script (e.g. Perl) to analyze the deltas in some meaning ful way.</description>
      <pubDate>Tue, 17 Sep 2002 12:51:42 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807640#M83480</guid>
      <dc:creator>A. Clay Stephenson</dc:creator>
      <dc:date>2002-09-17T12:51:42Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807641#M83481</link>
      <description>Hi Clay,&lt;BR /&gt;&lt;BR /&gt;the characters are textual data that are pipe delimited. The data is address data, i.e. &lt;BR /&gt;&lt;BR /&gt;131|real street|Richmond|London|UK  ....etc&lt;BR /&gt;&lt;BR /&gt;so based onthis, you are suggesting bdiff is the man/woman for the job&lt;BR /&gt;&lt;BR /&gt;Bummer, I don't get on with diff&lt;BR /&gt;&lt;BR /&gt;thanks a bunch for your help guys!&lt;BR /&gt;&lt;BR /&gt;John</description>
      <pubDate>Tue, 17 Sep 2002 13:01:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807641#M83481</guid>
      <dc:creator>u856100</dc:creator>
      <dc:date>2002-09-17T13:01:41Z</dc:date>
    </item>
    <item>
      <title>Re: comparing using diff or something else</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807642#M83482</link>
      <description>One of the most painful parts about large diffs is that the files are unsorted.  I highly recommend that you first massage your data so any "keys" like account numbers are your first field (awk would be a good choice to manipulate this data around), then run a sort on both files.&lt;BR /&gt;&lt;BR /&gt;After the data is prepared, you may want to try writing a Perl script to get a more valuable answer from the results other than bdiff or comm can give.  If comm or bdiff is good enough, then ignore the rest of this message.&lt;BR /&gt;&lt;BR /&gt;It *appears* your data is from some flat file database or spreadsheet, and you are looking for what has changed within a listing OR what has been removed from either list.  Here's the PSEUDOCODE of what to accomplish.  I'll use the word "key" as an account number, something that is unique to all records.&lt;BR /&gt;&lt;BR /&gt;  read from a&lt;BR /&gt;  read from b&lt;BR /&gt;  repeat&lt;BR /&gt;    if a == b&lt;BR /&gt;      print a to MATCHED file&lt;BR /&gt;      read from a&lt;BR /&gt;      read from b&lt;BR /&gt;    else&lt;BR /&gt;      # Is this a modified entry?  If so, print&lt;BR /&gt;      # out both a and b entries for my review.&lt;BR /&gt;      #&lt;BR /&gt;      # This is the only reason to write a&lt;BR /&gt;      # script instead of using bdiff or comm&lt;BR /&gt;      # if you don't need this specific data,&lt;BR /&gt;      # don't bother with the scripting.&lt;BR /&gt;      #&lt;BR /&gt;      if key[a] == key[b]&lt;BR /&gt;        print "WAS " a to MODIFIED file&lt;BR /&gt;        print "NOW " b to MODIFIED file&lt;BR /&gt;        read from a&lt;BR /&gt;        read from b&lt;BR /&gt;      else&lt;BR /&gt;        # non matching keys, so move on&lt;BR /&gt;        if key[a] &amp;lt; key[b]&lt;BR /&gt;          print a to NOT_IN_B file&lt;BR /&gt;          read from a&lt;BR /&gt;        else&lt;BR /&gt;          print b to NOT_IN_A file&lt;BR /&gt;          read from b&lt;BR /&gt;        endif&lt;BR /&gt;      endif&lt;BR /&gt;    endif&lt;BR /&gt;  until (end of a) or (end of b)&lt;BR /&gt;  while not (end of a)&lt;BR /&gt;    print a to NOT_IN_B file&lt;BR /&gt;    read from a&lt;BR /&gt;  endwhile&lt;BR /&gt;  while not (end of b)&lt;BR /&gt;    print b to NOT_IN_A file&lt;BR /&gt;    read from b&lt;BR /&gt;  endwhile&lt;BR /&gt;  &lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 19 Sep 2002 12:41:05 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/comparing-using-diff-or-something-else/m-p/2807642#M83482</guid>
      <dc:creator>Brian Kinney</dc:creator>
      <dc:date>2002-09-19T12:41:05Z</dc:date>
    </item>
  </channel>
</rss>

