<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: awk parsing 2 files help in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180006#M162743</link>
    <description>Hein has a point, though it's not that difficult to implement the original specs in the AWK solution. Matter of keeping track if the last line that came from File2 has been used for output.</description>
    <pubDate>Thu, 05 Feb 2004 01:13:39 GMT</pubDate>
    <dc:creator>Elmar P. Kolkman</dc:creator>
    <dc:date>2004-02-05T01:13:39Z</dc:date>
    <item>
      <title>awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179992#M162729</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I have two big files, what I want is to get those fields that match from my 1st and 2nd files and those that did not match.&lt;BR /&gt;&lt;BR /&gt;File1:&lt;BR /&gt;&lt;BR /&gt;xxx 10 hello&lt;BR /&gt;yyy 20 hello&lt;BR /&gt;xxx 20 hello&lt;BR /&gt;&lt;BR /&gt;File2:&lt;BR /&gt;&lt;BR /&gt;xxx thanks 10&lt;BR /&gt;xxx please 20&lt;BR /&gt;zzz thanks 10&lt;BR /&gt;&lt;BR /&gt;OUTPUT:&lt;BR /&gt;&lt;BR /&gt;xxx 10 thanks hello&lt;BR /&gt;xxx 20 please hello&lt;BR /&gt;zzz 10 thanks hello&lt;BR /&gt;zzz 10 thanks&lt;BR /&gt;yyy 20        hello&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Fields 1 and 2 of file1 should match fields 2 and 4 of file2. &lt;BR /&gt;&lt;BR /&gt;thanks.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 02 Feb 2004 06:17:18 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179992#M162729</guid>
      <dc:creator>Florinda Adato</dc:creator>
      <dc:date>2004-02-02T06:17:18Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179993#M162730</link>
      <description>ooops, output should be:&lt;BR /&gt;&lt;BR /&gt;xxx 10 thanks hello&lt;BR /&gt;xxx 20 please hello&lt;BR /&gt;zzz 10 thanks _____&lt;BR /&gt;yyy 20 ______ hello</description>
      <pubDate>Mon, 02 Feb 2004 06:24:10 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179993#M162730</guid>
      <dc:creator>Florinda Adato</dc:creator>
      <dc:date>2004-02-02T06:24:10Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179994#M162731</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;can you clarify this a bit more? I still don't know, what you want.&lt;BR /&gt;&lt;BR /&gt;Michael&lt;BR /&gt;</description>
      <pubDate>Mon, 02 Feb 2004 06:39:01 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179994#M162731</guid>
      <dc:creator>Michael Schulte zur Sur</dc:creator>
      <dc:date>2004-02-02T06:39:01Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179995#M162732</link>
      <description>Hi,&lt;BR /&gt;Have a look at the "join" command instead, it matches fields from two files and print out selected fields from both of the files.&lt;BR /&gt;</description>
      <pubDate>Mon, 02 Feb 2004 07:08:39 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179995#M162732</guid>
      <dc:creator>Leif Halvarsson_2</dc:creator>
      <dc:date>2004-02-02T07:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179996#M162733</link>
      <description>Not sure what you're trying to do either, but I don't think awk is the tool to compare 2 large files.&lt;BR /&gt;If "join", as suggested, is no good, try the man pages for "comm", "uniq".&lt;BR /&gt;&lt;BR /&gt;-- Graham</description>
      <pubDate>Mon, 02 Feb 2004 08:08:06 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179996#M162733</guid>
      <dc:creator>Graham Cameron_1</dc:creator>
      <dc:date>2004-02-02T08:08:06Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179997#M162734</link>
      <description>The question looks almost the same as a previous one of you, modify the solution from that could do the trick, if I understand the question correct.&lt;BR /&gt;&lt;A href="http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=372886" target="_blank"&gt;http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=372886&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Regrards,&lt;BR /&gt;Peter</description>
      <pubDate>Mon, 02 Feb 2004 08:30:41 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179997#M162734</guid>
      <dc:creator>Hoefnix</dc:creator>
      <dc:date>2004-02-02T08:30:41Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179998#M162735</link>
      <description>&lt;BR /&gt;&amp;gt; I have two big files&lt;BR /&gt;&lt;BR /&gt;Define big! for less than 10MB or so I would definitely just write a PERL (not awk!) script that remembers all lines and columns to print them (optionall sorted) out after all is read. For an example see below.&lt;BR /&gt;&lt;BR /&gt;For file larger then 1000MB you would need to pre-sort and do a classic merge join.&lt;BR /&gt;(read one, read other untill larger than one, read one untill larger then other and so on.). That is readily done with awk (as long as the input is sorted, unlike your example!).&lt;BR /&gt;&lt;BR /&gt;While you sort, or in addition to sort, you could perhpas re-arrange the join fields such that the standard join tool can do the final work.&lt;BR /&gt;&lt;BR /&gt;hth,&lt;BR /&gt;Hein.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt; Fields 1 and 2 of file1 should match fields 2 and 4 of file2. &lt;BR /&gt;&lt;BR /&gt;You meant 1 and 2 matching 1 and 3 right?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;open (FILE, "&lt;FILE1&gt;&lt;/FILE1&gt;while (&lt;FILE&gt;) {&lt;BR /&gt; chop;&lt;BR /&gt; ($k1,$k2,$c) = split;&lt;BR /&gt; $x1{$k1." ".$k2} = "------";&lt;BR /&gt; $x2{$k1." ".$k2} = $c;&lt;BR /&gt; }&lt;BR /&gt;close (FILE);&lt;BR /&gt;&lt;BR /&gt;open (FILE, "&lt;FILE2&gt;&lt;/FILE2&gt;while (&lt;FILE&gt;) {&lt;BR /&gt; chop;&lt;BR /&gt; ($k1,$c,$k2) = split;&lt;BR /&gt; $x1{$k1." ".$k2} = $c;&lt;BR /&gt; $x2{$k1." ".$k2} = "------" unless ($x2{$k1." ".$k2});&lt;BR /&gt; }&lt;BR /&gt;foreach $k (sort keys %x1) {&lt;BR /&gt; print "$k $x1{$k} $x2{$k}\n";&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;xxx 10 thanks hello&lt;BR /&gt;xxx 20 please hello&lt;BR /&gt;yyy 20 ------ hello&lt;BR /&gt;zzz 10 thanks ------&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;without the sort in the foreach you'd get:&lt;BR /&gt;&lt;BR /&gt;xxx 10 thanks hello&lt;BR /&gt;zzz 10 thanks ------&lt;BR /&gt;yyy 20 ------ hello&lt;BR /&gt;xxx 20 please hello&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/FILE&gt;&lt;/FILE&gt;</description>
      <pubDate>Mon, 02 Feb 2004 11:55:54 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179998#M162735</guid>
      <dc:creator>Hein van den Heuvel</dc:creator>
      <dc:date>2004-02-02T11:55:54Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179999#M162736</link>
      <description>Ok, let's see if I understand. Both files are big, so reading into memory like we did with the AWK solution the previous time is not an option. So we need to find another solution.&lt;BR /&gt;&lt;BR /&gt;Next, the order of the output. Should it be aphabetically ordered or is the order unimportant? If so, I could think of a nice solution, so please give more info on this.&lt;BR /&gt;&lt;BR /&gt;And, to prevent procura to 'nag' about the solution I've in mind, empty lines can be ignored? Only lines with 3 fields are important/exist?</description>
      <pubDate>Tue, 03 Feb 2004 02:47:28 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3179999#M162736</guid>
      <dc:creator>Elmar P. Kolkman</dc:creator>
      <dc:date>2004-02-03T02:47:28Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180000#M162737</link>
      <description>One more thing: is it possible you have multiple combinations of field 1 and field 2 or field 1 and field 3 in the files, for instance in File1:&lt;BR /&gt;xxx 10 hello&lt;BR /&gt;xxx 20 hello&lt;BR /&gt;xxx 10 bybye&lt;BR /&gt;yyy 10 oopsy&lt;BR /&gt;&lt;BR /&gt;Or in File2:&lt;BR /&gt;xxx thanks 10&lt;BR /&gt;xxx please 10&lt;BR /&gt;xxx please 20&lt;BR /&gt;&lt;BR /&gt;That way all solutions with putting 1 of the files in memory will fail, and a new solution should be written.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Feb 2004 01:42:45 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180000#M162737</guid>
      <dc:creator>Elmar P. Kolkman</dc:creator>
      <dc:date>2004-02-04T01:42:45Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180001#M162738</link>
      <description>Hi Elmar,&lt;BR /&gt;&lt;BR /&gt;Sorry for the late response... Let me clarify my question.&lt;BR /&gt;&lt;BR /&gt;File1 is the main file meaning every rows from this file will be part of the output, for example:&lt;BR /&gt;&lt;BR /&gt;File1:&lt;BR /&gt;xxx thanks 10&lt;BR /&gt;xxx thanks 20&lt;BR /&gt;yyy please 10&lt;BR /&gt;zzz help 10&lt;BR /&gt;&lt;BR /&gt;File1, fields 1 and 3 have to be matched with File2 fields 1 and 2. Those that matched will have an another field which came from File2. So if File2 contents are:&lt;BR /&gt;&lt;BR /&gt;xxx 10 hello&lt;BR /&gt;xxx 20 hello&lt;BR /&gt;zzz 10 ok&lt;BR /&gt;zzz 20 ok&lt;BR /&gt;&lt;BR /&gt;Then, if the fields did not matched then I have to put a default field of "notmatched", the output will be:&lt;BR /&gt;&lt;BR /&gt;10 xxx thanks hello&lt;BR /&gt;20 xxx thanks hello&lt;BR /&gt;10 yyy please notmatched&lt;BR /&gt;10 zzz help   notmatched&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I hope this time, I'm clear enough. &lt;BR /&gt;&lt;BR /&gt;Thank you very much for the help. The first solution you gave me was really great and it made my script really fast. :-)&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Feb 2004 02:37:18 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180001#M162738</guid>
      <dc:creator>Florinda Adato</dc:creator>
      <dc:date>2004-02-04T02:37:18Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180002#M162739</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;try the attachment.&lt;BR /&gt;&lt;BR /&gt;Michael</description>
      <pubDate>Wed, 04 Feb 2004 04:07:26 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180002#M162739</guid>
      <dc:creator>Michael Schulte zur Sur</dc:creator>
      <dc:date>2004-02-04T04:07:26Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180003#M162740</link>
      <description>Well, let's see if I can come up with a solution. But first a notice: your output order has changed...&lt;BR /&gt;&lt;BR /&gt;Now for the solution. What I suggest is to combine the files again, but this time we do it a bit different. I have not tested this on large files, but the script would become:&lt;BR /&gt;&lt;BR /&gt;( awk '{printf "1 %s %s %s",$1,$3,$2}' &amp;lt; File2 ; awk '{printf "2 %s %s %s",$1,$2,$3}' &amp;lt; File1 ) | sort -k 2,3 -k 1 | awk '$1=="1" { last1=$2;last2=$3;last3=$4 }&lt;BR /&gt;$1=="2" { if ($2==last1 &amp;amp;&amp;amp; $3==last2)&lt;BR /&gt;{ printf "%s %s %s %s\n",$2,$3,$4,last3 }&lt;BR /&gt;else {printf "%s %s %s NOTMATCHED\n",$2,$3,$4}}'&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Feb 2004 05:18:08 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180003#M162740</guid>
      <dc:creator>Elmar P. Kolkman</dc:creator>
      <dc:date>2004-02-04T05:18:08Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180004#M162741</link>
      <description>Hi,&lt;BR /&gt;The problem, as it is described, can be much simplified with some pre-processing of the data. By reordering and merging the matching fields to one field in each file you can do a simple join and thed split the fields in the output. Try the following:&lt;BR /&gt;&lt;BR /&gt;awk '{ printf "%s#%s %s\n", $1, $3, $2 }' xxx | sort &amp;gt;xxx1&lt;BR /&gt;awk '{ printf "%s#%s %s\n", $1, $2, $3 }' yyy | sort &amp;gt;yyy1&lt;BR /&gt;join  -1 1 -2 1 -o 1.1,1.2,2.2 -a 1 xxx1 yyy1 | tr "#" " "&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;It is not a final solution but may give you some ideas.</description>
      <pubDate>Wed, 04 Feb 2004 05:42:33 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180004#M162741</guid>
      <dc:creator>Leif Halvarsson_2</dc:creator>
      <dc:date>2004-02-04T05:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180005#M162742</link>
      <description>&lt;BR /&gt;Bah humbug.&lt;BR /&gt;&lt;BR /&gt;This is a completely different requirement description from the initial:&lt;BR /&gt;&lt;BR /&gt;&amp;gt; ooops, output should be:&lt;BR /&gt;&amp;gt;&lt;BR /&gt;&amp;gt; xxx 10 thanks hello&lt;BR /&gt;&amp;gt; xxx 20 please hello&lt;BR /&gt;&amp;gt; zzz 10 thanks _____&lt;BR /&gt;&amp;gt; yyy 20 ______ hello &lt;BR /&gt;&lt;BR /&gt;That line 'zzz' could have only originated from file 2.&lt;BR /&gt;&lt;BR /&gt;Now you tell us that file 1 is a driver, and the 'unmatched' can only appear in the last output column.&lt;BR /&gt;&lt;BR /&gt;Much simpler! Boring even, and essentially answerred in all prior replies.&lt;BR /&gt;&lt;BR /&gt;Kindly ask the rigth question and study the replies!&lt;BR /&gt;&lt;BR /&gt;Cheers,&lt;BR /&gt;Hein.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 04 Feb 2004 10:07:40 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180005#M162742</guid>
      <dc:creator>Hein van den Heuvel</dc:creator>
      <dc:date>2004-02-04T10:07:40Z</dc:date>
    </item>
    <item>
      <title>Re: awk parsing 2 files help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180006#M162743</link>
      <description>Hein has a point, though it's not that difficult to implement the original specs in the AWK solution. Matter of keeping track if the last line that came from File2 has been used for output.</description>
      <pubDate>Thu, 05 Feb 2004 01:13:39 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/awk-parsing-2-files-help/m-p/3180006#M162743</guid>
      <dc:creator>Elmar P. Kolkman</dc:creator>
      <dc:date>2004-02-05T01:13:39Z</dc:date>
    </item>
  </channel>
</rss>

