<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: selecting lines from huge files in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276019#M688463</link>
    <description>&lt;!--!*#--&gt;Hi (again) Henk:&lt;BR /&gt;&lt;BR /&gt;OK, here's another approach adopts to your use of a second file to define the patterns to match:&lt;BR /&gt;&lt;BR /&gt;# cat ./match.pl&lt;BR /&gt;#!/usr/bin/perl&lt;BR /&gt;use strict;&lt;BR /&gt;use warnings;&lt;BR /&gt;my @tokens;&lt;BR /&gt;my @strings;&lt;BR /&gt;die "Usage: $0 tokenfile file ...\n" unless @ARGV &amp;gt; 0;&lt;BR /&gt;my $tokenf = shift;&lt;BR /&gt;open( FH, "&amp;lt;", $tokenf ) or die "Can't open '$tokenf': $!\n";&lt;BR /&gt;chomp( @tokens = &lt;FH&gt; );&lt;BR /&gt;close FH;&lt;BR /&gt;push @strings, $_ for @tokens;&lt;BR /&gt;while (&amp;lt;&amp;gt;) {&lt;BR /&gt;    for my $match (@strings) {&lt;BR /&gt;        if (m/.{2}$match/) { #...adjust as needed&lt;BR /&gt;            print "$_";&lt;BR /&gt;            last;&lt;BR /&gt;        }&lt;BR /&gt;    }&lt;BR /&gt;}&lt;BR /&gt;1;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;...run as:&lt;BR /&gt;&lt;BR /&gt;# ./match.pl file_of_tokens file&lt;BR /&gt;&lt;BR /&gt;That is, the "file_of_tokens" is your attachement of strings to be matched in "file".  &lt;BR /&gt;&lt;BR /&gt;Once again, you say position-3 and I counted that as postition-2 (zero relative) so you may need to adjust the code above as annotated.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt;&lt;BR /&gt;..JRF...&lt;/FH&gt;</description>
    <pubDate>Thu, 25 Sep 2008 14:24:59 GMT</pubDate>
    <dc:creator>James R. Ferguson</dc:creator>
    <dc:date>2008-09-25T14:24:59Z</dc:date>
    <item>
      <title>selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276015#M688459</link>
      <description>Hi all &lt;BR /&gt;&lt;BR /&gt;I have got files (&amp;gt;1.000.00 lines) with lines like :&lt;BR /&gt;10000000000000666447024  1887282889               2000828080826 W+000000000,00UR&lt;BR /&gt;&lt;BR /&gt;now I have got to select all lines containing &lt;BR /&gt;certain numbers in caracters 3 to 17.. &lt;BR /&gt;&lt;BR /&gt;My file containing these numbers is 1.400.000 lines..  &lt;BR /&gt;looking like &lt;BR /&gt;..&lt;BR /&gt;000000001853208&lt;BR /&gt;000000001853210&lt;BR /&gt;000000001853211&lt;BR /&gt;000000001853214&lt;BR /&gt;..&lt;BR /&gt;&lt;BR /&gt;I am looking for an efficient and quick way..&lt;BR /&gt;(tried using for-loops/while loops , but was not effective.) &lt;BR /&gt;&lt;BR /&gt;Any solutions.. ? perl ? awk ?</description>
      <pubDate>Thu, 25 Sep 2008 11:45:25 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276015#M688459</guid>
      <dc:creator>Henk Geurts</dc:creator>
      <dc:date>2008-09-25T11:45:25Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276016#M688460</link>
      <description>Hi&lt;BR /&gt;&lt;BR /&gt;where sould this number be?&lt;BR /&gt;the last to character of each line sould be between 3 and 17 inclusive?&lt;BR /&gt;&lt;BR /&gt;Regards</description>
      <pubDate>Thu, 25 Sep 2008 11:59:05 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276016#M688460</guid>
      <dc:creator>Oviwan</dc:creator>
      <dc:date>2008-09-25T11:59:05Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276017#M688461</link>
      <description>Hi Henk:&lt;BR /&gt;&lt;BR /&gt;This is similar to your previous query:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1270250" target="_blank"&gt;http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1270250&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;That said, one way (using Perl) would be (by example:&lt;BR /&gt;&lt;BR /&gt;# perl -ne '$region=substr($_,2,7);print if ($region==1853208 or $region==1853210)' file&lt;BR /&gt;&lt;BR /&gt;When using Perl (in lieu of 'awk') things are zero-relative.  Hence, character #2 would be character-3 in'awk'.&lt;BR /&gt;&lt;BR /&gt;If you post more specific match requirements we might compose a better approach.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt;&lt;BR /&gt;...JRF...</description>
      <pubDate>Thu, 25 Sep 2008 12:02:25 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276017#M688461</guid>
      <dc:creator>James R. Ferguson</dc:creator>
      <dc:date>2008-09-25T12:02:25Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276018#M688462</link>
      <description>Thanks .. &lt;BR /&gt;Once again I should make myself more clear..&lt;BR /&gt;&lt;BR /&gt;I attached a short version of the "number" file (K_NO) &lt;BR /&gt;I would like each line of this file to be checked to each line of the other file. When matched to caracter 3-17 of this other file.. -&amp;gt; print the complete line of this other file. &lt;BR /&gt;</description>
      <pubDate>Thu, 25 Sep 2008 12:33:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276018#M688462</guid>
      <dc:creator>Henk Geurts</dc:creator>
      <dc:date>2008-09-25T12:33:36Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276019#M688463</link>
      <description>&lt;!--!*#--&gt;Hi (again) Henk:&lt;BR /&gt;&lt;BR /&gt;OK, here's another approach adopts to your use of a second file to define the patterns to match:&lt;BR /&gt;&lt;BR /&gt;# cat ./match.pl&lt;BR /&gt;#!/usr/bin/perl&lt;BR /&gt;use strict;&lt;BR /&gt;use warnings;&lt;BR /&gt;my @tokens;&lt;BR /&gt;my @strings;&lt;BR /&gt;die "Usage: $0 tokenfile file ...\n" unless @ARGV &amp;gt; 0;&lt;BR /&gt;my $tokenf = shift;&lt;BR /&gt;open( FH, "&amp;lt;", $tokenf ) or die "Can't open '$tokenf': $!\n";&lt;BR /&gt;chomp( @tokens = &lt;FH&gt; );&lt;BR /&gt;close FH;&lt;BR /&gt;push @strings, $_ for @tokens;&lt;BR /&gt;while (&amp;lt;&amp;gt;) {&lt;BR /&gt;    for my $match (@strings) {&lt;BR /&gt;        if (m/.{2}$match/) { #...adjust as needed&lt;BR /&gt;            print "$_";&lt;BR /&gt;            last;&lt;BR /&gt;        }&lt;BR /&gt;    }&lt;BR /&gt;}&lt;BR /&gt;1;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;...run as:&lt;BR /&gt;&lt;BR /&gt;# ./match.pl file_of_tokens file&lt;BR /&gt;&lt;BR /&gt;That is, the "file_of_tokens" is your attachement of strings to be matched in "file".  &lt;BR /&gt;&lt;BR /&gt;Once again, you say position-3 and I counted that as postition-2 (zero relative) so you may need to adjust the code above as annotated.&lt;BR /&gt;&lt;BR /&gt;Regards!&lt;BR /&gt;&lt;BR /&gt;..JRF...&lt;/FH&gt;</description>
      <pubDate>Thu, 25 Sep 2008 14:24:59 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276019#M688463</guid>
      <dc:creator>James R. Ferguson</dc:creator>
      <dc:date>2008-09-25T14:24:59Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276020#M688464</link>
      <description>&lt;P&gt;With such large files, you don't want to use "grep -f" nor for/while.&lt;BR /&gt;&lt;BR /&gt;With such large files, you could consider sorting both files then doing a "merge" to do the selection. This would mean you would have to change your selection file to get the keys in the same columns.&lt;BR /&gt;&lt;BR /&gt;Some other threads about large number of records:&lt;BR /&gt;&lt;A target="_blank" href="http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1110743"&gt;http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1110743&lt;/A&gt;&lt;BR /&gt;&lt;A target="_blank" href="http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1136435"&gt;http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1136435&lt;/A&gt;&lt;BR /&gt;&lt;A target="_blank" href="http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1165850"&gt;http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1165850&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Or write a customize program to do what JRF's perl script does.&lt;/P&gt;</description>
      <pubDate>Sat, 10 Sep 2011 19:03:58 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276020#M688464</guid>
      <dc:creator>Dennis Handly</dc:creator>
      <dc:date>2011-09-10T19:03:58Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276021#M688465</link>
      <description>&lt;BR /&gt;Give grep a try&lt;BR /&gt;&lt;BR /&gt;# skey = desired match columns 3 to 17&lt;BR /&gt;&lt;BR /&gt;skey="000000001853208"&lt;BR /&gt;&lt;BR /&gt;# cat inputfile to grep&lt;BR /&gt;# "/^ beginning of line.&lt;BR /&gt;# .. any first two characters&lt;BR /&gt;# ${skey} what we are really looking for /"&lt;BR /&gt;&lt;BR /&gt;cat inputfile ^&lt;BR /&gt;  grep "/^..${skey}/" &amp;gt;outputfile&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 26 Sep 2008 18:25:39 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276021#M688465</guid>
      <dc:creator>Ken Martin_3</dc:creator>
      <dc:date>2008-09-26T18:25:39Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276022#M688466</link>
      <description>&amp;gt;Ken: Give grep a try&lt;BR /&gt;&lt;BR /&gt;If you read Henk's comments about 1 million lines and 1.4 million selections and my reply and the URLs I provided, you don't dare want to use grep -f.  That's on the order of 1E12 compares.</description>
      <pubDate>Fri, 26 Sep 2008 21:50:21 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276022#M688466</guid>
      <dc:creator>Dennis Handly</dc:creator>
      <dc:date>2008-09-26T21:50:21Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276023#M688467</link>
      <description>Dennis,&lt;BR /&gt;&lt;BR /&gt;Yes, I see your point.&lt;BR /&gt;&lt;BR /&gt;Now thinking back I too had problems reading very large files but can't remember how I did it.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;</description>
      <pubDate>Sun, 28 Sep 2008 12:48:40 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276023#M688467</guid>
      <dc:creator>Ken Martin_3</dc:creator>
      <dc:date>2008-09-28T12:48:40Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276024#M688468</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Maybe "comm" could do this for you. But I'm guessing since you want to search and match specific types of lines you'll probably need to do some sort of regular expression.&lt;BR /&gt;&lt;BR /&gt;sed and awk can do this as well as grep/egrep but they're all quite "slow" in doing it when the files are so large.&lt;BR /&gt;&lt;BR /&gt;If the differances between the files will minimize the output given I would do something like this:&lt;BR /&gt;# comm -2 File1 File2 | egrep "[0]+[0-9]+[[3-9]|1[0-7]]$"&lt;BR /&gt;&lt;BR /&gt;The regexp is searching for anything that starts with 1 or more zero's, then 1 or more numeric value between 0-9. The last part is the magic where it searches for the value between 3-17 (by saying that either 3-9 or 10-17 is okey). I haven't tested this so I'm not sure it works :P please correct me if I missed something.&lt;BR /&gt;&lt;BR /&gt;Best regards&lt;BR /&gt;Fredrik Eriksson</description>
      <pubDate>Mon, 29 Sep 2008 08:05:33 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276024#M688468</guid>
      <dc:creator>Fredrik.eriksson</dc:creator>
      <dc:date>2008-09-29T08:05:33Z</dc:date>
    </item>
    <item>
      <title>Re: selecting lines from huge files</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276025#M688469</link>
      <description>&amp;gt;Fredrik: Maybe "comm" could do this for you.&lt;BR /&gt;&lt;BR /&gt;Yes, if the files are sorted and have the same contents, neither is the case here.&lt;BR /&gt;&lt;BR /&gt;&amp;gt;match specific types of lines you'll probably need to do some sort of regular expression.&lt;BR /&gt;&lt;BR /&gt;These are unique keys.  Unless you mean to use the RE to just shift the key position.&lt;BR /&gt;&lt;BR /&gt;&amp;gt;awk can do this but ... quite "slow" in doing it when the files are so large.&lt;BR /&gt;&lt;BR /&gt;You are confused.  If you sort the two input files, and reformat the records, it would be a simple linear pass.&lt;BR /&gt;I'm not sure how good awk's associative arrays are but that may also work.&lt;BR /&gt;&lt;BR /&gt;&amp;gt;value between 3-17&lt;BR /&gt;&lt;BR /&gt;That was columns 3 through 17.</description>
      <pubDate>Mon, 29 Sep 2008 09:16:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/selecting-lines-from-huge-files/m-p/4276025#M688469</guid>
      <dc:creator>Dennis Handly</dc:creator>
      <dc:date>2008-09-29T09:16:36Z</dc:date>
    </item>
  </channel>
</rss>

