- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: selecting lines from huge files
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2008 04:45 AM
09-25-2008 04:45 AM
selecting lines from huge files
I have got files (>1.000.00 lines) with lines like :
10000000000000666447024 1887282889 2000828080826 W+000000000,00UR
now I have got to select all lines containing
certain numbers in caracters 3 to 17..
My file containing these numbers is 1.400.000 lines..
looking like
..
000000001853208
000000001853210
000000001853211
000000001853214
..
I am looking for an efficient and quick way..
(tried using for-loops/while loops , but was not effective.)
Any solutions.. ? perl ? awk ?
- Tags:
- huge files
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2008 04:59 AM
09-25-2008 04:59 AM
Re: selecting lines from huge files
where sould this number be?
the last to character of each line sould be between 3 and 17 inclusive?
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2008 05:02 AM
09-25-2008 05:02 AM
Re: selecting lines from huge files
This is similar to your previous query:
http://forums12.itrc.hp.com/service/forums/questionanswer.do?threadId=1270250
That said, one way (using Perl) would be (by example:
# perl -ne '$region=substr($_,2,7);print if ($region==1853208 or $region==1853210)' file
When using Perl (in lieu of 'awk') things are zero-relative. Hence, character #2 would be character-3 in'awk'.
If you post more specific match requirements we might compose a better approach.
Regards!
...JRF...
- Tags:
- Perl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2008 05:33 AM
09-25-2008 05:33 AM
Re: selecting lines from huge files
Once again I should make myself more clear..
I attached a short version of the "number" file (K_NO)
I would like each line of this file to be checked to each line of the other file. When matched to caracter 3-17 of this other file.. -> print the complete line of this other file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2008 07:24 AM
09-25-2008 07:24 AM
Re: selecting lines from huge files
OK, here's another approach adopts to your use of a second file to define the patterns to match:
# cat ./match.pl
#!/usr/bin/perl
use strict;
use warnings;
my @tokens;
my @strings;
die "Usage: $0 tokenfile file ...\n" unless @ARGV > 0;
my $tokenf = shift;
open( FH, "<", $tokenf ) or die "Can't open '$tokenf': $!\n";
chomp( @tokens =
close FH;
push @strings, $_ for @tokens;
while (<>) {
for my $match (@strings) {
if (m/.{2}$match/) { #...adjust as needed
print "$_";
last;
}
}
}
1;
...run as:
# ./match.pl file_of_tokens file
That is, the "file_of_tokens" is your attachement of strings to be matched in "file".
Once again, you say position-3 and I counted that as postition-2 (zero relative) so you may need to adjust the code above as annotated.
Regards!
..JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2008 11:15 AM - edited 09-10-2011 12:03 PM
09-25-2008 11:15 AM - edited 09-10-2011 12:03 PM
Re: selecting lines from huge files
With such large files, you don't want to use "grep -f" nor for/while.
With such large files, you could consider sorting both files then doing a "merge" to do the selection. This would mean you would have to change your selection file to get the keys in the same columns.
Some other threads about large number of records:
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1110743
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1136435
http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1165850
Or write a customize program to do what JRF's perl script does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2008 11:25 AM
09-26-2008 11:25 AM
Re: selecting lines from huge files
Give grep a try
# skey = desired match columns 3 to 17
skey="000000001853208"
# cat inputfile to grep
# "/^ beginning of line.
# .. any first two characters
# ${skey} what we are really looking for /"
cat inputfile ^
grep "/^..${skey}/" >outputfile
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-26-2008 02:50 PM
09-26-2008 02:50 PM
Re: selecting lines from huge files
If you read Henk's comments about 1 million lines and 1.4 million selections and my reply and the URLs I provided, you don't dare want to use grep -f. That's on the order of 1E12 compares.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2008 05:48 AM
09-28-2008 05:48 AM
Re: selecting lines from huge files
Yes, I see your point.
Now thinking back I too had problems reading very large files but can't remember how I did it.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2008 01:05 AM
09-29-2008 01:05 AM
Re: selecting lines from huge files
Maybe "comm" could do this for you. But I'm guessing since you want to search and match specific types of lines you'll probably need to do some sort of regular expression.
sed and awk can do this as well as grep/egrep but they're all quite "slow" in doing it when the files are so large.
If the differances between the files will minimize the output given I would do something like this:
# comm -2 File1 File2 | egrep "[0]+[0-9]+[[3-9]|1[0-7]]$"
The regexp is searching for anything that starts with 1 or more zero's, then 1 or more numeric value between 0-9. The last part is the magic where it searches for the value between 3-17 (by saying that either 3-9 or 10-17 is okey). I haven't tested this so I'm not sure it works :P please correct me if I missed something.
Best regards
Fredrik Eriksson
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2008 02:16 AM
09-29-2008 02:16 AM
Re: selecting lines from huge files
Yes, if the files are sorted and have the same contents, neither is the case here.
>match specific types of lines you'll probably need to do some sort of regular expression.
These are unique keys. Unless you mean to use the RE to just shift the key position.
>awk can do this but ... quite "slow" in doing it when the files are so large.
You are confused. If you sort the two input files, and reformat the records, it would be a simple linear pass.
I'm not sure how good awk's associative arrays are but that may also work.
>value between 3-17
That was columns 3 through 17.
- Tags:
- awk