- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Check input file rows present or not present i...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 04:05 AM
07-27-2009 04:05 AM
Check input file rows present or not present in output file
I have an input file with 5 lines as below:
B 412648-B21 20090701 TRUDIN QBL1
B 412648-B21 20090701 WAECDF QBL1
B 412648-B21 20090701 ZARDDF QBL1
B 412648-B21 20090701 ZARDDP QBL1
B 412648-B21 20090701 ZAUDDF QBL1
I have an o/p file with 5 lines as below:
B 412648-B21 20090701 TRUDIN +0000000000000255.00 F 20090701 01 A QBL1
B 412648-B21 20090701 WAECDF +0000000000000195.00 F 20090701 01 A QBL1
B 412648-B21 20090701 ZARDDF +0000000000000000.00 N A QBL1
B 412648-B21 20090701 ZARDDP +0000000000001710.00 F 20090701 01 A QBL1
B 412648-B21 20090701 ZAUDDF +0000000000000245.00 F 20090701 01 A QBL1
I have a requirement to find if the lines(data) in input file present in output file or not. Could you please let me know how this can be done with out affecting the performance
(execution time should not take much to check) ?
Note: only first 1-32 characters of the line (input file) needs to be checked with output file
Regards,
Sathish
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 04:17 AM
07-27-2009 04:17 AM
Re: Check input file rows present or not present in output file
> only first 1-32 characters of the line (input file) needs to be checked with output file
And in your sample input that would span the beginning of the line through the "L" character in the last field. Is that correct?
What if your output file had a record like:
B 412648-B21 20090701 ZAUDDF +0000000000000245.00 F 20090701 01 A XXX2
Would that be considered a match or not? (I would think not).
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 04:22 AM
07-27-2009 04:22 AM
Re: Check input file rows present or not present in output file
By the way, Sathish:
You have unevaluated answers to many of your questions, including, but not limited to, your most recent two:
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1357438
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1304828
It would be appropriate to follow these guidelines:
http://forums11.itrc.hp.com/service/forums/helptips.do?#28
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 04:22 AM
07-27-2009 04:22 AM
Re: Check input file rows present or not present in output file
It should be checked till the end of the 4th field in each line. ie.
B 412648-B21 20090701 TRUDIN
B 412648-B21 20090701 WAECDF
B 412648-B21 20090701 ZARDDF
B 412648-B21 20090701 ZARDDP
B 412648-B21 20090701 ZAUDDF
-sathish
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 04:30 AM
07-27-2009 04:30 AM
Re: Check input file rows present or not present in output file
You could do something like this.
Snip the last field from your input file:
# awk '{$NF="";print}' inputfile > tokenfile
Then use the file of tokens to match your output:
# grep -Ff tokenfile outputfile
Regards!
...JRF...
- Tags:
- awk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 05:03 AM
07-27-2009 05:03 AM
Re: Check input file rows present or not present in output file
I need to print the missing lines that are available in input file but not in output file.
Could you please let me know how can this be done (with the faster execution time) ?
-sathish
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 05:38 AM
07-27-2009 05:38 AM
Re: Check input file rows present or not present in output file
> I need to print the missing lines that are available in input file but not in output file.
# grep -v -Ff tokenfile outputfile
> Could you please let me know how can this be done (with the faster execution time) ?
'grep' is going to be quite fast unless you have very large input files (with tokens to be matched).
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 06:01 AM
07-27-2009 06:01 AM
Re: Check input file rows present or not present in output file
I would need to use an input file with 50,000 no of lines. Just wanted to check if there are any other options (apart from grep) can be used to make it much faster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 06:13 AM
07-27-2009 06:13 AM
Re: Check input file rows present or not present in output file
> I would need to use an input file with 50,000 no of lines. Just wanted to check if there are any other options (apart from grep) can be used to make it much faster?
The question really becomes how often are you doing these comparisons? Are you really matching 50,000 tokens to N-many lines?
I might guess that given definitive information about the _scale_ of the input and output we might craft a faster solution than the 'grep' offering I have made.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2009 08:43 AM
07-27-2009 08:43 AM
Re: Check input file rows present or not present in output file
For large number of lines, sorting the two files may make it faster. But since your two files don't match exactly, you can't directly use comm(1). You would have to use awk, perl or a program.
If the input file fits in memory, you could read it into a map/hash, then compare the output file.
Note: With the grep -v or the map solution, you can only determine if lines in the output file aren't in the input file. But not easily if the lines in the input file are missing in output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2009 03:50 AM
07-29-2009 03:50 AM
Re: Check input file rows present or not present in output file
Please find below the exact requirement that we have:
My i/p file looks like:
B L1983A B1N 20090701 HUECDP QBLH
B L1983A B1N 20090701 HUHFDP QBL1
My o/p file looks like:
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBL1
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1
1) I need to compare the lines in i/p file (1-38 characters) with o/p file and if matches then for those output I need to replace the last field value in o/p file with the corresponding one in the i/p file.
ie. above output should change like:
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBLH
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1
You could observe the last field in the first line got changed from QBL1 to QBLH (as same as the one in i/p file)
2) If some lines present in the i/p file are missing in the o/p file then those lines
need to be captured in a new file
Note: We might need to do the testing with 5000,10000,20000 and even 50,000 of lines too. Hence need to check the performance of the script execution also.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2009 04:13 AM
07-29-2009 04:13 AM
Re: Check input file rows present or not present in output file
Are your files sorted? If not, do you care if the output is sorted?
A close upper bound on the time would be to sort both files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2009 04:13 AM
07-29-2009 04:13 AM
Re: Check input file rows present or not present in output file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2009 01:26 PM
07-29-2009 01:26 PM
Re: Check input file rows present or not present in output file
Then this is a simple no brainer and the performance is linear. Just do a "simple merge" and compare the records.
Probably easy to do in C or perl. Only a little harder in awk, since two input and two output files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2009 06:17 PM
07-29-2009 06:17 PM
Re: Check input file rows present or not present in output file
$ awk '{new = $NF; getline < "b.txt"; regexp = $NF "$"; sub(regexp,new); print}' a.txt
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBLH
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1
That remembers the last field a the line from the first file, reads the other file, replaces its last field with the one from the first and prints.
If the lines are sorted but potentially NOT equal then you will need to add some code to read along into whichever file that has fallen behind until caught up.
if file a is
10
12
and file b is
10
11
12
then the program has to skip that line 11 from b.
if file a is
10
11
13
and file b is
10
12
13
then the program needs to skip a line from file from each before processing.
Below an example of how to solve such program in awk.
Note, I used 28 instead of 38 in the example, because that's how the data showed up in the forum, and while you indicated 4 fields, you actually showed 5, so that's not to be trusted either.
Also please note how you wasted James's time by being imprecise initially.
You did NOT just need to find matching lines... for which GREP is perfect, but you also needed data from EACH provide file for which GREP is useless.
hope this helps,
Hein.
-------------- update.awk ----------------
BEGIN { a_skip = b_skip = c_lines = 0 }
{ a_match = substr($0,1,28)
a_last = $NF
while (a_match != b_match) {
if (a_match > b_match) {
b_skip++
if ((getline < "b.txt") != 1 ) { exit }
b_match = substr($0,1,28)
b_last = $NF
c = $0
}
if (a_match < b_match) {
a_skip++
if (getline != 1) { exit }
a_match = substr($0,1,28)
a_last = $NF
}
}
regexp = b_last "$"
sub (regexp, a_last, c)
print c
c_lines++
b_skip--
}
END { print c_lines " printed to C. " a_skip, " skipped from a, ", b_skip " from b." > "/dev/stderr"
}
-------------- sample execution ----------
/cygdrive/c/temp
$ awk -f update.awk < a.txt > c.txt
2 printed to C. 1 skipped from a, 0 from b.
/cygdrive/c/temp
$ cat c.txt
B L1983A B1N 20090701 HUECDP +0000000000000000.00 F 20090701 01 A QBLH
B L1983A B1N 20090701 HUHFDP +0000000000000000.00 F 20090701 01 A QBL1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-29-2009 10:02 PM
07-29-2009 10:02 PM
Re: Check input file rows present or not present in output file
Sathis said lines could be missing.
>for which GREP is perfect
grep might be terrible for 50 K records.
Here is my awk merge example with checking:
awk -v file=i_file -v err_file=err.out '
BEGIN { save = ""; EOF = 0 }
{
if (save == "") {
if (EOF || getline save < file <= 0) {
print "Missing in I file:", $0 > err_file
EOF = 1
save = ""
next
}
}
while (substr(save, 1, 28) < substr($0, 1, 28)) {
print "Missing in O file:", save > err_file
if (getline save < file <= 0) {
print "Missing in I file:", $0 > err_file
EOF = 1
save = ""
next
}
}
if (substr(save, 1, 28) == substr($0, 1, 28)) {
$NF = substr(save, 30)
print $0
save = ""
next
}
print "Missing in I file:", $0 > err_file
}
END {
if (save != "")
print "Missing in O file:", save > err_file
while (getline save < file > 0) {
print "Missing in O file:", save > err_file
}
} ' o_file