Operating System - HP-UX
1834339 Members
1985 Online
110066 Solutions
New Discussion

Re: extracting a portion from a file

 
SOLVED
Go to solution
Anand_30
Regular Advisor

Re: extracting a portion from a file

Graham,

I have one more question:

Now the scripts extracts the portion of the file which starts with the pattern 'abc' and ends in the pattern 'def'.

But in certain cases I have

abc
...
...
def
def

In this case the script first extracts 'abc' to the first 'def' and then again 'abc' to the second 'def'.
But my requirement is that when there are 2 consecutive lines having the pattern 'def' the script should extract from 'abc' till the second 'def' is reached.

Can the awk script be modified to fulfill this requiremnet.

Thanks,
Andy


Anand_30
Regular Advisor

Re: extracting a portion from a file

Rod,

How do I use your perl script from inside a SHELL script with the input filename as the parameter. Also, I want the output files created to have the same name as that of the input files with the counter as the extension.

Thanks,
Anand.
Rodney Hills
Honored Contributor

Re: extracting a portion from a file

Try-

argv
$ thefile="/whatever/myfile"
$ perl -ne 'if (/abc/) { @a=(); $p=1; } ; { push(@a,$_); }; if (/efg/) { if ($p) { $n++; open OUT,">${ARGV}.$n"; print OUT @a; close OUT; @a=(); $p=0 }}' $thefile

set "thefile" to the name of the file.
The perl variable "$ARGV" will have the name of the file from the command line and will use it to create the output file.

HTH

-- Rod Hills
There be dragons...
Graham Cameron_1
Honored Contributor

Re: extracting a portion from a file

>>But my requirement is that when there are 2 consecutive lines having the pattern 'def' the script should extract from 'abc' till the second 'def' is reached.

Sounds simple - at /def/ just keep reading until no more /def/s.
But what if the next record is an /abc/ ?
I know of no awk function to allow you to restart processing the current record, so I have had to rework the script to call a function, as below.

function do_abc () {
split("",list) #empty out the array.
el=0
list [++el] = $0
}
/^abc/ {
do_abc()
next
}
(el > 0) { list [++el] = $0 }
/^def/ {
for (;;) {
getline
if ($0 !~ /^def/) break
list [++el] = $0
}
nf=sprintf("%s.%d", fname, ++matchcount)
for (i=1;i<=el;i++)
print list[i] >> nf
if ($0 ~ /^abc/) do_abc()
}

Hope that helps, and hope you learned something about awk.

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi, here a adapted, streamlined version.

BEGIN{COUNT=0;DOPRINT=0;}
/abc/ {if (DOPRINT==0) {DOPRINT=1;COUNT++;OUTPUTFILE=FILENAME "." COUNT;}}
{if (DOPRINT==1) print $0 > OUTPUTFILE;}
/def/\
{DOPRINT=0;el=0;
while(el==0){
getline;if (($0 != "") && ($0 != "abc")) print $0 > OUTPUTFILE;
if ($0 != "def") el=1;
if ($0 == "abc"){
DOPRINT=1;COUNT++;
OUTPUTFILE=FILENAME "." COUNT;
print $0 > OUTPUTFILE;}}}
END{}

greetings,

Michael
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Well, one more change:

use: awk -f andy.awk file(s)

With this version you can use multiple files and each is split into files with its own name and counter.

I hope, this does everything you need

BEGIN{DOPRINT=0;FNAME=""}
{if (FILENAME != FNAME) {FNAME = FILENAME;COUNT=0;}}
/abc/ {if (DOPRINT==0) {DOPRINT=1;COUNT++;OUTPUTFILE=FILENAME "." COUNT;}}
{if (DOPRINT==1) print $0 > OUTPUTFILE;}
/def/\
{DOPRINT=0;el=0;
while(el==0){
getline;if (($0 != "") && ($0 == "def")) print $0 > OUTPUTFILE;
if ($0 != "def") el=1;
if ($0 == "abc"){
DOPRINT=1;
if (FILENAME == FNAME) COUNT++; else {FNAME = FILENAME; COUNT=1;}
OUTPUTFILE=FILENAME "." COUNT;
print $0 > OUTPUTFILE;}}}
END{}
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi,

make
if (($0 != "") && ($0 == "def")) print $0 > OUTPUTFILE;

if ($0 == "def") print $0 > OUTPUTFILE;

greetings,

Michael
Anand_30
Regular Advisor

Re: extracting a portion from a file

Thanks a lot Graham & Michael,

I tried Graham's new solution but it still gives the output as before. Still it extracts 'abc' to the first 'def' and then again 'abc' to the second 'def'.

-Andy

Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi,

have you tried mine? It should work nicely. I put a lot of sweat into it. ;-))

greetings,

Michael
Anand_30
Regular Advisor

Re: extracting a portion from a file

Thanks a lot Michael,

I tried your script also but it is not working fine. It is picking up all the other 'abc' also along with 'abc' 'def' portions. Like for example..

abc
...
...
abc
abc
..
..
abc
..
..
def

If the file is like this, your script extracts all the abc's along with the portion 'abc' to 'def'. But my requiremnet is that the script should extract only the portion that is from 'abc' to 'def'.

Thanks again for your help.

-Andy

Rodney Hills
Honored Contributor

Re: extracting a portion from a file

If any of our solutions are close, you might want to assign points as a way to let us know if we are on the right track...

It's the nice thing to do :-)

-- Rod Hills
There be dragons...
Graham Cameron_1
Honored Contributor

Re: extracting a portion from a file

Andy

This is my last shot.
Like Michael I have put a lot of sweat into it.
If it still isn't right then maybe we are misunderstanding your data and your requirement.

Here goes, andy2.awk
--
!/^def/ && (defgot>0) {
nf=sprintf("%s.%d", fname, ++matchcount)
for (i=1;i<=el;i++)
print list[i] >> nf
defgot=0
el = 0
}

/^abc/ {
split("",list) #empty out the array.
el=0
list [++el] = $0
next
}
(el > 0) { list [++el] = $0 }
/^def/ {defgot++}
--
-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Mark Grant
Honored Contributor

Re: extracting a portion from a file

Ok, I thought I'd have a go too :)

Seems to me that the problem is that we need search from the first "abc" to the last "def" which as far as I can tell means we have to read in the entire file first.

Try the following by calling it "test.pl" and running the followinga(assuming your data is called "testfile".

"cat testfile | ./test.pl abc def"

#!/usr/bin/perl

$START=$ARGV[0];
$END=$ARGV[1];
$PRINT="n";

@DATA=;

# Dump end of the up to the $END

while($i ne $END){
$i=pop(@DATA);
chomp($i);
}

# Ok, now we only need to print everything from $START

foreach $i (@DATA){
chomp($i);
if($PRINT eq "y"){
print "$i\n";
}
if($i eq $START){
$PRINT="y";
}
}
Never preceed any demonstration with anything more predictive than "watch this"
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi Andy,

another try. Well, since the beginning, the requirements have changed quite much. But I wont give up!!

greetings,

Michael

BEGIN{DOPRINT=0;FNAME="";COUNT=1;LCOUNT=0}
function doabc (){
if (FILENAME!=FNAME) {FNAME=FILENAME;COUNT=1;}
OUTPUTFILE=FILENAME "." COUNT
system("> " OUTPUTFILE);
DOPRINT=1;
LCOUNT=0;
}
function printline(){
print $0 > OUTPUTFILE;LCOUNT++;
}
{if (FILENAME!=FNAME) {FNAME=FILENAME;COUNT=1;}}
/abc/ {doabc()}
{if (DOPRINT==1) printline()}
/def/\
{if (DOPRINT==1){DOPRINT=0;el=0;
while(el==0){
getline;if ($0=="def") printline();
if ($0!="def") el=1;}
COUNT++;
if ($0=="abc"){
doabc();printline();}}}
END{}
Anand_30
Regular Advisor

Re: extracting a portion from a file

Hi All,

Thanks a lot for your help. Now everything assigned 10 points have worked. I have used Graham's solution since the execution time was marginally less than that of Michael's solution.

Rod's solution was also good and it worked pretty fast but I wanted to use shell script and awk rather than going to perl.

Thanks again to all of you for helping me in solving my problem.

-Andy.


Anand_30
Regular Advisor

Re: extracting a portion from a file

Hi,

I am back again with one more question on extracting portion of file based on pattern match.

Now my requirement is:

1. Extract portion of file say starting with "abc" to "def".
2. Then in each "abc.....def" extract I need to search for different patterns and output the lines which matches the pattern to an output file.

Can anyone please help me.

Thanks,
Andy