Operating System - HP-UX
1834434 Members
2824 Online
110067 Solutions
New Discussion

Re: extracting a portion from a file

 
SOLVED
Go to solution
Anand_30
Regular Advisor

extracting a portion from a file

Hi,

I have a file of around 500 lines. I want to extract a portion of the file say from the line with the pattern 'abc' to the line with pattern 'efg' which might span for around 100 lines and redirect the output to a file. In case the file has got more occurances of the combination, the script should be able to extract all of them and dump into separate files.

Can anyone please help me in this problem.

Thanks,
Andy
40 REPLIES 40
James A. Donovan
Honored Contributor

Re: extracting a portion from a file

I think something like this will do the job....

EXTRACT=N
NN=1
while read LINE
do

echo $LINE|grep abc

if [ $? -eq 0 ]; then
EXTRACT=Y
OUTFILE=file${NN}
fi

echo $LINE|grep efg
if [ $? -eq 0 ]; then
EXTRACT=N
NN=$(expr $NN + 1)
fi

if [ "X${EXTRACT}" = "XY" ]; then
echo $LINE > $OUTFILE
fi

done < mysourcefile.dat
Remember, wherever you go, there you are...
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi,

here an awk script:
BEGIN{COUNT=0;DOPRINT=0;}
/abc/ {DOPRINT=1;COUNT=COUNT+1;OUTPUTFILE="text." COUNT;}
{
if (DOPRINT==1)
{
print $0 > OUTPUTFILE;
}
}
/efg/ {DOPRINT=0;}
END{}

greetings,

Michael
Graham Cameron_1
Honored Contributor
Solution

Re: extracting a portion from a file

Michael's solution will work, but there is an easier way with awk.

awk '/abc/,/efg/' yourfile > newfile

will print to a single file, but if you want a new file each time you hit abc it's a bit more involved.

awk '
/abc/ {nf=sprintf("newfile.%d", ++matchcount)}
/abc/,/efg/ {print >>nf}
' yourfile

This will create files called newfile.1, newfile.2 etc.

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Kent Ostby
Honored Contributor

Re: extracting a portion from a file

create a file called extract.awk that includes the following:

BEGIN {myflag=0;}
/abc/ {myflag=1;}
myflag==1 {print $0}
/def/ {exit;}

Now run this as follows:

awk -f extract.awk < inputfile > outputfile

NOTE: The above will quit after the first occurrence of def following the first occurrence of abc.

If you want to find multiple sections in a file (i.e. the lines from "abc" to "def" and then skipped some lines until we found another "abc" / "def" section. Then just the change "exit" above to "myflag=0".

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Rory R Hammond
Trusted Contributor

Re: extracting a portion from a file

Somebody should give Graham C. 10 points.
Rory
There are a 100 ways to do things and 97 of them are right
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi Graham,

in case you havent noticed, mine is an awk script. ;-)
But yours is elegant and highly optimzed, very efficient a Borg would say. ;-)

greetings,

Michael
Anand_30
Regular Advisor

Re: extracting a portion from a file

Thanks all for your response. I faced a problem while using Graham's and Kent's solution. In my file there are multiple instances of the pattern 'abc' but few combinations of 'abc' and 'def'. I want the script only to extract the lines which start at 'abc' and end at 'def'. But currently the script extracts all the lines that start with 'abc' also.

Looking forward for some help.

Thanks,
Andy
R. Allan Hicks
Trusted Contributor

Re: extracting a portion from a file

Are you saying that your file has:

abc{other stuff}def

lines scattered in it and you only want the _lines_ that begin abc and end in edf?

grep -e "^abc.*def$" myfile

Will provide those lines.

However, if your file is like

abc (stuff)
more stuff
...
...
def (stuff)

Then the scripts that have already should work.

If your expect to have

abc stuff
more stuff
def
more stuff
abc stuff
more stuff
def

and you want each block listed,then try modifying the submitted scripts so that a abc match turns on the print flag and a def turns it off.

In perl it would look like

#!/usr/local/bin/perl -w

use IO::File;

$printit=0;

$filename=shift(@ARGV);
$file = new IO::File;
$file->open("<$filename");

while( ($linein=$file->getline) ){

if ($linein =~ /abc/){
printit=1;
}

if (printit == 1){
print "$linein\n";
if($linein=~ /def/){
printit=0;
}
}
}
$file->close;


run it with my_script my_file

Hope this helps



"Only he who attempts the absurd is capable of achieving the impossible
Anand_30
Regular Advisor

Re: extracting a portion from a file

My file is like this:

abc
some stuff
...
..
def
abc
some stuff
abc
some stuff
abc
some stuff
abc
some stuff
...
..
def

I want the script to extract the portion of the file which starts with the pattern 'abc' and ends with the pattern def. But now the scripts extracts the lines that starts with 'abc' also.

In some files there are only one occurance of 'abc' and 'def'. In those cases the scripts work exactly fine.

-Andy


Rodney Hills
Honored Contributor

Re: extracting a portion from a file

If your saying your file has multiple "abc" lines that may or may not have matching "efg" lines and you only want the "abc" lines that are closest to the "efg".

Example-
1 abc one
2 abc two
3 something
4 def one

And you want lines 2 - 4, then maybe this one line perl script can do it.

perl -ne 'if (/abc/) { @a=(); $p=1; } ; { push(@a,$_); }; if (/efg/) { print @a if $p; @a=(); $p=0 }' yourfile

HTH

-- Rod Hills
There be dragons...
Rodney Hills
Honored Contributor

Re: extracting a portion from a file

Looks like I confused "def" and "efg". Just replace "def" with all "efg" in my response.

-- Rod Hills
There be dragons...
Anand_30
Regular Advisor

Re: extracting a portion from a file

Thanks Rod,

This seems to work fine but I want each extract to be put in a separate file. How can I accomplish that.

-Andy
Rodney Hills
Honored Contributor

Re: extracting a portion from a file

This will open a new file for each "block"-

perl -ne 'if (/abc/) { @a=(); $p=1; } ; { push(@a,$_); }; if (/efg/) { if ($p) { $n++; open OUT ">outfile.$n"; print @a; close OUT; @a=(); $p=0 }' yourfile

-- Rod Hills
There be dragons...
Rodney Hills
Honored Contributor

Re: extracting a portion from a file

Oops, I needed another "}"

perl -ne 'if (/abc/) { @a=(); $p=1; } ; { push(@a,$_); }; if (/efg/) { if ($p) { $n++; open OUT ">outfile.$n"; print @a; close OUT; @a=(); $p=0 }}' yourfile

-- Rod Hills
There be dragons...
Anand_30
Regular Advisor

Re: extracting a portion from a file

I am getting the following error while executing the script:

Missing comma after first argument to open function at -e line 1, near "">outfil
e.$n";"
Execution of -e aborted due to compilation errors

Can you please help me out in this

-Andy
Graham Cameron_1
Honored Contributor

Re: extracting a portion from a file

Michael

Mine is an awk script too.

Andy, so you want to
- start capturing when you hit abc
- if you hit another abc, start capturing again.
- when you hit a def, print all lines since the most recent abc

Here goes. I am doing this via a script file rather than in-line:

Create a file, eg andy.awk, containing:

/^abc/ {
split("",list) #empty out the array.
el=0
list [++el] = $0
next }
(el > 0) { list [++el] = $0 }
/^def/ {
nf=sprintf("newfile.%d", ++matchcount)
for (i=1;i<=el;i++)
print list[i] >> nf
}

Invoke this with awk -f andy.awk yourfile, it will create newfile.1, newfile.2 etc.


Hope that closes it.

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi,

BEGIN{COUNT=0;DOPRINT=0;}
/abc/\
{if (DOPRINT==0)
{
DOPRINT=1;
COUNT=COUNT+1;
OUTPUTFILE="text." COUNT;
}
{
if (DOPRINT==1)
{
print $0 > OUTPUTFILE;
}
}
/efg/ {DOPRINT=0;}
END{}

this should do the trick.

greetings,

Michael
H.Merijn Brand (procura
Honored Contributor

Re: extracting a portion from a file

Into a single file:

# perl -ne '/abc/.../efg/ and print' file>excerpt

What I dont see in your quest is what should hppen when matches overlap

abc
x
abc
efg
efg

that's the diff in perl between /pat/../pat/ and /pat/.../pat/ both not very well know options. Unless you state what your wish is about overlapping matches, it's hard to split into different files.

Basics could be like:

if (/abc/../def/) {
unless ($out) { open $out, ">file$." or die "file$.: $!" }
print $out;
}
else {
$out and close $out;
undef $out;
}

Which is far more easy than the previous posted perl solutions. Rodney's solutions are correct and simple (as usual), but I think I should at least draw they attention to these lesser known, but very powerful features.

Enjoy, have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Michael Schulte zur Sur
Honored Contributor

Re: extracting a portion from a file

Hi Andy,

have you checked on the latest posts?

greetings,

Michael
Rodney Hills
Honored Contributor

Re: extracting a portion from a file

To fix my script-

perl -ne 'if (/abc/) { @a=(); $p=1; } ; { push(@a,$_); }; if (/efg/) { if ($p) { $n++; open OUT,">outfile.$n"; print OUT @a; close OUT; @a=(); $p=0 }}' yourfile

I guess I should have tested it before posting. Sorry...

-- Rod Hills
There be dragons...
Anand_30
Regular Advisor

Re: extracting a portion from a file

Thanks everyone for your help. I have used the solution provided by Graham and it has worked absolutely fine.

Thanks again for all your help.

-Andy.
Anand_30
Regular Advisor

Re: extracting a portion from a file

Hi All,

I am back again with one more question:

I have used Graham's solution to solve my problem. I am using awk -f andy.awk in my SHELL script. I have nearly 50 files for which I need to extract the 'abc' 'def' portion. I have put it in a loop and using awk -f andy.awk $file. Now, I have to name the output file created in andy.awk (newfile.1, newfile.2 ....) as $file.1, $file.2, etc...

Can anyone please help me in this problem.

Thanks,
Andy





Rodney Hills
Honored Contributor

Re: extracting a portion from a file

If you don't care about the orginal file name being part of the new file name, then you could-

cat * | awk andy.awk

This way all the files would appear as one stream into the script, thus creating unique filenames.

HTH

-- Rod Hills
There be dragons...
Graham Cameron_1
Honored Contributor

Re: extracting a portion from a file

Andy

Change the awk script slightly to have the filename passed into it.

Change the line
nf=sprintf("newfile.%d", ++matchcount)
to
nf=sprintf("%s.%d", fname, ++matchcount)

and invoke with
awk -v fname=$file -f andy.awk $file
it will create $file.1, $file.2, etc...

-- Graham
Computers make it easier to do a lot of things, but most of the things they make it easier to do don't need to be done.