1829536 Members
1738 Online
109992 Solutions
New Discussion

Extract portion of file

 
SOLVED
Go to solution
Rahul_13
Advisor

Extract portion of file

Hi,

I have a file like

abc
...
...
...
...
def
...
...
...
...
abc
...
...
...
def
...
...
...

I have to write a script which will extract the lines starting from "abc" to the line "def" and put it in separate individual files.

For example if my file has 3 occurances of the combination "abc"...."def", then the script should have 3 outputs with each "abc"..."def" stored separately in each output file.

Can anyone please help me.

Thanks,
Rahul
22 REPLIES 22
Keith Bryson
Honored Contributor

Re: Extract portion of file

Hi Rahul

Here is my script (not pretty, but it works):

LOOP=0
LINES=`cat /test | wc -l`
FILENAMELOOP=1
STARTLOGGING="N"
while [ $LOOP -lt $LINES ]
do
LINE=`cat /test | head -n$LOOP | tail -1`
FILENAME=/tmp/output.$FILENAMELOOP
case $LINE in
abc)
STARTLOGGING="Y"
echo $LINE >$FILENAME
;;
def)
FILENAME=/tmp/output.$FILENAMELOOP
let FILENAMELOOP=$FILENAMELOOP+1
echo $LINE >>$FILENAME
STARTLOGGING="N"
;;
*)
if [ $STARTLOGGING = "Y" ]
then
echo $LINE >>$FILENAME
fi
;;
esac
let LOOP=$LOOP+1
done

You then end up with /tmp/output.1 /tmp/output.2 etc....

Good luck - Keith
Arse-cover at all costs
Keith Bryson
Honored Contributor

Re: Extract portion of file

Forgot to mention, put your file in /test!!

Keith
Arse-cover at all costs
Peter Godron
Honored Contributor

Re: Extract portion of file

Rahul,
a solution in perl:
#!/usr/contrib/bin/perl
# Initialise the file counter to 1
my $handle = 1;
# While there is data
while (<>)
{
# Found a header
if ( /^abc/)
{
my $file = "$handle.txt";
open FILE, ">$file" or die "unable to open $file $!";
}
# Found the end
if ( /^def/)
{
print FILE ;
close FILE;
$handle += $handle;
}
print FILE ;
}
Save the file as a.pl. chmod 755 a.pl
and run with a.pl filename
Regards
Muthukumar_5
Honored Contributor

Re: Extract portion of file

#!/bin/ksh
# $1 = filename
set -a arr1 `grep -n 'abc' $1 | awk -F":" '{ print $1 }'
set -a arr2 `grep -n 'def' $1 | awk -F":" '{ print $1 }'

cnt=0

while [[ $cnt -le ${arr1[*]} ]]
do

sed -n ${arr1[$cnt]},${arr2[$cnt]}p $1
let cnt=cnt+1

done

it will work.

hth.
Easy to suggest when don't know about the problem!
Pete Randall
Outstanding Contributor

Re: Extract portion of file

From "Handy One-Liners for Sed" (attached):

# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive

Pete
Rahul_13
Advisor

Re: Extract portion of file

Hi,

My requirement has changed little. I have to extract the pattern "abc....def" from the file and search for some lines matching particular pattern from the extract. The "abc....def" pattern need not be extracted in separated files.

But the pattern from the "abc.....def" extracts should be ouputed to a file.

Please help

Thanks,
Rahul
Keith Bryson
Honored Contributor

Re: Extract portion of file

Hi Rahul

See Pete Randall's sed solution - this works fine:

cat /tmp/inputfile | sed -n '/abc/,/def/p' >/tmp/outputfile

/tmp/inputfile would contain the abc...def sections as you have specified.

Regards - Keith
Arse-cover at all costs
Peter Godron
Honored Contributor

Re: Extract portion of file

Rahul,
can you please clarify
"search for some lines matching particular pattern from the extract".
You could now use grep for the pattern matching either on single file or multiple files.

Rahul_13
Advisor

Re: Extract portion of file

Please find attached a sample extract from my file. The input file is about 2 MB. And there are 100s of "message EventEstablished"............................
"AttributeCallType" occurances in the input file.

The requirement is that I have to read through the input file and output the pattern "ANI", "ORIGCONN", the timestamp (first line of the extract)and few more patterns from each occurance of EventEstablished"............................
"AttributeCallType". Also I need to output a blank in case I don't get any of the required patterns in any of the EventEstablished"............................
"AttributeCallType" extract.

I can do it by extracting each occurance of
EventEstablished"............................
"AttributeCallType" in separate files and then parse the files. But I would need to create 100s of files and then process each individual file.

I want to do it without creating separate files for each extract. Can it be done on the runtime.

Please help.

Thanks,
Rahul

Rodney Hills
Honored Contributor

Re: Extract portion of file

In perl-

open(INP,"while() {
if (/^@([:.\d]+)/) { $time=$1;}
elsif (/'ANI'\s+'([^']*)'/) { $ani=$1;}
elsif (/'ORIGCONN'\s+'([^']*)'/) { $origconn=$1;}
elsif (/AttributeCallType\s+(\d+)/) { $act=$1;
print join(",",$time,$ani,$origconn,$act),"\n";
}
}

HTH

-- Rod Hills
There be dragons...
Rahul_13
Advisor

Re: Extract portion of file

Hi Rod,

Thanks for your help.

But the actual input file is not the one which I have attached in my earlier message.
The actual file is about 2Mb size which contains 100s of "message EventEstablished"............."AttributeCallType" occurances and also lots of other data.

I have to search through the pattern for each "message EventEstablished"............."AttributeCallType" such extract and then perform the pattern matching for "ANI", "CallType", "Time" and others.

I cannot store each "message EventEstablished"............."AttributeCallType" extract in separate files as it will occupy lots of space.

Thanks,
Rahul
Rodney Hills
Honored Contributor

Re: Extract portion of file

My routine will scan whatever the file size is. It will print one line for each grouping of data. It will include the time, and other data you want. The result can be redirected to one file.

one file in. one file out...

-- Rod Hills
There be dragons...
Rahul_13
Advisor

Re: Extract portion of file

Hi Rod,

I am getting the error:

syntax error at line 1 : `(' unexpected.

while executing your script.

Also, the input file has lots of combination other than "message EventEstablished"............."AttributeCallType" which also has ANI, ORIGCONN and other patterns but I want the data to be extracted only from each combination of "message EventEstablished"............."AttributeCallType".

Please help.

Thanks,
Rahul




Rodney Hills
Honored Contributor

Re: Extract portion of file

This is a perl script, so to run it, enter-
perl thescript

Be sure you have at least version 5 also. To get the version, enter-
perl -V

The script I supplied collects the data as it sees it, then only dumps it out when "AttributeCallType" is found.

If their are other values you want to dump out, then just add additional "elsif" statements for each type.

If AttributeCallType may not have a matching start time, then the script will have to modified.

-- Rod Hills
There be dragons...
Hein van den Heuvel
Honored Contributor

Re: Extract portion of file

Ah.. I guess I did not read closely enough.
At first I really thought you wanted mutlipel files. In that case I would use a perl script like:

------ x.p -------------------------------
$yourfile = shift @ARGV or die "Please provide name of file to read";
open(INP,"<$yourfile") or die "Could not open '$yourfile' for read";
while() {
if (/^@([:.\d]+)/) {
$time=$1;
$begin++;
$file++;
$ani = $origconn = " ";
$outfile = $yourfile . "_$file";
open (OUT,">$outfile") or die "Could not open '$outfile' for output";
}
next unless ($begin);
$ani=$1 if (/'ANI'\s+'([^']*)'/);
$origconn=$1 if (/'ORIGCONN'\s+'([^']*)'/);
if (/AttributeCallType\s+(\d+)/) {
$act=$1;
print OUT join(",",$time,$ani,$origconn,$act),"\n";
close OUT;
$begin = 0;
}
}

Now execute with: perl x.p xxx
This will create xxx_1, xxx_2,...
Each file will contain one match block.

Now if you jost want a line per block then try the following variation on Rod's solution:


---- x.p --------------
while(<>) {
if (/^@([:.\d]+).*EventEstablished$//) {
$time=$1;
$ani = $origconn = " ";
}
next unless ($time);
$ani=$1 if (/'ANI'\s+'([^']*)'/);
$origconn=$1 if (/'ORIGCONN'\s+'([^']*)'/);
if (/AttributeCallType\s+(\d+)/) {
$act=$1;
print join(",",$time,$ani,$origconn,$act),"\n";
undef $time;
}
}

perl x.p < filename
or
perl x.p filename
or
cat filename | perl x.p

will now give a line per block for example:
perl x.p < x
10:37:47.3208,11111111111,03be0123ae27821d,2
10:37:47.3208, ,03be0123ae27821d,2
10:37:47.3208,333333333333,03be0123ae27821d,2

here the first block had 'ani' = 111, the second had no 'ani', the las had 'ani' = 333


The magic is to use having seen a 'time' on a line ending in EventEstablished as a flag.
Skip anything unless the flag is seen (next...)
Print an clear the flag when an end is seen.

hth,
Hein,



Rahul_13
Advisor

Re: Extract portion of file

Hi Rod,

The script works absolutely fine now. But my input file looks like:

message EventEstablished
.....
.....
.....
AttributeCallType
......
.....
....
....
message EventEstablished
.....
.....
.....
AttributeCallType
..........
.......
........
.........

message EventEstablished
.....
.....
.....
AttributeCallType
.....
.....
.....

My requirement is that I have to extract each occurance of "message EventEstablished"............."AttributeCallType" from the input file and then execute your script to get the required data.

But I don't want to extract "message EventEstablished"............."AttributeCallType" in separate files and then execute your script for each individual file.

Is there a way in which I could read through the input file and then put each occurance of "message EventEstablished"............."AttributeCallType" in separate arrays and then search through the arrays for "ANI", Timestamp etc.

Thanks,
Rahul



Rodney Hills
Honored Contributor

Re: Extract portion of file

My script takes care of both managing the "grouping" of data and the extract of required data from the group. No need to store in seperate files or arrays.

What is the final result you want of this extract. Usually most people want the data grouped one line per group comma seperated so it can be imported into an excel spreadsheet.

-- Rod Hills
There be dragons...
Hein van den Heuvel
Honored Contributor

Re: Extract portion of file

Rahul wrote on Jan 28, 2005 18:53:36 GMT ..

"Is there a way in which I could read through the input file and then put each occurance of "message EventEstablished"............."AttributeCallType" in separate arrays and then search through the arrays for "ANI", Timestamp etc."

the answer I posted a few minutes before your question solves this

:-).

Hein.

while(<>) {
if (/^@([:.\d]+).*EventEstablished$//) {
$time=$1;
$ani = $origconn = " ";
}
next unless ($time);
$ani=$1 if (/'ANI'\s+'([^']*)'/);
$origconn=$1 if (/'ORIGCONN'\s+'([^']*)'/);
if (/AttributeCallType\s+(\d+)/) {
$act=$1;
print join(",",$time,$ani,$origconn,$act),"\n";
undef $time;
}
}



Rahul_13
Advisor

Re: Extract portion of file

Hi Rod,

My input file has lots of other Events apart from EventEstablished which also has the same format.

When I execute your script, The ouput file has the timestamp,ANI,ORIGCONN etc from all the events in the input file.

But my requirement is that I am trying to collect the timestamp, ANI etc for the event EventEstablished which starts with
"timestamp..........message EventEstablished" and ends with "AttributeCallType".

However the ouput Format which your script generates is exactly what I need.

Hein,

I tried to execute your script and got the following error:

syntax error at x.p line 2, near "/) "
syntax error at x.p line 14, near "}"
Execution of x.p aborted due to compilation errors.

Thanks,
Rahul




Hein van den Heuvel
Honored Contributor

Re: Extract portion of file

Ooops. Cut & Paste error.

>>> if (/^@([:.\d]+).*EventEstablished$//) {

should be

if (/^@([:.\d]+).*EventEstablished$/) {


That double slash should be single.
The version I posted earlier worked.
I added the extra test for 'EventEstablished' at the end of the line.
That '$' is an anchor fo end of line, like the '^' is the anchor for the start).

Hein.

Rahul_13
Advisor

Re: Extract portion of file

Hi Hein,

Your solution has worked absolutely fine. I have to do one more thing. I have to execute this script not for one input file but for all the files in a directory.

Also, the input files will look like

filename.20050124_133711_431.log.

I have to extract the date part "20050124" from all the input files and include it in the output along with ANI,ORIGCONN etc.

Please help.

Thanks,
Rahul
Hein van den Heuvel
Honored Contributor
Solution

Re: Extract portion of file

Easy. Perl loves to 'glob' file names, or just snarf them from teh argument list.

Here is how I cloned a test file 3 times over, and then feed them intothe script:

> ls -1 x*log
x.12345678_aap.log
x.23456789_noot.log
x.34567890_mies.log
>
> perl x.p x*log
12345678,10:37:47.3208,11111111111,03be0123ae27821d,2
12345678,10:37:47.3208, ,03be0123ae27821d,2
12345678,10:37:47.3208,333333333333,03be0123ae27821d,2
23456789,10:37:47.3208,11111111111,03be0123ae27821d,2
23456789,10:37:47.3208, ,03be0123ae27821d,2
23456789,10:37:47.3208,333333333333,03be0123ae27821d,2
34567890,10:37:47.3208,11111111111,03be0123ae27821d,2
34567890,10:37:47.3208, ,03be0123ae27821d,2
34567890,10:37:47.3208,333333333333,03be0123ae27821d,2

Now for the script. It grabs the filenames from ARGV into $_ (default)
Next match it against the pattern you suggest.
If it matched, remember the date part
\d = a decimal
{8} = eight of them
Then open the file and loop through it.

> cat x.p
while (<@ARGV>){
if (/\.(\d{8})_.*.log/) {
$date = $1;
open (INP, "<$_") or die "Could not open file '$_' for input";
while() {
if (/^@([:.\d]+).*EventEstablished$/) {
$time=$1;
$ani = $origconn = " ";
}
next unless ($time);
$ani=$1 if (/'ANI'\s+'([^']*)'/);
$origconn=$1 if (/'ORIGCONN'\s+'([^']*)'/);
if (/AttributeCallType\s+(\d+)/) {
$act=$1;
print join(",",$date,$time,$ani,$origconn,$act),"\n";
undef $time;
}
}
}
}


Cheers,
Hein.