Operating System - HP-UX
1828623 Members
1578 Online
109983 Solutions
New Discussion

Re: Awk and rss feeds: awk question:

 
SOLVED
Go to solution
Anu Mathew
Valued Contributor

Awk and rss feeds: awk question:

Greetings..!!

I'm trying to parse some files from a shell script. The general structure of the file is:

..../snip/...


stuff
yada yada yada


....
blah


.. and so forth...

I want to extract stuff between lines that begin with and ends with , and save each extraction's output to a file, evoke a function to process that file, and so forth, until there are no more such sections left in the input file.

I've this so far:

cat in.file | awk '/^/{
getline;
while ( $0 !~ "^<\/item>" ) {
print $0
getline;
}
exit ;
}' >> /tmp/out.$$

This extracts only the 1st section of infile between and .

Now, without "exit;", it extracts all sections in infile which start and end with and respectively.

Any help will be much appreciated.

--AM
5 REPLIES 5
Rodney Hills
Honored Contributor

Re: Awk and rss feeds: awk question:

Rather then copy each section to a seperate file, why not process each section as it appears. For instance-

awk -f awkprog in.file

awkprog contains-
/^/{ITEM=1};
/^<\/item>/{ITEM=0};
ITEM == 1 { print "processing " $0 " now"}

If you plan to do multi-level exploded of sections within sections, then "perl" might be a better tool for parsing and processing the file.

my 2 cents

Rod Hills
There be dragons...
James R. Ferguson
Acclaimed Contributor

Re: Awk and rss feeds: awk question:

Hi:

You can redirect in 'awk'. Remember to close files not in use as is always good practice. For example:

# awk '{print $0 >> "/tmp/output"};END{close "/tmp/output"}' /etc/hosts

...would copy '/etc/hosts' to '/tmp/output'.

Regards!

...JRF...
Anu Mathew
Valued Contributor

Re: Awk and rss feeds: awk question:

Thanks.

Rodney, that would be somewhat similar to:

awk '//,/<\/item>/ {print "processing " $0 " now"}' inputfile

For every occurence of and , I want to grab the stuff in between them, to either run a script, or say to save it as as files "out.1, out.2.. etc."
James R. Ferguson
Acclaimed Contributor
Solution

Re: Awk and rss feeds: awk question:

Hi Anu:

Using your posted data, Rod's suggestion and mine, see if something like the following gives you what you want:

#!/usr/bin/awk -f
/^/{ITEM=1;n++};
/^<\/item>/{ITEM=0;close "tmp/out"n};
{if (ITEM == 1) {
print "processing " $0 " in section=" n > "/tmp/out"n
}}

...your output will appear (in this case) in three files, named "/tmp/out1", "/tmp/out2" and "/tmp/out3".

Regards!

...JRF...
Srini Jay
Valued Contributor

Re: Awk and rss feeds: awk question:

Hi Anu,
Hope this might help:

cat in.file | awk '/^/{
getline;
i++;
while ( $0 !~ "^<\/item>" ) {
print $0 >> "out." i
getline;
}
close "out." i
}'

Incase 'i' doesn't start from '1', use this:

cat in.file | awk 'begin{i=0}/^/{
getline;
i++;
while ( $0 !~ "^<\/item>" ) {
print $0 >> "out." i
getline;
}
close "out." i
}'