Re: Awk and rss feeds: awk question:

Anu Mathew · ‎12-14-2005

Greetings..!!

I'm trying to parse some files from a shell script. The general structure of the file is:

..../snip/...

stuff
yada yada yada

....
blah

.. and so forth...

I want to extract stuff between lines that begin with and ends with , and save each extraction's output to a file, evoke a function to process that file, and so forth, until there are no more such sections left in the input file.

I've this so far:

cat in.file | awk '/^/{
getline;
while ( $0 !~ "^<\/item>" ) {
print $0
getline;
}
exit ;
}' >> /tmp/out.$$

This extracts only the 1st section of infile between and .

Now, without "exit;", it extracts all sections in infile which start and end with and respectively.

Any help will be much appreciated.

--AM

Rodney Hills · ‎12-14-2005

Rather then copy each section to a seperate file, why not process each section as it appears. For instance-

awk -f awkprog in.file

awkprog contains-
/^/{ITEM=1};
/^<\/item>/{ITEM=0};
ITEM == 1 { print "processing " $0 " now"}

If you plan to do multi-level exploded of sections within sections, then "perl" might be a better tool for parsing and processing the file.

my 2 cents

Rod Hills

There be dragons...

James R. Ferguson · ‎12-14-2005

Hi:

You can redirect in 'awk'. Remember to close files not in use as is always good practice. For example:

# awk '{print $0 >> "/tmp/output"};END{close "/tmp/output"}' /etc/hosts

...would copy '/etc/hosts' to '/tmp/output'.

Regards!

...JRF...

Anu Mathew · ‎12-14-2005

Thanks.

Rodney, that would be somewhat similar to:

awk '//,/<\/item>/ {print "processing " $0 " now"}' inputfile

For every occurence of and , I want to grab the stuff in between them, to either run a script, or say to save it as as files "out.1, out.2.. etc."

James R. Ferguson · ‎12-14-2005

Hi Anu:

Using your posted data, Rod's suggestion and mine, see if something like the following gives you what you want:

#!/usr/bin/awk -f
/^/{ITEM=1;n++};
/^<\/item>/{ITEM=0;close "tmp/out"n};
{if (ITEM == 1) {
print "processing " $0 " in section=" n > "/tmp/out"n
}}

...your output will appear (in this case) in three files, named "/tmp/out1", "/tmp/out2" and "/tmp/out3".

Regards!

...JRF...

Srini Jay · ‎12-15-2005

Hi Anu,
Hope this might help:

cat in.file | awk '/^/{
getline;
i++;
while ( $0 !~ "^<\/item>" ) {
print $0 >> "out." i
getline;
}
close "out." i
}'

Incase 'i' doesn't start from '1', use this:

cat in.file | awk 'begin{i=0}/^/{
getline;
i++;
while ( $0 !~ "^<\/item>" ) {
print $0 >> "out." i
getline;
}
close "out." i
}'

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Awk and rss feeds: awk question:

Awk and rss feeds: awk question: