topic Re: extract fields .... in Operating System - Linux

extract fields ....

lawrenzo_1 — Fri, 04 Apr 2008 15:15:12 GMT

Hello all,

I am trying to remove certain fields from each row in a file however the data I require isnt always the same field:

sample output

16:39:27 * (DATA=(CID=(PROGRAM=)(HOST=)(USER=))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=153094144)) * data * 1 2 3 4

the data I require is the date, HOST and USER fields however the first "*" dictates 1-4 fields depending on what has been written to the file.

As far as I am aware the application that writes the file cannot be changed.

any help is much appreciated

Thanks

Re: extract fields ....

lawrenzo_1 — Fri, 04 Apr 2008 15:40:38 GMT

I have been reading and looking the awk "match" function:

awk '{match($0, HOST)};{print substr($0, RSTART, RLENGHT)}' file

but this doesnt work?

I know I will have to build the statement up but any idea's to get me started?

Re: extract fields ....

Hein van den Heuvel — Fri, 04 Apr 2008 15:42:02 GMT

I would use perl.. but then I used perl for just about anything.

$ perl -ne '$user=$1 if /$USER=([^$]+)/;$host=$1 if /$HOST=([^$]+)/; print qq($user $host\n)' some-file

For an explanation, let's take the chunk:

$user=$1 if /\(USER=([^)]+)/

We make variable $user become the first remembered variable ($1) if we match

\(USER= : piece of string starting with escaped parenthesis

([^)]+) : remember "()", some series of "[]+", NOT closing parenthesis "^)"

Enjoy,
Hein.

Re: extract fields ....

lawrenzo_1 — Fri, 04 Apr 2008 15:46:52 GMT

Thanks Hein for the perl solution however I was looking for an awk / sed example as so I can manipulate and adapt if necessary!

Thanks

Chris .....

any awk?

Re: extract fields ....

Hein van den Heuvel — Fri, 04 Apr 2008 16:03:00 GMT

Lawrenzo... given the style of questions you frequently ask you may want to look into picking up perl (or better).

Yes, awk's MATCH should allow you to get there. But I find it cumbersome to use

I would solve it by going very generic first, then ask for a detail.
Take that whole line, look for (name=value) chunks and add each chunk to an array.

Here is a final solution:

$ awk '{split($0,pairs,"\)+$"); for (i in pairs) {split(pairs[i],x,"="); y[x[1]]=x[2]}; print y["USER"],y["HOST"]}' file-name

You (hopefully) see you could pick any component to print.

Here are partial solutions to help you, and other interested readers, understand what is happening:

First see if we can find 'pairs' by looking for splitting the line on ")(" but tolerating "))("

$ awk '{split($0,pairs,"$+$"); for (i in pairs) print pairs[i]}' x
HOST=
USER=
COMMAND=status
ARGUMENTS=64
:

Now pick apart each pair and print members.

$ awk '{split($0,pairs,"$+\("); for (i in pairs) {split(pairs[i],x,"="); print x[1],x[2]}}' x
HOST
USER
COMMAND status
ARGUMENTS 64

And finally use those members to build a generic array as per solution.

Hein.

Re: extract fields ....

Hein van den Heuvel — Fri, 04 Apr 2008 16:11:42 GMT

fwiw... it picks up those pieces even better if you simply allow for any number of opening parens and at least one close:

$ awk '{split($0,pairs,"\)+$*"); for (i in pairs) print pairs[i]}' x
HOST=
USER=
COMMAND=status
ARGUMENTS=64
SERVICE=LISTENER
VERSION=153094144

Arguably the nested initial names are better picked up by allowing an evan more generic "any close followed by any open"

$ awk '{split($0,pairs,"$*\(*"); for (i in pairs) print pairs[i]}' x
DATA=
CID=
PROGRAM=
HOST=
USER=
COMMAND=status
ARGUMENTS=64
SERVICE=LISTENER
VERSION=153094144

Just for for thought.
ZERO points please.

Hein.

Re: extract fields ....

Dennis Handly — Fri, 04 Apr 2008 23:16:33 GMT

>the data I require is the date, HOST and USER fields however the first "*" dictates 1-4 fields depending on what has been written to the file.

I'm not sure what you mean about that "*"?

You can use awk and pretend you are programming in C. Is the date enclosed in <> or that's just your metachars? If not in <>, how many fields?

awk '
{
# extract date stuff
...
# find HOST
h = index($0, "HOST=")
t = substr($0, h)
e = index(t, ")")
host = substr(t, 1, e-1)
# find USER
u = index($0, "USER=")
t = substr($0, u)
e = index(t, ")")
user = substr(t, 1, e-1)
print date, host, user
}' file

You could then write a function that does the work for HOST and USER:

awk '
function find_it(line, str) {
h = index(line, str)
t = substr(line, h)
e = index(t, ")")
return substr(t, 1, e-1)
}
{
# extract date stuff
...
host = find_it($0, "HOST=")
user = find_it($0, "USER=")
print date, host, user
}' file

Of course you can go with your first thought about using match:
awk '
{
match($0, "HOST=[^\)]*\)")
print substr($0, RSTART, RLENGTH-1)
}' file

Re: extract fields ....

Peter Nikitka — Sat, 05 Apr 2008 06:26:21 GMT

Hi,

I would consider using the char '=' as a delimiter and split out the value.
Something like this:
awk -F= '{for(i=2;i
mfG Peter

Re: extract fields ....

lawrenzo_1 — Mon, 07 Apr 2008 07:33:35 GMT

ok thats great and thanks all for the help

Re: extract fields ....

lawrenzo_1 — Mon, 07 Apr 2008 08:27:24 GMT

this is what I used in the end:

awk '{match($0, "HOST=[^\)]*\)")} {host = substr($0, RSTART, RLENGTH-1)};{match($0, "USER=[^\)]*\)")} {user = substr($0, RSTART, RLENGTH-1)};{print $1,host,user}' file"

04-APR-2008 HOST= USER=

cheers

Chris.

Re: extract fields ....

lawrenzo_1 — Mon, 07 Apr 2008 08:28:03 GMT

done