1828780 Members
2997 Online
109985 Solutions
New Discussion

extract fields ....

 
SOLVED
Go to solution
lawrenzo_1
Super Advisor

extract fields ....

Hello all,

I am trying to remove certain fields from each row in a file however the data I require isnt always the same field:

sample output

16:39:27 * (DATA=(CID=(PROGRAM=)(HOST=)(USER=))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=153094144)) * data * 1 2 3 4

the data I require is the date, HOST and USER fields however the first "*" dictates 1-4 fields depending on what has been written to the file.

As far as I am aware the application that writes the file cannot be changed.

any help is much appreciated

Thanks


hello
10 REPLIES 10
lawrenzo_1
Super Advisor

Re: extract fields ....

I have been reading and looking the awk "match" function:

awk '{match($0, HOST)};{print substr($0, RSTART, RLENGHT)}' file

but this doesnt work?

I know I will have to build the statement up but any idea's to get me started?
hello
Hein van den Heuvel
Honored Contributor
Solution

Re: extract fields ....

I would use perl.. but then I used perl for just about anything.

$ perl -ne '$user=$1 if /\(USER=([^\)]+)/;$host=$1 if /\(HOST=([^\)]+)/; print qq($user $host\n)' some-file

For an explanation, let's take the chunk:

$user=$1 if /\(USER=([^)]+)/

We make variable $user become the first remembered variable ($1) if we match

\(USER= : piece of string starting with escaped parenthesis

([^)]+) : remember "()", some series of "[]+", NOT closing parenthesis "^)"

Enjoy,
Hein.
lawrenzo_1
Super Advisor

Re: extract fields ....

Thanks Hein for the perl solution however I was looking for an awk / sed example as so I can manipulate and adapt if necessary!

Thanks

Chris .....

any awk?
hello
Hein van den Heuvel
Honored Contributor

Re: extract fields ....

Lawrenzo... given the style of questions you frequently ask you may want to look into picking up perl (or better).

Yes, awk's MATCH should allow you to get there. But I find it cumbersome to use

I would solve it by going very generic first, then ask for a detail.
Take that whole line, look for (name=value) chunks and add each chunk to an array.

Here is a final solution:

$ awk '{split($0,pairs,"\)+\("); for (i in pairs) {split(pairs[i],x,"="); y[x[1]]=x[2]}; print y["USER"],y["HOST"]}' file-name


You (hopefully) see you could pick any component to print.

Here are partial solutions to help you, and other interested readers, understand what is happening:

First see if we can find 'pairs' by looking for splitting the line on ")(" but tolerating "))("

$ awk '{split($0,pairs,"\)+\("); for (i in pairs) print pairs[i]}' x
HOST=
USER=
COMMAND=status
ARGUMENTS=64
:

Now pick apart each pair and print members.

$ awk '{split($0,pairs,"\)+\("); for (i in pairs) {split(pairs[i],x,"="); print x[1],x[2]}}' x
HOST
USER
COMMAND status
ARGUMENTS 64

And finally use those members to build a generic array as per solution.

Hein.
Hein van den Heuvel
Honored Contributor

Re: extract fields ....

fwiw... it picks up those pieces even better if you simply allow for any number of opening parens and at least one close:

$ awk '{split($0,pairs,"\)+\(*"); for (i in pairs) print pairs[i]}' x
HOST=
USER=
COMMAND=status
ARGUMENTS=64
SERVICE=LISTENER
VERSION=153094144

Arguably the nested initial names are better picked up by allowing an evan more generic "any close followed by any open"

$ awk '{split($0,pairs,"\)*\(*"); for (i in pairs) print pairs[i]}' x
DATA=
CID=
PROGRAM=
HOST=
USER=
COMMAND=status
ARGUMENTS=64
SERVICE=LISTENER
VERSION=153094144

Just for for thought.
ZERO points please.

Hein.
Dennis Handly
Acclaimed Contributor

Re: extract fields ....

>the data I require is the date, HOST and USER fields however the first "*" dictates 1-4 fields depending on what has been written to the file.

I'm not sure what you mean about that "*"?

You can use awk and pretend you are programming in C. Is the date enclosed in <> or that's just your metachars? If not in <>, how many fields?

awk '
{
# extract date stuff
...
# find HOST
h = index($0, "HOST=")
t = substr($0, h)
e = index(t, ")")
host = substr(t, 1, e-1)
# find USER
u = index($0, "USER=")
t = substr($0, u)
e = index(t, ")")
user = substr(t, 1, e-1)
print date, host, user
}' file

You could then write a function that does the work for HOST and USER:

awk '
function find_it(line, str) {
h = index(line, str)
t = substr(line, h)
e = index(t, ")")
return substr(t, 1, e-1)
}
{
# extract date stuff
...
host = find_it($0, "HOST=")
user = find_it($0, "USER=")
print date, host, user
}' file

Of course you can go with your first thought about using match:
awk '
{
match($0, "HOST=[^\)]*\)")
print substr($0, RSTART, RLENGTH-1)
}' file
Peter Nikitka
Honored Contributor

Re: extract fields ....

Hi,

I would consider using the char '=' as a delimiter and split out the value.
Something like this:
awk -F= '{for(i=2;i
mfG Peter
The Universe is a pretty big place, it's bigger than anything anyone has ever dreamed of before. So if it's just us, seems like an awful waste of space, right? Jodie Foster in "Contact"
lawrenzo_1
Super Advisor

Re: extract fields ....

ok thats great and thanks all for the help
hello
lawrenzo_1
Super Advisor

Re: extract fields ....

this is what I used in the end:

awk '{match($0, "HOST=[^\)]*\)")} {host = substr($0, RSTART, RLENGTH-1)};{match($0, "USER=[^\)]*\)")} {user = substr($0, RSTART, RLENGTH-1)};{print $1,host,user}' file"

04-APR-2008 HOST= USER=

cheers

Chris.
hello
lawrenzo_1
Super Advisor

Re: extract fields ....

done
hello