Operating System - Linux
1748199 Members
3086 Online
108759 Solutions
New Discussion юеВ

Re: join problem with awk/printf

 
SOLVED
Go to solution
Scott Lindstrom_2
Regular Advisor

join problem with awk/printf

I have a script that outputs the result of the last backup for each host in this format:

hostnamea retcode policy_name date time

I now have a new requirement to join this with a file that contains the description of what runs on that host, eg :

hostnamea HR dev, DR dev

Up until now, I have been successful using join, and awk with printf. But now that the second file has a freefrom 'second' field, I am having problems. Any ideas on how I can end up with the following output (formatted with printf):

hostnamea retcode policy_name date time HR dev, DR dev

TIA,
Scott
12 REPLIES 12
harry d brown jr
Honored Contributor

Re: join problem with awk/printf

Can you post exampleS of what you mean by "freeform" ? I suspect that you mean it can have any number of words.

live free or die
harry d brown jr
Live Free or Die
Scott Lindstrom_2
Regular Advisor

Re: join problem with awk/printf

This phrase was an example:
HR dev, DR dev

(ie, HR development, Data Repository development)

Yes - the remainder of the line after the hostname can contain anything, including spaces and commas. That is where my problem lies.

Scott
harry d brown jr
Honored Contributor
Solution

Re: join problem with awk/printf

If you are saying that the second line in the file contains something like this:

hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff

then you can use "sed" or cut to grab the hostname out:

sed:
sed "s/^\([A-Za-z0-9]*\) \(.*\)/\1/"

cut:
cut -d" " -f1

to grab the additional stuff use cut again:

cut -d" " -f2-

If you want to transform the various stings like "HR development" into "HR dev" and "Data Respository development" into "DR dev" then that poses another challenge, especially if this is a free form field that some user is typing the information into, espeically if they can't spell.

live free or die
harry d brown jr
Live Free or Die
Scott Lindstrom_2
Regular Advisor

Re: join problem with awk/printf

The second file is exactly as you state:

hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff

The problem is as soon as I use join I lose formatting. So I use awk with printf, but then I lose anything after the first word in field2 (I would only get "HR" output).

Basically I need to join and pipe into an awk printf when file2 has a variable number of fields.

Here is what I'm playing with that does not work:

join -j1 1 -j2 1 /tmp/std_backup_list3 /tmp/swinfo | awk '{printf "%-10s\t%s\t%-30s\t%s %s %-40s\n", $1, $2, $3, $4, $5, $6, $7}'

Scott
harry d brown jr
Honored Contributor

Re: join problem with awk/printf

So the "joined file" has a first line contains the host name
and the second line contains some free form stuff, like this:

---------------------
hostnamea
HR development, Data Repository development, crazy stuff, more crazy stuff
---------------------

If this is the case, then try this:

"join stuff here" |
awk ' BEGIN { firsttime = 1 }
{
if ( firsttime == 1 ) {
hostis = $0
firsttime = 0
} else
{
print hostis, $0
exit
}
}
'

live free or die
harry d brown jr

[root@vpart1 /var/appl/perlscripts]# ./daher
hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff
[root@vpart1 /var/appl/perlscripts]#
Live Free or Die
harry d brown jr
Honored Contributor

Re: join problem with awk/printf

I was a little confused, but now I think this:

[root@vpart1 /var/appl/perlscripts]# cat dah1
hostnameA 0 policy_name date time
hostnameB 1 bad_policy nodate never
hostnameC 2 old_policy someday sometime
hostnameZ 8 good_policy goodday goodtime


[root@vpart1 /var/appl/perlscripts]# cat dah2
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB stupid stuff, more stupid stuff
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ Security Respository stuff, more backup stuff


[root@vpart1 /var/appl/perlscripts]# cat daher

sort -k 1 dah1 dah2 |
awk ' BEGIN { firsttime = 1 }
{
if ( firsttime == 1 ) {
std_hostis = $1
std_retcode = $2
std_policy_name = $3
std_date = $4
std_time = $5
firsttime = 0
} else
{
if ( std_hostis == $1 ) {
printf "%-10s\t%s\t%-30s\t%s %s %-40s\n", std_hostis, std_retcode, std_policy_name, std_date, std_time, $0
firsttime=1
} else
{
std_hostis = $1
std_retcode = $2
std_policy_name = $3
std_date = $4
std_time = $5
firsttime=0
}
}
}
'

live free or die
harry d brown jr
Live Free or Die
Scott Lindstrom_2
Regular Advisor

Re: join problem with awk/printf

Harry -

That looks like what I need! Let me give it a try and let you know.

Thanks!

Scott
Sandman!
Honored Contributor

Re: join problem with awk/printf

IMHO you need not use join or printf to get the proper formatting. Try the awk construct below, it does what you're trying to accomplish.

The file containing "hostnamea retcode policy_name date time" must precede the file containing "hostnamea HR dev, DR dev", otherwise the output will be...
"hostnamea HR dev, DR dev retcode policy_name date time"
instead of...
"hostnamea retcode policy_name date time HR dev, DR dev"

===============================================
awk '{
if(x[$1]=="")
x[$1]=$0
else
for(i=2;i<=NF;++i)
x[$1]=x[$1]" "$i
} END{for(i in x) print x[i]}' firstfile secondfile
===============================================
~hope it helps
Scott Lindstrom_2
Regular Advisor

Re: join problem with awk/printf

Harry - I think because my data is a bit different, the sort works different, and the script gives the wrong results. The output of your sort command is like this:

sort -k 1 dah1 dah2
hostnameA 0 policy_name date time
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB 1 bad_policy nodate never
hostnameB stupid stuff, more stupid stuff
hostnameC 2 old_policy someday sometime
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ 8 good_policy goodday goodtime
hostnameZ Security Respository stuff, more backup stuff

Always the backup results file followed by the host description file.

My sort output looks more like this, regardless of which file is specified first in the sort command:

host1 (leading spaces) DTP QTP
host1 0 STD_host1 07/10/2006 12:36:09
host2 (leading spaces) BW DTP QTP
host2 0 STD_host2 07/11/2006 01:57:38
host3 (leading spaces) Non-SAP Development
host3 0 STD_host3 07/10/2006 12:26:33

Unless I can get the sort to operate the same, I think I need to move on from this task. I thank you for all your assistance; this has been a learning experience for me!

Scott