topic Re: join problem with awk/printf in Operating System - Linux

join problem with awk/printf

Scott Lindstrom_2 — Thu, 06 Jul 2006 14:01:36 GMT

I have a script that outputs the result of the last backup for each host in this format:

hostnamea retcode policy_name date time

I now have a new requirement to join this with a file that contains the description of what runs on that host, eg :

hostnamea HR dev, DR dev

Up until now, I have been successful using join, and awk with printf. But now that the second file has a freefrom 'second' field, I am having problems. Any ideas on how I can end up with the following output (formatted with printf):

hostnamea retcode policy_name date time HR dev, DR dev

TIA,
Scott

Re: join problem with awk/printf

harry d brown jr — Thu, 06 Jul 2006 14:06:43 GMT

Can you post exampleS of what you mean by "freeform" ? I suspect that you mean it can have any number of words.

live free or die
harry d brown jr

Re: join problem with awk/printf

Scott Lindstrom_2 — Thu, 06 Jul 2006 14:09:41 GMT

This phrase was an example:
HR dev, DR dev

(ie, HR development, Data Repository development)

Yes - the remainder of the line after the hostname can contain anything, including spaces and commas. That is where my problem lies.

Scott

Re: join problem with awk/printf

harry d brown jr — Thu, 06 Jul 2006 14:25:58 GMT

If you are saying that the second line in the file contains something like this:

hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff

then you can use "sed" or cut to grab the hostname out:

sed:
sed "s/^$[A-Za-z0-9]*$ $.*$/\1/"

cut:
cut -d" " -f1

to grab the additional stuff use cut again:

cut -d" " -f2-

If you want to transform the various stings like "HR development" into "HR dev" and "Data Respository development" into "DR dev" then that poses another challenge, especially if this is a free form field that some user is typing the information into, espeically if they can't spell.

live free or die
harry d brown jr

Re: join problem with awk/printf

Scott Lindstrom_2 — Thu, 06 Jul 2006 14:33:09 GMT

The second file is exactly as you state:

hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff

The problem is as soon as I use join I lose formatting. So I use awk with printf, but then I lose anything after the first word in field2 (I would only get "HR" output).

Basically I need to join and pipe into an awk printf when file2 has a variable number of fields.

Here is what I'm playing with that does not work:

join -j1 1 -j2 1 /tmp/std_backup_list3 /tmp/swinfo | awk '{printf "%-10s\t%s\t%-30s\t%s %s %-40s\n", $1, $2, $3, $4, $5, $6, $7}'

Scott

Re: join problem with awk/printf

harry d brown jr — Thu, 06 Jul 2006 14:48:07 GMT

So the "joined file" has a first line contains the host name
and the second line contains some free form stuff, like this:

---------------------
hostnamea
HR development, Data Repository development, crazy stuff, more crazy stuff
---------------------

If this is the case, then try this:

"join stuff here" |
awk ' BEGIN { firsttime = 1 }
{
if ( firsttime == 1 ) {
hostis = $0
firsttime = 0
} else
{
print hostis, $0
exit
}
}
'

live free or die
harry d brown jr

[root@vpart1 /var/appl/perlscripts]# ./daher
hostnamea HR development, Data Repository development, crazy stuff, more crazy stuff
[root@vpart1 /var/appl/perlscripts]#

Re: join problem with awk/printf

harry d brown jr — Thu, 06 Jul 2006 15:43:35 GMT

I was a little confused, but now I think this:

[root@vpart1 /var/appl/perlscripts]# cat dah1
hostnameA 0 policy_name date time
hostnameB 1 bad_policy nodate never
hostnameC 2 old_policy someday sometime
hostnameZ 8 good_policy goodday goodtime

[root@vpart1 /var/appl/perlscripts]# cat dah2
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB stupid stuff, more stupid stuff
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ Security Respository stuff, more backup stuff

[root@vpart1 /var/appl/perlscripts]# cat daher

sort -k 1 dah1 dah2 |
awk ' BEGIN { firsttime = 1 }
{
if ( firsttime == 1 ) {
std_hostis = $1
std_retcode = $2
std_policy_name = $3
std_date = $4
std_time = $5
firsttime = 0
} else
{
if ( std_hostis == $1 ) {
printf "%-10s\t%s\t%-30s\t%s %s %-40s\n", std_hostis, std_retcode, std_policy_name, std_date, std_time, $0
firsttime=1
} else
{
std_hostis = $1
std_retcode = $2
std_policy_name = $3
std_date = $4
std_time = $5
firsttime=0
}
}
}
'

live free or die
harry d brown jr

Re: join problem with awk/printf

Scott Lindstrom_2 — Thu, 06 Jul 2006 15:49:31 GMT

Harry -

That looks like what I need! Let me give it a try and let you know.

Thanks!

Scott

Re: join problem with awk/printf

Sandman! — Thu, 06 Jul 2006 17:10:18 GMT

IMHO you need not use join or printf to get the proper formatting. Try the awk construct below, it does what you're trying to accomplish.

The file containing "hostnamea retcode policy_name date time" must precede the file containing "hostnamea HR dev, DR dev", otherwise the output will be...
"hostnamea HR dev, DR dev retcode policy_name date time"
instead of...
"hostnamea retcode policy_name date time HR dev, DR dev"

===============================================
awk '{
if(x[$1]=="")
x[$1]=$0
else
for(i=2;i<=NF;++i)
x[$1]=x[$1]" "$i
} END{for(i in x) print x[i]}' firstfile secondfile
===============================================
~hope it helps

Re: join problem with awk/printf

Scott Lindstrom_2 — Tue, 11 Jul 2006 12:01:42 GMT

Harry - I think because my data is a bit different, the sort works different, and the script gives the wrong results. The output of your sort command is like this:

sort -k 1 dah1 dah2
hostnameA 0 policy_name date time
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB 1 bad_policy nodate never
hostnameB stupid stuff, more stupid stuff
hostnameC 2 old_policy someday sometime
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ 8 good_policy goodday goodtime
hostnameZ Security Respository stuff, more backup stuff

Always the backup results file followed by the host description file.

My sort output looks more like this, regardless of which file is specified first in the sort command:

host1 (leading spaces) DTP QTP
host1 0 STD_host1 07/10/2006 12:36:09
host2 (leading spaces) BW DTP QTP
host2 0 STD_host2 07/11/2006 01:57:38
host3 (leading spaces) Non-SAP Development
host3 0 STD_host3 07/10/2006 12:26:33

Unless I can get the sort to operate the same, I think I need to move on from this task. I thank you for all your assistance; this has been a learning experience for me!

Scott

Re: join problem with awk/printf

Sandman! — Tue, 11 Jul 2006 15:04:02 GMT

Hi Scott,

I'm inclined to pursue a wee bit more owing to the intriguing nature of the problem and because imho i think i'ave finally hit the nail on the head :)

1. sort each of the files individually on the first field
# sort -k1,1 /tmp/std_backup_list3 > /tmp/std_backup_list3.out
# sort -k1,1 /tmp/swinfo > /tmp/swinfo.out

2. join the sorted output files from above into a single output file
# join -1 1 -2 1 /tmp/std_backup_list3.out /tmp/swinfo.out > /tmp/all.out

~cheers

Re: join problem with awk/printf

Greg Vaidman — Tue, 11 Jul 2006 16:30:41 GMT

Re: join problem with awk/printf

Hein van den Heuvel — Tue, 11 Jul 2006 23:07:28 GMT

Here is an other approach, similar to Sandman's...

It treats s.txt as a reference file to 'cross' with.

The file b.txt is that backup log.

Awk does all the work, by storing records from the software file in an associative array.

No need to sort... the data will be in the backup log order:

C:\Temp>type s.txt
hostnameA HR development, Data Repository development, crazy stuff, more crazy stuff
hostnameB stupid stuff, more stupid stuff
hostnameD weird stuff, more weird stuff
hostnameE eerie stuff, more eerie stuff
hostnameZ Security Respository stuff, more backup stuff

C:\Temp>type b.txt
hostnameA 0 policy_name date time
hostnameZ 8 good_policy goodday goodtime
hostnameB 1 bad_policy nodate never
hostnameC 2 old_policy someday sometime

C:\Temp>awk 'NR==FNR {key=$1; sub(key,""); S[key]=$0}
NR!=FNR {printf "%-10s\t%s\t%-30s\t%s \n", $1, $2, $3, $4, $5, S[$1]}' s.txt b.txt

hostnameA 0 policy_name date time HR development, Data Repository development, crazy stuff, more cr
azy stuff
hostnameZ 8 good_policy goodday goodtime Security Respository stuff, more backup stuff
hostnameB 1 bad_policy nodate never stupid stuff, more stupid stuff
hostnameC 2 old_policy someday sometime

The awk script decides from which file the data is by comparing the current line number NR with the line in current file number FNR. If they are the same, then it is the first file.

fwiw,
Hein.