1825747 Members
2495 Online
109687 Solutions
New Discussion

Re: awk

 
SOLVED
Go to solution
KT_4
New Member

awk

I am using awk to extract the 2nd column (a list of file names) out of a file like this:

07:37:46 file1.txt 02/12/06
07:35:48 file 2.txt 02/12/06
07:36:55 file3.txt 02/12/06

If you notice the 2nd record has a space as part of the file name so awk is treating this as a field separator.. Is there a way to get awk to include the space so I get the entire file name extracted in the 2nd record?
14 REPLIES 14
Robert Bennett_3
Respected Contributor
Solution

Re: awk

try - awk '{print $2,$3}' infile > outfile
"All there is to thinking is seeing something noticeable which makes you see something you weren't noticing which makes you see something that isn't even visible." - Norman Maclean
Indira Aramandla
Honored Contributor

Re: awk

Hi KT,

Since you text file has lines with common words "txt" and "file",

you can change your file to do a global replacement with a comma in front of file and after txt. To modify vi the file and do this
:1,$s/file/,file/g and then save the file
Then again :1,$s/txt/txt,/g and save the file.

Eg:-
Original file conntents
07:37:46 file1.txt 02/12/06
07:35:48 file 2.txt 02/12/06
07:36:55 file3.txt 02/12/06

The modified file contents
07:37:46 ,file1.txt, 02/12/06
07:35:48 ,file 2.txt, 02/12/06
07:36:55 ,file3.txt, 02/12/06

Now you can use awk to list he files names as follows.
awk -F, '{ print $2 }' filename

This will output the file names.
file1.txt
file 2.txt
file3.txt


Indira A


Never give up, Keep Trying
Muthukumar_5
Honored Contributor

Re: awk

hai,

we can do as,

# cat > testfile
07:37:46 file1.txt 02/12/06
07:35:48 file 2.txt 02/12/06
07:36:55 file3.txt 02/12/06

# awk '{ if (NF==4) { print $2$3 } else { print $2 } }' testfile
file1.txt
file2.txt
file3.txt

it will check that if a input line is containing 4 fields then it will add 2nd and 3rd. Else it will display only 2nd.

HTH.
Easy to suggest when don't know about the problem!
Thierry Poels_1
Honored Contributor

Re: awk

how about a cryptic sed :)

sed 's;\(..:..:.. \)\(.*\)\(../../..\)$;\2;' yourfile

regards,
Thierry.
All unix flavours are exactly the same . . . . . . . . . . for end users anyway.
Jdamian
Respected Contributor

Re: awk

Hi.

I prefer to think a little before answer. I assume the following:

a) first field is ALWAYS a timestamp in the format HH:MM:SS
b) last field is ALWAYS a date in the format xx/yy/zz...
then first and last fields won't contain blanks.

c) everything between first and last fields is a file name.

My first proposal would be:

awk '{ NAME=""; for(i=2; i
It works fine if blank chars in file name are single (i.e. not more than one blank) and file name doesn't begin nor end in blank chars -- I assume that JUST a blank char is used as separator in the original line.

A better proposal will remove the first field (plus a blank char after it) and last field (plus a blank char befor it) from the original line... leaving alone the file name.
Keith Bryson
Honored Contributor

Re: awk

Assuming your file is in /tmp/testfile:

cat /tmp/testfile | while read a
do
templength=`echo $a | wc -m`
let length=templength-10
echo $a | cut -c10-$length
done

This assumes that you always have a time record at the start and date record at the end (both are always the same length and the script strips the front and end 10 characters).

All the best - Keith
Arse-cover at all costs
Jdamian
Respected Contributor

Re: awk

my ultimate proposal is:

awk '
{
LINE=$0 # needed to preserve original line and fields.
sub("^[[:blank:]]*" $1 ".", "", LINE) # remove first field plus a separator char

sub("." $NF "[[:blank:]]*$", "", LINE) # remove a separator char plus last field

print LINE
}'

one doubt: I'm not sure if char class [[:blank:]] is more suitable than [[:space:]].
Gordon  Morrison
Trusted Contributor

Re: awk

Why be AWKward? I have always found AWK syntax to be tortuously finnicky and impossible to remember, and I have always found that there is another way (usually built-in to ksh) that does the same thing in a much easier & friendlier way:
read a line from your file into a variable (in this case, ${line}) then let ksh do it:

temp=${line#* }
fname=${temp% *}

This works whether or not there is a space in the filename on $line
What does this button do?
Thierry Poels_1
Honored Contributor

Re: awk

yep, nice one.

while read line
do
temp=${line#* }
fname=${temp% *}
echo $fname
done < tt

10 points to Gordon.
All unix flavours are exactly the same . . . . . . . . . . for end users anyway.
Elmar P. Kolkman
Honored Contributor

Re: awk

Simple, (almost) human readable perl version would be:

perl -e 'while (@input=split(" ",)) {
shift @input;
pop @input;
print join(" ",@input);
}'

(put it in an array, drop the first and last element and join it again.)
Every problem has at least one solution. Only some solutions are harder to find.
Hein van den Heuvel
Honored Contributor

Re: awk

Since your lines are all well formed with that 'time name date' format, a simple REGexpr will grab the name:


perl -ne 'print "$1\n" if (/\d\d\s+(.*)\s+\d\d/)' x


This says:
\d\d\s+ = find from two decimals an whitespace
(\.*) = remember all
\s+\d\d = untill whitespace followed by two decimals is seen.

fwiw,
Hein.
Leif Halvarsson_2
Honored Contributor

Re: awk

Hi,
Perhaps no easy solution, can there be more then one whitespace and , does the names always begin with "file" and end with txt. Probably not.

How is the input file created ? Is it possible to change the filed separator from space to tab. If it is, then it is easy to change the spaces to "_" using tr (and there will always be only 3 fields).
KT_4
New Member

Re: awk

Thank you all who replied!
Rafael Santander
New Member

Re: awk

Lets say the entry data was the following:
"time filename date
07:37:46 file1.txt 02/12/06
07:35:48 file2.txt 02/12/06
07:36:55 file3.txt 02/12/06
"

We want to only pull out the filenames wihtout the header. An additional twist if your up for it, is to write out the filenames as comma separated items.

IE: file1.txt, file2.txt, file3.txt

I have the following:

awk '{print $2}' | sed 's/filename//g'

This line only removes the filename string but leaves a blank line where it was. Is it possible to completely remove the line?

thanks