1826523 Members
4095 Online
109695 Solutions
New Discussion

parsing the FTP log

 
SOLVED
Go to solution
Rick Garland
Honored Contributor

parsing the FTP log

Hi all:

Working with FTP logs and parsing the data. I'm running into trouble when spaces are used in the file names. I have 2 example lines below. The 1st line has spaces in filename, the 2nd does not. How can I parse and report the filename when it has spaces?

Wed Oct 21 10:05:10 2009 1 10.20.29.32 0 /incoming/SR 1060258/Sanford & Sons Company 1060258.zip b _ i a IEUser@ ftp 0 * i

/incoming/SR1060258/Sanford_&_Sons_Company_1060258.zip b _ i a IEUser@ ftp 0 * i
21 REPLIES 21
Steven Schweda
Honored Contributor

Re: parsing the FTP log

> How can I parse and report the filename
> when it has spaces?

How do _you_ know where the end of the file
name is? Look for the last "@" and work
backward from there? I think that "sed"
could do that.
Rick Garland
Honored Contributor

Re: parsing the FTP log

That's part of the problem. The filename could have multiple spaces and with anonymous logins.
Steven Schweda
Honored Contributor

Re: parsing the FTP log

> The filename could have multiple spaces and
> with anonymous logins.

What's harder about _multiple_ spaces?

I don't do enough with an FTP server on HP-UX
to be particularly familiar with its log file
format, but (judging from these two example
lines) there seem to be some items with
reliable forms at the beginning of the line,
and some items with reliable forms at the end
of the line. I'd tend to expect whatever's
in between to be the file name.

I don't immediately see what would make this
particularly difficult. Is there more than
the obvious variability? (No spaces in the
anonymous FTP ID string, right?)
James R. Ferguson
Acclaimed Contributor

Re: parsing the FTP log

Hi Rick:

You could snip out the filename (with or without spaces) based on the fact that there are a standard number of fields defined for lines in the '/var/adm/syslog/xferlog'.

This will report only the file name:

# cat ./snip_xferlog
#!/usr/bin/perl
use strict;
use warnings;
my ( @F, @left, @right );
while (<>) {
@F = split;
(@left) = ( @F[ 0 .. 7 ] );
(@right) = ( @F[ -9 .. -1 ] );
for ( 0 .. @left - 1 ) {
shift @F;
}
for ( 0 .. @right - 1 ) {
pop @F;
}
print "@F\n";
}
1;

The idea is to snip off (shift) the first eight fields along with the last nine fields (using pop), leaving the middle fields (of however many).

Run as:

# ./snip_xferlog /var/adm/syslog/xferlog

or:

# LINE="Wed Oct 21 10:05:10 2009 1 10.20.29.32 0 /incoming/SR 1060258/Sanford & Sons Company 1060258.zip b _ i a IEUser@ ftp 0 * i"

# echo ${LINE{ | ./snip_xferlog

Regards!

...JRF...
Raj D.
Honored Contributor

Re: parsing the FTP log

Rick,
To parse the data & get the filenames you can use the below code:

No matter if the file has space or no space in the file name , you will get the output with the file name :


# cat your_log_file | sed -e 's/\/incoming/\"\/incoming/g' -e 's/zip/zip\"/g' | awk -F'["]' '{print $2}'


Cheers,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor

Re: parsing the FTP log

Starting and Ending pattern taken : "/incoming" & "zip" resectively , then marked with delimeter and separated from the log file with awk:



Output would be like:

/incoming/SR 1060258/Sanford & Sons Company 1060258.zip

/incoming/SR 1060258/Sanford & Dddds Company 1060258.zip
/incoming/SR 1060258/Zinford & Ucccs Company 1060258.zip
/incoming/SR 1060258/Caliord & Brros Company 1060258.zip

/incoming/SR1060258/Caliord&TTT_No_space_files_y1060258.zip
/incoming/SR1060258/CaliordTTT&TTTBrosCompany1060258.zip
/incoming/SR1060258/CaliordTTT&TTTBrosNo_space_Company 1060258.zip


Cheers,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "
Steven Schweda
Honored Contributor

Re: parsing the FTP log

> Starting and Ending pattern taken :
> "/incoming" & "zip" resectively [...]

You're serious? Well, that handles at least
one file name, I suppose.
Raj D.
Honored Contributor

Re: parsing the FTP log

Steven,

Assuing all the ftp activity from /incoming directory.
If there is file name ,not starting with /incoming , the code canot capture that.

rgds,
Raj
" If u think u can , If u think u cannot , - You are always Right . "
Steven Schweda
Honored Contributor

Re: parsing the FTP log

> Assuing [...]

You're assuming more than I would.

> If there is file name ,not starting with
> /incoming , the code canot capture that.

And what about the ending?

As I said, "at least one file name". So, you
really _were_ serious. Scary.
Raj D.
Honored Contributor

Re: parsing the FTP log

Steven,

Sorry for the typo in earlier post, my bad! KBD problem..

Well I understand that the script given above (with start/end pattern matching) is not correct . Thanks for poiting out..





Here is the correct one:

# a={'/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'} ; cat logfile | sed $a | cut -d' ' -f10- | sed $a| cut -d' ' -f9-



#[ Where logfile is the filename.]
# The above command will parse the entire file and report the filename(s).


Now it is not "at least one file name" &, and not scary. :)


Cheers,
Raj.

" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor
Solution

Re: parsing the FTP log

# a={'/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'} ; cat logfile | sed $a | cut -d' ' -f10- | sed $a| cut -d' ' -f9-

/incoming/SR 1060258/Sanford & Sons Company 1060258.zip
/incoming/SR1060258/Sanford_&_Sons_Company_1060258.zip
" If u think u can , If u think u cannot , - You are always Right . "
Rick Garland
Honored Contributor

Re: parsing the FTP log

Raj:

Many thanks. I took some of the items you presented and was able to make it work. Essentially, at the end of the filename listing are the 'b _ i a' values. These are always the same (binary vs ascii, incoming vs outgoing, anonymous vs account).

Question, could you do a little explaining on the sed syntax you presented? It works and I may still use it, but I would like to know what it is doing.

Again, many thanks!
James R. Ferguson
Acclaimed Contributor

Re: parsing the FTP log

Hi (again):

I'm curious Rick, why the Perl solution (and the explanation provided) was unsatisfactory.

Regards!

...JRF...
Rick Garland
Honored Contributor

Re: parsing the FTP log

JRF:

The perl solution works, I did some playing and found it working great. However, this project I am working on is being passed off to another admin and perl is not in her toolbox.

I am having to digress to shell.
James R. Ferguson
Acclaimed Contributor

Re: parsing the FTP log

Hi (again) Rick:

> The perl solution works, I did some playing and found it working great. However, this project I am working on is being passed off to another admin and perl is not in her toolbox.

Every server of any modern vintage has Perl installed!

You could do this easily in 'awk' given that you have a fixed format (the 'xferlog')!

# awk '{for (i=1;i<9;i++) {$i=""};for (i=0;i<9;i++) {$(NF-i)=""};gsub(/^[ ]*/,"");print}' xferlog

Regards!

...JRF...
James R. Ferguson
Acclaimed Contributor

Re: parsing the FTP log

Hi:

Oops, we should trim both leading and trailing spaces to be clean:

# awk '{for (i=1;i<9;i++) {$i=""};for (i=0;i<9;i++) {$(NF-i)=""};gsub(/^[ ]*/,"");gsub(/[ ]*$/,"");print}' xferlog

Regards!

...JRF...
Rick Garland
Honored Contributor

Re: parsing the FTP log

True, can find perl just about anywhere. Her toolbox is she doesn't know it.
Raj D.
Honored Contributor

Re: parsing the FTP log

Rick,
a={'/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'} code reverse the character orders using sed , so that cut command gets clean fields to cut from left.
First it reverses the characters, cuts the rhs fields (that becomes lhs) , then reverses again , then cuts lhs fields just before "/incoming.../" field starts , as a result leaaving only the filenames .

Raj.

" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor

Re: parsing the FTP log

Rick,
Also as you said :
> Essentially, at the end of the filename listing are the 'b _ i a' values. These are always the same:

Is a very good point as far as the log file concerned :
And I found another one and the below code is working best so far and shortest, Pls take a look:



# (cut -d' ' -f9- |awk -F "b _ i" '{print $1}') < logfile



Cheers,
Raj.
" If u think u can , If u think u cannot , - You are always Right . "
Raj D.
Honored Contributor

Re: parsing the FTP log

# (cut -d' ' -f9- |awk -F "b _ i" '{print $1}') < logfile

/incoming/SR 1060258/Sanford & Sons Company 1060258.zip

/incoming/SR1060258/Sanford_&_Sons_Company_1060258.zip

" If u think u can , If u think u cannot , - You are always Right . "
Steven Schweda
Honored Contributor

Re: parsing the FTP log

> [...] awk -F "b _ i" [...]

And, once again, you _know_ that this string
will not appear in the file name? Counting
tokens back from the end still sounds safer
to me.


> Oct 21, 2009 18:04:14 GMT 0 pts
> Oct 21, 2009 20:22:37 GMT 0 pts
> Oct 21, 2009 20:36:43 GMT 0 pts

Let me guess, ...