- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Urgent Question about grepping thru the logs
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 02:04 PM
09-17-2009 02:04 PM
I have a list of fraudulent IPs(~2000) I need to search through my apache web logs. I have the logs(~450) from all my web servers in one place from the last 3 months , what would be the best way to grep those IPs on the gizipped logs.
Please help!
Thanks,
Allan
Solved! Go to Solution.
- Tags:
- grep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 02:32 PM
09-17-2009 02:32 PM
Re: Urgent Question about grepping thru the logs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 02:57 PM
09-17-2009 02:57 PM
SolutionYou might create a file of your IP addresses -- one per line, called 'tmp/IPS' and then do:
#!/usr/bin/sh
cd /path_to_logs
for FILE in $(ls)
do
echo ">>> '${FILE}' <<<"
gzcat -c ${FILE}|grep -f /tmp/IPS
done
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 05:10 PM
09-17-2009 05:10 PM
Re: Urgent Question about grepping thru the logs
Any way to speed it up.
Thanks,
Allan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 05:17 PM
09-17-2009 05:17 PM
Re: Urgent Question about grepping thru the logs
The gzcat file is probably consuming all your cpu time by compressing each file. I've you got the room, you can speed things up by removing this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 05:50 PM
09-17-2009 05:50 PM
Re: Urgent Question about grepping thru the logs
Thanks,
Allan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 06:39 PM
09-17-2009 06:39 PM
Re: Urgent Question about grepping thru the logs
Given the number of lines you may want to help is a little if you can by not having just the IP's there, but perhaps ANCHORING them to the begin of the line ^aa.bb.cc.dd to allow for a quicker yeah-nay decision.
(if appropriate... you did not share any log layout).
As expressed earlier, it is not unlikely to be the gzcat which consumes more resources. You really should verify that (with TOP ?)
If 'grep' is the top consumer than consider re-writting in AWK or PERL initially loading those 2000 IPs into a associtive array, then read the log, find the IP and look up in the array.
Something roughly like:
$ cat > IP.tmp
1.2.3.4
2.3.4.5
4.5.6.7
$ cat > LOG.tmp
aap 5.6.7.8
noot 1.2.3.4
mies 4.3.2.1
$ awk 'BEGIN {while (getline ip < "IP.tmp"){ips[ip]=1}} $2 in ips' LOG.tmp
noot 1.2.3.4
Good luck!
Hein van den Heuvel
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 06:40 PM
09-17-2009 06:40 PM
Re: Urgent Question about grepping thru the logs
How many processors in your server? If you have less than 6 you may do more harm than good.
I would run 1 less script than the number of processors in the system (4 processors -- 3 scripts running).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 07:03 PM
09-17-2009 07:03 PM
Re: Urgent Question about grepping thru the logs
> re-writting in AWK or PERL [...]
Sometimes it pays to write a real computer
program in a real, compiled programming
language. C, for example, is popular these
days. Or so I hear. (I think that it even
has arrays.)
Sorry, if this sounds too radical.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 07:30 PM
09-17-2009 07:30 PM
Re: Urgent Question about grepping thru the logs
program in a real, compiled programming
language.
:-)
Yes. And hashed lookups and all that good stuff.
Thank you Steve.
We needed that quick sanity check.
Actually, it would not surprise me if awk just does a linear search for array keys, which would suck (cpu).
Best I know Perl builds in index tree, but that may be wishful thinking. I have never needed to find out. But some day...
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-17-2009 11:06 PM
09-17-2009 11:06 PM
Re: Urgent Question about grepping thru the logs
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6179 root 25 0 68140 7824 652 R 100 0.0 242:11.62 grep
6739 root 25 0 64828 4516 648 R 100 0.0 93:35.20 grep
6771 root 25 0 67736 7420 652 R 100 0.0 78:34.72 grep
6915 root 18 0 67864 7492 652 R 100 0.0 15:52.94 grep
6919 root 25 0 68136 7828 652 R 100 0.0 14:33.67 grep
6800 root 25 0 68140 7780 652 R 100 0.0 65:17.68 grep
6799 root 18 0 4116 484 324 S 0 0.0 0:04.15 zcat
If PERL or AWK can help is speeding up can some specify that so that gzcat and grep can be replaced with that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2009 01:31 AM
09-18-2009 01:31 AM
Re: Urgent Question about grepping thru the logs
You can replace the grep but you'll still need the gzcat.
You can try fgrep so it doesn't need to do pattern matching. (Otherwise you would also have to quote the "." in your IPs.)
The grep source shows it does read the -f file into memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2009 03:45 AM
09-18-2009 03:45 AM
Re: Urgent Question about grepping thru the logs
My example was supposed to show that.
The assumption in that example was that you can readily find the IP address in the log file as being in a fixed 'word' or 'column'.
For the purpose of the example I used the second field/word, represented by the '$2 in ips'
As I hinted already, you'll need to povide us with a representative snippet from your log to help you grab the IP out of that.
And do heed Dennis advice (and mine) advice to carefully construct the grep search file to be quick-failing regular expressions with as little as possible wildcard. Notably the '.' in the ip address
For example, using my sample file and adding 2 records:
$ cat >> LOG.tmp
noot 1.2.3.4
1x2x3x4
now returns
$ GREP -f IP.tmp LOG.tmp
noot 1.2.3.4
1.2.3.4 noot
1x2x3x4
The two new lines should probably NOT be found. And they probably will not be found in the real log, but in the mean time grep is wasting CPU cycles trying to look for them!
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2009 05:01 AM
09-18-2009 05:01 AM
Re: Urgent Question about grepping thru the logs
> Dennis: You can try fgrep so it doesn't need to do pattern matching. (Otherwise you would also have to quote the "." in your IPs.)
Yes, I realized that too after I posted before dinner last night. The quoting can be handled by Perl (below).
> Dennis: The grep source shows it does read the -f file into memory.
I have often wondered about that! I assumed that it would for speed. Thnaks very much for looking.
Anyway, here's a quick Perl script that might speed things up. The dot characters in the "token" file of IP addresses are escaped "automatically". The script stops analyzing the pattern list as soon as a match to a line in the file is found and moves along to the next line of the file.
# cat ./ngrep
#!/usr/bin/perl
use strict;
use warnings;
my @tokens;
sub loadtokens {
my ($file) = @_;
local $/;
my $fh;
open( $fh, '<', $file ) or die "Can't open '$file'\n";
$_ = <$fh>;
@tokens = split;
}
my $tokenfile = shift or die "Token file expected\n";
loadtokens $tokenfile;
while (@ARGV) {
my $fh;
my $file = shift;
unless ( open( $fh, "gzcat -c $file|" ) ) {
warn "Can't open '$file'\n";
next;
}
while (<$fh>) {
PATTERN:
for my $pattern (@tokens) {
if (m/\Q$pattern/) {
print $file, ': ', $_;
last PATTERN;
}
}
}
close $fh;
}
1;
...run the script like:
# ./ngrep tokenfile file1 file2 file3...
The "tokenfile" should contain your IP addresses to be matched, one per line. The list of files on the command line are your logs to be analyzed.
You might try a timing test with just a few logs the original way and then with this code. I haven't had time to benchmark this.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2009 11:26 PM
09-18-2009 11:26 PM
Re: Urgent Question about grepping thru the logs
How about C++ using STL?
My strtok(3) loop can probably be optimized better.
Insert:
if (len < min_len) min_len = len;
if (len > max_len) max_len = len;
const char *p = strdup(buf);
result = IP_set.insert(p);
Search:
const char *p = strtok(buf2, " \t[]!@#$%^&*()_-=+{}|\\;:'\",<>/?");
while (p) {
len = strlen(p);
if (len >= min_len && len <= max_len) {
if (IP_set.find(p) != IP_set.end()) {
printf("Found %s, in: %s\n", p, buf);
}
}
p = strtok(NULL, " \t[]!@#$%^&*()_-=+{}|\\;:'\",<>/?");
}
Using some random large files:
$ a.out itrc_IP.data itrc_IP.scan /stand/vmunix /var/adm/wtmp*
Duplicate key: 15.00.00.00
Elements in the container: 2
Found 15.00.00.00, in: abc 15.00.00.00 def
Found 16.00.00.00, in: sam[16.00.00.00] 1.2.3.4
Found 15.00.00.00, in: 15.00.00.00 is bad
files scanned 5, lines scanned 364676
tokens scanned 544502, tokens looked up 397339
Changing to read from stdin would be easy, just don't close it.