- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- PERL for HTML file parsing
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 04:51 PM
тАО02-27-2007 04:51 PM
attached..name "input html.doc").also its source is attached in "report
source code.txt"
i just want to seperate the datas like in first line it should be..
NHTEST-3848498958-NHTEST-10.2-no-baloo a
and so on for whole report
i have a perl script.its also attached ,named-"perl coding for
parsing.txt".It can give the required output.
now suppose i have more than 1 file,ie 20 report in html format.and i have
to compare different values of all the tables from different report files
(ie,to compare buffer cache values from different report file).
so how to do that..plss give me some ideas.
i need a script to do this in unix or perl..can you help me in this
regards.
waitin for ur reply
i have used :
sed -n "s/.*Buffer Cache:<\/TD><[^>]*> *\([0-9,]*[A-Za-z]*\)<\/TD><[^>]*>
*\([0-9,]*[A-Za-z]*\).*/\1 \2/p" report.txt
its giving correct values for "buffer cache" but due to tag differences it
can't give correct values for "Redo Size".i think only by help of a script
i can do this...so pls help
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 06:07 PM
тАО02-27-2007 06:07 PM
Re: PERL for HTML file parsing
I don't just get - if you have text file why you strugle with HTML? Text file have no tags and formatting info - just grep out needed values ("Redo sizes") and compare them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 06:21 PM
тАО02-27-2007 06:21 PM
Re: PERL for HTML file parsing
then how to compare different values from different text files
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 06:41 PM
тАО02-27-2007 06:41 PM
Re: PERL for HTML file parsing
1 to 1 is good - just grep out needed value from all files and compare them. For example, you can write a script that process one file. Output of this script is a line that contains needed values ("Redo size","Logical reads" and so on) separated by '\t' or comma or what-ever-you-want. Then run this script against all text files and collect output in another file.
IE:
#!/bin/sh
OUTPUT='./output.txt'
cat /dev/null > $OUTPUT
for FILE in `find . -name "*.txt"`;
do
script_process $FILE >> $OUTPUT
done;
In this example "processing_script" - perl script that greps out needed values.
That's all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 08:23 PM
тАО02-27-2007 08:23 PM
Re: PERL for HTML file parsing
what is script_process $FILE >>OUTPUT
as you wrote "processing_script" as the perl script name.
also if i have to write the required item in -name???
can you just give comments over ur script so that it will be little easy for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 08:42 PM
тАО02-27-2007 08:42 PM
Re: PERL for HTML file parsing
Yes I mean,
script_process is a processing script written in perl that takes argument - file name to process, greps values and this script's output redirected to file $OUTPUT
And also you should point path to processing script. Correct version is:
#!/bin/sh
OUTPUT='./output.txt'
cat /dev/null > $OUTPUT
for FILE in `find . -name "*.txt"`;
do
./script_process $FILE >> $OUTPUT
done;
Command find . -name "*.txt" outputs list of txt files in current directory, you can point another dir - it is just example of how you can tell your script what files to process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 09:00 PM
тАО02-27-2007 09:00 PM
Re: PERL for HTML file parsing
but i actually need that perl script by which i can grep out the values from text file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 09:03 PM
тАО02-27-2007 09:03 PM
Re: PERL for HTML file parsing
If "report source code.txt" is html you must convert it to text - It can be done so:
for each table in html doc
match string that contain entire table
elminate tags and , tags and replace with "\t" and "\n" respectivly. Of coure cut off tag pair
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 09:38 PM
тАО02-27-2007 09:38 PM
Re: PERL for HTML file parsing
then it will be easy for me...and i can parse any html file giving as an argument only
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 10:38 PM
тАО02-27-2007 10:38 PM
Re: PERL for HTML file parsing
#!/usr/local/bin/perl
#lets open file
open SRC, "$ARGV[0]";
#set line delim to undef
#thus we can treat file as a string
$/= undef;
#read data
$data=
#close file
close SRC;
#take table part string
$data =~ /$.*
$data = $&;
#get read of html
$data =~ s/
$data =~ s/
/\n/ig;
$data =~ s/<\/table>//ig;
$data =~ s///ig;
$data =~ s/><\/th>/>Column\t/ig;
$data =~ s/<\/th>/\t/ig;
$data =~ s/<\/TD><\/TR>//ig;
$data =~ s/<\/td>/\t/ig;
$data =~ s/<\/tr>//ig;
$data =~ s/
$data =~ s/
$data =~ s/\x20{2,}/\t/ig;
$data =~ s/ /\t/ig;
$data =~ s/\t{2,}/\t/ig;
#for example we want redo size
$data =~ /redo size:\s{1,}([\d\.\,]{1,})\s{1,}([\d\.\,]{1,}).*/is;
#output result
print $1, "\t", "$2";
open SRC, "$ARGV[0]";
$/= undef;
$data=
close SRC;
$html = HTML::TokeParser->new($data);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 10:45 PM
тАО02-27-2007 10:45 PM
Re: PERL for HTML file parsing
Sample script ends with
#output result
print $1, "\t", "$2";
Theese lines:
open SRC, "$ARGV[0]";
$/= undef;
$data=
close SRC;
$html = HTML::TokeParser->new($data);
are example on how to get string from file and create parser object over this string.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 10:57 PM
тАО02-27-2007 10:57 PM
Re: PERL for HTML file parsing
#!/usr/local/bin/perl
use strict;
use HTML::TokeParser;
then i run it as :
perl html_parse.pl html
where the script name is "html_parse.pl"
and the "html" is the name of my report file.
still it gives compilation error....please make required change in ur script to avoid error..
error:
Global symbol "$html" requires explicit package name at html_parse.pl line 49.
Execution of html_parse.pl aborted due to compilation errors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 11:04 PM
тАО02-27-2007 11:04 PM
Re: PERL for HTML file parsing
Working script is:
#!/usr/local/bin/perl
#lets open file
open SRC, "$ARGV[0]";
#set line delim to undef
#thus we can treat file as a string
$/= undef;
#read data
$data=
#close file
close SRC;
#take table part string
$data =~ /$.*
$data = $&;
#get read of html
$data =~ s/
$data =~ s/
/\n/ig;
$data =~ s/<\/table>//ig;
$data =~ s///ig;
$data =~ s/><\/th>/>Column\t/ig;
$data =~ s/<\/th>/\t/ig;
$data =~ s/<\/TD><\/TR>//ig;
$data =~ s/<\/td>/\t/ig;
$data =~ s/<\/tr>//ig;
$data =~ s/
$data =~ s/
$data =~ s/\x20{2,}/\t/ig;
$data =~ s/ /\t/ig;
$data =~ s/\t{2,}/\t/ig;
#for example we want redo size
$data =~ /redo size:\s{1,}([\d\.\,]{1,})\s{1,}([\d\.\,]{1,}).*/is;
#output result
print $1, "\t", "$2";
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 11:13 PM
тАО02-27-2007 11:13 PM
Re: PERL for HTML file parsing
u have defined the method for getting "redo size" but it's not valid for other parameters(actually tags are different in different cases).so that values can't be obtained.
so how to make a generalised script.i can't run different script for getting different parameters.there should be only one script (by which the different parameter value can be obtained.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-27-2007 11:45 PM
тАО02-27-2007 11:45 PM
Re: PERL for HTML file parsing
Find in my script this line:
$data =~ /redo size:\s{1,}([\d\.\,]{1,})\s{1,}([\d\.\,]{1,}).*/is;
Block before elimantes HTML, so when things go to this line variable $data contain plain text, and u just have to write expressions to match another values.
Example for redo size says to regexp engine:
find words "redo size:"
after this words would be some spaces
then sequense of digits,commas and dots
then - spaces again
then sequense of digits,commas and dots
I have enclosed sequense of digits,commas and dots in round brackets - this means that matched patern goes to predefined perl vars $1, $2 and so on - first seq to $1 and second to $2
For this look at how to extract matches with perl - google and u'll find a lot of about this.
In script you can take a var for holding result:
#!/usr/local/bin/perl
open SRC, "$ARGV[0]";
$/= undef;
$data=
close SRC;
#take table string
$data =~ /$.*
$data = $&;
#get read of html
$data =~ s/
$data =~ s/
/\n/ig;
$data =~ s/<\/table>//ig;
$data =~ s///ig;
$data =~ s/><\/th>/>Column\t/ig;
$data =~ s/<\/th>/\t/ig;
$data =~ s/<\/TD><\/TR>//ig;
$data =~ s/<\/td>/\t/ig;
$data =~ s/<\/tr>//ig;
$data =~ s/
$data =~ s/
$data =~ s/\x20{2,}/\t/ig;
$data =~ s/ /\t/ig;
$data =~ s/\t{2,}/\t/ig;
#for example we want redo size
$result=""
#match redo size Per Second Per Transaction
$data =~ /redo size:\s{1,}([\d\.\,]{1,})\s{1,}([\d\.\,]{1,}).*/is;
$result=$result."\t".$1."\t".$2;
#match Soft Parse %
$data =~ /Soft Parse %:\s{1,}([\d\.\,]{1,}).*/is;
$result=$result."\t".$1;
# and so on
#
#
#add matching for another values here
#
#
#output result
print $result;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 12:00 AM
тАО02-28-2007 12:00 AM
Re: PERL for HTML file parsing
i got ur point....but
1) first thing is using ur script is lengthy(its ok ,no prob)...but also to set all the parameters for all the different values from table is not a good programming practice.
2)but can we print the values of "redo size" from 20 html files simultaneouly???
its the main requirement...then only i can compare the values in different reports.
i have to get "redo size" values from all html report in output by running the script.
so pls look into the matter...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 12:25 AM
тАО02-28-2007 12:25 AM
Re: PERL for HTML file parsing
2) I'm trying to tell you that from the begining I look into the question matter :)
You have script that takes specified values out specified file - so run this script for every file in set and save result somewhere - thus you will have ur redo size, extracted out 20 files simult :) strictly speaking "serialy" :) but in one place
you can write batch to do this
like this:
#!/bin/sh
OUTPUT='./output.txt'
cat /dev/null > $OUTPUT
for FILE in `find . -name "*.txt"`;
do
./script_process $FILE >> $OUTPUT
done;
after completing file output.txt will contain values from files.
I wrote it in shell, but this batch can be written in perl too.
Idea is that - write script to process one file and write second script and to call first one for every file in set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 05:37 PM
тАО02-28-2007 05:37 PM
Re: PERL for HTML file parsing
tell the command line statements
i think i have to write first a PERL script named "script_process.pl"
then to write the shell script named "file_process.sh"
then if the name of report file is report.html
then just tell me how to execute one by one for getting correct answer...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 06:07 PM
тАО02-28-2007 06:07 PM
Re: PERL for HTML file parsing
If you did not get about it, lets try another way
Suppose you have 3 reports -
report1.txt
report2.txt
report3.txt
then you run:
create empty output.txt
./script_process.pl report1.txt >> output.txt
./script_process.pl report2.txt >> output.txt
./script_process.pl report3.txt >> output.txt
after it each line in output.txt will contain values extracted from reportN.txt.
Second script that you called file_process.sh do just that - it finds files and calls script_process.pl for each report file one by one. Read more about perl and shell. Then you open output.txt in Excel or another spredsheet processor that understands tab-delimted files and compare what you wish, build diagrams and so on.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 06:12 PM
тАО02-28-2007 06:12 PM
Re: PERL for HTML file parsing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 06:54 PM
тАО02-28-2007 06:54 PM
Re: PERL for HTML file parsing
but i have to set more than 300-400 parameters tag to get their values.
it's really not possible.your script is good for getting two or three required values .
but anyway...thank you very much for sharing ur knowledge and helping a lot.actually i was confused bcos ur script names were not given.
but if it's possible to write a script for getting values(a generalised script)then pls help me.
sed -n "s/.*Buffer Cache:<\/TD><[^>]*> *\([0-9,]*[A-Za-z]*\)<\/TD><[^>]*>
*\([0-9,]*[A-Za-z]*\).*/\1 \2/p" report.txt
this command(what i am using now)can also give the correct values for one variable..but its not working for others.so there also i have to change tag properties every time..so i need a generalised script...
thanks..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 08:09 PM
тАО02-28-2007 08:09 PM
Re: PERL for HTML file parsing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 08:11 PM
тАО02-28-2007 08:11 PM
Re: PERL for HTML file parsing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 08:20 PM
тАО02-28-2007 08:20 PM
Re: PERL for HTML file parsing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО02-28-2007 08:29 PM
тАО02-28-2007 08:29 PM
Re: PERL for HTML file parsing
and the source code text file of HTML file is "code.txt"...
then run it as:
perl parse_html.pl code.txt
then it will show you the persed result..
but for multiple file i got stuck