topic Re: HTML Data Extraction By perl scripting in Operating System - HP-UX

HTML Data Extraction By perl scripting

Dodo_5 — Tue, 13 Feb 2007 01:51:21 GMT

i have a HTML report file..its in attachment(a part of the whole report is attached)
i just want to seperate the datas like in first line it should be..

NHTEST-3848498958-NHTEST-10.2-no-baloo a
and so on for whole report

how to seperate the datas from tables that kind of format with the use of perl(or unix )scripting.

please help guys..write the script as a whole pls.otherwise it will be difficult to understand for me
its urgent plsss...

Re: HTML Data Extraction By perl scripting

Oviwan — Tue, 13 Feb 2007 03:18:36 GMT

Hey

You can make an other snapshot:
9i: @?/rdbms/admin/spreport
10g: @?/rdbms/admin/awrrpt.sql

then choose text as output format, this is easier to modify.

Regards

Re: HTML Data Extraction By perl scripting

Peter Godron — Tue, 13 Feb 2007 03:20:05 GMT

Hi,
surely TCS has the resource/experience to do this ?!

Take the report in html format, pull out the table rows (marked by TR and /TR).
Then remove all HTML markers and what you have left is the table data without HTML markers.

Quck check on the web:
http://www.thescripts.com/forum/thread49414.html
http://www.wdvl.com/Authoring/Languages/Perl/PerlfortheWeb/summarizer.html
http://www.unix.org.ua/orelly/perl/cookbook/ch20_07.htm
http://cpan.uwinnipeg.ca/htdocs/HTML-Strip/HTML/Strip.html

Please also read:
http://forums1.itrc.hp.com/service/forums/helptips.do?#33 on how to reward any useful answers given to your questions.

Re: HTML Data Extraction By perl scripting

Dodo_5 — Tue, 13 Feb 2007 04:18:46 GMT

when i tried to run the scripts then it shows as:
Can't locate HTML/TableExtract.pm in @INC (@INC contains: /usr/lib/perl5/5.8.5/i386-linux-thread-multi

actually i dont have admin rights on machine.
can you please help writing a perl script to extract datas from the table in a html file which exixts in my pc(not like a URL,taking it as a file in pc)

Re: HTML Data Extraction By perl scripting

Dodo_5 — Tue, 13 Feb 2007 06:50:41 GMT

go through the source of the html file...pls send me solution.its urgent..
its a part of whole report.

Re: HTML Data Extraction By perl scripting

James R. Ferguson — Tue, 13 Feb 2007 07:47:09 GMT

Hi:

> ...how to seperate the datas from tables that kind of format with the use of perl(or unix )scripting. i dont have administrator rights in my pc.so pls send script without having such commands(which needs admin privelege)...please help guys..write the script as a whole pls. otherwise it will be difficult to understand for me
its urgent plsss...

Without payment for doing your job, I don't think anyone is going to write a solution that you earn you your pay.

use Perl;

That said, however, you don't need administrator rights to install modules locally in directories with which you have write-access.

http://www.cpan.org/modules/INSTALL.html

As for parsing the HTML, you should look at modules like: HTML::Parser, HTML::FormatText, HTML::LinkExtor just to name a few. Fetch what you need from CPAN:

http://www.cpan.org/

Regards!

...JRF...

Re: HTML Data Extraction By perl scripting

Ralph Grothe — Tue, 13 Feb 2007 07:58:38 GMT

Hi,

I'd like to second Jame's statement about doing your chores.
Parsing tagged markup is a bit more involved, especially if it's not well formed.
But there exist standard Perl HTML parsers for the task.
Basically there seem to be two avenues.
Either use HTML::TreeBuilder
http://search.cpan.org/~petek/HTML-Tree-3.23/lib/HTML/TreeBuilder.pm
or HTML::TokeParser
http://search.cpan.org/~gaas/HTML-Parser-3.56/lib/HTML/TokeParser.pm
If you can afford I would suggest trying both modules two get the different idea how HTML can be treated.
Of course almost every scripting language should have HTML parsers for this purpose.

Re: HTML Data Extraction By perl scripting

Dodo_5 — Tue, 13 Feb 2007 07:59:58 GMT

thanks but expecting a little bit more from you experts.

Re: HTML Data Extraction By perl scripting

Ralph Grothe — Tue, 13 Feb 2007 10:04:19 GMT

Ok Dodo,
without any guarantee if this will be of any value I tinkered up the tiny attached script which uses Perl and the HTML::TokeParser module.
You will have to check for yourself what exactly your HTML looks like and what you need to really parse.
E.g. my script produces this:

$ ./shp.pl
NHTEST
3848498958
NHTEST
1
10.2.0.2.0
NO
baloo_a
Begin Snap:
1728
02-Feb-07 20:00:35
20
3.1

Re: HTML Data Extraction By perl scripting

Dodo_5 — Wed, 14 Feb 2007 01:10:19 GMT

really amazing...you have done a great job.very much thankfull to you.
for now its really enough...
i think i will get more job on this script.keep in touch.
thanks to all...