General
cancel
Showing results for 
Search instead for 
Did you mean: 

How to automate opening Webpage and copying the contents into a file in the text format using perl

SOLVED
Go to solution
sowm18
Occasional Contributor

How to automate opening Webpage and copying the contents into a file in the text format using perl

Hi all,

Here i wanted to automate opening the web page and copy the contents into a file in txt format.Please let me know how to do in linux.

Regards,
BS
6 REPLIES
Paul Cross_1
Respected Contributor
Solution

Re: How to automate opening Webpage and copying the contents into a file in the text format using perl

If you want to retain the html tags, wget would do the trick, otherwise you could pipe it to html2txt or something to strip out the html tags. If you need to do this inside perl, wget could be called inside a perl script with HTML::strip to remove the tags.
Yogeeraj_1
Honored Contributor

Re: How to automate opening Webpage and copying the contents into a file in the text format using perl

hi BS,

try the solution as proposed by Paul above - wget

followed by one "simple-minded" approach, that works for most files:

#!/usr/bin/perl -p0777
s/<(?:[^>'"]*|(['"]).*?\1)*>//gs


hope this helps!
kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Court Campbell
Honored Contributor

Re: How to automate opening Webpage and copying the contents into a file in the text format using perl

http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP/Simple.pm
"The difference between me and you? I will read the man page." and "Respect the hat." and "You could just do a search on ITRC, you don't need to start a thread on a topic that's been answered 100 times already." Oh, and "What. no points???"
sowm18
Occasional Contributor

Re: How to automate opening Webpage and copying the contents into a file in the text format using perl

Hi again,

if the web page needs authentication with username and password then how to takecare of that.

Regards,
BS
Steven Schweda
Honored Contributor

Re: How to automate opening Webpage and copying the contents into a file in the text format using perl

> [...] how to takecare of that.

wget -h

Look for options like:

[...]
HTTP options:
--http-user=USER set http user to USER.
--http-password=PASS set http password to PASS.
[...]
James R. Ferguson
Acclaimed Contributor

Re: How to automate opening Webpage and copying the contents into a file in the text format using perl

Hi:

If your objective is only to snapshot a webpage and copy it to a file in text format, 'wget' is probably the simplest way.

http://hpux.cs.utah.edu/hppd/hpux/Gnu/wget-1.11.1/

The LWP::Simple module from CPAN provides the 'getstore' function to do the same:

http://search.cpan.org/~gaas/libwww-perl-5.812/lib/LWP/Simple.pm

Have a look, too, at:

http://search.cpan.org/~gaas/libwww-perl-5.812/lwpcook.pod

If you are serious about parsing HTML, though, I suggest you look at the HTML::TreeBuilder module beginning with:

http://search.cpan.org/~petek/HTML-Tree-3.23/lib/HTML/Tree.pm

and:

http://search.cpan.org/~petek/HTML-Tree-3.23/lib/HTML/TreeBuilder.pm

Regards!

...JRF...