Operating System - HP-UX
1819862 Members
2698 Online
109607 Solutions
New Discussion юеВ

Re: Perl script to retrieve values from Web URL.

 
SOLVED
Go to solution
Gulam Mohiuddin
Regular Advisor

Perl script to retrieve values from Web URL.

I need to write a Perl script which will connect to URL "http://www.peelsite.ca" with user id тАЬpeeladminтАЭ and password тАЬpeelтАЭ.

After successful login, we need to retrieve value from filed тАЬTotalHitsтАЭ and store it in a flat file on our HP-UX server where we have latest Perl installed.

Thanks,

Gulam.
Everyday Learning.
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: Perl script to retrieve values from Web URL.

Shalom Gulam,

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1072865

This data is recovered by just such a script. You may be able to adapt the code and make it work for you.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Gulam Mohiuddin
Regular Advisor

Re: Perl script to retrieve values from Web URL.

But where I can get the code, I couldn't find any code.

Thanks,

Gulam.
Everyday Learning.
H.Merijn Brand (procura
Honored Contributor

Re: Perl script to retrieve values from Web URL.

does http://peeladmin:peel@www.peelsite.ca work?

After that, maybe WWW::Mechanize is the tool for you

http://search.cpan.org/~petdance/WWW-Mechanize-1.20/

Enjoy, Have FUN! H.Merijn
Enjoy, Have FUN! H.Merijn
Ralph Grothe
Honored Contributor
Solution

Re: Perl script to retrieve values from Web URL.

Hi Gulam,

I would use LWP for this.
But if you have to undergo a more involved request-response cycle, probably with addidtional HTML scraping in-between
then indeed WWW::Mechanize is probably more appropiate.
I suppose that you can't directly access WWW sites but go over a proxy in your company?
In that case you would most likely also need to authorize with your WWW proxy.
Admittedly this is a bit trickier than using LWP::Simple.
You need to extra create an HTTP::Headers object which you need to populate with authorization headers for your WWW proxy as well as for all the realms of "secured" URL that you wish to GET or POST with your user agent.
Let me demonstrate this in the Perl debugger,
because we immediately can get a view of what HTTP headers are set.

$ perl -MLWP -MHTTP::Headers -de0

Loading DB routines from perl5db.pl version 1.19
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1): 0
DB<1> $h=HTTP::Headers->new

DB<2> $h->proxy_authorization_basic(qw(your_proxy_user your_proxy_passwd))
DB<3> x $h
0 HTTP::Headers=HASH(0x4055347c)
'proxy-authorization' => 'Basic eW91cl9wcm94eV91c2VyOnlvdXJfcHJveHlfcGFzc3dk'


The intimidatingly cryptic string in the proxy-autorization header field is nothing more than the Base64 encoding of "your_proxy_user:your_proxy_passwd".
We can easyli verify

DB<4> use MIME::Base64

DB<5> x decode_base64('eW91cl9wcm94eV91c2VyOnlvdXJfcHJveHlfcGFzc3dk')
0 'your_proxy_user:your_proxy_passwd'


Our user agent sends this automatically to the WWW proxy if it catches a 401 request

But you still need the authorize for the URL you want to GET.
So extend the HTTP header

DB<6> $h->authorization_basic(qw(url_to_get_user url_to_get_passwd))

DB<7> x $h
0 HTTP::Headers=HASH(0x4055347c)
'authorization' => 'Basic dXJsX3RvX2dldF91c2VyOnVybF90b19nZXRfcGFzc3dk'
'proxy-authorization' => 'Basic eW91cl9wcm94eV91c2VyOnlvdXJfcHJveHlfcGFzc3dk'


Now you need to signal your user agent that it should send the request to your WWW proxy instead of to the non-reachable target URL.
First we need to create a user agent

DB<8> $ua=LWP::UserAgent->new

DB<11> $ua->proxy([qw(http ftp)] => 'http://url.of.your.www-proxy:8888/')

DB<12> x $ua
0 LWP::UserAgent=HASH(0x4059ddd0)
'agent' => 'libwww-perl/5.68'
'from' => undef
'max_size' => undef
'no_proxy' => ARRAY(0x4059dd58)
empty array
'parse_head' => 1
'protocols_allowed' => undef
'protocols_forbidden' => undef
'proxy' => HASH(0x402b0648)
'ftp' => 'http://url.of.your.www-proxy:8888/'
'http' => 'http://url.of.your.www-proxy:8888/'
'requests_redirectable' => ARRAY(0x4055cd38)
0 'GET'
1 'HEAD'
'timeout' => 180
'use_eval' => 1


Alternatively you can set environmen vars like www_proxy and use the $ua->env_proxy method.
Let's try

DB<19> $ENV{http_proxy}='http://some.other.proxy-url:7777/'

DB<20> $ua->env_proxy

DB<21> x $ua->{proxy}
0 HASH(0x405ecf64)
'ftp' => 'http://url.of.your.www-proxy:8888/'
'http' => 'http://some.other.proxy-url:7777/'

DB<22>


See, it swallowed it.


Finally we need to put all parts together into a HTTP request.

DB<28> use HTTP::Request::Common


DB<30> $req=HTTP::Request->new(GET => 'http://url.you.want/', $h)

DB<31> x $req
0 HTTP::Request=HASH(0x40606bf0)
'_content' => ''
'_headers' => HTTP::Headers=HASH(0x40606be4)
'authorization' => 'Basic dXJsX3RvX2dldF91c2VyOnVybF90b19nZXRfcGFzc3dk'
'proxy-authorization' => 'Basic eW91cl9wcm94eV91c2VyOnlvdXJfcHJveHlfcGFzc3dk'
'_method' => 'GET'
'_uri' => URI::http=SCALAR(0x402a0620)
-> 'http://url.you.want/'

Finally you can send the request with your user agent.
Of course this won't work in my debugger session because I so far used only bogus data.

DB<32> $res=$ua->request($req)

DB<33> x $res->is_success
0 ''
DB<34> x $res
0 HTTP::Response=HASH(0x407ecfb4)
'_content' => ''
'_headers' => HTTP::Headers=HASH(0x407ecffc)
'client-date' => 'Fri, 10 Nov 2006 10:57:08 GMT'
'_msg' => 'Can\'t connect to some.other.proxy-url:7777 (Bad hostname \'some.other.proxy
-url\')'
'_rc' => 500
'_request' => HTTP::Request=HASH(0x40606bf0)
'_content' => ''
'_headers' => HTTP::Headers=HASH(0x40606be4)
'authorization' => 'Basic dXJsX3RvX2dldF91c2VyOnVybF90b19nZXRfcGFzc3dk'
'proxy-authorization' => 'Basic eW91cl9wcm94eV91c2VyOnlvdXJfcHJveHlfcGFzc3dk'
'user-agent' => 'libwww-perl/5.68'
'_method' => 'GET'
'_uri' => URI::http=SCALAR(0x402a0620)
-> 'http://url.you.want/'




Madness, thy name is system administration
Chris Fleming_1
New Member

Re: Perl script to retrieve values from Web URL.

I would look at using curl and the curl perl libraries to do this. (http://search.cpan.org/~crisb/WWW-Curl-2.0/easy.pm.in)

However it may be easier to just use the command line version and then process the output in your perl script.

curl -u name:password www.mysite.com

will do what you need, although
It can also do more complex "login and navigation" with session support if required.

The curl man page is at:
http://curl.haxx.se/docs/manpage.html

And you will also find links for downloading curl here.

Cheers
Chris
Ralph Grothe
Honored Contributor

Re: Perl script to retrieve values from Web URL.

Well, if you don't want to script yourself
simply run the lwp-request script that comes with your Perl (but wget can be used equally easily).
e.g.

$ lwp-request -m get -C url_user:url_passwd http://requested.url.tld/some/url/path > if_its_not_html.dump

Btw, the source of lwp-request is a good example how you can sublass LWP::UserAgent and override get_basic_credentials to return for each visited realm the required authorization parameters.
Simply invoke "perldoc -m lwp-request", and search for get_basic_credentials in pager.
Madness, thy name is system administration