Operating System - Linux
1752793 Members
5794 Online
108789 Solutions
New Discussion

Re: Extracting data in between strings...using perl

 
SOLVED
Go to solution
H.Merijn Brand (procura
Honored Contributor

Re: Extracting data in between strings...using perl

What Hein said, but then parsing HTML with regexes is still NOT safe.

Have a look at the HTML::TreeBuilder module.

--8<--- test.pl
#!/pro/bin/perl

use strict;
use warnings;

use HTML::TreeBuilder;

my $tree = HTML::TreeBuilder->new;

my $content = <<'EOH';

This page is used to hold your data while you are being authorized for your
request.

You will be forwarded to continue the authorization process. If
this does not happen automatically, please click the Continue button below.
STARThttps://test.ip.com/siteminderagent/forms/login.fcc?TYPE=33554433&REALMOID=06-00034bb7-e037-116f-8241-808d67a50008&GUID=&SMAUTHREASON=0&METHOD=POST&SMAGENTNAME=$SM$8OJVwItP%2fV8GXRhL%2fhch6KJt3EvC2AWLQ7%2bWLfTgx3%2bWD7k%2buJc3dVSFPOr1jTxg&TARGET=$SM$%2fEND="HIDDEN"
NAME="SMPostPreserve"
VALUE="S1NJbjNmby81VzRqMmo0cTNuWm9NdFo3cVpZSlF6enpMc2laNWZrcnRudlhWVEUzM0xUTHVPR1Y3REpwNnUwM1ZVd1IySFdQZkRDRmpUQldrV01ybk9pcEFBZnpzNmg4RG1yQ0lRQUNzbTFMekdiUG9Eck02M2NUcis4RG5YQ3l2dkZHOGp4WDRPbHJJTFdJOXUvbnFBPT0END="SUBMIT"
VALUE="Continue">

EOH

$tree->parse_content ($content);
# print $tree->as_HTML (undef, " ", {});

foreach my $f ($tree->look_down (_tag => "form")) {
# print "FORM:\n", $f->as_HTML (undef, " ", {});
$f->as_HTML (undef, " ", {}) =~ m{\bstart(.*?)end}i and
print "START in FORM: $1\n";
}
-->8---

# perl test.pl
START in FORM: https://test.ip.com/siteminderagent/forms/login.fcc?type="33554433&REALMOID=06-00034bb7-e037-116f-8241-808d67a50008&GUID=&SMAUTHREASON=0&METHOD=POST&SMAGENTNAME=$SM$8OJVwItP%2fV8GXRhL%2fhch6KJt3EvC2AWLQ7%2bWLfTgx3%2bWD7k%2buJc3dVSFPOr1jTxg&TARGET=$SM$%2f

Enjoy, Have FUN! H.Merijn [ who does not think this is clean HTML ]
Enjoy, Have FUN! H.Merijn
jmckinzie
Super Advisor

Re: Extracting data in between strings...using perl

I wound up using regular expressions butam looking forwardto messing withthe treebuilder but it is very very difficult.