<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic sed/awk/perl help in Operating System - HP-UX</title>
    <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506105#M640153</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a following piece of output from a wget command and I want to strip the html tags.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;    &amp;lt;td&amp;gt;INACCESSIBLE&amp;lt;/td&amp;gt;
    &amp;lt;td&amp;gt;hostname:port &amp;lt;a title="inspect" href="dest/hostname:port"&amp;gt;[I]&amp;lt;/a&amp;gt; &amp;lt;a title="debugger" href="http://hostname:port/debug"&amp;gt;[D]&amp;lt;/a&amp;gt; &amp;lt;a title="browser" href="http://hostname/Browser.jsp/Url=http%3A//hostname%3A8501/oa/ww&amp;amp;amp;serviceUrl=http%3A//hostname%3A1port/service"&amp;gt;[SB]&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;&lt;/PRE&gt;&lt;P&gt;Want the output something like this -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INACCESSIBLE&lt;/P&gt;&lt;P&gt;hostname:port&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been successful with a bunch of awk/sed but it would be great if it can be done in a single command.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Allan&lt;/P&gt;</description>
    <pubDate>Tue, 24 Jan 2012 01:25:10 GMT</pubDate>
    <dc:creator>allanm77</dc:creator>
    <dc:date>2012-01-24T01:25:10Z</dc:date>
    <item>
      <title>sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506105#M640153</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a following piece of output from a wget command and I want to strip the html tags.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;    &amp;lt;td&amp;gt;INACCESSIBLE&amp;lt;/td&amp;gt;
    &amp;lt;td&amp;gt;hostname:port &amp;lt;a title="inspect" href="dest/hostname:port"&amp;gt;[I]&amp;lt;/a&amp;gt; &amp;lt;a title="debugger" href="http://hostname:port/debug"&amp;gt;[D]&amp;lt;/a&amp;gt; &amp;lt;a title="browser" href="http://hostname/Browser.jsp/Url=http%3A//hostname%3A8501/oa/ww&amp;amp;amp;serviceUrl=http%3A//hostname%3A1port/service"&amp;gt;[SB]&amp;lt;/a&amp;gt;&amp;lt;/td&amp;gt;&lt;/PRE&gt;&lt;P&gt;Want the output something like this -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INACCESSIBLE&lt;/P&gt;&lt;P&gt;hostname:port&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been successful with a bunch of awk/sed but it would be great if it can be done in a single command.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Allan&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2012 01:25:10 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506105#M640153</guid>
      <dc:creator>allanm77</dc:creator>
      <dc:date>2012-01-24T01:25:10Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506159#M640154</link>
      <description>&lt;P&gt;&amp;gt;Want the output something like this:&lt;/P&gt;&lt;P&gt;&amp;gt;hostname:port&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Don't you want something like this?:&lt;/P&gt;&lt;P&gt;hostname:port [I] [D]&amp;nbsp; [SB]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Or do you just want to ignore anything in the &amp;lt;a ...&amp;gt; ... &amp;lt;/a&amp;gt; blocks?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2012 04:40:03 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506159#M640154</guid>
      <dc:creator>Dennis Handly</dc:creator>
      <dc:date>2012-01-24T04:40:03Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506209#M640155</link>
      <description>Yes, that would do too if the html works out and points to the right links.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Allan.</description>
      <pubDate>Tue, 24 Jan 2012 06:04:48 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5506209#M640155</guid>
      <dc:creator>allanm77</dc:creator>
      <dc:date>2012-01-24T06:04:48Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5507097#M640156</link>
      <description>&lt;P&gt;Dennis, perl would do too but sed / awk is preferable&lt;BR /&gt;&lt;BR /&gt;thx.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2012 20:06:55 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5507097#M640156</guid>
      <dc:creator>allanm77</dc:creator>
      <dc:date>2012-01-24T20:06:55Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5507629#M640157</link>
      <description>&lt;P&gt;I'd say not to try to get into html digging using regular expressions (with whatever language). You'll end up changing them all the time, as the html changes. Been there done that.&lt;/P&gt;&lt;P&gt;There are several fine ways to "convert" HTML to something else. If you want just text, use the proper tools (e.g. &lt;A href="http://hpux.connect.org.uk/hppd/hpux/Networking/WWW/lynx-2.8.7/" target="_blank"&gt;lynx&lt;/A&gt;):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;$ lynx -dump file.html
$ lynx -dump &lt;A href="http://some.host.com/index.htm" target="_blank"&gt;http://some.host.com/index.htm&lt;/A&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;If you want to use a scripting language, &lt;A href="https://metacpan.org/module/WWW::Mechanize" target="_blank"&gt;WWW::Mechanize&lt;/A&gt;, &lt;A href="https://metacpan.org/module/LWP" target="_blank"&gt;LWP&lt;/A&gt;, &lt;A href="https://metacpan.org/module/LWP::Simple" target="_blank"&gt;LWP::Simple&lt;/A&gt;, &lt;A href="https://metacpan.org/module/LWP::UserAgent" target="_blank"&gt;LWP::UserAgent&lt;/A&gt;, and &lt;A href="https://metacpan.org/module/HTML::TreeBuilder" target="_blank"&gt;HTML::TreeBuilder&lt;/A&gt; are your friends in perl:&lt;/P&gt;&lt;PRE&gt;#!/usr/bin/perl

use strict;
use warnings;

use LWP::Simple;
use HTML::TreeBuilder;

my $site = get ("&lt;A href="http://h30499.www3.hp.com/t5/Languages-and-Scripting/bd-p/itrc-150" target="_blank"&gt;http://h30499.www3.hp.com/t5/Languages-and-Scripting/bd-p/itrc-150&lt;/A&gt;");
my $tree = HTML::TreeBuilder-&amp;gt;new;
$tree-&amp;gt;parse_content ($site) or die "Cannot parse as HTML\n";

# Print the whole page formatted
print $tree-&amp;gt;as_HTML (undef, "  ", {});

# Print all &amp;lt;a&amp;gt; tags pointing to something with scripting in it
for ($tree-&amp;gt;look_down (_tag =&amp;gt; "a", href =&amp;gt; qr{scripting}i)) {
    print "A: ", $_-&amp;gt;as_text, "\t=&amp;gt; ", $_-&amp;gt;attr ("href"), "\n";
    }&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2012 08:17:17 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5507629#M640157</guid>
      <dc:creator>H.Merijn Brand (procura</dc:creator>
      <dc:date>2012-01-25T08:17:17Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5508627#M640158</link>
      <description>&lt;P&gt;Thanks Merjin!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But the problem is that the wget is part of a script , so if there is a Perl one-liner or sed/awk combination that is what is preferable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Allan.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2012 19:44:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5508627#M640158</guid>
      <dc:creator>allanm77</dc:creator>
      <dc:date>2012-01-25T19:44:36Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5508835#M640159</link>
      <description>&lt;P&gt;Used sed to fix this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;sed 's/&amp;lt;[^&amp;gt;]*&amp;gt;//&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks All.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2012 23:29:36 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5508835#M640159</guid>
      <dc:creator>allanm77</dc:creator>
      <dc:date>2012-01-25T23:29:36Z</dc:date>
    </item>
    <item>
      <title>Re: sed/awk/perl help</title>
      <link>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5508847#M640160</link>
      <description>The "get ()" in the perl example is almost exactly what wget does. My perl scriplet is just an example. Modify to your hearts content.&lt;BR /&gt;One-lines to analyze HTML? Forget it! (it works once, but you'll end up with very very long lines to stick to a single line.</description>
      <pubDate>Wed, 25 Jan 2012 23:47:06 GMT</pubDate>
      <guid>https://community.hpe.com/t5/operating-system-hp-ux/sed-awk-perl-help/m-p/5508847#M640160</guid>
      <dc:creator>H.Merijn Brand (procura</dc:creator>
      <dc:date>2012-01-25T23:47:06Z</dc:date>
    </item>
  </channel>
</rss>

