- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: sed/awk/perl help
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-23-2012 05:14 PM - edited тАО01-23-2012 05:25 PM
тАО01-23-2012 05:14 PM - edited тАО01-23-2012 05:25 PM
Hi All,
I have a following piece of output from a wget command and I want to strip the html tags.
<td>INACCESSIBLE</td> <td>hostname:port <a title="inspect" href="dest/hostname:port">[I]</a> <a title="debugger" href="http://hostname:port/debug">[D]</a> <a title="browser" href="http://hostname/Browser.jsp/Url=http%3A//hostname%3A8501/oa/ww&serviceUrl=http%3A//hostname%3A1port/service">[SB]</a></td>
Want the output something like this -
INACCESSIBLE
hostname:port
I have been successful with a bunch of awk/sed but it would be great if it can be done in a single command.
Thanks,
Allan
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-23-2012 08:38 PM - edited тАО01-23-2012 08:40 PM
тАО01-23-2012 08:38 PM - edited тАО01-23-2012 08:40 PM
Re: sed/awk/perl help
>Want the output something like this:
>hostname:port
Don't you want something like this?:
hostname:port [I] [D] [SB]
Or do you just want to ignore anything in the <a ...> ... </a> blocks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-23-2012 10:04 PM
тАО01-23-2012 10:04 PM
Re: sed/awk/perl help
Thanks,
Allan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-24-2012 12:03 PM - edited тАО01-24-2012 12:06 PM
тАО01-24-2012 12:03 PM - edited тАО01-24-2012 12:06 PM
Re: sed/awk/perl help
Dennis, perl would do too but sed / awk is preferable
thx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-25-2012 12:17 AM
тАО01-25-2012 12:17 AM
Re: sed/awk/perl help
I'd say not to try to get into html digging using regular expressions (with whatever language). You'll end up changing them all the time, as the html changes. Been there done that.
There are several fine ways to "convert" HTML to something else. If you want just text, use the proper tools (e.g. lynx):
$ lynx -dump file.html $ lynx -dump http://some.host.com/index.htm
If you want to use a scripting language, WWW::Mechanize, LWP, LWP::Simple, LWP::UserAgent, and HTML::TreeBuilder are your friends in perl:
#!/usr/bin/perl use strict; use warnings; use LWP::Simple; use HTML::TreeBuilder; my $site = get ("http://h30499.www3.hp.com/t5/Languages-and-Scripting/bd-p/itrc-150"); my $tree = HTML::TreeBuilder->new; $tree->parse_content ($site) or die "Cannot parse as HTML\n"; # Print the whole page formatted print $tree->as_HTML (undef, " ", {}); # Print all <a> tags pointing to something with scripting in it for ($tree->look_down (_tag => "a", href => qr{scripting}i)) { print "A: ", $_->as_text, "\t=> ", $_->attr ("href"), "\n"; }
- Tags:
- Perl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-25-2012 11:44 AM
тАО01-25-2012 11:44 AM
Re: sed/awk/perl help
Thanks Merjin!
But the problem is that the wget is part of a script , so if there is a Perl one-liner or sed/awk combination that is what is preferable.
Allan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-25-2012 03:29 PM
- Tags:
- sed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-25-2012 03:47 PM
тАО01-25-2012 03:47 PM
Re: sed/awk/perl help
One-lines to analyze HTML? Forget it! (it works once, but you'll end up with very very long lines to stick to a single line.