1834481 Members
3619 Online
110067 Solutions
New Discussion

Re: Script help

 
SOLVED
Go to solution
Kathleen
Regular Advisor

Script help

I am trying to get html tags removed from a text document.

I have a script I am using and it works fine unless the html tag runs across multiple lines....ie

name="PostalCode"/>

How do I make the statement link the lines so it thinks it is all one, or to continue afte finding the first < until it finds the second > and delete that and all in between? To get it to work on one line, I am using
sed "s/<[^>]*>/ /g"
it works great. Just not when multiple lines are involved. Please help!
10 REPLIES 10
Sridhar Bhaskarla
Honored Contributor

Re: Script help

Hi,

Did you try deletion like this.

sed '/\/d' data

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
S.K. Chan
Honored Contributor
Solution

Re: Script help

Try this instead ..
sed -e :a -e 's/<[^>]*>//g;/I did a quick test and it seems to work.
Kathleen
Regular Advisor

Re: Script help

I tried both. I am still seeing some of the info in my text file. Here is one example I am trying with also.




When I try the second script, it does remove the

Thanks
Kathleen
Regular Advisor

Re: Script help

I tried both. I am still seeing some of the info in my text file. Here is one example I am trying with also.




When I try the second script, it does remove the

Thanks
S.K. Chan
Honored Contributor

Re: Script help

I cut and paste what you had into file "test" and added a few more other lines to test it and it worked for me.
# cat test|sed -e :a -e 's/<[^>]*>//g;/You sure it's not typo ..?
Kathleen
Regular Advisor

Re: Script help

I am not sure what is going wrong if yours is working. Here is what I have...including first the output, the text file I have, and then the script.

[/tmp] # cat kmv_test| /tmp/html_kmv2
a:hover { color: #990000; text-decoration: none }
a:visited { color: navy }
a:visited:hover{ color: #990000; text-decoration: none } -->
[/tmp] # cat kmv_test


[/tmp] # cat /tmp/html_kmv2
#!/bin/ksh

while read line
do
echo $line |sed -e :a -e 's/<[^>]*>//g;/done
#
exit 0
Sridhar Bhaskarla
Honored Contributor

Re: Script help

Kathleen,

I missed your idea.

Credit goes to Chan and his script does work.

The problem with your approach is that you are applying it to each line by the 'while' statement which will not work.

You have to apply the sed statement to the entire file at once.


-Sri
You may be disappointed if you fail, but you are doomed if you don't try
S.K. Chan
Honored Contributor

Re: Script help

That's because you're reading the file in line by line. The sed statement I gave processes the whole file at one go.
Kathleen
Regular Advisor

Re: Script help

Ok. Just so you know...I will get points assigned and I REALLY appreciate the help. My question is, how do I get this to work if I am reading in a whole file (not by the command line but as a converter script)...without using a while statement? Thanks again for all of the help.
S.K. Chan
Honored Contributor

Re: Script help

Something simple like this.. the script takes in a single file and process it. If you wanted capture the processed file just redirect it like so ..
# ./myscript datafile > new-datafile

The "myscript" script ..

#!/bin/ksh

[[ $# != 1 ]] && { echo "Usage: $0 ";exit 1; }

cat $1|sed -e :a -e 's/<[^>]*>//g;/