Operating System - HP-UX
1826592 Members
3872 Online
109695 Solutions
New Discussion

Re: Script to select text from a line

 
Luk Vandenbussche
Honored Contributor

Script to select text from a line

Hi,

I have the following question.
I have a file with several lines in it.
In each line a have the adress off a website, not on a fixed position. I need a script / command to filter out this website.

fe
The file contains

aeratarawww.hp.comqmlkmlk
qdfdwww.google.commmkkm
arewww.gmail.com kmkùkù

I need as output only

www.hp.com
www.google.com
www.gmail.com

Please advise
8 REPLIES 8
Masatake Hanayama
Trusted Contributor

Re: Script to select text from a line

Below is not perfect, but may work for the example.

sed -e 's/^.*www\./www\./' -e 's/\.com.*$/\.com/' filename
Dennis Handly
Acclaimed Contributor

Re: Script to select text from a line

If they start with www. and end in .com you can use:
sed -e 's/*.\(www\..*\.com\).*$/\1/' filename
john korterman
Honored Contributor

Re: Script to select text from a line

Hi Luk,

try something like this, using your file as $1:

#!/usr/bin/sh
while read line
do
STRING1=$(echo ${line%%www\.*})
STRING2=$(echo "${line#$STRING1}" )
STRING3=$(echo ${STRING2##*\.com})
echo ${STRING2%$STRING3}
done <$1



regards,
John K.
it would be nice if you always got a second chance
Rasheed Tamton
Honored Contributor

Re: Script to select text from a line

Hi,

# awk 'BEGIN {FS="."} {printf "www."$2".com\n"}' filename
www.hp.com
www.google.com
www.gmail.com

Rasheed Tamton
Honored Contributor

Re: Script to select text from a line

Hi,

With more error checking.

awk -F"." '/www\./ && /\.com/ {print "www."$2".com"}' filename


If the file will definitely contain the strings as "www.xxx.com" form, then here is a more simplified form:

awk -F"." '{print "www."$2".com"}' filename
Hein van den Heuvel
Honored Contributor

Re: Script to select text from a line

Luk, what problem are you really trying to solve? If the file comes from a particular file (index.dat) or log then others may have dealt with that before.

I find it unlikely that the www just starts after random text and the .com runs into random text as your example suggests. Is there really no seperator of sorts (whitespace?) If there a field lenght/offset indicator elsewhere in the structure?

If the text is all we have, is the www and .com, and a single field between them a complete description, or does the code also have to trigger on .org, .net and what have you not?

Cheers,
Hein.

spex
Honored Contributor

Re: Script to select text from a line

Hi Luk,

This sed script is a bit more generic than some of the other solutions:

$ cat urls
aeratarawww.hp.comqmlkmlk
qdfdwww.google.commmkkm
arewww.gmail.com kmkyky
jkddswww.hello.orgafsdjk
wjekwjws1.hp.netfds7u8
weewwemail.dsdsdswww.netsdfjkldskdsj
wwewuiweweb.ieee.orgfdjkdjdk
www.ibm.com

$ sed -e 's/^.*\(...\..*\....\).*$/\1/' urls
www.hp.com
www.google.com
www.gmail.com
www.hello.org
ws1.hp.net
ail.dsdsdswww.net
web.ieee.org
www.ibm.com

PCS
Luk Vandenbussche
Honored Contributor

Re: Script to select text from a line

Thanks for the help