Operating System - HP-UX
1826925 Members
2070 Online
109705 Solutions
New Discussion

Re: script/cmd required for avoiding the lines containing a search word

 
SOLVED
Go to solution
Karthikeyan_5
Frequent Advisor

script/cmd required for avoiding the lines containing a search word

Hi All,

I want a script / command which'll avoid all the lines containing the word "mathi" & not "mathivanan" or "abcmathitest" from my i/p file.....my sample i/p files looks as follows:

-------------------------------
1 mathi a bcdtest
2xyz mathivanan hopeso
3 hanry
4karthik s s mathi
5 Michael Schulte mathiabc
6 Pete Randall testmathi
7Klaas D. Eenkhoorn mathi
8john korterman aa mathi
9 Dave La Mar smathia
-------------------------------

I want the ouput after avoiding the lines containing exactly the word "mathi".....which'll be as follows:

-------------------------------
2xyz mathivanan hopeso
3 hanry
5 Michael Schulte mathiabc
6 Pete Randall testmathi
7Klaas D. Eenkhoorn mathi
9 Dave La Mar smathia
-------------------------------

I hope I made my requirement very clear......I want to know how i can get this by way of "grep" or any script which'll do this job for me......

Thanks in advance.....

Regards,
karthik
30 REPLIES 30
John Carr_2
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word



grep -v mathi filename > newfile

-v means do not include any lines with given search string

John.
Mark Grant
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

perl -n -e 'if(not /\bmathi\b/){print}' testfile

But in your example line 7 contains the word "mathi" too so it presumably should be avoided.
Never preceed any demonstration with anything more predictive than "watch this"
John Carr_2
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi

ignore my first grep that will remove all lines with mathi

grep -v " mathi " filename > newfile

:-)

JOhn.
john korterman
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi,
try the attached script, using your input file as $1

regards,
John K
it would be nice if you always got a second chance
Hein van den Heuvel
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Please help me understand why you want line 7 in the output, and not line 4. What sets them apart? Is this a problem with the forum stripping spaces? If so, I woudl recommend to ATTACH sample input as a file as well as displaying it.

The grep solution should be:

grep -E -v '(^| )mathi( |$)' x

or
grep -E -v '(^|[ ])mathi([ ]|$)' x
if you anticipate tabs or spaces and have entered a space and a tab (\t ^V^I) in the match box []

I like the perl solution better using the powerful "\b" word boundary.


To bad you did not search for, or failed to find, the following threads, which discuss much similar problems:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=292228

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=284829

hth,
Hein.
Michael Schulte zur Sur
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi Karthis,

please read the statement made in the other thread regarding the comment of Hein.

What the solution is concerned, according to your specifications line 7 shouldnt show up. I attach a modified awk script, which I used in the other thread. Perhaps it helps you.
run it with
awk -f yourscriptname filetosearch.

greetings,

Michael
Michael Schulte zur Sur
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Oops, where did the attachment go?

Michael
Karthikeyan_5
Frequent Advisor

Re: script/cmd required for avoiding the lines containing a search word

Hi All,

First Thanks for all of u'r help/responses....

Also in my sample output "line 7" shouldn't be present.....sorry its by mistake.......

Perl is not installed in our HP-UX server....so i couldn't able to try it out....& also the scripts attached by "john korterman" & "Michael Schulte" b'coz I am new to scripting......also the command given by "Hein van den Heuvel" was very simple:

"grep -E -v '(^| )mathi( |$)' x"

But "Hein" can u pls tell me what is the meaning of (^| ) & ( |$)...i.e. starting & ending i suppose....but y opening "(" & closing ")" brackets are required......

Also I am very very sorry that there is some modification in my sample OUTPUT.......

1) first from the i/p file i should avoid all the lines containg only the word "mathi"......the output is as follows:

-------------------------------
2xyz mathivanan hopeso
3 hanry
5 Michael Schulte mathiabc
6 Pete Randall testmathi
9 Dave La Mar smathia
-------------------------------

2) secondly, in turn from the above output it should further DELETE/AVOID the alternate lines (assume all EVEN lines from the above output).......the sample output for this is some what like:

-------------------------------
2xyz mathivanan hopeso
5 Michael Schulte mathiabc
9 Dave La Mar smathia
-------------------------------

3) Finally, from the above output, the 2 2 lines should be appended to the first line of that 2 lines with a SPACE separated. The sample output is as follows:

-------------------------------
2xyz mathivanan hopeso 5 Michael Schulte mathiabc
9 Dave La Mar smathia
-------------------------------

That's it......thanks in advance.......

Regards,
karthik
Jean-Louis Phelix
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi,

First, there is a '-w' option to grep to get only matching "words". It works in your case ... So the simple script could be :

#!/usr/bin/sh
grep -vw mathi yourfile | awk '
{
line=$0
getline
getline
printf "%s %s\n", line, $0
getline
}'

The ouput should be what you need ...

Regards.
It works for me (© Bill McNAMARA ...)
John Carr_2
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word


grep -w

I have not seen this before and its not on my version of 11.00 can anyone clarify this option please.

John.
Pete Randall
Outstanding Contributor

Re: script/cmd required for avoiding the lines containing a search word

John,

From "man grep" on 11i:

-w Select only those lines containing matches
that form whole words. The test is that the
matching substring must either be at the
beginning of the line, or preceded by a non-
word constituent character. Similarly, it
must be either at the end of the line or
followed by a non-word constituent character.
Word-constituent characters are letters,
digits, and the underscore.


Pete

Pete
Jean-Louis Phelix
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi,

It comes with 11.11 I think ...

Hewlett-Packard Company - 2 - HP-UX Release 11i: November 2000

grep(1) grep(1)

-w Select only those lines containing matches
that form whole words. The test is that the
matching substring must either be at the
beginning of the line, or preceded by a non-
word constituent character. Similarly, it
must be either at the end of the line or
followed by a non-word constituent character.
Word-constituent characters are letters,
digits, and the underscore.
It works for me (© Bill McNAMARA ...)
Michael Schulte zur Sur
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi Karthik,

(^| )mathi( |$)

It means, mathi is supposed to be at the start of a line or preceeded by a space and followed by a space or end of line.

greetings,

Michael
John Carr_2
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

can you donate a little time and assign points to all the people who gave up there time to help you :-)

company name:
country: india
personal quote:
certification:
ITRC member since: May 28, 2003
last contribution date: December 15, 2003
I have assigned points to 9 of 86 responses to my questions.
Michael Schulte zur Sur
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi Karthik,

what about this attachment. Put it in a file lets say kart.awk, then:
awk -f kart.awk filetogrep

greetings,

Michael
Michael Schulte zur Sur
Honored Contributor
Solution

Re: script/cmd required for avoiding the lines containing a search word

Hi Karthik,

what about this attachment. Put it in a file lets say kart.awk, then:
awk -f kart.awk filetogrep

greetings,

Michael
Karthikeyan_5
Frequent Advisor

Re: script/cmd required for avoiding the lines containing a search word

Hi All,

first of all thanks for all of u'r responses......

Jean-Louis Phelix:

I am using HP-UX 10.2....so i think "-w" option of "grep" is rulled out.....

Michael Schulte:

----------------------
(^| )mathi( |$)

It means, mathi is supposed to be at the start of a line or preceeded by a space and followed by a space or end of line.
----------------------
assume in my case, the word MATHI is preceeded & follwed by space....and not start or end......in this case, if put ' mathi '......i.e.

grep -v ' mathi ' filename

but I am not getting the required output.....can u pls explain.......also as I mentioned in my prev. post......i am new to scripting & also the script u gave is some what confusing for me b'coz i am not familiar with scripting na.......sorry my dear.....

John Carr:

I'll surely assign points for those who have helped me......i.e. assigning points comes into picture only at the ending stage of the topic...right.....so y r u hurrying my dear friend.......b'coz in my later post....as I mentioned there is some change in my output requirement.....only a part of it is complete & other 2 parts are remaining.......

waiting to c all of u'r responses......

regards,
karthik
Jean-Louis Phelix
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi,

OK ... without -w option, this one could work :

#!/usr/bin/sh
grep -v -E '^mathi | mathi | mathi$' /tmp/a | awk '
{
line=$0
getline
getline
printf "%s %s\n", line, $0
getline
}'

Regards.
It works for me (© Bill McNAMARA ...)
Michael Schulte zur Sur
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi Karthik,

you dont need to change the grep of Hein. grep tries any combination of what you give it and grep -E -v '(^| )mathi( |$)' is equivalent to:
grep -E -v '^mathi ' or
grep -E -v ' mathi$' or
grep -E -v '^mathi$' or
grep -E -v ' mathi '.

But grep is not sufficient enough here.

Have you tried the awk script in my atachment?

Michael

Elmar P. Kolkman
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

If you use Hein's grep, getting only the even line numbers can be done by using awk:
grep -Ev '(^| )mathi( |$)' input | awk 'NR%2==0'

For the odd lines, you would change the 0 to a 1.

As for part 3, you should then change the awk:
awk 'NR%4==0 { printf "%s",$0 }
NR%4==2 {printf " %s\n",$0}
END { if (NR%4 < 2) {print}}
'

Every problem has at least one solution. Only some solutions are harder to find.
Karthikeyan_5
Frequent Advisor

Re: script/cmd required for avoiding the lines containing a search word

Hi All,

I am very sorry late responding.......

Jean-Louis Phelix:

I tried u'r script.....but the output is not what I wanted......but i am interested to know what exactly it does......i.e. description of your script.

Michael Schulte:

I tried your awk script & the output of it is exactly what I wanted.......but as I am beginner of scripting.....I more interested to know the STEP by STEP details of your AWK script....i.e. script working details/explination........

In u'r prev. post, u've stated that grep -E -v '(^| )mathi( |$)' is equivalent to:
......
......
......
grep -E -v ' mathi '.

In my case, the above code is enough right to get my first o/p.....but when I try executing the following command:

grep -E -v ' mathi ' filename

I am getting the complete contents of i/p file......y is it so happening.......

Elmar P. Kolkman:
--------------------------------------------
As for part 3, you should then change the awk:
awk 'NR%4==0 { printf "%s",$0 }
NR%4==2 {printf " %s\n",$0}
END { if (NR%4 < 2) {print}}
'
--------------------------------------------
Up to first 2 parts its working......but for the third part......i tried u'r above script & its giving some error......so can u make changes if any & attach me the script as a file (some thing like test.awk).....so that I can run it by giving

$awk -f test.awk input

also I want the explination for the 3rd part script alone.......

Atlast i'll surely assign points & so u all pls be with patience.......

regards,
karthik
Jean-Louis Phelix
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi

Strange ... I did a cut and paste of your data file and :

$ cat /tmp/b
#!/usr/bin/sh
grep -v -E '^mathi | mathi | mathi$' /tmp/a | awk '
{
line=$0
getline
getline
printf "%s %s\n", line, $0
getline
}'
$ sh /tmp/b
2xyz mathivanan hopeso 5 Michael Schulte mathiabc
9 Dave La Mar smathia
$

I assume that your data are in /tmp/a ...

grep -v -E '^mathi | mathi | mathi$' means suppress lines begining with 'mathi' or contaning 'mathi' or ending with 'mathi'

In awk :

line=$0 save record in line variable
getline reads next record
getline reads again next record so overwrite record 2 and gets record 3
printf xxx prints line (so record 1) and current record (record 3)
getline reads next record (record 4) to skip it and continues the loop with record 5

Regards.
It works for me (© Bill McNAMARA ...)
Michael Schulte zur Sur
Honored Contributor

Re: script/cmd required for avoiding the lines containing a search word

Hi Karthik,

let me explain, what my script does.
The first if catches any combination of mathi, the hat means start of string .* are any number of characters, the space in the bracket and the + means one or more spaces and the $ the end of the string.
This boils down to finding mathi, surrounded by one or more space on either or both sides or mathi alone on a line and ignore it(next). If I find a suitable text and it is the first line(SL, second line=0), print it without newline and discard the next line, otherwise print with newline and discard the next line.

Well, I could have moved the getline after the if, since both branches have it, but..

I hope, I could make it clear to you,

Michael
Karthikeyan_5
Frequent Advisor

Re: script/cmd required for avoiding the lines containing a search word

Hi Jean-Louis & Michael Schulte,

Thanks for your responses with explinations......

Jean-Louis:

This time just copy & pasted u'r script....its working fine........

But In my case the search sting is " mathi " (i.e. space before sting & followed space)......in this case if we use the following grep cmd:

grep -v -E ' mathi ' /tmp/a

its enough right, but when I try to execute the above GREP cmd.......I am getting the complete i/p file contents as my output....i.e. equivalent to "cat" cmd"......

Can u pls explain why it is happening so.....

Michael Schulte:

You have not told anything regarding the following which I have asked in my prev post:

----------------------
In u'r prev. post, u've stated that grep -E -v '(^| )mathi( |$)' is equivalent to:
......
......
......
grep -E -v ' mathi '.

In my case, the above code is enough right to get my first o/p.....but when I try executing the following command:

grep -E -v ' mathi ' filename

I am getting the complete contents of i/p file......y is it so happening.......
----------------------

Still I don't want to make delay in assigning points.......

Also can you pls give me links for very very GOOD scripting docs (unix scripting, awk & sed) from basics so that I can learn it, preferably in printable format.....or if you have one can you pls attach it........

Waiting to see u'r replies......

regards,
karthik