1828926 Members
2907 Online
109986 Solutions
New Discussion

Re: Shell script.

 
SOLVED
Go to solution
LEJARRE Patrick
Advisor

Shell script.

I would like to extract from a input text file the lines that would match the following specifications:
The extracted lines should contain a given string
AND
The line before the first occurence in the input file should also be extracted
AND
All the extracted lines should be deleted from the input file.
The result should be placed ino a output text file.

I think that an "awk" guru could help me best.
12 REPLIES 12
Andreas Voss
Honored Contributor
Solution

Re: Shell script.

Hi,

here you are:

awk '{
if($0 ~ "pattern")
{
print $0;
print prev;
}
prev=$0;
}'

You have to modify "pattern" for your requirements.

Cheers

Andrew
Andreas Voss
Honored Contributor

Re: Shell script.

Sorry title mistake in the output order:
Instead:
print $0
print prev
use:
print prev
print $0
LEJARRE Patrick
Advisor

Re: Shell script.

hi
your answer helped me a lot but the lines containing the search string are printed twice:

awk '{if ($0 ~ "pattern") {print prev;print $0;} prev=$0;}' inputfile > outputfile

I solved the problem with the following additional commands:

head -1 outputfile > outputfile1
line=`wc -l outputfile | awk '{print $1}'`
line=`expr $line - 1`
tail -$line outputfile | sort | uniq > outputfile2
cat outputfile1 outputfile2 > outputfile

I don't know wether it is possible to do it within the awk command, but it works perfectly.

Thanks.
LEJARRE Patrick
Advisor

Re: Shell script.

The initial goals aren't fullfilled yet.
The extracted lines should be deleted from the input file.
LEJARRE Patrick
Advisor

Re: Shell script.

Hi Andrew !
Below the solution.

Take a look at the -v option of awk to assign a parameter to be passed to the command.

Pat

---------------------------------------------
Shell script
---------------------------------------------

#!/sbin/sh

# Initialisation
search_string=$1
inputfile=$2
outputfile=$3
outputfile1=${outputfile}_tmp1
outputfile2=${outputfile}_tmp2

# Main
awk -v search_string=$search_string '{if ($0 ~ search_string) {print prev;print $0;} prev=$0;}' $inputfile > $outputfile

head -1 ${outputfile} > $outputfile1
nb_lines=`wc -l ${outputfile} | awk '{print $1}'`
nb_lines=`expr $nb_lines - 1`

tail -$nb_lines ${outputfile} | sort | uniq > $outputfile2
cat $outputfile1 $outputfile2 > ${outputfile}

sed -e 's/ /?/g' ${outputfile} > ${outputfile}$$
mv ${outputfile}$$ ${outputfile}
sed -e 's/"/@/g' ${outputfile} > ${outputfile}$$
mv ${outputfile}$$ ${outputfile}

sed -e 's/ /?/g' ${inputfile} > ${inputfile}$$
mv ${inputfile}$$ ${inputfile}
sed -e 's/"/@/g' ${inputfile} > ${inputfile}$$
mv ${inputfile}$$ ${inputfile}

ligne=`head -1 ${outputfile}`

grep -v "$ligne" ${inputfile} > ${inputfile}$$
mv ${inputfile}$$ ${inputfile}

grep -v "$search_string" ${inputfile} > ${inputfile}$$
mv ${inputfile}$$ ${inputfile}

sed -e 's/?/ /g' ${outputfile} > ${outputfile}$$
mv ${outputfile}$$ ${outputfile}
sed -e 's/@/"/g' ${outputfile} > ${outputfile}$$
mv ${outputfile}$$ ${outputfile}

sed -e 's/?/ /g' ${inputfile} > ${inputfile}$$
mv ${inputfile}$$ ${inputfile}
sed -e 's/@/"/g' ${inputfile} > ${inputfile}$$
mv ${inputfile}$$ ${inputfile}
James R. Ferguson
Acclaimed Contributor

Re: Shell script.

Pat:

Let's use Andreas' suggestion amended as follows:

awk '{
if ($0 ~ /pattern/)
{
print $0 >> "/tmp/file.1";
print prev >> "/tmp/file.1";
}
else
{
print $0 >> "/tmp/file.2";
prev=$0;
}
}' /tmp/file

I think you will find that /tmp/file.2 is the input file stripped according to your mandate.

...JRF...
Alan Riggs
Honored Contributor

Re: Shell script.

How about:

INFILE=xxxx
OUTFILE=yyyyy

awk '{if($0 ~ "AND")
{
print prev
print $0
}
prev=$0
}' $INFILE > $OUTFILE
while read LINE
do
grep -v "$LINE" $INFILE > tmpfile
mv tmpfile $INFILE
done < $OUTFILE
curt larson
Frequent Advisor

Re: Shell script.

#!/usr/bin/ksh

mfile=file_before_deletions

file1=mfile_after_deletions
file2=deletions_from_mfile

pattern=your_pattern

count=$(grep -c $pattern $mfile)

#if count = 0 no lines to delete
#
if (( $count == 0 )) ;then
cp $mfile $file1
touch $file2
else
#get line number of first occurance
#
line_num=$(grep -n $pattern $mfile | awk -F: '{print $1;exit;}')

#if the first occurance is the first line
#you can't delete the previous line
#so just delete the lines with the pattern
#
if (( $line_num == 1 )) ;then
grep -v $pattern $mfile > $file1
grep $pattern $mfile > $file2
else
#
# get the line number of the line
# before the first occurance
#
(( line_num = $line_num - 1 ))

# use sed to delete line then grep for all rest
#
sed -e "${line_num}d" $mfile | grep -v $pattern > $file1

#for deletions file use sed to print the line
#before the first occurance then grep the rest
#
sed -ne "${line_num}p" $mfile > $file2
grep $pattern $mfile >> $file2
fi
fi
nobody else has this problem
curt larson
Frequent Advisor

Re: Shell script.

and if you know there is at least one occurance and it isn't the first line, then this will work also:

ex -s +"/$pattern/ | .- d | 1,\$ global /$pattern/d | w > $file_with_deletions | q!" your_file
nobody else has this problem
LEJARRE Patrick
Advisor

Re: Shell script.

Hi Curt !
I read your script closely. I find it a little more complicated than the Andreas and Alan's scripts. I'm afraid your script would be more time and cpu consuming than the Andreas and Alan's ones.

Thank you very much for your contribution!
Pat.
curt larson
Frequent Advisor

Re: Shell script.

Well, my script is more complicated for a couple of reasons:

1) I extract only the line before the first occurance of the pattern as specified, not the line before every occurance of the pattern as alan's does

2) my script handles the situation if the pattern occurs in the first line. Alan's will output an empty line being prev will have no value at that time and will later remove all the blank lines from the file. I don't think this was your intention.

As far as time and cpu, my script would only do 3 greps and an awk on the file worst case (double that if you want to count creating the file of the deleted lines which could be removed if it isn't necessary), where if there is only 2 matches in the file alan's script will do 4 greps and an awk on the file. And with more matches alan's would do even more greps increasing by 2 greps for every line with a pattern match.

But, if you'd like to do it even faster:

#!/usr/bin/ksh

line_num=$(grep -n $pattern $mfile | awk -F: '{print $1;exit;}')

if [[ -n $line_num ]] ;then
if (( $line_num > 1 )) ;then

(( line_num = $line_num - 1 ))
sed -e '$line_num d'
-e '/$pattern/d' $mfile > $tmpfile

else
sed -e '/$pattern/d' $mfile > $tmpfile
fi
mv $tmpfile $mfile
fi

sorry, but I have a different opinion then you
nobody else has this problem
curt larson
Frequent Advisor

Re: Shell script.

and if you really want to spend the time maintaining an awk script:

#!/usr/bin/ksh


cat infile | awk '

BEGIN { First = "no";
Match = "no";
getline;
if ( $0 ~ /pattern/ ) {
First = "yes";
Match = "yes";
}
prev = $0;
}
{
if ( $0 ~ /pattern/ ) {
if ( First == "no" ) {
First = "yes";
}
else {
if ( Match == "no" ) {
print prev; }
}
Match = "yes";
}
else {
if ( Match == "no" ) {
print prev; }
Match = "no";
}
prev = $0
}
END
{ if ( Match == "no" ) print prev; }
' > outfile

mv outfile infile

myself the greps, seds, and eds are simplier to understand.
nobody else has this problem