1827845 Members
1258 Online
109969 Solutions
New Discussion

Re: Script help

 
SOLVED
Go to solution
jackfiled
Advisor

Script help

I need your help!!!

I need to know duplicated strings in two files.
One file has 1000line or so, and the other has 550 lines too.
The output that I want is printing the same string, so I will update new string.

For example One file has strings like

webbbs
ryujin
hanbangapple
yumso
sagua
nojuck
ryujin
gajossal
scfarm
kyulnara
dearpia
cwpodofarm
chungma
mealon
samyu...


The Other

andonghoney
ansungfarm
apeace
appletop
apsanjayoun
bawoofarm
bitgolfarm
hanbangapple
bonghwangfarm
celesti
chuksukfarm
chungmaewon
chungpoongfarm
chunmafarm
dearpia
...
so as you see hanbangapple and dearpia are both in two files

I could do know duplicated strings are hanbangapple , dearpia and I can erase them one of files.


what is the script for fit it? any tips..
3 REPLIES 3
Stuart Browne
Honored Contributor
Solution

Re: Script help

So you want to remove words which are duplicated in the files from one of the files.

Using something like:

cat file1 file2 | sort | uniq -d

to list the same words.

Then use sed or awk or your favourite text manipulation tool to remove it.. i.e.

for WORD in $(cat file1 file2 | sort | uniq -d)
do
sed -e "/${WORD}/d" < file1 > file1.out
mv file1.out file1
done

or some such..
One long-haired git at your service...
Francisco J. Soler
Honored Contributor

Re: Script help

Hi,
If you have enought memory (i think it is possible because the length of files is small), you can store all lines from the file with no modifications in an array, then with awk read the other file and write out this lines that are not in the array.

For example:

You can do
awk -f script.awk file1 file2 > file3

where file3 is the file2 without the strings that are in file1

---- script.awk -----------

BEGIN {
flag_file1=1
filename=" "
}
{
if (filename==" ")
filename=FILENAME
if (filename!=FILENAME)
flag_file1=0
if (flag_file1==1) {
a[NR]=$0
num_lin=NR
} else {
exists=0
for (i=1;i<=num_lin;i++) {
if ($0==a[i])
exists=1
}
if (exists==0)
print
}
}
------------- end script -------------

Cheers.
Frank.
Linux?. Yes, of course.
Muthukumar_5
Honored Contributor

Re: Script help

hai,

use grep to do this. Get one file whose line is less than another. Get line by line and grep that in two files. end, input file will be modified.

#!/usr/bin/ksh
# forum.ksh
set -x

file1=$1
file2=$2

input=""
newfile=/tmp/stringcheck.log

# Remove file if exists
cp -p $newfile
touch $newfile

if [[ $(cat $file1 | wc -l) -lt $(cat $file2 | wc -l) ]]; then
input=$file1
else
input=$file2
fi

while read line; do
grep -q $line $file1 $file2
if [[ $? -eq 0 ]]; then
echo "$line is in $file1 and $file2"
else
echo "$line" >> $newfile
fi
done < $input

# To make the file without same string
cp $newfile $input

## end ###

Regards,
Muthukumar
Easy to suggest when don't know about the problem!