Operating System - HP-UX
1827620 Members
3227 Online
109966 Solutions
New Discussion

Creating a new file from two source files minus duplicates

 
SOLVED
Go to solution
Russell Gould
Advisor

Creating a new file from two source files minus duplicates

If I have a file called file1 which contains :
header
blue
purple
trailer

and file2 contains
header
red
purple
green
black
blue
trailer

i.e File 1 appears in some way in File 2 but not neccesarily in any order
Is it possible to strip out each line in File 1 from File 2 and create a File3 which in this case would contain

header
red
green
black
trailer

i.e - No blue or purple !

Our 'real' files contain thousands of records and we are battling with sort, paste, merge, head, tail etc etc

The reason is that File 1 has already been loaded into a database and there will be a problem loading duplicates, hence why I need to create a file3 which is actually file2 without file1 lines in it !

Thanks in advance

Russell
It's not a problem, it's an opportunity !
8 REPLIES 8
John Palmer
Honored Contributor

Re: Creating a new file from two source files minus duplicates

Hi,

Provided the two files are sorted, you can use the comm command to select first those lines unique to file1 and second those unique to file 2.

man comm

Regards,
John
Ian Lochray
Respected Contributor
Solution

Re: Creating a new file from two source files minus duplicates

grep -v -f file1 file2 > file3
Sridhar Bhaskarla
Honored Contributor

Re: Creating a new file from two source files minus duplicates

Hi Russel,

How about this small script.

#!/usr/bin/ksh


HEAD=$(head -1 file1)
TAIL=$(tail -1 file1)

echo $HEAD > result
for LINE in $(cat file1)
do

grep $LINE file2 > /dev/null 2>&1
if [ $? != 0 ]
then
echo $LINE >> result
fi
done

for LINE in $(cat file2)
do
grep $LINE file1 > /dev/null 2>&1
if [ $? != 0 ]
then
echo $LINE >> result
fi
done

echo $TAIL >> result


Your result should have what you wanted.

-Sri


You may be disappointed if you fail, but you are doomed if you don't try
Volker Borowski
Honored Contributor

Re: Creating a new file from two source files minus duplicates

Hi,

not sure about your "header" and "tailer", which might make things difficult, as you want to keep this duplicate line.

Evolve the "uniq" command, and may be re-insert "header" and "tailer" afterwards.

I never thought about Ians approach, but it looks charming simple:-)

Volker

Saravanan Kaliappan
Occasional Advisor

Re: Creating a new file from two source files minus duplicates

Hi Russell,

First sort your source files i.e file 1 and file 2

Then try this

a. comm -13 file1 file2
gives lines that appear only in file2

b. comm -23 file1 file2
gives lines that appear only in file1

I hope for your problem, method-a will give the solution. It eliminate all the duplicate lines.

Regards
Saravanan Kaliappan


Russell Gould
Advisor

Re: Creating a new file from two source files minus duplicates

Thanks,
This is a useful flag which I hadn't relly see before - very helpful - Cheers

Unfortunately, I have established that pur problem is a little more complex and we have since loaded the data into an Oracle database which has provided more flexibility to manipulate the data which has resolved our problem.

Thanks to all for your help though.
It's not a problem, it's an opportunity !
Steven Sim Kok Leong
Honored Contributor

Re: Creating a new file from two source files minus duplicates

Hi,

If you are using GNU grep, then the following will provide you with the output you want:

# head -1 file2; grep -vf file1 file2; tail -1 file2

The -vf options greps whatever not found in file1 that is found in file2.

To redirect it to file3,

# head -1 file2; grep -vf file1 file2; tail -1 file2 > file3

Hope this helps. Regards.

Steven Sim Kok Leong
Robin Wakefield
Honored Contributor

Re: Creating a new file from two source files minus duplicates

Hi Russell,

Just a small point - you'll need to bracket Steven's command, i.e.

(head -1 file2; grep -vf file1 file2; tail -1 file2) > file3

Rgds, Robin