Operating System - HP-UX
1833056 Members
2369 Online
110049 Solutions
New Discussion

Distinct lines from FileA

 
SOLVED
Go to solution
panchpan
Regular Advisor

Distinct lines from FileA

I have got 2 files having thousands of line in each. I would like to find out ONLY lines not available in 2nd file.
May be something like
lines=`ec -l fileA`;
i=1
while [ $i -le $lines ]
do
if line$i NOTfound in fileB then >> NOTPRESENT.txt
i=i+1
done

Please suggest!!!
12 REPLIES 12
James R. Ferguson
Acclaimed Contributor
Solution

Re: Distinct lines from FileA

Hi:

Use 'comm'. See the manpages (and examples therein) for 'comm(1)'.

Regards!

...JRF...
panchpan
Regular Advisor

Re: Distinct lines from FileA

Thank you - I will try this.
Also, please advice how can i get the output printed with line numbers?
Sandman!
Honored Contributor

Re: Distinct lines from FileA

Below command will print numbered lines that are found in fileA but not in fileB.

# comm -23 fileA fileB | nl -nln -s' '
spex
Honored Contributor

Re: Distinct lines from FileA

#!/usr/bin/sh

F1=A
F1S=A.sorted
F2=B
F2S=B.sorted
F1U=A.unique

sort ${F1} > ${F1S}
sort ${F2} > ${F2S}
comm -23 ${F1S} ${F2S} > ${F1U}
grep -F -n -f ${F1U} ${F1}

rm -f ${F1S} ${F2S} ${F1U}

exit 0
panchpan
Regular Advisor

Re: Distinct lines from FileA

for serv_cnt in `cat /home/mfgeb/pp/serv.txt`
do
j=$serv_cnt
echo "===[ Directory List for Server: $serv_cnt ]==="
ssh $serv_cnt "find /mfgdata/ -type d -exec ls -ld {} \; 2>/dev/null "
done >> $j.mfgdata-ux-dir.ls

How can I dynamically generated output file with differnt names???
spex
Honored Contributor

Re: Distinct lines from FileA

I suggest expounding upon your question in a new thread.
panchpan
Regular Advisor

Re: Distinct lines from FileA

Hello.
I ran below command:

comm -23 data-ux-9th.lst data-lx-9th.lst | nl -nln -s' ' > not-in-lx-data.lst

found that output file still has below entries:

data/amfp2p/log/api/IPCSH
data/amfp2p/log/scm
data/amfp2p/log/audit
data/amfp2p/log/ecgei

Though, these were present in both files. Do i need to sort files beforehand. Please suggest how can i run comm command?
spex
Honored Contributor

Re: Distinct lines from FileA

Yes, you need to sort the files beforehand. Hence these following lines in my script above:

sort ${F1} > ${F1S}
sort ${F2} > ${F2S}

Also, in this scenario, 'nl' is going to number the lines of output piped from the 'comm' command--not the position of lines unique to fileA. I suggest using the script I supplied.
Hein van den Heuvel
Honored Contributor

Re: Distinct lines from FileA

If have a little perl scipt handy which does roughly what you desire. It takes two file names as argument and compares them. The files doal NOT have to be sorted. The first file will be stored in memory as keys to an accosiative array, the values being the line numbers. This obviously does NOT handle duplicate lines (SMOP to make it do that :-), and it might run into memory problems with large (many MB) input files. 10,000 lines should be fine. 1,000,000 probably not.

Try it?

Hein

---- compare_file_lines.pl ----------
#
# Open first file and remember all lines with their line numbers in array f1
#
$name = shift @ARGV or die "Must provide first filename";
open FILE, "<$name" or die "Could not read file $name";
while () {
chomp;
next if /^$/; # skip blanks
next if /^\s+#/; # skip comments
$f1{$_}=$.;
}
close FILE; # reset line number

#
# open and loop through second file.
# delete each corresponding element from the array
# Report if missing (could be duplicate in second file)
#
$name = shift @ARGV or die "Must provide second filename";
open FILE, "<$name" or die "Could not read file $name";
while () {
chomp;
next if /^$/; # skip blanks
next if /^\s+#/; # skip comments
next if delete $f1{$_};
print "2:$. not in 1:$_\n";
}


#
# All lines remaining in array must not have been in second file
#
foreach (sort {$f1{$a} <=> $f1{$b}} keys %f1) {
print "1:$f1{$_} not in 2:$_\n";
}

-------------------------


# perl compare_file_lines.pl left right


Hein.
Sandman!
Honored Contributor

Re: Distinct lines from FileA

As already mentioned you need to sort(1) files before putting them thru comm(1). See the comm(1) man page for details. I assumed that fileA and fileB were sorted. Here's a script you can use for what you are trying to do...hope it helps:


#!/usr/bin/sh

sort fileA > fileA.srtd
sort fileB > fileB.srtd
comm -23 fileA.srtd fileB.srtd | nl -nln -s' '
Dennis Handly
Acclaimed Contributor

Re: Distinct lines from FileA

>please advice how can i get the output printed with line numbers?

Line numbers starting from 1? Or starting from the original file1?

Sandman did the first. spex did the latter but his fgrep -n -f will kill you on performance if the number of records is large.

If you really want to do this, you need to number each file with nl and a ID, then sort, then use awk to look for lines only in first file.

Let me know if you are interested.
panchpan
Regular Advisor

Re: Distinct lines from FileA

BIG THANKS !!!