- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Finding the same content in two files.
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 02:22 AM
10-19-2006 02:22 AM
I want to find (or) list the same contents are there in two files.
Ex: Test1 & Test2 there are two files. If that two files are having the same words like "example". If I execute grep or diff or cmp commands, it should display the output "example" and what are all the same words it has.
In which command, I can get this details.
* It should work in SOLARIS also.
Regards,
Sudhakaran.K
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 02:28 AM
10-19-2006 02:28 AM
Re: Finding the same content in two files.
By "word" I presume that you mean a whitespace (space, tab or newline) delimited string of characters.
If that's the case, you can create a simple Perl or awk script using hashes (associate arrays) to hold every word of each file. Depending upon what it is that you want to report, process the hashes accordingly.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 02:33 AM
10-19-2006 02:33 AM
Re: Finding the same content in two files.
without space, newlines, tabs. I want the output.
If there is any same contents, it should show.
Regards,
Sudhakaran.K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 02:43 AM
10-19-2006 02:43 AM
Re: Finding the same content in two files.
now is the time for all
therefore
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 03:02 AM
10-19-2006 03:02 AM
Re: Finding the same content in two files.
the way I read the request is:
1. Create an index of all the individual words in file1
2. Create an index of all the individual words in file2
3. Compare the two indexes and report on what words are common or not found
If this is not correct, could you please supply some example file1 and file2 and the expected output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 03:52 AM
10-19-2006 03:52 AM
Re: Finding the same content in two files.
(The files have to be sorted though)
comm - select or reject lines common to two sorted files
For details man comm
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 05:59 AM
10-19-2006 05:59 AM
Re: Finding the same content in two files.
here is an example for 'comm' - the term 'same' is here seen as 'same line'.
sort test1 >test1.sorted
sort test2 | comm -12 test1.sorted -
You will get all common lines.
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 04:49 PM
10-19-2006 04:49 PM
Re: Finding the same content in two files.
Here I am explaning, what exactly i want.
_________________________________________
File "a" contents:
apple
orange
egg
_________________________________________
File "b" contents:
sun
orange
space
_________________________________________
My output should be: "orange"
_________________________________________
So what are all the matching words, I wish to see in two files.
Regards,
Sudhakaran.K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 05:07 PM
10-19-2006 05:07 PM
Re: Finding the same content in two files.
sort file1> f1
sort file2> f2
diff f1 f2
You could also use "sort -u" to remove duplicates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 05:37 PM
10-19-2006 05:37 PM
Re: Finding the same content in two files.
using diff command is not working after sorting.
I am not getting the same words.
Regards,
Sudhakaran.K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 06:20 PM
10-19-2006 06:20 PM
Solutionmy previous solution will show exactly the requested output!
mfG Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 06:44 PM
10-19-2006 06:44 PM
Re: Finding the same content in two files.
Yes it's working. I want to implement this command in working environment. The example scnerio is:
The first file having word "solaris"
The second file having word "install solaris"
In this scnerio your command is not working!
It's not displaying the word "solaris" in output.
Please suggest.
Regards,
Sudhakaran.K
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 08:22 PM
10-19-2006 08:22 PM
Re: Finding the same content in two files.
based on my previous post:
$ cat a.pl
#!/usr/bin/perl
while (<>) {
@words = split(/\W+/);
foreach (@words) {
print "$_\n";
}
}
$ cat a.sh
#!/usr/bin/sh
# Create the index for file a
a.pl a > a.out
# Remove any duplicates out of index a
sort -uo a.sor a.out
rm a.out
# Create the index for file b
a.pl b > b.out
# Remove any duplicates out of index b
sort -uo b.sor b.out
rm b.out
# Print data that is common to both files
comm -12 a.sor b.sor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-19-2006 08:30 PM
10-19-2006 08:30 PM
Re: Finding the same content in two files.
this runs fine on mu HPUX11i:
/tmp/> cat a
apple
orange
egg
/tmp/> cat b
sun
orange
space
> grep -f a b
orange
This command search all word in file a in file b and show the result.
I suggest you to use as first file the biggest file. Man grep for further info.
HTH,
Art
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2006 02:28 AM
10-20-2006 02:28 AM
Re: Finding the same content in two files.
If you want to create output of each whitespace delimited string, do this, for example:
# perl -nlaF -e 'print for (@F)' /etc/hosts
Splitting on "word" characters with '\W' decomposes things like IP addresses, etc. Compare the above output to:
# perl -nl -e '@F=split(/\W+/);print for (@F)' /etc/hosts
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2006 06:15 AM
10-20-2006 06:15 AM
Re: Finding the same content in two files.
Try this:
>perl -ne 'chomp; foreach (split) {if ($test) {print "$_\n" if delete $w{$_}} else {$w{$_}++}}; $test++ if eof' x.txt y.txt
In slow motion...
#>perl -ne ' Start perl looping over input(s) using next string as program
# chomp; Drop newline
# foreach (split) { Loop over words in each input line split by whitespace
# if ($test) { Test is set when first file gives EOF
# print "$_\n" if Print the word but only if...
# delete $w{$_} There was one deleted, thus present
# } else { Not seen eof yet
# $w{$_}++}}; Remember each word in hash %w
# $test++ if eof' Switch gears when eof seen
# x.txt y.txt Sample input files.
hth,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-20-2006 07:45 PM
10-20-2006 07:45 PM
Re: Finding the same content in two files.
# common.sh file1 file2
=======================================================
#!/usr/bin/sh
trap ' rm tmpfile' 0
while read line
do
echo $line | awk '{for(i=1;i<=NF;i++) print $i}' > tmpfile
while read word
do
grep -E "^$word | $word | $word[\.]$" $2 > /dev/null
if [ $? -eq 0 ]; then
echo $word >> awords
fi
done < tmpfile
done < $1
if [ -s "awords" ]; then
sort -uk1,1 awords > cwords && rm awords
fi
=======================================================
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-22-2006 03:28 PM
10-22-2006 03:28 PM
Re: Finding the same content in two files.
It sure looks like your solution would work from a functional perspective, but from a performance perspective it is just horrible!
The old n-square problem at its best.
- awk launched for every line in the first file.
- grep launched for every word in the first file.
- The second file read completely, over and over again, for every word in the first file.
Yikes!
Please review the Peter's solution which nicely reduces each input to unique words first and then uses a simple tool to read and compare each sorted wordlist once.
Cheers!
Hein.
# common.sh file1 file2
=======================================================
#!/usr/bin/sh
trap ' rm tmpfile' 0
while read line
do
echo $line | awk '{for(i=1;i<=NF;i++) print $i}' > tmpfile
while read word
do
grep -E "^$word | $word | $word[\.]$" $2 > /dev/null
if [ $? -eq 0 ]; then
echo $word >> awords
fi
done < tmpfile
done < $1
if [ -s "awords" ]; then
sort -uk1,1 awords > cwords && rm awords
fi
=======================================================
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-22-2006 06:41 PM
10-22-2006 06:41 PM
Re: Finding the same content in two files.
I think we should give Sudhakaran some time to try out the various solutions and report back with his comments.