Operating System - HP-UX
1745837 Members
4298 Online
108723 Solutions
New Discussion

Re: sh script - find string in two different files and compare

 
Ratzie
Super Advisor

sh script - find string in two different files and compare

I think I am going to have a hard time explaining this one, so sorry in advance...

 

I have a file that contains multiple entries:

.BEGIN
UPDATE/5552166619,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR
 
.BEGIN
UPDATE/5552194161,,,
.DELETE_ALL RELATED
ACCOUNT/52728912
.INSERT_RELATED/
RELATED/diffemail@myemail.net
.END_INSERT
.EOR

 

This goes on and on...

I have another file almost identical it may contain the same (UPDATE/7digits) it may not.

What I want to do it take the UPDATE/<7digits>

Look it up in 2nd file, and if it exists, compare the ACCOUNT and see if it is different.

 

I can get the TN part:

grep UPDATE * | awk '{print $2}'|sed 's/,,,//g' |sort -u > file

 

Then do a:

for tn in `cat file`
do

grep $tn second.file

...

 

But, I have no idea how to capture the ACCOUNT information from one file, and compare to second...

Appreciate the help.

6 REPLIES 6
Patrick Wallek
Honored Contributor

Re: sh script - find string in two different files and compare

What do you think of this:

 

# cat file1
.BEGIN
UPDATE/5552166619,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR

.BEGIN
UPDATE/5552194161,,,
.DELETE_ALL RELATED
ACCOUNT/52728912
.INSERT_RELATED/
RELATED/diffemail@myemail.net
.END_INSERT
.EOR

# cat file2
.BEGIN
UPDATE/5552166619,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR

.BEGIN
UPDATE/5552194161,,,
.DELETE_ALL RELATED
ACCOUNT/92728912
.INSERT_RELATED/
RELATED/diffemail@myemail.net
.END_INSERT
.EOR


# cat script
#!/usr/bin/sh

for UPDATE in $(grep UPDATE file1 | awk -F \/ '{print $2}' | sed 's/,,,//g')
do
FILE1ACCT=$(sed -n "/${UPDATE}/{n;n;p;}" file1 | awk -F \/ '{print $2}')
FILE2ACCT=$(sed -n "/${UPDATE}/{n;n;p;}" file2 | awk -F \/ '{print $2}')
if (( ${FILE1ACCT} == ${FILE2ACCT} )) ; then
   echo "The Account numbers are the same in FILE1 and FILE2 for update number ${UPDATE}"
   echo "Update # = ${UPDATE} ; FILE1 ACCT# = ${FILE1ACCT} ; FILE2 ACCT# = ${FILE2ACCT}"
   echo ""
else
   echo "The Account numbers are DIFFERENT in FILE1 and FILE2 for update number ${UPDATE}"
   echo "Update # = ${UPDATE} ; FILE1 ACCT# = ${FILE1ACCT} ; FILE2 ACCT# = ${FILE2ACCT}"
   echo ""
fi
done

 And here's what it looks like when the script is run:

 

# ./script
The Account numbers are the same in FILE1 and FILE2 for update number 5552166619
Update # = 5552166619 ; FILE1 ACCT# = 52727963 ; FILE2 ACCT# = 52727963

The Account numbers are DIFFERENT in FILE1 and FILE2 for update number 5552194161
Update # = 5552194161 ; FILE1 ACCT# = 52728912 ; FILE2 ACCT# = 92728912

 The key is the 'sed -n' statement above.

 

It searches through the file for the value of the UPDATE# (hopefully there is never more than 1 occurrence of any particular update number in a file) obtained from file1 and looks for the corresponding account numbers in both file1 and file2 by printing the 2nd line below the UPDATE #.  This also assumes that the Account number is always 2 lines below the Update number.

Ratzie
Super Advisor

Re: sh script - find string in two different files and compare

I will try, but the file2 is tricking me as I need to look the directory that has muliple files in it for the TN... Then pull the account and check.
Patrick Wallek
Honored Contributor

Re: sh script - find string in two different files and compare

Is the FILE1 file in same directory as the other files you need to check?

Patrick Wallek
Honored Contributor

Re: sh script - find string in two different files and compare

OK, file1 is the same as above and is in the /root/pw directory.

 

I have created 2 other files called file3 and file4 in the /root/pw/test directory.

 

Here are the files, the script and the results:

 

# pwd
/root/pw

# cat test/file3
.BEGIN
UPDATE/1234567890,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR

.BEGIN
UPDATE/5552194161,,,
.DELETE_ALL RELATED
ACCOUNT/92728912
.INSERT_RELATED/
RELATED/diffemail@myemail.net
.END_INSERT
.EOR


# cat test/file4
.BEGIN
UPDATE/5552166619,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR

.BEGIN
UPDATE/2345678901,,,
.DELETE_ALL RELATED
ACCOUNT/92728912
.INSERT_RELATED/
RELATED/diffemail@myemail.net
.END_INSERT
.EOR


# cat script
#!/usr/bin/sh

for UPDATE in $(grep UPDATE file1 | awk -F \/ '{print $2}' | sed 's/,,,//g')
do
FILE1ACCT=$(sed -n "/${UPDATE}/{n;n;p;}" file1 | awk -F \/ '{print $2}')
UPDATEFILE=$(grep -l ${UPDATE} /root/pw/test/*)
FILE2ACCT=$(sed -n "/${UPDATE}/{n;n;p;}" ${UPDATEFILE} | awk -F \/ '{print $2}')
if (( ${FILE1ACCT} == ${FILE2ACCT} )) ; then
   echo "The Account numbers are the same in FILE1 and ${UPDATEFILE} for update number ${UPDATE}"
   echo "Update # = ${UPDATE} ; FILE1 ACCT# = ${FILE1ACCT} ; ${UPDATEFILE} ACCT# = ${FILE2ACCT}"
   echo ""
else
   echo "The Account numbers are DIFFERENT in FILE1 and ${UPDATEFILE} for update number ${UPDATE}"
   echo "Update # = ${UPDATE} ; FILE1 ACCT# = ${FILE1ACCT} ; ${UPDATEFILE} ACCT# = ${FILE2ACCT}"
   echo ""
fi
done


# ./script
The Account numbers are the same in FILE1 and /root/pw/test/file4 for update number 5552166619
Update # = 5552166619 ; FILE1 ACCT# = 52727963 ; /root/pw/test/file4 ACCT# = 52727963

The Account numbers are DIFFERENT in FILE1 and /root/pw/test/file3 for update number 5552194161
Update # = 5552194161 ; FILE1 ACCT# = 52728912 ; /root/pw/test/file3 ACCT# = 92728912

 The 'grep -l' in the script searches through the files in /root/pw/test and returns the filename of the file with the same UPDATE number.  The sed statement for FILE2ACCT then looks for the ACCT# in the file returned by the 'grep -l' command.

Patrick Wallek
Honored Contributor

Re: sh script - find string in two different files and compare

I have just added a check so that is an UPDATE # from file1 is NOT found in any files in the /root/pw/test directory, then the script will continue on.  My previous versions just hung.

 

NEW FILE1

# cat file1
.BEGIN
UPDATE/4567890123,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR

.BEGIN
UPDATE/5552166619,,,
.DELETE_ALL RELATED
ACCOUNT/52727963
.INSERT_RELATED/
RELATED/myemail@email.net
.END_INSERT
.EOR

.BEGIN
UPDATE/5552194161,,,
.DELETE_ALL RELATED
ACCOUNT/52728912
.INSERT_RELATED/
RELATED/diffemail@myemail.net
.END_INSERT
.EOR



NEW SCRIPT

# cat script
#!/usr/bin/sh

for UPDATE in $(grep UPDATE file1 | awk -F \/ '{print $2}' | sed 's/,,,//g')
do
FILE1ACCT=$(sed -n "/${UPDATE}/{n;n;p;}" file1 | awk -F \/ '{print $2}')
UPDATEFILE=$(grep -l ${UPDATE} /root/pw/test/*)
if [[ ${UPDATEFILE} != "" ]] ; then
   FILE2ACCT=$(sed -n "/${UPDATE}/{n;n;p;}" ${UPDATEFILE} | awk -F \/ '{print $2}')
   if (( ${FILE1ACCT} == ${FILE2ACCT} )) ; then
      echo "The Account numbers are the same in FILE1 and ${UPDATEFILE} for update number ${UPDATE}"
      echo "Update # = ${UPDATE} ; FILE1 ACCT# = ${FILE1ACCT} ; ${UPDATEFILE} ACCT# = ${FILE2ACCT}"
      echo ""
   else
      echo "The Account numbers are DIFFERENT in FILE1 and ${UPDATEFILE} for update number ${UPDATE}"
      echo "Update # = ${UPDATE} ; FILE1 ACCT# = ${FILE1ACCT} ; ${UPDATEFILE} ACCT# = ${FILE2ACCT}"
      echo ""
   fi
fi
done


# ./script
The Account numbers are the same in FILE1 and /root/pw/test/file4 for update number 5552166619
Update # = 5552166619 ; FILE1 ACCT# = 52727963 ; /root/pw/test/file4 ACCT# = 52727963

The Account numbers are DIFFERENT in FILE1 and /root/pw/test/file3 for update number 5552194161
Update # = 5552194161 ; FILE1 ACCT# = 52728912 ; /root/pw/test/file3 ACCT# = 92728912

 

Dennis Handly
Acclaimed Contributor

Re: sh script - find string in two different files and compare

Here is something a little easier to understand and is performant since it uses a hash and reads each file once:

 

awk -v master=file1 '
# finds the number after "/" and before any ","
function crack_number(field) {
   i = split(field, fields, "[/,]")
#   print "found", i, "fields:", fields[2]
   return fields[2] ""  # make sure it is a string
}
BEGIN {
# create a map from update # to account #
while (getline < master > 0) {
   if ($1 ~ "UPDATE") {
      update = crack_number($1)
      continue
   }
   if ($1 ~ "ACCOUNT") {
      account = crack_number($1)
#      print update "|" account
      map[update] = account
      continue
   }
}
close(master)
}
/UPDATE/ {
   update = crack_number($1)
   next
}
/ACCOUNT/ {
   account = crack_number($1)
   if (update == "") {
      print "No update # for account", account
      next
   }
   account_m = map[update]
   if (account_m == "") {
#      print "update number", update, "in", FILENAME, "skipped"
      update = ""
      next
   }
   if (account == account_m) {
      print "The Account numbers are the same in FILE1 and", FILENAME, "for update number", update
      print "Update # =", update "; FILE1 ACCT# =", account_m, "; FILE2 ACCT# =", account
   } else {
      print "The Account numbers are DIFFERENT in FILE1 and", FILENAME, "for update number", update
      print "Update # =", update "; FILE1 ACCT# =", account_m, "; FILE2 ACCT# =", account
   }
   print ""
   update = ""
}' file3 file4