shell script problem..

amonamon · ‎10-01-2007

Hello can anyone give fast and good solution on this:
I have 2 files:
fileA:

629122
629145
629111
629000
629999
629787
629444

fileB:
123569
789999
629122
629145
777777
629111
000000
629999
629787
629444
989898
777777

and result should be :

629000

becouse it is the only line that is in fileA and not in fileB

any help..I tryed something with comm..but..no use..

Oviwan · ‎10-01-2007

hey

check the diff command:
$diff fileA fileB

Regards

MarkSyder · ‎10-01-2007

One of those "scrap of paper" moments - I haven't tried this but think it will work:

for i in `cat fileA`
do
if grep $i fileB
then
:
else
echo $i
done

Mark Syder (like the drink but spelt different)

The triumph of evil requires only that good men do nothing

john korterman · ‎10-01-2007

Hi,

try this first:
$ sort fileA >./fileA.s
$ sort fileB >./fileB.s

$ comm -23 fileA.s fileB.s
629000

regards,
John K.

it would be nice if you always got a second chance

amonamon · ‎10-01-2007

noo...
Problem is becouse file B is very big and also fileA

files are not sorted..
so most of content from fileA is in fileB but there are 10-20 lines that are in fileA that fileB does not have..

I think I got it..
sort -b fileA > fileA.sort
sort -b fileB > fileB.sort

comm -23 fileA.sort fileB.sort

Ian Lochray · ‎10-01-2007

grep -v -f fileb filea

MarkSyder · ‎10-01-2007

Ian,

Your solution will show every line that doesn't match. If one line matches and the other (say) 3000 don't, it will show him those 3000 lines.

Mark

The triumph of evil requires only that good men do nothing

Dennis Handly · ‎10-01-2007

>can anyone give fast and good solution on this:

Fast to run? Or fast to implement?

>Mark: I haven't tried this but think it will work:

Using grep will have terrible performance for large files. Using that cat(1) will also tokenize each line, if there is more than one field that would fail.

John has the right solution with sort and comm. If there are duplicates, you may or may want to use sort -u.

>noo... Problem is because file B is very big and also fileA

Were you complaining about the sort, sort, comm solution?

>Mark: Ian, Your solution will show every line that doesn't match.

Isn't that what amonamon wants? It gives 629000. It will have all the lines of fileA that are not in fileB.

Dennis Handly · ‎10-01-2007

If fileA and fileB are about 2000 records, I get:
grep:
real 0m0.02s user 0m0.03s sys 0m0.01s

sort, sort, comm:
real 0m0.07s user 0m0.01s sys 0m0.01s

With 30,000 they break even and comm is better.

MarkSyder · ‎10-01-2007

Dennis,

I am assuming that Ian's solution would involve a loop like my earlier suggestion. The first iteration of the loop would do grep -v 123569 fileA. This would output every line in fileA that did not contain 123569. Similarly, the second iteration would output every line that did not contain 789999 etc.

As Ian has written it, amonamon would actually be searching for the string fileB in fileA.

Mark

The triumph of evil requires only that good men do nothing

Dennis Handly · ‎10-01-2007

>Mark: I am assuming that Ian's solution would involve a loop like my earlier suggestion.

Any loop would be IN grep. That's what grep -f does. Looks at the file fileB for each string.

>The first iteration of the loop would do grep -v 123569 fileA.

No. grep looks at each record of fileA and if it matches any line of fileB, it isn't printed. Perhaps you have your gedanken for-loops interchanged. :-)

>As Ian has written it, amonamon would actually be searching for the string fileB in fileA.

That's not what "grep -v -f file" does. Did you try it?

Ernesto Cappello · ‎10-02-2007

Hi Amonamon, this is the command:

diff A B | grep "<" | awk '{print $2}'

to satisfy your request.

Best regards.
Ernesto

MarkSyder · ‎10-02-2007

Dennis,

No, I missed the -f!

I've never used it and have not tried it but will bear it in mind for future use.

Mark

The triumph of evil requires only that good men do nothing

Sandman! · ‎10-02-2007

> any help..I tryed something with comm..but..no use..

Did you sort the files before filtering your criteria thru comm(1) i.e.

# sort fileA > sfA
# sort fileB > sfB
# comm -23 sfA sfB
629000

blah2blah · ‎10-02-2007

a faster alogrithm would be to use awk or perl to read in file b put everything in an array or hash, the read file a and check to see if entry is in the hash if it isn't print it.

something like this. maybe someone that knows perl can correct my mistakes.

open(FILEHANDLE,"
while (){
@a[chop()]=1;
}

open(FILEHANDLE,"
while (){
if @a[chop()] print;
}

Sandman! · ‎10-02-2007

never mind my post...John Korterman beat me to it by a mile :)

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

shell script problem..

shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..

Re: shell script problem..