Operating System - OpenVMS
1751914 Members
4731 Online
108783 Solutions
New Discussion

Re: Strange output from DIFFERENCES in DCL

 
Pavel_Adamus
Occasional Visitor

Strange output from DIFFERENCES in DCL

Hi there,

 

I got a result from file DIFFERENCES command what I didn't understand.

 

 

DIFF/PARALLEL/WIDTH=30 FILE_01 FILE_02

 

 

Output:

 

-----------------------------
File 01      | File 02
------- 4 ----------- 4 -----
15           | 12
20           | 13
25           | 14
30           | 30
35           | 35
                | 36
                | 37
-----------------------------

Number of difference sections-
found: 1
Number of difference records -
found: 7

 

 

Input:

 

File_01  File_02

 01       01 
 05       05
 10       10
 15       12

 20       13
 25       14
 30       30
 35       35
 40       36
 45       37
 50       40

             45

             50

 

 

 

It looks that it reads it like in sections but even I put difference between 35 and 40 in file_01 it didn't work better.

 

 

-----------------------------
File 01      | File 02
------- 4 ----------- 4 -----
15           | 12
20           | 13
25           | 14
30           | 30
35           | 35
XX           | 36
                | 37
-----------------------------

Number of difference sections-
found: 1
Number of difference records -
found: 7

 

 

Do you have any idea what's wrong?

 

 

4 REPLIES 4
Hoff
Honored Contributor

Re: Strange output from DIFFERENCES in DCL

DIFFERENCES is simplistic in its processing, and the default processing is arguably tailored for and intended for text files.  

 

See /WINDOWS=x and /MATCH=y to to adjust what DIFFERENCES does here, and see if you can convince DIFFERENCES to display something closer to what you want.

 

For a file such as the data shown, I'd probably either use a DCL file loop or maybe the MERGE command, as — though you don't provide any background on the problem or your goals — you might not really looking for DIFFERENCES here so much as maybe tools to sort or merge the data, or to look for duplicates in the merge?

 

I'm guessing here, based on the data shown.  If I've guessed wrong, please consider providing some details on what you think should happen here, and on the data involved.

Pavel_Adamus
Occasional Visitor

Re: Strange output from DIFFERENCES in DCL

Thanks for your reply.

 

I need simple file differences like on the picture:

 

 

 

 

 

 

I'm just wondering why the rows with values 30, 35 is maked by DIFF as different if they're not?

 

-----------------------------
File 01      | File 02
------- 4 ----------- 4 -----
15           | 12
20           | 13
25           | 14
30           | 30
35           | 35
                | 36
                | 37
-----------------------------

H.Becker
Honored Contributor

Re: Strange output from DIFFERENCES in DCL

As already mentioned, have a look at /match

 

$ help diff/match
  /MATCH

        /MATCH=size

     Specifies the number of records that should indicate matching
     data after a difference is found. By default, after the
     DIFFERENCES command finds unmatched records, it assumes that the
     files once again match after it finds three sequential records
     that match. Use the /MATCH qualifier to override the default
     match size of 3.

     You can increase the /MATCH qualifier value if you feel that
     the DIFFERENCES command is incorrectly matching sections of
     the master and revision input files after it has detected a
     difference.

you may want to add  /match=1, which gives an output like

 

------- 4 ----------- 4 -----
15            |  12          
20            |  13          
25            |  14          
------- 9 ----------- 9 -----
              |  36          
              |  37          
-----------------------------

 

Hoff
Honored Contributor

Re: Strange output from DIFFERENCES in DCL

There is what you seem to think DIFFERENCES does, and what it actually does.

 

DIFFERENCES looks at a range of text records within a window when it finds a mismatch, and then at the number of records that must then be matched before the file processing is assumed to be resynchronized across the two files being processed.  

 

Once DIFFERENCES has found differences, it then goes looking for not-differences to resume its processing.

 

DIFFERENCES is not "anchored" by the file record within the file, like might be expected with a simple comparison across the same record in two arrays in a program or across the same record in two  RMS relative files.  Because DIFFERENCES expects records might be added into a file, it will try to resynchronize and to then resume matching the two files.   If the file contents are arbitrary text records that start out identical but file 2 gets records added, then record 100 in file 1 might eventually be found to match record 500 in file 2, assuming somebody added 400 records into file 2, and that the records starting at 100 in file 1 and starting at 500 in file 2  meet or exceed the /MATCH specification.  Once (enough) records are matched irrespective of their relative position in the file, DIFFERENCES will then continue reading records in file 1 and in file 2 looking for the next difference, which might be at (say) 200 in file 2 and 600 in file 2, where somebody might have added another batch of records into file 2.

 

Again, DIFFERENCES does not implement record-index-based comparsions, it's looking at the records themselves.

 

DIFFERENCES works well for paragraphs  of text, but generally stinks for sorting and merging data such as the example text that you're using.  (I'm not sure if you're just using numbers here, or if you're actually trying to process numbers with DIFFERENCES.)

 

Please read the DIFFERENCES documentation, and experiement with /WINDW and /MATCH.  Or roll your own processing with some DCL, or with SORT / MERGE, depending on what you're up to.

 

For an example of some explicit DCL that does some roughly similar matching to what DIFFERENCES does, please see the http://www.digiater.nl/openvms/freeware/v80/hoffman_examples/diff_directories.com example from the OpenVMS freeware.  This tool looks for differences in the entries present in two different directories and — like DIFFERENCES — tries to resynchronize its processing after finding differences; once matching filenames are found in both directories.