- live browsing of Self object memory -

lobby globalscomparator

A comparator can compare two indexable collections to find
their longest common subsequence [LCS] (this is the same as what Unix "diff"
does).

To compare two arbitrary collections, use the following expression:
    comparator compare: firstCollection With: secondCollection
This will return a new comparator.  To find out which elements of
firstCollection are in the LCS, send the message matchVector to the
comparator.  This will return a vector the same size as firstCollection,
which will contain the indices of corresponding elements of secondCollection.
To get the inverse mapping (from secondColelction to firstCollection),
send the message inverseMatchVector.
To get a sequence of indices of non-matching elements of firstCollection
(secondCollection) send noMatches (inverseNoMatches).
  
The following expression performs the same function as Unix diff,
writing results to stdout:
    (comparator
      compareFileNamed: fileName1
         WithFileNamed: fileName2) printIt.
This is an extractable application: see the `diffApplication' object
for details.
      
If you have UI1 loaded, the following expression will provide a visual
comparison of the contents of the named files (like diff):
    (comparator
      compareFileNamed: fileName1
         WithFileNamed: fileName2) openView
                 
See the comment in modules diff for bibliographic information.

CopyDowns: vector

CreatorPath: globals comparator

Module: diff

parent* = traits clonable

accessing

collection1 = ( coll1)
collection2 = ( coll2)
diff = 'I implement diff'I added this slot for browsing (implementors of diff). -- Ungar July 2002

computing differences

inverseMatchVector =
( 
    matches ifNil: [inverseMatches: nil].
    inverseMatches ifNil: [
        inverseMatches:  vector copySize: coll2 size.
        matchVector do: [| :el. :index |
            el ifNotNil: [inverseMatches at: el Put: index]
        ].
    ].
    inverseMatches)
inverseNoMatches = ( nilIndexes: inverseMatchVector)
matchVector =
( 
    matches ifNil: [computeMatchVector].
    matches)
noMatches = ( nilIndexes: matchVector)
buildEquivalenceClasses =
( | coll2map |

    "Map each element of coll2 to the sequence of indexes it occupies."
    coll2map:  dictionary copy desiredMinCapacity:  (end2 - start2 max: 1) min: 256.
    start2 to: end2 Do: [
        | :index. |
        coll2map
                   if:  (coll2 at: index)
          IsPresentDo:  [| :seq |      seq add: index]
          IfAbsentPut:  [ sequence copyAddLast: index]
                AndDo:  []
    ].
    coll2map)
buildLinks:Seq:Map: =
( 
    start1 to: end1 Do: [ | :i. |
        coll2map
                   if:  (coll1 at: i)
          IsPresentDo:  [| :res |
                         res reverseDo: [| :j. k. |
                           k:  thresh replaceNextLargerWith: j.
                           k ifNotNil: [
                             | lnk.  |
                             lnk:  prlink copy.
                             lnk next:
                               (k = 0
                                 ifTrue: [nil]
                                  False: [links at: k pred]).
                             lnk first: i.
                             lnk last: j.
                             links at: k Put: lnk
                           ]
                         ]
                        ]
           IfAbsentDo: []
    ])
coll1 = nil
coll2 = nil
computeMatchVector =
( | coll2map. links. thresh |
    "Here's the heart of the matter.  See the original paper for an
     explanation and proof (ref in module comment)."

    start1:   coll1 firstKey.
    end1:     coll1 lastKey.
    start2:   coll2 firstKey.
    end2:     coll2 lastKey.
    matches:  vector copySize: coll1 size.

    "first trim the ends of common elements"
    trimStart.
    trimEnd.

    "Now compute equivalence classes of positions of elements."
    coll2map: buildEquivalenceClasses.

    thresh:  sortedSequence copy.
    thresh desiredMinCapacity: end2 - start2.
    links:
        vector copySize:
            (end1 - start1) succ  min:  (end2 - start2) succ.

    "Build the DAG of subsequences."
    buildLinks: links Seq: thresh Map: coll2map.

    "Scan the DAG for the longest common subsequence."
    scanLinks: links Seq: thresh)
end1 = nil
end2 = nil
hashObj = comparator hashObj
inverseMatches = nil
linesFromTextLines: =
( 
    t lines asVector copyMappedBy:
      [| :l | "l canonicalize here to use canonical strings"
         hashObj copySet: l])
Convert text lines to a vector of string-like objects (which do not canonicalize when hashed).
matches = nil
nilIndexes: =
( | seq |
    seq: sequence copy.
    v do: [| :v. :k |
      v ifNil: [seq add: k]
    ].
    seq)
Answer a sequence of the indexes of nil elements in v.
prlink = comparator prlink
scanLinks:Seq: =
( 
    thresh isEmpty  ifFalse: [| lnk |
        lnk:   links at:  thresh size pred.
        [|:exit|
            matches 
                 at:  lnk first
                Put:  lnk last.
            lnk next ifNil: exit.
            lnk:  lnk next
        ] loopExit
    ])
Walk the longest chain reading out the longest subsequence.
start1 = nil
start2 = nil
trimEnd =
( 
    [|:exit|
        start1 <= end1  ifFalse: exit.
        start2 <= end2  ifFalse: exit.
        (coll1 at: end1) = (coll2 at: end2)  ifFalse: exit.
        matchVector at: end1 - coll1 firstKey  Put: end2.
        end1:  end1 pred.
        end2:  end2 pred
    ] loopExit)
trimStart =
( 
    [|:exit|
        start1 <= end1  ifFalse: exit.
        start2 <= end2  ifFalse: exit.
        (coll1 at: start1) = (coll2 at: start2)  ifFalse: exit.
        matchVector at: start1 - coll1 firstKey  Put: start2.
        start1:  start1 succ.
        start2:  start2 succ
    ] loopExit)

copying

compare:With: =
( | comp |
    comp: copy.
    comp collection1: c1.
    comp collection2: c2.
    comp)
compareFileNamed:WithFileNamed: = ( compareFileNamed: f1 WithFileNamed: f2 IfFail: raiseError)Create a comparator to compare the files named f1 and f2.
compareFileNamed:WithFileNamed:IfFail: =
( | cmp. d1. d2 |
    d1: os_file openForReading: f1 IfFail: [|:e| ^failBlock value: f1 With: e].
    d2: os_file openForReading: f2 IfFail: [|:e| ^failBlock value: f2 With: e].
    cmp:
     compare:  (linesFromTextLines: d1 contents asTextLines)
        With:  (linesFromTextLines: d2 contents asTextLines).
    d1 close.
    d2 close.
    cmp)
Create a comparator to compare the files named f1 and f2. Evaluate failBlock if the files cannot be read (with the offending filename and the error string).

diff application

printIt =
( | i0. i1. j0. j1. m. mLast |
    m: matchVector copy. "will be modified"
    i0:  m firstKey.
    mLast: m lastKey.

    [i0 <= mLast] whileTrue: [

      [   (i0 <= mLast)
       && [(m at: i0) = (m at: i0 pred IfAbsent: [collection2 firstKey pred]) succ]
      ] whileTrue: [
          i0: i0 succ.
      ].
      j0: (m at: i0 pred IfAbsent: [collection2 firstKey pred]) succ.
      i1: i0 pred.

      [   (i1 < mLast)
       && [(m at: i1 succ) isNil]
      ] whileTrue: [
          i1: i1 succ.
      ].
      j1:  (m at: i1 succ IfAbsent: [collection2 lastKey succ]) pred.
      m at: i1 Put: j1.

      reportChangeFromA: i0 To: i1 FromB: j0 To: j1.

      i0: i1 succ.
    ])
Print the output in a manner compatible with Unix diff.
diffApplication = comparator diffApplication
printRangeFrom:To: =
( 
    "+1 for compatibility with diff"
    (a min: b) succ print.
    a < b ifTrue: [',' print.  b succ print])
reportChangeFromA:To:FromB:To: =
( 
    (a > b) && [c > d] ifTrue: [^self].

    printRangeFrom: a To: b.
    (a > b ifTrue: 'a' False: [c > d ifTrue: 'd' False: 'c']) print.
    printRangeFrom: c To: d.
    '\n' print.

    a to: b Do: [| :i |
      '< ' print.
      (collection1 at: i) string printLine.
    ].
    (a <= b) && [c <= d] ifTrue: ['---\n' print].
    c to: d Do: [| :i |
      '> ' print.
      (collection2 at: i) string printLine.
    ].

    self)

initialization

collection1: = ( coll1: c. matches: nil)
collection2: = ( coll2: c. matches: nil)

tests