1*cda5da8dSAndroid Build Coastguard Worker""" 2*cda5da8dSAndroid Build Coastguard WorkerModule difflib -- helpers for computing deltas between objects. 3*cda5da8dSAndroid Build Coastguard Worker 4*cda5da8dSAndroid Build Coastguard WorkerFunction get_close_matches(word, possibilities, n=3, cutoff=0.6): 5*cda5da8dSAndroid Build Coastguard Worker Use SequenceMatcher to return list of the best "good enough" matches. 6*cda5da8dSAndroid Build Coastguard Worker 7*cda5da8dSAndroid Build Coastguard WorkerFunction context_diff(a, b): 8*cda5da8dSAndroid Build Coastguard Worker For two lists of strings, return a delta in context diff format. 9*cda5da8dSAndroid Build Coastguard Worker 10*cda5da8dSAndroid Build Coastguard WorkerFunction ndiff(a, b): 11*cda5da8dSAndroid Build Coastguard Worker Return a delta: the difference between `a` and `b` (lists of strings). 12*cda5da8dSAndroid Build Coastguard Worker 13*cda5da8dSAndroid Build Coastguard WorkerFunction restore(delta, which): 14*cda5da8dSAndroid Build Coastguard Worker Return one of the two sequences that generated an ndiff delta. 15*cda5da8dSAndroid Build Coastguard Worker 16*cda5da8dSAndroid Build Coastguard WorkerFunction unified_diff(a, b): 17*cda5da8dSAndroid Build Coastguard Worker For two lists of strings, return a delta in unified diff format. 18*cda5da8dSAndroid Build Coastguard Worker 19*cda5da8dSAndroid Build Coastguard WorkerClass SequenceMatcher: 20*cda5da8dSAndroid Build Coastguard Worker A flexible class for comparing pairs of sequences of any type. 21*cda5da8dSAndroid Build Coastguard Worker 22*cda5da8dSAndroid Build Coastguard WorkerClass Differ: 23*cda5da8dSAndroid Build Coastguard Worker For producing human-readable deltas from sequences of lines of text. 24*cda5da8dSAndroid Build Coastguard Worker 25*cda5da8dSAndroid Build Coastguard WorkerClass HtmlDiff: 26*cda5da8dSAndroid Build Coastguard Worker For producing HTML side by side comparison with change highlights. 27*cda5da8dSAndroid Build Coastguard Worker""" 28*cda5da8dSAndroid Build Coastguard Worker 29*cda5da8dSAndroid Build Coastguard Worker__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher', 30*cda5da8dSAndroid Build Coastguard Worker 'Differ','IS_CHARACTER_JUNK', 'IS_LINE_JUNK', 'context_diff', 31*cda5da8dSAndroid Build Coastguard Worker 'unified_diff', 'diff_bytes', 'HtmlDiff', 'Match'] 32*cda5da8dSAndroid Build Coastguard Worker 33*cda5da8dSAndroid Build Coastguard Workerfrom heapq import nlargest as _nlargest 34*cda5da8dSAndroid Build Coastguard Workerfrom collections import namedtuple as _namedtuple 35*cda5da8dSAndroid Build Coastguard Workerfrom types import GenericAlias 36*cda5da8dSAndroid Build Coastguard Worker 37*cda5da8dSAndroid Build Coastguard WorkerMatch = _namedtuple('Match', 'a b size') 38*cda5da8dSAndroid Build Coastguard Worker 39*cda5da8dSAndroid Build Coastguard Workerdef _calculate_ratio(matches, length): 40*cda5da8dSAndroid Build Coastguard Worker if length: 41*cda5da8dSAndroid Build Coastguard Worker return 2.0 * matches / length 42*cda5da8dSAndroid Build Coastguard Worker return 1.0 43*cda5da8dSAndroid Build Coastguard Worker 44*cda5da8dSAndroid Build Coastguard Workerclass SequenceMatcher: 45*cda5da8dSAndroid Build Coastguard Worker 46*cda5da8dSAndroid Build Coastguard Worker """ 47*cda5da8dSAndroid Build Coastguard Worker SequenceMatcher is a flexible class for comparing pairs of sequences of 48*cda5da8dSAndroid Build Coastguard Worker any type, so long as the sequence elements are hashable. The basic 49*cda5da8dSAndroid Build Coastguard Worker algorithm predates, and is a little fancier than, an algorithm 50*cda5da8dSAndroid Build Coastguard Worker published in the late 1980's by Ratcliff and Obershelp under the 51*cda5da8dSAndroid Build Coastguard Worker hyperbolic name "gestalt pattern matching". The basic idea is to find 52*cda5da8dSAndroid Build Coastguard Worker the longest contiguous matching subsequence that contains no "junk" 53*cda5da8dSAndroid Build Coastguard Worker elements (R-O doesn't address junk). The same idea is then applied 54*cda5da8dSAndroid Build Coastguard Worker recursively to the pieces of the sequences to the left and to the right 55*cda5da8dSAndroid Build Coastguard Worker of the matching subsequence. This does not yield minimal edit 56*cda5da8dSAndroid Build Coastguard Worker sequences, but does tend to yield matches that "look right" to people. 57*cda5da8dSAndroid Build Coastguard Worker 58*cda5da8dSAndroid Build Coastguard Worker SequenceMatcher tries to compute a "human-friendly diff" between two 59*cda5da8dSAndroid Build Coastguard Worker sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the 60*cda5da8dSAndroid Build Coastguard Worker longest *contiguous* & junk-free matching subsequence. That's what 61*cda5da8dSAndroid Build Coastguard Worker catches peoples' eyes. The Windows(tm) windiff has another interesting 62*cda5da8dSAndroid Build Coastguard Worker notion, pairing up elements that appear uniquely in each sequence. 63*cda5da8dSAndroid Build Coastguard Worker That, and the method here, appear to yield more intuitive difference 64*cda5da8dSAndroid Build Coastguard Worker reports than does diff. This method appears to be the least vulnerable 65*cda5da8dSAndroid Build Coastguard Worker to syncing up on blocks of "junk lines", though (like blank lines in 66*cda5da8dSAndroid Build Coastguard Worker ordinary text files, or maybe "<P>" lines in HTML files). That may be 67*cda5da8dSAndroid Build Coastguard Worker because this is the only method of the 3 that has a *concept* of 68*cda5da8dSAndroid Build Coastguard Worker "junk" <wink>. 69*cda5da8dSAndroid Build Coastguard Worker 70*cda5da8dSAndroid Build Coastguard Worker Example, comparing two strings, and considering blanks to be "junk": 71*cda5da8dSAndroid Build Coastguard Worker 72*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(lambda x: x == " ", 73*cda5da8dSAndroid Build Coastguard Worker ... "private Thread currentThread;", 74*cda5da8dSAndroid Build Coastguard Worker ... "private volatile Thread currentThread;") 75*cda5da8dSAndroid Build Coastguard Worker >>> 76*cda5da8dSAndroid Build Coastguard Worker 77*cda5da8dSAndroid Build Coastguard Worker .ratio() returns a float in [0, 1], measuring the "similarity" of the 78*cda5da8dSAndroid Build Coastguard Worker sequences. As a rule of thumb, a .ratio() value over 0.6 means the 79*cda5da8dSAndroid Build Coastguard Worker sequences are close matches: 80*cda5da8dSAndroid Build Coastguard Worker 81*cda5da8dSAndroid Build Coastguard Worker >>> print(round(s.ratio(), 3)) 82*cda5da8dSAndroid Build Coastguard Worker 0.866 83*cda5da8dSAndroid Build Coastguard Worker >>> 84*cda5da8dSAndroid Build Coastguard Worker 85*cda5da8dSAndroid Build Coastguard Worker If you're only interested in where the sequences match, 86*cda5da8dSAndroid Build Coastguard Worker .get_matching_blocks() is handy: 87*cda5da8dSAndroid Build Coastguard Worker 88*cda5da8dSAndroid Build Coastguard Worker >>> for block in s.get_matching_blocks(): 89*cda5da8dSAndroid Build Coastguard Worker ... print("a[%d] and b[%d] match for %d elements" % block) 90*cda5da8dSAndroid Build Coastguard Worker a[0] and b[0] match for 8 elements 91*cda5da8dSAndroid Build Coastguard Worker a[8] and b[17] match for 21 elements 92*cda5da8dSAndroid Build Coastguard Worker a[29] and b[38] match for 0 elements 93*cda5da8dSAndroid Build Coastguard Worker 94*cda5da8dSAndroid Build Coastguard Worker Note that the last tuple returned by .get_matching_blocks() is always a 95*cda5da8dSAndroid Build Coastguard Worker dummy, (len(a), len(b), 0), and this is the only case in which the last 96*cda5da8dSAndroid Build Coastguard Worker tuple element (number of elements matched) is 0. 97*cda5da8dSAndroid Build Coastguard Worker 98*cda5da8dSAndroid Build Coastguard Worker If you want to know how to change the first sequence into the second, 99*cda5da8dSAndroid Build Coastguard Worker use .get_opcodes(): 100*cda5da8dSAndroid Build Coastguard Worker 101*cda5da8dSAndroid Build Coastguard Worker >>> for opcode in s.get_opcodes(): 102*cda5da8dSAndroid Build Coastguard Worker ... print("%6s a[%d:%d] b[%d:%d]" % opcode) 103*cda5da8dSAndroid Build Coastguard Worker equal a[0:8] b[0:8] 104*cda5da8dSAndroid Build Coastguard Worker insert a[8:8] b[8:17] 105*cda5da8dSAndroid Build Coastguard Worker equal a[8:29] b[17:38] 106*cda5da8dSAndroid Build Coastguard Worker 107*cda5da8dSAndroid Build Coastguard Worker See the Differ class for a fancy human-friendly file differencer, which 108*cda5da8dSAndroid Build Coastguard Worker uses SequenceMatcher both to compare sequences of lines, and to compare 109*cda5da8dSAndroid Build Coastguard Worker sequences of characters within similar (near-matching) lines. 110*cda5da8dSAndroid Build Coastguard Worker 111*cda5da8dSAndroid Build Coastguard Worker See also function get_close_matches() in this module, which shows how 112*cda5da8dSAndroid Build Coastguard Worker simple code building on SequenceMatcher can be used to do useful work. 113*cda5da8dSAndroid Build Coastguard Worker 114*cda5da8dSAndroid Build Coastguard Worker Timing: Basic R-O is cubic time worst case and quadratic time expected 115*cda5da8dSAndroid Build Coastguard Worker case. SequenceMatcher is quadratic time for the worst case and has 116*cda5da8dSAndroid Build Coastguard Worker expected-case behavior dependent in a complicated way on how many 117*cda5da8dSAndroid Build Coastguard Worker elements the sequences have in common; best case time is linear. 118*cda5da8dSAndroid Build Coastguard Worker """ 119*cda5da8dSAndroid Build Coastguard Worker 120*cda5da8dSAndroid Build Coastguard Worker def __init__(self, isjunk=None, a='', b='', autojunk=True): 121*cda5da8dSAndroid Build Coastguard Worker """Construct a SequenceMatcher. 122*cda5da8dSAndroid Build Coastguard Worker 123*cda5da8dSAndroid Build Coastguard Worker Optional arg isjunk is None (the default), or a one-argument 124*cda5da8dSAndroid Build Coastguard Worker function that takes a sequence element and returns true iff the 125*cda5da8dSAndroid Build Coastguard Worker element is junk. None is equivalent to passing "lambda x: 0", i.e. 126*cda5da8dSAndroid Build Coastguard Worker no elements are considered to be junk. For example, pass 127*cda5da8dSAndroid Build Coastguard Worker lambda x: x in " \\t" 128*cda5da8dSAndroid Build Coastguard Worker if you're comparing lines as sequences of characters, and don't 129*cda5da8dSAndroid Build Coastguard Worker want to synch up on blanks or hard tabs. 130*cda5da8dSAndroid Build Coastguard Worker 131*cda5da8dSAndroid Build Coastguard Worker Optional arg a is the first of two sequences to be compared. By 132*cda5da8dSAndroid Build Coastguard Worker default, an empty string. The elements of a must be hashable. See 133*cda5da8dSAndroid Build Coastguard Worker also .set_seqs() and .set_seq1(). 134*cda5da8dSAndroid Build Coastguard Worker 135*cda5da8dSAndroid Build Coastguard Worker Optional arg b is the second of two sequences to be compared. By 136*cda5da8dSAndroid Build Coastguard Worker default, an empty string. The elements of b must be hashable. See 137*cda5da8dSAndroid Build Coastguard Worker also .set_seqs() and .set_seq2(). 138*cda5da8dSAndroid Build Coastguard Worker 139*cda5da8dSAndroid Build Coastguard Worker Optional arg autojunk should be set to False to disable the 140*cda5da8dSAndroid Build Coastguard Worker "automatic junk heuristic" that treats popular elements as junk 141*cda5da8dSAndroid Build Coastguard Worker (see module documentation for more information). 142*cda5da8dSAndroid Build Coastguard Worker """ 143*cda5da8dSAndroid Build Coastguard Worker 144*cda5da8dSAndroid Build Coastguard Worker # Members: 145*cda5da8dSAndroid Build Coastguard Worker # a 146*cda5da8dSAndroid Build Coastguard Worker # first sequence 147*cda5da8dSAndroid Build Coastguard Worker # b 148*cda5da8dSAndroid Build Coastguard Worker # second sequence; differences are computed as "what do 149*cda5da8dSAndroid Build Coastguard Worker # we need to do to 'a' to change it into 'b'?" 150*cda5da8dSAndroid Build Coastguard Worker # b2j 151*cda5da8dSAndroid Build Coastguard Worker # for x in b, b2j[x] is a list of the indices (into b) 152*cda5da8dSAndroid Build Coastguard Worker # at which x appears; junk and popular elements do not appear 153*cda5da8dSAndroid Build Coastguard Worker # fullbcount 154*cda5da8dSAndroid Build Coastguard Worker # for x in b, fullbcount[x] == the number of times x 155*cda5da8dSAndroid Build Coastguard Worker # appears in b; only materialized if really needed (used 156*cda5da8dSAndroid Build Coastguard Worker # only for computing quick_ratio()) 157*cda5da8dSAndroid Build Coastguard Worker # matching_blocks 158*cda5da8dSAndroid Build Coastguard Worker # a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k]; 159*cda5da8dSAndroid Build Coastguard Worker # ascending & non-overlapping in i and in j; terminated by 160*cda5da8dSAndroid Build Coastguard Worker # a dummy (len(a), len(b), 0) sentinel 161*cda5da8dSAndroid Build Coastguard Worker # opcodes 162*cda5da8dSAndroid Build Coastguard Worker # a list of (tag, i1, i2, j1, j2) tuples, where tag is 163*cda5da8dSAndroid Build Coastguard Worker # one of 164*cda5da8dSAndroid Build Coastguard Worker # 'replace' a[i1:i2] should be replaced by b[j1:j2] 165*cda5da8dSAndroid Build Coastguard Worker # 'delete' a[i1:i2] should be deleted 166*cda5da8dSAndroid Build Coastguard Worker # 'insert' b[j1:j2] should be inserted 167*cda5da8dSAndroid Build Coastguard Worker # 'equal' a[i1:i2] == b[j1:j2] 168*cda5da8dSAndroid Build Coastguard Worker # isjunk 169*cda5da8dSAndroid Build Coastguard Worker # a user-supplied function taking a sequence element and 170*cda5da8dSAndroid Build Coastguard Worker # returning true iff the element is "junk" -- this has 171*cda5da8dSAndroid Build Coastguard Worker # subtle but helpful effects on the algorithm, which I'll 172*cda5da8dSAndroid Build Coastguard Worker # get around to writing up someday <0.9 wink>. 173*cda5da8dSAndroid Build Coastguard Worker # DON'T USE! Only __chain_b uses this. Use "in self.bjunk". 174*cda5da8dSAndroid Build Coastguard Worker # bjunk 175*cda5da8dSAndroid Build Coastguard Worker # the items in b for which isjunk is True. 176*cda5da8dSAndroid Build Coastguard Worker # bpopular 177*cda5da8dSAndroid Build Coastguard Worker # nonjunk items in b treated as junk by the heuristic (if used). 178*cda5da8dSAndroid Build Coastguard Worker 179*cda5da8dSAndroid Build Coastguard Worker self.isjunk = isjunk 180*cda5da8dSAndroid Build Coastguard Worker self.a = self.b = None 181*cda5da8dSAndroid Build Coastguard Worker self.autojunk = autojunk 182*cda5da8dSAndroid Build Coastguard Worker self.set_seqs(a, b) 183*cda5da8dSAndroid Build Coastguard Worker 184*cda5da8dSAndroid Build Coastguard Worker def set_seqs(self, a, b): 185*cda5da8dSAndroid Build Coastguard Worker """Set the two sequences to be compared. 186*cda5da8dSAndroid Build Coastguard Worker 187*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher() 188*cda5da8dSAndroid Build Coastguard Worker >>> s.set_seqs("abcd", "bcde") 189*cda5da8dSAndroid Build Coastguard Worker >>> s.ratio() 190*cda5da8dSAndroid Build Coastguard Worker 0.75 191*cda5da8dSAndroid Build Coastguard Worker """ 192*cda5da8dSAndroid Build Coastguard Worker 193*cda5da8dSAndroid Build Coastguard Worker self.set_seq1(a) 194*cda5da8dSAndroid Build Coastguard Worker self.set_seq2(b) 195*cda5da8dSAndroid Build Coastguard Worker 196*cda5da8dSAndroid Build Coastguard Worker def set_seq1(self, a): 197*cda5da8dSAndroid Build Coastguard Worker """Set the first sequence to be compared. 198*cda5da8dSAndroid Build Coastguard Worker 199*cda5da8dSAndroid Build Coastguard Worker The second sequence to be compared is not changed. 200*cda5da8dSAndroid Build Coastguard Worker 201*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, "abcd", "bcde") 202*cda5da8dSAndroid Build Coastguard Worker >>> s.ratio() 203*cda5da8dSAndroid Build Coastguard Worker 0.75 204*cda5da8dSAndroid Build Coastguard Worker >>> s.set_seq1("bcde") 205*cda5da8dSAndroid Build Coastguard Worker >>> s.ratio() 206*cda5da8dSAndroid Build Coastguard Worker 1.0 207*cda5da8dSAndroid Build Coastguard Worker >>> 208*cda5da8dSAndroid Build Coastguard Worker 209*cda5da8dSAndroid Build Coastguard Worker SequenceMatcher computes and caches detailed information about the 210*cda5da8dSAndroid Build Coastguard Worker second sequence, so if you want to compare one sequence S against 211*cda5da8dSAndroid Build Coastguard Worker many sequences, use .set_seq2(S) once and call .set_seq1(x) 212*cda5da8dSAndroid Build Coastguard Worker repeatedly for each of the other sequences. 213*cda5da8dSAndroid Build Coastguard Worker 214*cda5da8dSAndroid Build Coastguard Worker See also set_seqs() and set_seq2(). 215*cda5da8dSAndroid Build Coastguard Worker """ 216*cda5da8dSAndroid Build Coastguard Worker 217*cda5da8dSAndroid Build Coastguard Worker if a is self.a: 218*cda5da8dSAndroid Build Coastguard Worker return 219*cda5da8dSAndroid Build Coastguard Worker self.a = a 220*cda5da8dSAndroid Build Coastguard Worker self.matching_blocks = self.opcodes = None 221*cda5da8dSAndroid Build Coastguard Worker 222*cda5da8dSAndroid Build Coastguard Worker def set_seq2(self, b): 223*cda5da8dSAndroid Build Coastguard Worker """Set the second sequence to be compared. 224*cda5da8dSAndroid Build Coastguard Worker 225*cda5da8dSAndroid Build Coastguard Worker The first sequence to be compared is not changed. 226*cda5da8dSAndroid Build Coastguard Worker 227*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, "abcd", "bcde") 228*cda5da8dSAndroid Build Coastguard Worker >>> s.ratio() 229*cda5da8dSAndroid Build Coastguard Worker 0.75 230*cda5da8dSAndroid Build Coastguard Worker >>> s.set_seq2("abcd") 231*cda5da8dSAndroid Build Coastguard Worker >>> s.ratio() 232*cda5da8dSAndroid Build Coastguard Worker 1.0 233*cda5da8dSAndroid Build Coastguard Worker >>> 234*cda5da8dSAndroid Build Coastguard Worker 235*cda5da8dSAndroid Build Coastguard Worker SequenceMatcher computes and caches detailed information about the 236*cda5da8dSAndroid Build Coastguard Worker second sequence, so if you want to compare one sequence S against 237*cda5da8dSAndroid Build Coastguard Worker many sequences, use .set_seq2(S) once and call .set_seq1(x) 238*cda5da8dSAndroid Build Coastguard Worker repeatedly for each of the other sequences. 239*cda5da8dSAndroid Build Coastguard Worker 240*cda5da8dSAndroid Build Coastguard Worker See also set_seqs() and set_seq1(). 241*cda5da8dSAndroid Build Coastguard Worker """ 242*cda5da8dSAndroid Build Coastguard Worker 243*cda5da8dSAndroid Build Coastguard Worker if b is self.b: 244*cda5da8dSAndroid Build Coastguard Worker return 245*cda5da8dSAndroid Build Coastguard Worker self.b = b 246*cda5da8dSAndroid Build Coastguard Worker self.matching_blocks = self.opcodes = None 247*cda5da8dSAndroid Build Coastguard Worker self.fullbcount = None 248*cda5da8dSAndroid Build Coastguard Worker self.__chain_b() 249*cda5da8dSAndroid Build Coastguard Worker 250*cda5da8dSAndroid Build Coastguard Worker # For each element x in b, set b2j[x] to a list of the indices in 251*cda5da8dSAndroid Build Coastguard Worker # b where x appears; the indices are in increasing order; note that 252*cda5da8dSAndroid Build Coastguard Worker # the number of times x appears in b is len(b2j[x]) ... 253*cda5da8dSAndroid Build Coastguard Worker # when self.isjunk is defined, junk elements don't show up in this 254*cda5da8dSAndroid Build Coastguard Worker # map at all, which stops the central find_longest_match method 255*cda5da8dSAndroid Build Coastguard Worker # from starting any matching block at a junk element ... 256*cda5da8dSAndroid Build Coastguard Worker # b2j also does not contain entries for "popular" elements, meaning 257*cda5da8dSAndroid Build Coastguard Worker # elements that account for more than 1 + 1% of the total elements, and 258*cda5da8dSAndroid Build Coastguard Worker # when the sequence is reasonably large (>= 200 elements); this can 259*cda5da8dSAndroid Build Coastguard Worker # be viewed as an adaptive notion of semi-junk, and yields an enormous 260*cda5da8dSAndroid Build Coastguard Worker # speedup when, e.g., comparing program files with hundreds of 261*cda5da8dSAndroid Build Coastguard Worker # instances of "return NULL;" ... 262*cda5da8dSAndroid Build Coastguard Worker # note that this is only called when b changes; so for cross-product 263*cda5da8dSAndroid Build Coastguard Worker # kinds of matches, it's best to call set_seq2 once, then set_seq1 264*cda5da8dSAndroid Build Coastguard Worker # repeatedly 265*cda5da8dSAndroid Build Coastguard Worker 266*cda5da8dSAndroid Build Coastguard Worker def __chain_b(self): 267*cda5da8dSAndroid Build Coastguard Worker # Because isjunk is a user-defined (not C) function, and we test 268*cda5da8dSAndroid Build Coastguard Worker # for junk a LOT, it's important to minimize the number of calls. 269*cda5da8dSAndroid Build Coastguard Worker # Before the tricks described here, __chain_b was by far the most 270*cda5da8dSAndroid Build Coastguard Worker # time-consuming routine in the whole module! If anyone sees 271*cda5da8dSAndroid Build Coastguard Worker # Jim Roskind, thank him again for profile.py -- I never would 272*cda5da8dSAndroid Build Coastguard Worker # have guessed that. 273*cda5da8dSAndroid Build Coastguard Worker # The first trick is to build b2j ignoring the possibility 274*cda5da8dSAndroid Build Coastguard Worker # of junk. I.e., we don't call isjunk at all yet. Throwing 275*cda5da8dSAndroid Build Coastguard Worker # out the junk later is much cheaper than building b2j "right" 276*cda5da8dSAndroid Build Coastguard Worker # from the start. 277*cda5da8dSAndroid Build Coastguard Worker b = self.b 278*cda5da8dSAndroid Build Coastguard Worker self.b2j = b2j = {} 279*cda5da8dSAndroid Build Coastguard Worker 280*cda5da8dSAndroid Build Coastguard Worker for i, elt in enumerate(b): 281*cda5da8dSAndroid Build Coastguard Worker indices = b2j.setdefault(elt, []) 282*cda5da8dSAndroid Build Coastguard Worker indices.append(i) 283*cda5da8dSAndroid Build Coastguard Worker 284*cda5da8dSAndroid Build Coastguard Worker # Purge junk elements 285*cda5da8dSAndroid Build Coastguard Worker self.bjunk = junk = set() 286*cda5da8dSAndroid Build Coastguard Worker isjunk = self.isjunk 287*cda5da8dSAndroid Build Coastguard Worker if isjunk: 288*cda5da8dSAndroid Build Coastguard Worker for elt in b2j.keys(): 289*cda5da8dSAndroid Build Coastguard Worker if isjunk(elt): 290*cda5da8dSAndroid Build Coastguard Worker junk.add(elt) 291*cda5da8dSAndroid Build Coastguard Worker for elt in junk: # separate loop avoids separate list of keys 292*cda5da8dSAndroid Build Coastguard Worker del b2j[elt] 293*cda5da8dSAndroid Build Coastguard Worker 294*cda5da8dSAndroid Build Coastguard Worker # Purge popular elements that are not junk 295*cda5da8dSAndroid Build Coastguard Worker self.bpopular = popular = set() 296*cda5da8dSAndroid Build Coastguard Worker n = len(b) 297*cda5da8dSAndroid Build Coastguard Worker if self.autojunk and n >= 200: 298*cda5da8dSAndroid Build Coastguard Worker ntest = n // 100 + 1 299*cda5da8dSAndroid Build Coastguard Worker for elt, idxs in b2j.items(): 300*cda5da8dSAndroid Build Coastguard Worker if len(idxs) > ntest: 301*cda5da8dSAndroid Build Coastguard Worker popular.add(elt) 302*cda5da8dSAndroid Build Coastguard Worker for elt in popular: # ditto; as fast for 1% deletion 303*cda5da8dSAndroid Build Coastguard Worker del b2j[elt] 304*cda5da8dSAndroid Build Coastguard Worker 305*cda5da8dSAndroid Build Coastguard Worker def find_longest_match(self, alo=0, ahi=None, blo=0, bhi=None): 306*cda5da8dSAndroid Build Coastguard Worker """Find longest matching block in a[alo:ahi] and b[blo:bhi]. 307*cda5da8dSAndroid Build Coastguard Worker 308*cda5da8dSAndroid Build Coastguard Worker By default it will find the longest match in the entirety of a and b. 309*cda5da8dSAndroid Build Coastguard Worker 310*cda5da8dSAndroid Build Coastguard Worker If isjunk is not defined: 311*cda5da8dSAndroid Build Coastguard Worker 312*cda5da8dSAndroid Build Coastguard Worker Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where 313*cda5da8dSAndroid Build Coastguard Worker alo <= i <= i+k <= ahi 314*cda5da8dSAndroid Build Coastguard Worker blo <= j <= j+k <= bhi 315*cda5da8dSAndroid Build Coastguard Worker and for all (i',j',k') meeting those conditions, 316*cda5da8dSAndroid Build Coastguard Worker k >= k' 317*cda5da8dSAndroid Build Coastguard Worker i <= i' 318*cda5da8dSAndroid Build Coastguard Worker and if i == i', j <= j' 319*cda5da8dSAndroid Build Coastguard Worker 320*cda5da8dSAndroid Build Coastguard Worker In other words, of all maximal matching blocks, return one that 321*cda5da8dSAndroid Build Coastguard Worker starts earliest in a, and of all those maximal matching blocks that 322*cda5da8dSAndroid Build Coastguard Worker start earliest in a, return the one that starts earliest in b. 323*cda5da8dSAndroid Build Coastguard Worker 324*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, " abcd", "abcd abcd") 325*cda5da8dSAndroid Build Coastguard Worker >>> s.find_longest_match(0, 5, 0, 9) 326*cda5da8dSAndroid Build Coastguard Worker Match(a=0, b=4, size=5) 327*cda5da8dSAndroid Build Coastguard Worker 328*cda5da8dSAndroid Build Coastguard Worker If isjunk is defined, first the longest matching block is 329*cda5da8dSAndroid Build Coastguard Worker determined as above, but with the additional restriction that no 330*cda5da8dSAndroid Build Coastguard Worker junk element appears in the block. Then that block is extended as 331*cda5da8dSAndroid Build Coastguard Worker far as possible by matching (only) junk elements on both sides. So 332*cda5da8dSAndroid Build Coastguard Worker the resulting block never matches on junk except as identical junk 333*cda5da8dSAndroid Build Coastguard Worker happens to be adjacent to an "interesting" match. 334*cda5da8dSAndroid Build Coastguard Worker 335*cda5da8dSAndroid Build Coastguard Worker Here's the same example as before, but considering blanks to be 336*cda5da8dSAndroid Build Coastguard Worker junk. That prevents " abcd" from matching the " abcd" at the tail 337*cda5da8dSAndroid Build Coastguard Worker end of the second sequence directly. Instead only the "abcd" can 338*cda5da8dSAndroid Build Coastguard Worker match, and matches the leftmost "abcd" in the second sequence: 339*cda5da8dSAndroid Build Coastguard Worker 340*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") 341*cda5da8dSAndroid Build Coastguard Worker >>> s.find_longest_match(0, 5, 0, 9) 342*cda5da8dSAndroid Build Coastguard Worker Match(a=1, b=0, size=4) 343*cda5da8dSAndroid Build Coastguard Worker 344*cda5da8dSAndroid Build Coastguard Worker If no blocks match, return (alo, blo, 0). 345*cda5da8dSAndroid Build Coastguard Worker 346*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, "ab", "c") 347*cda5da8dSAndroid Build Coastguard Worker >>> s.find_longest_match(0, 2, 0, 1) 348*cda5da8dSAndroid Build Coastguard Worker Match(a=0, b=0, size=0) 349*cda5da8dSAndroid Build Coastguard Worker """ 350*cda5da8dSAndroid Build Coastguard Worker 351*cda5da8dSAndroid Build Coastguard Worker # CAUTION: stripping common prefix or suffix would be incorrect. 352*cda5da8dSAndroid Build Coastguard Worker # E.g., 353*cda5da8dSAndroid Build Coastguard Worker # ab 354*cda5da8dSAndroid Build Coastguard Worker # acab 355*cda5da8dSAndroid Build Coastguard Worker # Longest matching block is "ab", but if common prefix is 356*cda5da8dSAndroid Build Coastguard Worker # stripped, it's "a" (tied with "b"). UNIX(tm) diff does so 357*cda5da8dSAndroid Build Coastguard Worker # strip, so ends up claiming that ab is changed to acab by 358*cda5da8dSAndroid Build Coastguard Worker # inserting "ca" in the middle. That's minimal but unintuitive: 359*cda5da8dSAndroid Build Coastguard Worker # "it's obvious" that someone inserted "ac" at the front. 360*cda5da8dSAndroid Build Coastguard Worker # Windiff ends up at the same place as diff, but by pairing up 361*cda5da8dSAndroid Build Coastguard Worker # the unique 'b's and then matching the first two 'a's. 362*cda5da8dSAndroid Build Coastguard Worker 363*cda5da8dSAndroid Build Coastguard Worker a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.bjunk.__contains__ 364*cda5da8dSAndroid Build Coastguard Worker if ahi is None: 365*cda5da8dSAndroid Build Coastguard Worker ahi = len(a) 366*cda5da8dSAndroid Build Coastguard Worker if bhi is None: 367*cda5da8dSAndroid Build Coastguard Worker bhi = len(b) 368*cda5da8dSAndroid Build Coastguard Worker besti, bestj, bestsize = alo, blo, 0 369*cda5da8dSAndroid Build Coastguard Worker # find longest junk-free match 370*cda5da8dSAndroid Build Coastguard Worker # during an iteration of the loop, j2len[j] = length of longest 371*cda5da8dSAndroid Build Coastguard Worker # junk-free match ending with a[i-1] and b[j] 372*cda5da8dSAndroid Build Coastguard Worker j2len = {} 373*cda5da8dSAndroid Build Coastguard Worker nothing = [] 374*cda5da8dSAndroid Build Coastguard Worker for i in range(alo, ahi): 375*cda5da8dSAndroid Build Coastguard Worker # look at all instances of a[i] in b; note that because 376*cda5da8dSAndroid Build Coastguard Worker # b2j has no junk keys, the loop is skipped if a[i] is junk 377*cda5da8dSAndroid Build Coastguard Worker j2lenget = j2len.get 378*cda5da8dSAndroid Build Coastguard Worker newj2len = {} 379*cda5da8dSAndroid Build Coastguard Worker for j in b2j.get(a[i], nothing): 380*cda5da8dSAndroid Build Coastguard Worker # a[i] matches b[j] 381*cda5da8dSAndroid Build Coastguard Worker if j < blo: 382*cda5da8dSAndroid Build Coastguard Worker continue 383*cda5da8dSAndroid Build Coastguard Worker if j >= bhi: 384*cda5da8dSAndroid Build Coastguard Worker break 385*cda5da8dSAndroid Build Coastguard Worker k = newj2len[j] = j2lenget(j-1, 0) + 1 386*cda5da8dSAndroid Build Coastguard Worker if k > bestsize: 387*cda5da8dSAndroid Build Coastguard Worker besti, bestj, bestsize = i-k+1, j-k+1, k 388*cda5da8dSAndroid Build Coastguard Worker j2len = newj2len 389*cda5da8dSAndroid Build Coastguard Worker 390*cda5da8dSAndroid Build Coastguard Worker # Extend the best by non-junk elements on each end. In particular, 391*cda5da8dSAndroid Build Coastguard Worker # "popular" non-junk elements aren't in b2j, which greatly speeds 392*cda5da8dSAndroid Build Coastguard Worker # the inner loop above, but also means "the best" match so far 393*cda5da8dSAndroid Build Coastguard Worker # doesn't contain any junk *or* popular non-junk elements. 394*cda5da8dSAndroid Build Coastguard Worker while besti > alo and bestj > blo and \ 395*cda5da8dSAndroid Build Coastguard Worker not isbjunk(b[bestj-1]) and \ 396*cda5da8dSAndroid Build Coastguard Worker a[besti-1] == b[bestj-1]: 397*cda5da8dSAndroid Build Coastguard Worker besti, bestj, bestsize = besti-1, bestj-1, bestsize+1 398*cda5da8dSAndroid Build Coastguard Worker while besti+bestsize < ahi and bestj+bestsize < bhi and \ 399*cda5da8dSAndroid Build Coastguard Worker not isbjunk(b[bestj+bestsize]) and \ 400*cda5da8dSAndroid Build Coastguard Worker a[besti+bestsize] == b[bestj+bestsize]: 401*cda5da8dSAndroid Build Coastguard Worker bestsize += 1 402*cda5da8dSAndroid Build Coastguard Worker 403*cda5da8dSAndroid Build Coastguard Worker # Now that we have a wholly interesting match (albeit possibly 404*cda5da8dSAndroid Build Coastguard Worker # empty!), we may as well suck up the matching junk on each 405*cda5da8dSAndroid Build Coastguard Worker # side of it too. Can't think of a good reason not to, and it 406*cda5da8dSAndroid Build Coastguard Worker # saves post-processing the (possibly considerable) expense of 407*cda5da8dSAndroid Build Coastguard Worker # figuring out what to do with it. In the case of an empty 408*cda5da8dSAndroid Build Coastguard Worker # interesting match, this is clearly the right thing to do, 409*cda5da8dSAndroid Build Coastguard Worker # because no other kind of match is possible in the regions. 410*cda5da8dSAndroid Build Coastguard Worker while besti > alo and bestj > blo and \ 411*cda5da8dSAndroid Build Coastguard Worker isbjunk(b[bestj-1]) and \ 412*cda5da8dSAndroid Build Coastguard Worker a[besti-1] == b[bestj-1]: 413*cda5da8dSAndroid Build Coastguard Worker besti, bestj, bestsize = besti-1, bestj-1, bestsize+1 414*cda5da8dSAndroid Build Coastguard Worker while besti+bestsize < ahi and bestj+bestsize < bhi and \ 415*cda5da8dSAndroid Build Coastguard Worker isbjunk(b[bestj+bestsize]) and \ 416*cda5da8dSAndroid Build Coastguard Worker a[besti+bestsize] == b[bestj+bestsize]: 417*cda5da8dSAndroid Build Coastguard Worker bestsize = bestsize + 1 418*cda5da8dSAndroid Build Coastguard Worker 419*cda5da8dSAndroid Build Coastguard Worker return Match(besti, bestj, bestsize) 420*cda5da8dSAndroid Build Coastguard Worker 421*cda5da8dSAndroid Build Coastguard Worker def get_matching_blocks(self): 422*cda5da8dSAndroid Build Coastguard Worker """Return list of triples describing matching subsequences. 423*cda5da8dSAndroid Build Coastguard Worker 424*cda5da8dSAndroid Build Coastguard Worker Each triple is of the form (i, j, n), and means that 425*cda5da8dSAndroid Build Coastguard Worker a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in 426*cda5da8dSAndroid Build Coastguard Worker i and in j. New in Python 2.5, it's also guaranteed that if 427*cda5da8dSAndroid Build Coastguard Worker (i, j, n) and (i', j', n') are adjacent triples in the list, and 428*cda5da8dSAndroid Build Coastguard Worker the second is not the last triple in the list, then i+n != i' or 429*cda5da8dSAndroid Build Coastguard Worker j+n != j'. IOW, adjacent triples never describe adjacent equal 430*cda5da8dSAndroid Build Coastguard Worker blocks. 431*cda5da8dSAndroid Build Coastguard Worker 432*cda5da8dSAndroid Build Coastguard Worker The last triple is a dummy, (len(a), len(b), 0), and is the only 433*cda5da8dSAndroid Build Coastguard Worker triple with n==0. 434*cda5da8dSAndroid Build Coastguard Worker 435*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, "abxcd", "abcd") 436*cda5da8dSAndroid Build Coastguard Worker >>> list(s.get_matching_blocks()) 437*cda5da8dSAndroid Build Coastguard Worker [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] 438*cda5da8dSAndroid Build Coastguard Worker """ 439*cda5da8dSAndroid Build Coastguard Worker 440*cda5da8dSAndroid Build Coastguard Worker if self.matching_blocks is not None: 441*cda5da8dSAndroid Build Coastguard Worker return self.matching_blocks 442*cda5da8dSAndroid Build Coastguard Worker la, lb = len(self.a), len(self.b) 443*cda5da8dSAndroid Build Coastguard Worker 444*cda5da8dSAndroid Build Coastguard Worker # This is most naturally expressed as a recursive algorithm, but 445*cda5da8dSAndroid Build Coastguard Worker # at least one user bumped into extreme use cases that exceeded 446*cda5da8dSAndroid Build Coastguard Worker # the recursion limit on their box. So, now we maintain a list 447*cda5da8dSAndroid Build Coastguard Worker # ('queue`) of blocks we still need to look at, and append partial 448*cda5da8dSAndroid Build Coastguard Worker # results to `matching_blocks` in a loop; the matches are sorted 449*cda5da8dSAndroid Build Coastguard Worker # at the end. 450*cda5da8dSAndroid Build Coastguard Worker queue = [(0, la, 0, lb)] 451*cda5da8dSAndroid Build Coastguard Worker matching_blocks = [] 452*cda5da8dSAndroid Build Coastguard Worker while queue: 453*cda5da8dSAndroid Build Coastguard Worker alo, ahi, blo, bhi = queue.pop() 454*cda5da8dSAndroid Build Coastguard Worker i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi) 455*cda5da8dSAndroid Build Coastguard Worker # a[alo:i] vs b[blo:j] unknown 456*cda5da8dSAndroid Build Coastguard Worker # a[i:i+k] same as b[j:j+k] 457*cda5da8dSAndroid Build Coastguard Worker # a[i+k:ahi] vs b[j+k:bhi] unknown 458*cda5da8dSAndroid Build Coastguard Worker if k: # if k is 0, there was no matching block 459*cda5da8dSAndroid Build Coastguard Worker matching_blocks.append(x) 460*cda5da8dSAndroid Build Coastguard Worker if alo < i and blo < j: 461*cda5da8dSAndroid Build Coastguard Worker queue.append((alo, i, blo, j)) 462*cda5da8dSAndroid Build Coastguard Worker if i+k < ahi and j+k < bhi: 463*cda5da8dSAndroid Build Coastguard Worker queue.append((i+k, ahi, j+k, bhi)) 464*cda5da8dSAndroid Build Coastguard Worker matching_blocks.sort() 465*cda5da8dSAndroid Build Coastguard Worker 466*cda5da8dSAndroid Build Coastguard Worker # It's possible that we have adjacent equal blocks in the 467*cda5da8dSAndroid Build Coastguard Worker # matching_blocks list now. Starting with 2.5, this code was added 468*cda5da8dSAndroid Build Coastguard Worker # to collapse them. 469*cda5da8dSAndroid Build Coastguard Worker i1 = j1 = k1 = 0 470*cda5da8dSAndroid Build Coastguard Worker non_adjacent = [] 471*cda5da8dSAndroid Build Coastguard Worker for i2, j2, k2 in matching_blocks: 472*cda5da8dSAndroid Build Coastguard Worker # Is this block adjacent to i1, j1, k1? 473*cda5da8dSAndroid Build Coastguard Worker if i1 + k1 == i2 and j1 + k1 == j2: 474*cda5da8dSAndroid Build Coastguard Worker # Yes, so collapse them -- this just increases the length of 475*cda5da8dSAndroid Build Coastguard Worker # the first block by the length of the second, and the first 476*cda5da8dSAndroid Build Coastguard Worker # block so lengthened remains the block to compare against. 477*cda5da8dSAndroid Build Coastguard Worker k1 += k2 478*cda5da8dSAndroid Build Coastguard Worker else: 479*cda5da8dSAndroid Build Coastguard Worker # Not adjacent. Remember the first block (k1==0 means it's 480*cda5da8dSAndroid Build Coastguard Worker # the dummy we started with), and make the second block the 481*cda5da8dSAndroid Build Coastguard Worker # new block to compare against. 482*cda5da8dSAndroid Build Coastguard Worker if k1: 483*cda5da8dSAndroid Build Coastguard Worker non_adjacent.append((i1, j1, k1)) 484*cda5da8dSAndroid Build Coastguard Worker i1, j1, k1 = i2, j2, k2 485*cda5da8dSAndroid Build Coastguard Worker if k1: 486*cda5da8dSAndroid Build Coastguard Worker non_adjacent.append((i1, j1, k1)) 487*cda5da8dSAndroid Build Coastguard Worker 488*cda5da8dSAndroid Build Coastguard Worker non_adjacent.append( (la, lb, 0) ) 489*cda5da8dSAndroid Build Coastguard Worker self.matching_blocks = list(map(Match._make, non_adjacent)) 490*cda5da8dSAndroid Build Coastguard Worker return self.matching_blocks 491*cda5da8dSAndroid Build Coastguard Worker 492*cda5da8dSAndroid Build Coastguard Worker def get_opcodes(self): 493*cda5da8dSAndroid Build Coastguard Worker """Return list of 5-tuples describing how to turn a into b. 494*cda5da8dSAndroid Build Coastguard Worker 495*cda5da8dSAndroid Build Coastguard Worker Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple 496*cda5da8dSAndroid Build Coastguard Worker has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the 497*cda5da8dSAndroid Build Coastguard Worker tuple preceding it, and likewise for j1 == the previous j2. 498*cda5da8dSAndroid Build Coastguard Worker 499*cda5da8dSAndroid Build Coastguard Worker The tags are strings, with these meanings: 500*cda5da8dSAndroid Build Coastguard Worker 501*cda5da8dSAndroid Build Coastguard Worker 'replace': a[i1:i2] should be replaced by b[j1:j2] 502*cda5da8dSAndroid Build Coastguard Worker 'delete': a[i1:i2] should be deleted. 503*cda5da8dSAndroid Build Coastguard Worker Note that j1==j2 in this case. 504*cda5da8dSAndroid Build Coastguard Worker 'insert': b[j1:j2] should be inserted at a[i1:i1]. 505*cda5da8dSAndroid Build Coastguard Worker Note that i1==i2 in this case. 506*cda5da8dSAndroid Build Coastguard Worker 'equal': a[i1:i2] == b[j1:j2] 507*cda5da8dSAndroid Build Coastguard Worker 508*cda5da8dSAndroid Build Coastguard Worker >>> a = "qabxcd" 509*cda5da8dSAndroid Build Coastguard Worker >>> b = "abycdf" 510*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, a, b) 511*cda5da8dSAndroid Build Coastguard Worker >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): 512*cda5da8dSAndroid Build Coastguard Worker ... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % 513*cda5da8dSAndroid Build Coastguard Worker ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) 514*cda5da8dSAndroid Build Coastguard Worker delete a[0:1] (q) b[0:0] () 515*cda5da8dSAndroid Build Coastguard Worker equal a[1:3] (ab) b[0:2] (ab) 516*cda5da8dSAndroid Build Coastguard Worker replace a[3:4] (x) b[2:3] (y) 517*cda5da8dSAndroid Build Coastguard Worker equal a[4:6] (cd) b[3:5] (cd) 518*cda5da8dSAndroid Build Coastguard Worker insert a[6:6] () b[5:6] (f) 519*cda5da8dSAndroid Build Coastguard Worker """ 520*cda5da8dSAndroid Build Coastguard Worker 521*cda5da8dSAndroid Build Coastguard Worker if self.opcodes is not None: 522*cda5da8dSAndroid Build Coastguard Worker return self.opcodes 523*cda5da8dSAndroid Build Coastguard Worker i = j = 0 524*cda5da8dSAndroid Build Coastguard Worker self.opcodes = answer = [] 525*cda5da8dSAndroid Build Coastguard Worker for ai, bj, size in self.get_matching_blocks(): 526*cda5da8dSAndroid Build Coastguard Worker # invariant: we've pumped out correct diffs to change 527*cda5da8dSAndroid Build Coastguard Worker # a[:i] into b[:j], and the next matching block is 528*cda5da8dSAndroid Build Coastguard Worker # a[ai:ai+size] == b[bj:bj+size]. So we need to pump 529*cda5da8dSAndroid Build Coastguard Worker # out a diff to change a[i:ai] into b[j:bj], pump out 530*cda5da8dSAndroid Build Coastguard Worker # the matching block, and move (i,j) beyond the match 531*cda5da8dSAndroid Build Coastguard Worker tag = '' 532*cda5da8dSAndroid Build Coastguard Worker if i < ai and j < bj: 533*cda5da8dSAndroid Build Coastguard Worker tag = 'replace' 534*cda5da8dSAndroid Build Coastguard Worker elif i < ai: 535*cda5da8dSAndroid Build Coastguard Worker tag = 'delete' 536*cda5da8dSAndroid Build Coastguard Worker elif j < bj: 537*cda5da8dSAndroid Build Coastguard Worker tag = 'insert' 538*cda5da8dSAndroid Build Coastguard Worker if tag: 539*cda5da8dSAndroid Build Coastguard Worker answer.append( (tag, i, ai, j, bj) ) 540*cda5da8dSAndroid Build Coastguard Worker i, j = ai+size, bj+size 541*cda5da8dSAndroid Build Coastguard Worker # the list of matching blocks is terminated by a 542*cda5da8dSAndroid Build Coastguard Worker # sentinel with size 0 543*cda5da8dSAndroid Build Coastguard Worker if size: 544*cda5da8dSAndroid Build Coastguard Worker answer.append( ('equal', ai, i, bj, j) ) 545*cda5da8dSAndroid Build Coastguard Worker return answer 546*cda5da8dSAndroid Build Coastguard Worker 547*cda5da8dSAndroid Build Coastguard Worker def get_grouped_opcodes(self, n=3): 548*cda5da8dSAndroid Build Coastguard Worker """ Isolate change clusters by eliminating ranges with no changes. 549*cda5da8dSAndroid Build Coastguard Worker 550*cda5da8dSAndroid Build Coastguard Worker Return a generator of groups with up to n lines of context. 551*cda5da8dSAndroid Build Coastguard Worker Each group is in the same format as returned by get_opcodes(). 552*cda5da8dSAndroid Build Coastguard Worker 553*cda5da8dSAndroid Build Coastguard Worker >>> from pprint import pprint 554*cda5da8dSAndroid Build Coastguard Worker >>> a = list(map(str, range(1,40))) 555*cda5da8dSAndroid Build Coastguard Worker >>> b = a[:] 556*cda5da8dSAndroid Build Coastguard Worker >>> b[8:8] = ['i'] # Make an insertion 557*cda5da8dSAndroid Build Coastguard Worker >>> b[20] += 'x' # Make a replacement 558*cda5da8dSAndroid Build Coastguard Worker >>> b[23:28] = [] # Make a deletion 559*cda5da8dSAndroid Build Coastguard Worker >>> b[30] += 'y' # Make another replacement 560*cda5da8dSAndroid Build Coastguard Worker >>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes())) 561*cda5da8dSAndroid Build Coastguard Worker [[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)], 562*cda5da8dSAndroid Build Coastguard Worker [('equal', 16, 19, 17, 20), 563*cda5da8dSAndroid Build Coastguard Worker ('replace', 19, 20, 20, 21), 564*cda5da8dSAndroid Build Coastguard Worker ('equal', 20, 22, 21, 23), 565*cda5da8dSAndroid Build Coastguard Worker ('delete', 22, 27, 23, 23), 566*cda5da8dSAndroid Build Coastguard Worker ('equal', 27, 30, 23, 26)], 567*cda5da8dSAndroid Build Coastguard Worker [('equal', 31, 34, 27, 30), 568*cda5da8dSAndroid Build Coastguard Worker ('replace', 34, 35, 30, 31), 569*cda5da8dSAndroid Build Coastguard Worker ('equal', 35, 38, 31, 34)]] 570*cda5da8dSAndroid Build Coastguard Worker """ 571*cda5da8dSAndroid Build Coastguard Worker 572*cda5da8dSAndroid Build Coastguard Worker codes = self.get_opcodes() 573*cda5da8dSAndroid Build Coastguard Worker if not codes: 574*cda5da8dSAndroid Build Coastguard Worker codes = [("equal", 0, 1, 0, 1)] 575*cda5da8dSAndroid Build Coastguard Worker # Fixup leading and trailing groups if they show no changes. 576*cda5da8dSAndroid Build Coastguard Worker if codes[0][0] == 'equal': 577*cda5da8dSAndroid Build Coastguard Worker tag, i1, i2, j1, j2 = codes[0] 578*cda5da8dSAndroid Build Coastguard Worker codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2 579*cda5da8dSAndroid Build Coastguard Worker if codes[-1][0] == 'equal': 580*cda5da8dSAndroid Build Coastguard Worker tag, i1, i2, j1, j2 = codes[-1] 581*cda5da8dSAndroid Build Coastguard Worker codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n) 582*cda5da8dSAndroid Build Coastguard Worker 583*cda5da8dSAndroid Build Coastguard Worker nn = n + n 584*cda5da8dSAndroid Build Coastguard Worker group = [] 585*cda5da8dSAndroid Build Coastguard Worker for tag, i1, i2, j1, j2 in codes: 586*cda5da8dSAndroid Build Coastguard Worker # End the current group and start a new one whenever 587*cda5da8dSAndroid Build Coastguard Worker # there is a large range with no changes. 588*cda5da8dSAndroid Build Coastguard Worker if tag == 'equal' and i2-i1 > nn: 589*cda5da8dSAndroid Build Coastguard Worker group.append((tag, i1, min(i2, i1+n), j1, min(j2, j1+n))) 590*cda5da8dSAndroid Build Coastguard Worker yield group 591*cda5da8dSAndroid Build Coastguard Worker group = [] 592*cda5da8dSAndroid Build Coastguard Worker i1, j1 = max(i1, i2-n), max(j1, j2-n) 593*cda5da8dSAndroid Build Coastguard Worker group.append((tag, i1, i2, j1 ,j2)) 594*cda5da8dSAndroid Build Coastguard Worker if group and not (len(group)==1 and group[0][0] == 'equal'): 595*cda5da8dSAndroid Build Coastguard Worker yield group 596*cda5da8dSAndroid Build Coastguard Worker 597*cda5da8dSAndroid Build Coastguard Worker def ratio(self): 598*cda5da8dSAndroid Build Coastguard Worker """Return a measure of the sequences' similarity (float in [0,1]). 599*cda5da8dSAndroid Build Coastguard Worker 600*cda5da8dSAndroid Build Coastguard Worker Where T is the total number of elements in both sequences, and 601*cda5da8dSAndroid Build Coastguard Worker M is the number of matches, this is 2.0*M / T. 602*cda5da8dSAndroid Build Coastguard Worker Note that this is 1 if the sequences are identical, and 0 if 603*cda5da8dSAndroid Build Coastguard Worker they have nothing in common. 604*cda5da8dSAndroid Build Coastguard Worker 605*cda5da8dSAndroid Build Coastguard Worker .ratio() is expensive to compute if you haven't already computed 606*cda5da8dSAndroid Build Coastguard Worker .get_matching_blocks() or .get_opcodes(), in which case you may 607*cda5da8dSAndroid Build Coastguard Worker want to try .quick_ratio() or .real_quick_ratio() first to get an 608*cda5da8dSAndroid Build Coastguard Worker upper bound. 609*cda5da8dSAndroid Build Coastguard Worker 610*cda5da8dSAndroid Build Coastguard Worker >>> s = SequenceMatcher(None, "abcd", "bcde") 611*cda5da8dSAndroid Build Coastguard Worker >>> s.ratio() 612*cda5da8dSAndroid Build Coastguard Worker 0.75 613*cda5da8dSAndroid Build Coastguard Worker >>> s.quick_ratio() 614*cda5da8dSAndroid Build Coastguard Worker 0.75 615*cda5da8dSAndroid Build Coastguard Worker >>> s.real_quick_ratio() 616*cda5da8dSAndroid Build Coastguard Worker 1.0 617*cda5da8dSAndroid Build Coastguard Worker """ 618*cda5da8dSAndroid Build Coastguard Worker 619*cda5da8dSAndroid Build Coastguard Worker matches = sum(triple[-1] for triple in self.get_matching_blocks()) 620*cda5da8dSAndroid Build Coastguard Worker return _calculate_ratio(matches, len(self.a) + len(self.b)) 621*cda5da8dSAndroid Build Coastguard Worker 622*cda5da8dSAndroid Build Coastguard Worker def quick_ratio(self): 623*cda5da8dSAndroid Build Coastguard Worker """Return an upper bound on ratio() relatively quickly. 624*cda5da8dSAndroid Build Coastguard Worker 625*cda5da8dSAndroid Build Coastguard Worker This isn't defined beyond that it is an upper bound on .ratio(), and 626*cda5da8dSAndroid Build Coastguard Worker is faster to compute. 627*cda5da8dSAndroid Build Coastguard Worker """ 628*cda5da8dSAndroid Build Coastguard Worker 629*cda5da8dSAndroid Build Coastguard Worker # viewing a and b as multisets, set matches to the cardinality 630*cda5da8dSAndroid Build Coastguard Worker # of their intersection; this counts the number of matches 631*cda5da8dSAndroid Build Coastguard Worker # without regard to order, so is clearly an upper bound 632*cda5da8dSAndroid Build Coastguard Worker if self.fullbcount is None: 633*cda5da8dSAndroid Build Coastguard Worker self.fullbcount = fullbcount = {} 634*cda5da8dSAndroid Build Coastguard Worker for elt in self.b: 635*cda5da8dSAndroid Build Coastguard Worker fullbcount[elt] = fullbcount.get(elt, 0) + 1 636*cda5da8dSAndroid Build Coastguard Worker fullbcount = self.fullbcount 637*cda5da8dSAndroid Build Coastguard Worker # avail[x] is the number of times x appears in 'b' less the 638*cda5da8dSAndroid Build Coastguard Worker # number of times we've seen it in 'a' so far ... kinda 639*cda5da8dSAndroid Build Coastguard Worker avail = {} 640*cda5da8dSAndroid Build Coastguard Worker availhas, matches = avail.__contains__, 0 641*cda5da8dSAndroid Build Coastguard Worker for elt in self.a: 642*cda5da8dSAndroid Build Coastguard Worker if availhas(elt): 643*cda5da8dSAndroid Build Coastguard Worker numb = avail[elt] 644*cda5da8dSAndroid Build Coastguard Worker else: 645*cda5da8dSAndroid Build Coastguard Worker numb = fullbcount.get(elt, 0) 646*cda5da8dSAndroid Build Coastguard Worker avail[elt] = numb - 1 647*cda5da8dSAndroid Build Coastguard Worker if numb > 0: 648*cda5da8dSAndroid Build Coastguard Worker matches = matches + 1 649*cda5da8dSAndroid Build Coastguard Worker return _calculate_ratio(matches, len(self.a) + len(self.b)) 650*cda5da8dSAndroid Build Coastguard Worker 651*cda5da8dSAndroid Build Coastguard Worker def real_quick_ratio(self): 652*cda5da8dSAndroid Build Coastguard Worker """Return an upper bound on ratio() very quickly. 653*cda5da8dSAndroid Build Coastguard Worker 654*cda5da8dSAndroid Build Coastguard Worker This isn't defined beyond that it is an upper bound on .ratio(), and 655*cda5da8dSAndroid Build Coastguard Worker is faster to compute than either .ratio() or .quick_ratio(). 656*cda5da8dSAndroid Build Coastguard Worker """ 657*cda5da8dSAndroid Build Coastguard Worker 658*cda5da8dSAndroid Build Coastguard Worker la, lb = len(self.a), len(self.b) 659*cda5da8dSAndroid Build Coastguard Worker # can't have more matches than the number of elements in the 660*cda5da8dSAndroid Build Coastguard Worker # shorter sequence 661*cda5da8dSAndroid Build Coastguard Worker return _calculate_ratio(min(la, lb), la + lb) 662*cda5da8dSAndroid Build Coastguard Worker 663*cda5da8dSAndroid Build Coastguard Worker __class_getitem__ = classmethod(GenericAlias) 664*cda5da8dSAndroid Build Coastguard Worker 665*cda5da8dSAndroid Build Coastguard Worker 666*cda5da8dSAndroid Build Coastguard Workerdef get_close_matches(word, possibilities, n=3, cutoff=0.6): 667*cda5da8dSAndroid Build Coastguard Worker """Use SequenceMatcher to return list of the best "good enough" matches. 668*cda5da8dSAndroid Build Coastguard Worker 669*cda5da8dSAndroid Build Coastguard Worker word is a sequence for which close matches are desired (typically a 670*cda5da8dSAndroid Build Coastguard Worker string). 671*cda5da8dSAndroid Build Coastguard Worker 672*cda5da8dSAndroid Build Coastguard Worker possibilities is a list of sequences against which to match word 673*cda5da8dSAndroid Build Coastguard Worker (typically a list of strings). 674*cda5da8dSAndroid Build Coastguard Worker 675*cda5da8dSAndroid Build Coastguard Worker Optional arg n (default 3) is the maximum number of close matches to 676*cda5da8dSAndroid Build Coastguard Worker return. n must be > 0. 677*cda5da8dSAndroid Build Coastguard Worker 678*cda5da8dSAndroid Build Coastguard Worker Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities 679*cda5da8dSAndroid Build Coastguard Worker that don't score at least that similar to word are ignored. 680*cda5da8dSAndroid Build Coastguard Worker 681*cda5da8dSAndroid Build Coastguard Worker The best (no more than n) matches among the possibilities are returned 682*cda5da8dSAndroid Build Coastguard Worker in a list, sorted by similarity score, most similar first. 683*cda5da8dSAndroid Build Coastguard Worker 684*cda5da8dSAndroid Build Coastguard Worker >>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"]) 685*cda5da8dSAndroid Build Coastguard Worker ['apple', 'ape'] 686*cda5da8dSAndroid Build Coastguard Worker >>> import keyword as _keyword 687*cda5da8dSAndroid Build Coastguard Worker >>> get_close_matches("wheel", _keyword.kwlist) 688*cda5da8dSAndroid Build Coastguard Worker ['while'] 689*cda5da8dSAndroid Build Coastguard Worker >>> get_close_matches("Apple", _keyword.kwlist) 690*cda5da8dSAndroid Build Coastguard Worker [] 691*cda5da8dSAndroid Build Coastguard Worker >>> get_close_matches("accept", _keyword.kwlist) 692*cda5da8dSAndroid Build Coastguard Worker ['except'] 693*cda5da8dSAndroid Build Coastguard Worker """ 694*cda5da8dSAndroid Build Coastguard Worker 695*cda5da8dSAndroid Build Coastguard Worker if not n > 0: 696*cda5da8dSAndroid Build Coastguard Worker raise ValueError("n must be > 0: %r" % (n,)) 697*cda5da8dSAndroid Build Coastguard Worker if not 0.0 <= cutoff <= 1.0: 698*cda5da8dSAndroid Build Coastguard Worker raise ValueError("cutoff must be in [0.0, 1.0]: %r" % (cutoff,)) 699*cda5da8dSAndroid Build Coastguard Worker result = [] 700*cda5da8dSAndroid Build Coastguard Worker s = SequenceMatcher() 701*cda5da8dSAndroid Build Coastguard Worker s.set_seq2(word) 702*cda5da8dSAndroid Build Coastguard Worker for x in possibilities: 703*cda5da8dSAndroid Build Coastguard Worker s.set_seq1(x) 704*cda5da8dSAndroid Build Coastguard Worker if s.real_quick_ratio() >= cutoff and \ 705*cda5da8dSAndroid Build Coastguard Worker s.quick_ratio() >= cutoff and \ 706*cda5da8dSAndroid Build Coastguard Worker s.ratio() >= cutoff: 707*cda5da8dSAndroid Build Coastguard Worker result.append((s.ratio(), x)) 708*cda5da8dSAndroid Build Coastguard Worker 709*cda5da8dSAndroid Build Coastguard Worker # Move the best scorers to head of list 710*cda5da8dSAndroid Build Coastguard Worker result = _nlargest(n, result) 711*cda5da8dSAndroid Build Coastguard Worker # Strip scores for the best n matches 712*cda5da8dSAndroid Build Coastguard Worker return [x for score, x in result] 713*cda5da8dSAndroid Build Coastguard Worker 714*cda5da8dSAndroid Build Coastguard Worker 715*cda5da8dSAndroid Build Coastguard Workerdef _keep_original_ws(s, tag_s): 716*cda5da8dSAndroid Build Coastguard Worker """Replace whitespace with the original whitespace characters in `s`""" 717*cda5da8dSAndroid Build Coastguard Worker return ''.join( 718*cda5da8dSAndroid Build Coastguard Worker c if tag_c == " " and c.isspace() else tag_c 719*cda5da8dSAndroid Build Coastguard Worker for c, tag_c in zip(s, tag_s) 720*cda5da8dSAndroid Build Coastguard Worker ) 721*cda5da8dSAndroid Build Coastguard Worker 722*cda5da8dSAndroid Build Coastguard Worker 723*cda5da8dSAndroid Build Coastguard Worker 724*cda5da8dSAndroid Build Coastguard Workerclass Differ: 725*cda5da8dSAndroid Build Coastguard Worker r""" 726*cda5da8dSAndroid Build Coastguard Worker Differ is a class for comparing sequences of lines of text, and 727*cda5da8dSAndroid Build Coastguard Worker producing human-readable differences or deltas. Differ uses 728*cda5da8dSAndroid Build Coastguard Worker SequenceMatcher both to compare sequences of lines, and to compare 729*cda5da8dSAndroid Build Coastguard Worker sequences of characters within similar (near-matching) lines. 730*cda5da8dSAndroid Build Coastguard Worker 731*cda5da8dSAndroid Build Coastguard Worker Each line of a Differ delta begins with a two-letter code: 732*cda5da8dSAndroid Build Coastguard Worker 733*cda5da8dSAndroid Build Coastguard Worker '- ' line unique to sequence 1 734*cda5da8dSAndroid Build Coastguard Worker '+ ' line unique to sequence 2 735*cda5da8dSAndroid Build Coastguard Worker ' ' line common to both sequences 736*cda5da8dSAndroid Build Coastguard Worker '? ' line not present in either input sequence 737*cda5da8dSAndroid Build Coastguard Worker 738*cda5da8dSAndroid Build Coastguard Worker Lines beginning with '? ' attempt to guide the eye to intraline 739*cda5da8dSAndroid Build Coastguard Worker differences, and were not present in either input sequence. These lines 740*cda5da8dSAndroid Build Coastguard Worker can be confusing if the sequences contain tab characters. 741*cda5da8dSAndroid Build Coastguard Worker 742*cda5da8dSAndroid Build Coastguard Worker Note that Differ makes no claim to produce a *minimal* diff. To the 743*cda5da8dSAndroid Build Coastguard Worker contrary, minimal diffs are often counter-intuitive, because they synch 744*cda5da8dSAndroid Build Coastguard Worker up anywhere possible, sometimes accidental matches 100 pages apart. 745*cda5da8dSAndroid Build Coastguard Worker Restricting synch points to contiguous matches preserves some notion of 746*cda5da8dSAndroid Build Coastguard Worker locality, at the occasional cost of producing a longer diff. 747*cda5da8dSAndroid Build Coastguard Worker 748*cda5da8dSAndroid Build Coastguard Worker Example: Comparing two texts. 749*cda5da8dSAndroid Build Coastguard Worker 750*cda5da8dSAndroid Build Coastguard Worker First we set up the texts, sequences of individual single-line strings 751*cda5da8dSAndroid Build Coastguard Worker ending with newlines (such sequences can also be obtained from the 752*cda5da8dSAndroid Build Coastguard Worker `readlines()` method of file-like objects): 753*cda5da8dSAndroid Build Coastguard Worker 754*cda5da8dSAndroid Build Coastguard Worker >>> text1 = ''' 1. Beautiful is better than ugly. 755*cda5da8dSAndroid Build Coastguard Worker ... 2. Explicit is better than implicit. 756*cda5da8dSAndroid Build Coastguard Worker ... 3. Simple is better than complex. 757*cda5da8dSAndroid Build Coastguard Worker ... 4. Complex is better than complicated. 758*cda5da8dSAndroid Build Coastguard Worker ... '''.splitlines(keepends=True) 759*cda5da8dSAndroid Build Coastguard Worker >>> len(text1) 760*cda5da8dSAndroid Build Coastguard Worker 4 761*cda5da8dSAndroid Build Coastguard Worker >>> text1[0][-1] 762*cda5da8dSAndroid Build Coastguard Worker '\n' 763*cda5da8dSAndroid Build Coastguard Worker >>> text2 = ''' 1. Beautiful is better than ugly. 764*cda5da8dSAndroid Build Coastguard Worker ... 3. Simple is better than complex. 765*cda5da8dSAndroid Build Coastguard Worker ... 4. Complicated is better than complex. 766*cda5da8dSAndroid Build Coastguard Worker ... 5. Flat is better than nested. 767*cda5da8dSAndroid Build Coastguard Worker ... '''.splitlines(keepends=True) 768*cda5da8dSAndroid Build Coastguard Worker 769*cda5da8dSAndroid Build Coastguard Worker Next we instantiate a Differ object: 770*cda5da8dSAndroid Build Coastguard Worker 771*cda5da8dSAndroid Build Coastguard Worker >>> d = Differ() 772*cda5da8dSAndroid Build Coastguard Worker 773*cda5da8dSAndroid Build Coastguard Worker Note that when instantiating a Differ object we may pass functions to 774*cda5da8dSAndroid Build Coastguard Worker filter out line and character 'junk'. See Differ.__init__ for details. 775*cda5da8dSAndroid Build Coastguard Worker 776*cda5da8dSAndroid Build Coastguard Worker Finally, we compare the two: 777*cda5da8dSAndroid Build Coastguard Worker 778*cda5da8dSAndroid Build Coastguard Worker >>> result = list(d.compare(text1, text2)) 779*cda5da8dSAndroid Build Coastguard Worker 780*cda5da8dSAndroid Build Coastguard Worker 'result' is a list of strings, so let's pretty-print it: 781*cda5da8dSAndroid Build Coastguard Worker 782*cda5da8dSAndroid Build Coastguard Worker >>> from pprint import pprint as _pprint 783*cda5da8dSAndroid Build Coastguard Worker >>> _pprint(result) 784*cda5da8dSAndroid Build Coastguard Worker [' 1. Beautiful is better than ugly.\n', 785*cda5da8dSAndroid Build Coastguard Worker '- 2. Explicit is better than implicit.\n', 786*cda5da8dSAndroid Build Coastguard Worker '- 3. Simple is better than complex.\n', 787*cda5da8dSAndroid Build Coastguard Worker '+ 3. Simple is better than complex.\n', 788*cda5da8dSAndroid Build Coastguard Worker '? ++\n', 789*cda5da8dSAndroid Build Coastguard Worker '- 4. Complex is better than complicated.\n', 790*cda5da8dSAndroid Build Coastguard Worker '? ^ ---- ^\n', 791*cda5da8dSAndroid Build Coastguard Worker '+ 4. Complicated is better than complex.\n', 792*cda5da8dSAndroid Build Coastguard Worker '? ++++ ^ ^\n', 793*cda5da8dSAndroid Build Coastguard Worker '+ 5. Flat is better than nested.\n'] 794*cda5da8dSAndroid Build Coastguard Worker 795*cda5da8dSAndroid Build Coastguard Worker As a single multi-line string it looks like this: 796*cda5da8dSAndroid Build Coastguard Worker 797*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(result), end="") 798*cda5da8dSAndroid Build Coastguard Worker 1. Beautiful is better than ugly. 799*cda5da8dSAndroid Build Coastguard Worker - 2. Explicit is better than implicit. 800*cda5da8dSAndroid Build Coastguard Worker - 3. Simple is better than complex. 801*cda5da8dSAndroid Build Coastguard Worker + 3. Simple is better than complex. 802*cda5da8dSAndroid Build Coastguard Worker ? ++ 803*cda5da8dSAndroid Build Coastguard Worker - 4. Complex is better than complicated. 804*cda5da8dSAndroid Build Coastguard Worker ? ^ ---- ^ 805*cda5da8dSAndroid Build Coastguard Worker + 4. Complicated is better than complex. 806*cda5da8dSAndroid Build Coastguard Worker ? ++++ ^ ^ 807*cda5da8dSAndroid Build Coastguard Worker + 5. Flat is better than nested. 808*cda5da8dSAndroid Build Coastguard Worker """ 809*cda5da8dSAndroid Build Coastguard Worker 810*cda5da8dSAndroid Build Coastguard Worker def __init__(self, linejunk=None, charjunk=None): 811*cda5da8dSAndroid Build Coastguard Worker """ 812*cda5da8dSAndroid Build Coastguard Worker Construct a text differencer, with optional filters. 813*cda5da8dSAndroid Build Coastguard Worker 814*cda5da8dSAndroid Build Coastguard Worker The two optional keyword parameters are for filter functions: 815*cda5da8dSAndroid Build Coastguard Worker 816*cda5da8dSAndroid Build Coastguard Worker - `linejunk`: A function that should accept a single string argument, 817*cda5da8dSAndroid Build Coastguard Worker and return true iff the string is junk. The module-level function 818*cda5da8dSAndroid Build Coastguard Worker `IS_LINE_JUNK` may be used to filter out lines without visible 819*cda5da8dSAndroid Build Coastguard Worker characters, except for at most one splat ('#'). It is recommended 820*cda5da8dSAndroid Build Coastguard Worker to leave linejunk None; the underlying SequenceMatcher class has 821*cda5da8dSAndroid Build Coastguard Worker an adaptive notion of "noise" lines that's better than any static 822*cda5da8dSAndroid Build Coastguard Worker definition the author has ever been able to craft. 823*cda5da8dSAndroid Build Coastguard Worker 824*cda5da8dSAndroid Build Coastguard Worker - `charjunk`: A function that should accept a string of length 1. The 825*cda5da8dSAndroid Build Coastguard Worker module-level function `IS_CHARACTER_JUNK` may be used to filter out 826*cda5da8dSAndroid Build Coastguard Worker whitespace characters (a blank or tab; **note**: bad idea to include 827*cda5da8dSAndroid Build Coastguard Worker newline in this!). Use of IS_CHARACTER_JUNK is recommended. 828*cda5da8dSAndroid Build Coastguard Worker """ 829*cda5da8dSAndroid Build Coastguard Worker 830*cda5da8dSAndroid Build Coastguard Worker self.linejunk = linejunk 831*cda5da8dSAndroid Build Coastguard Worker self.charjunk = charjunk 832*cda5da8dSAndroid Build Coastguard Worker 833*cda5da8dSAndroid Build Coastguard Worker def compare(self, a, b): 834*cda5da8dSAndroid Build Coastguard Worker r""" 835*cda5da8dSAndroid Build Coastguard Worker Compare two sequences of lines; generate the resulting delta. 836*cda5da8dSAndroid Build Coastguard Worker 837*cda5da8dSAndroid Build Coastguard Worker Each sequence must contain individual single-line strings ending with 838*cda5da8dSAndroid Build Coastguard Worker newlines. Such sequences can be obtained from the `readlines()` method 839*cda5da8dSAndroid Build Coastguard Worker of file-like objects. The delta generated also consists of newline- 840*cda5da8dSAndroid Build Coastguard Worker terminated strings, ready to be printed as-is via the writelines() 841*cda5da8dSAndroid Build Coastguard Worker method of a file-like object. 842*cda5da8dSAndroid Build Coastguard Worker 843*cda5da8dSAndroid Build Coastguard Worker Example: 844*cda5da8dSAndroid Build Coastguard Worker 845*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(True), 846*cda5da8dSAndroid Build Coastguard Worker ... 'ore\ntree\nemu\n'.splitlines(True))), 847*cda5da8dSAndroid Build Coastguard Worker ... end="") 848*cda5da8dSAndroid Build Coastguard Worker - one 849*cda5da8dSAndroid Build Coastguard Worker ? ^ 850*cda5da8dSAndroid Build Coastguard Worker + ore 851*cda5da8dSAndroid Build Coastguard Worker ? ^ 852*cda5da8dSAndroid Build Coastguard Worker - two 853*cda5da8dSAndroid Build Coastguard Worker - three 854*cda5da8dSAndroid Build Coastguard Worker ? - 855*cda5da8dSAndroid Build Coastguard Worker + tree 856*cda5da8dSAndroid Build Coastguard Worker + emu 857*cda5da8dSAndroid Build Coastguard Worker """ 858*cda5da8dSAndroid Build Coastguard Worker 859*cda5da8dSAndroid Build Coastguard Worker cruncher = SequenceMatcher(self.linejunk, a, b) 860*cda5da8dSAndroid Build Coastguard Worker for tag, alo, ahi, blo, bhi in cruncher.get_opcodes(): 861*cda5da8dSAndroid Build Coastguard Worker if tag == 'replace': 862*cda5da8dSAndroid Build Coastguard Worker g = self._fancy_replace(a, alo, ahi, b, blo, bhi) 863*cda5da8dSAndroid Build Coastguard Worker elif tag == 'delete': 864*cda5da8dSAndroid Build Coastguard Worker g = self._dump('-', a, alo, ahi) 865*cda5da8dSAndroid Build Coastguard Worker elif tag == 'insert': 866*cda5da8dSAndroid Build Coastguard Worker g = self._dump('+', b, blo, bhi) 867*cda5da8dSAndroid Build Coastguard Worker elif tag == 'equal': 868*cda5da8dSAndroid Build Coastguard Worker g = self._dump(' ', a, alo, ahi) 869*cda5da8dSAndroid Build Coastguard Worker else: 870*cda5da8dSAndroid Build Coastguard Worker raise ValueError('unknown tag %r' % (tag,)) 871*cda5da8dSAndroid Build Coastguard Worker 872*cda5da8dSAndroid Build Coastguard Worker yield from g 873*cda5da8dSAndroid Build Coastguard Worker 874*cda5da8dSAndroid Build Coastguard Worker def _dump(self, tag, x, lo, hi): 875*cda5da8dSAndroid Build Coastguard Worker """Generate comparison results for a same-tagged range.""" 876*cda5da8dSAndroid Build Coastguard Worker for i in range(lo, hi): 877*cda5da8dSAndroid Build Coastguard Worker yield '%s %s' % (tag, x[i]) 878*cda5da8dSAndroid Build Coastguard Worker 879*cda5da8dSAndroid Build Coastguard Worker def _plain_replace(self, a, alo, ahi, b, blo, bhi): 880*cda5da8dSAndroid Build Coastguard Worker assert alo < ahi and blo < bhi 881*cda5da8dSAndroid Build Coastguard Worker # dump the shorter block first -- reduces the burden on short-term 882*cda5da8dSAndroid Build Coastguard Worker # memory if the blocks are of very different sizes 883*cda5da8dSAndroid Build Coastguard Worker if bhi - blo < ahi - alo: 884*cda5da8dSAndroid Build Coastguard Worker first = self._dump('+', b, blo, bhi) 885*cda5da8dSAndroid Build Coastguard Worker second = self._dump('-', a, alo, ahi) 886*cda5da8dSAndroid Build Coastguard Worker else: 887*cda5da8dSAndroid Build Coastguard Worker first = self._dump('-', a, alo, ahi) 888*cda5da8dSAndroid Build Coastguard Worker second = self._dump('+', b, blo, bhi) 889*cda5da8dSAndroid Build Coastguard Worker 890*cda5da8dSAndroid Build Coastguard Worker for g in first, second: 891*cda5da8dSAndroid Build Coastguard Worker yield from g 892*cda5da8dSAndroid Build Coastguard Worker 893*cda5da8dSAndroid Build Coastguard Worker def _fancy_replace(self, a, alo, ahi, b, blo, bhi): 894*cda5da8dSAndroid Build Coastguard Worker r""" 895*cda5da8dSAndroid Build Coastguard Worker When replacing one block of lines with another, search the blocks 896*cda5da8dSAndroid Build Coastguard Worker for *similar* lines; the best-matching pair (if any) is used as a 897*cda5da8dSAndroid Build Coastguard Worker synch point, and intraline difference marking is done on the 898*cda5da8dSAndroid Build Coastguard Worker similar pair. Lots of work, but often worth it. 899*cda5da8dSAndroid Build Coastguard Worker 900*cda5da8dSAndroid Build Coastguard Worker Example: 901*cda5da8dSAndroid Build Coastguard Worker 902*cda5da8dSAndroid Build Coastguard Worker >>> d = Differ() 903*cda5da8dSAndroid Build Coastguard Worker >>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1, 904*cda5da8dSAndroid Build Coastguard Worker ... ['abcdefGhijkl\n'], 0, 1) 905*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(results), end="") 906*cda5da8dSAndroid Build Coastguard Worker - abcDefghiJkl 907*cda5da8dSAndroid Build Coastguard Worker ? ^ ^ ^ 908*cda5da8dSAndroid Build Coastguard Worker + abcdefGhijkl 909*cda5da8dSAndroid Build Coastguard Worker ? ^ ^ ^ 910*cda5da8dSAndroid Build Coastguard Worker """ 911*cda5da8dSAndroid Build Coastguard Worker 912*cda5da8dSAndroid Build Coastguard Worker # don't synch up unless the lines have a similarity score of at 913*cda5da8dSAndroid Build Coastguard Worker # least cutoff; best_ratio tracks the best score seen so far 914*cda5da8dSAndroid Build Coastguard Worker best_ratio, cutoff = 0.74, 0.75 915*cda5da8dSAndroid Build Coastguard Worker cruncher = SequenceMatcher(self.charjunk) 916*cda5da8dSAndroid Build Coastguard Worker eqi, eqj = None, None # 1st indices of equal lines (if any) 917*cda5da8dSAndroid Build Coastguard Worker 918*cda5da8dSAndroid Build Coastguard Worker # search for the pair that matches best without being identical 919*cda5da8dSAndroid Build Coastguard Worker # (identical lines must be junk lines, & we don't want to synch up 920*cda5da8dSAndroid Build Coastguard Worker # on junk -- unless we have to) 921*cda5da8dSAndroid Build Coastguard Worker for j in range(blo, bhi): 922*cda5da8dSAndroid Build Coastguard Worker bj = b[j] 923*cda5da8dSAndroid Build Coastguard Worker cruncher.set_seq2(bj) 924*cda5da8dSAndroid Build Coastguard Worker for i in range(alo, ahi): 925*cda5da8dSAndroid Build Coastguard Worker ai = a[i] 926*cda5da8dSAndroid Build Coastguard Worker if ai == bj: 927*cda5da8dSAndroid Build Coastguard Worker if eqi is None: 928*cda5da8dSAndroid Build Coastguard Worker eqi, eqj = i, j 929*cda5da8dSAndroid Build Coastguard Worker continue 930*cda5da8dSAndroid Build Coastguard Worker cruncher.set_seq1(ai) 931*cda5da8dSAndroid Build Coastguard Worker # computing similarity is expensive, so use the quick 932*cda5da8dSAndroid Build Coastguard Worker # upper bounds first -- have seen this speed up messy 933*cda5da8dSAndroid Build Coastguard Worker # compares by a factor of 3. 934*cda5da8dSAndroid Build Coastguard Worker # note that ratio() is only expensive to compute the first 935*cda5da8dSAndroid Build Coastguard Worker # time it's called on a sequence pair; the expensive part 936*cda5da8dSAndroid Build Coastguard Worker # of the computation is cached by cruncher 937*cda5da8dSAndroid Build Coastguard Worker if cruncher.real_quick_ratio() > best_ratio and \ 938*cda5da8dSAndroid Build Coastguard Worker cruncher.quick_ratio() > best_ratio and \ 939*cda5da8dSAndroid Build Coastguard Worker cruncher.ratio() > best_ratio: 940*cda5da8dSAndroid Build Coastguard Worker best_ratio, best_i, best_j = cruncher.ratio(), i, j 941*cda5da8dSAndroid Build Coastguard Worker if best_ratio < cutoff: 942*cda5da8dSAndroid Build Coastguard Worker # no non-identical "pretty close" pair 943*cda5da8dSAndroid Build Coastguard Worker if eqi is None: 944*cda5da8dSAndroid Build Coastguard Worker # no identical pair either -- treat it as a straight replace 945*cda5da8dSAndroid Build Coastguard Worker yield from self._plain_replace(a, alo, ahi, b, blo, bhi) 946*cda5da8dSAndroid Build Coastguard Worker return 947*cda5da8dSAndroid Build Coastguard Worker # no close pair, but an identical pair -- synch up on that 948*cda5da8dSAndroid Build Coastguard Worker best_i, best_j, best_ratio = eqi, eqj, 1.0 949*cda5da8dSAndroid Build Coastguard Worker else: 950*cda5da8dSAndroid Build Coastguard Worker # there's a close pair, so forget the identical pair (if any) 951*cda5da8dSAndroid Build Coastguard Worker eqi = None 952*cda5da8dSAndroid Build Coastguard Worker 953*cda5da8dSAndroid Build Coastguard Worker # a[best_i] very similar to b[best_j]; eqi is None iff they're not 954*cda5da8dSAndroid Build Coastguard Worker # identical 955*cda5da8dSAndroid Build Coastguard Worker 956*cda5da8dSAndroid Build Coastguard Worker # pump out diffs from before the synch point 957*cda5da8dSAndroid Build Coastguard Worker yield from self._fancy_helper(a, alo, best_i, b, blo, best_j) 958*cda5da8dSAndroid Build Coastguard Worker 959*cda5da8dSAndroid Build Coastguard Worker # do intraline marking on the synch pair 960*cda5da8dSAndroid Build Coastguard Worker aelt, belt = a[best_i], b[best_j] 961*cda5da8dSAndroid Build Coastguard Worker if eqi is None: 962*cda5da8dSAndroid Build Coastguard Worker # pump out a '-', '?', '+', '?' quad for the synched lines 963*cda5da8dSAndroid Build Coastguard Worker atags = btags = "" 964*cda5da8dSAndroid Build Coastguard Worker cruncher.set_seqs(aelt, belt) 965*cda5da8dSAndroid Build Coastguard Worker for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes(): 966*cda5da8dSAndroid Build Coastguard Worker la, lb = ai2 - ai1, bj2 - bj1 967*cda5da8dSAndroid Build Coastguard Worker if tag == 'replace': 968*cda5da8dSAndroid Build Coastguard Worker atags += '^' * la 969*cda5da8dSAndroid Build Coastguard Worker btags += '^' * lb 970*cda5da8dSAndroid Build Coastguard Worker elif tag == 'delete': 971*cda5da8dSAndroid Build Coastguard Worker atags += '-' * la 972*cda5da8dSAndroid Build Coastguard Worker elif tag == 'insert': 973*cda5da8dSAndroid Build Coastguard Worker btags += '+' * lb 974*cda5da8dSAndroid Build Coastguard Worker elif tag == 'equal': 975*cda5da8dSAndroid Build Coastguard Worker atags += ' ' * la 976*cda5da8dSAndroid Build Coastguard Worker btags += ' ' * lb 977*cda5da8dSAndroid Build Coastguard Worker else: 978*cda5da8dSAndroid Build Coastguard Worker raise ValueError('unknown tag %r' % (tag,)) 979*cda5da8dSAndroid Build Coastguard Worker yield from self._qformat(aelt, belt, atags, btags) 980*cda5da8dSAndroid Build Coastguard Worker else: 981*cda5da8dSAndroid Build Coastguard Worker # the synch pair is identical 982*cda5da8dSAndroid Build Coastguard Worker yield ' ' + aelt 983*cda5da8dSAndroid Build Coastguard Worker 984*cda5da8dSAndroid Build Coastguard Worker # pump out diffs from after the synch point 985*cda5da8dSAndroid Build Coastguard Worker yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi) 986*cda5da8dSAndroid Build Coastguard Worker 987*cda5da8dSAndroid Build Coastguard Worker def _fancy_helper(self, a, alo, ahi, b, blo, bhi): 988*cda5da8dSAndroid Build Coastguard Worker g = [] 989*cda5da8dSAndroid Build Coastguard Worker if alo < ahi: 990*cda5da8dSAndroid Build Coastguard Worker if blo < bhi: 991*cda5da8dSAndroid Build Coastguard Worker g = self._fancy_replace(a, alo, ahi, b, blo, bhi) 992*cda5da8dSAndroid Build Coastguard Worker else: 993*cda5da8dSAndroid Build Coastguard Worker g = self._dump('-', a, alo, ahi) 994*cda5da8dSAndroid Build Coastguard Worker elif blo < bhi: 995*cda5da8dSAndroid Build Coastguard Worker g = self._dump('+', b, blo, bhi) 996*cda5da8dSAndroid Build Coastguard Worker 997*cda5da8dSAndroid Build Coastguard Worker yield from g 998*cda5da8dSAndroid Build Coastguard Worker 999*cda5da8dSAndroid Build Coastguard Worker def _qformat(self, aline, bline, atags, btags): 1000*cda5da8dSAndroid Build Coastguard Worker r""" 1001*cda5da8dSAndroid Build Coastguard Worker Format "?" output and deal with tabs. 1002*cda5da8dSAndroid Build Coastguard Worker 1003*cda5da8dSAndroid Build Coastguard Worker Example: 1004*cda5da8dSAndroid Build Coastguard Worker 1005*cda5da8dSAndroid Build Coastguard Worker >>> d = Differ() 1006*cda5da8dSAndroid Build Coastguard Worker >>> results = d._qformat('\tabcDefghiJkl\n', '\tabcdefGhijkl\n', 1007*cda5da8dSAndroid Build Coastguard Worker ... ' ^ ^ ^ ', ' ^ ^ ^ ') 1008*cda5da8dSAndroid Build Coastguard Worker >>> for line in results: print(repr(line)) 1009*cda5da8dSAndroid Build Coastguard Worker ... 1010*cda5da8dSAndroid Build Coastguard Worker '- \tabcDefghiJkl\n' 1011*cda5da8dSAndroid Build Coastguard Worker '? \t ^ ^ ^\n' 1012*cda5da8dSAndroid Build Coastguard Worker '+ \tabcdefGhijkl\n' 1013*cda5da8dSAndroid Build Coastguard Worker '? \t ^ ^ ^\n' 1014*cda5da8dSAndroid Build Coastguard Worker """ 1015*cda5da8dSAndroid Build Coastguard Worker atags = _keep_original_ws(aline, atags).rstrip() 1016*cda5da8dSAndroid Build Coastguard Worker btags = _keep_original_ws(bline, btags).rstrip() 1017*cda5da8dSAndroid Build Coastguard Worker 1018*cda5da8dSAndroid Build Coastguard Worker yield "- " + aline 1019*cda5da8dSAndroid Build Coastguard Worker if atags: 1020*cda5da8dSAndroid Build Coastguard Worker yield f"? {atags}\n" 1021*cda5da8dSAndroid Build Coastguard Worker 1022*cda5da8dSAndroid Build Coastguard Worker yield "+ " + bline 1023*cda5da8dSAndroid Build Coastguard Worker if btags: 1024*cda5da8dSAndroid Build Coastguard Worker yield f"? {btags}\n" 1025*cda5da8dSAndroid Build Coastguard Worker 1026*cda5da8dSAndroid Build Coastguard Worker# With respect to junk, an earlier version of ndiff simply refused to 1027*cda5da8dSAndroid Build Coastguard Worker# *start* a match with a junk element. The result was cases like this: 1028*cda5da8dSAndroid Build Coastguard Worker# before: private Thread currentThread; 1029*cda5da8dSAndroid Build Coastguard Worker# after: private volatile Thread currentThread; 1030*cda5da8dSAndroid Build Coastguard Worker# If you consider whitespace to be junk, the longest contiguous match 1031*cda5da8dSAndroid Build Coastguard Worker# not starting with junk is "e Thread currentThread". So ndiff reported 1032*cda5da8dSAndroid Build Coastguard Worker# that "e volatil" was inserted between the 't' and the 'e' in "private". 1033*cda5da8dSAndroid Build Coastguard Worker# While an accurate view, to people that's absurd. The current version 1034*cda5da8dSAndroid Build Coastguard Worker# looks for matching blocks that are entirely junk-free, then extends the 1035*cda5da8dSAndroid Build Coastguard Worker# longest one of those as far as possible but only with matching junk. 1036*cda5da8dSAndroid Build Coastguard Worker# So now "currentThread" is matched, then extended to suck up the 1037*cda5da8dSAndroid Build Coastguard Worker# preceding blank; then "private" is matched, and extended to suck up the 1038*cda5da8dSAndroid Build Coastguard Worker# following blank; then "Thread" is matched; and finally ndiff reports 1039*cda5da8dSAndroid Build Coastguard Worker# that "volatile " was inserted before "Thread". The only quibble 1040*cda5da8dSAndroid Build Coastguard Worker# remaining is that perhaps it was really the case that " volatile" 1041*cda5da8dSAndroid Build Coastguard Worker# was inserted after "private". I can live with that <wink>. 1042*cda5da8dSAndroid Build Coastguard Worker 1043*cda5da8dSAndroid Build Coastguard Workerimport re 1044*cda5da8dSAndroid Build Coastguard Worker 1045*cda5da8dSAndroid Build Coastguard Workerdef IS_LINE_JUNK(line, pat=re.compile(r"\s*(?:#\s*)?$").match): 1046*cda5da8dSAndroid Build Coastguard Worker r""" 1047*cda5da8dSAndroid Build Coastguard Worker Return True for ignorable line: iff `line` is blank or contains a single '#'. 1048*cda5da8dSAndroid Build Coastguard Worker 1049*cda5da8dSAndroid Build Coastguard Worker Examples: 1050*cda5da8dSAndroid Build Coastguard Worker 1051*cda5da8dSAndroid Build Coastguard Worker >>> IS_LINE_JUNK('\n') 1052*cda5da8dSAndroid Build Coastguard Worker True 1053*cda5da8dSAndroid Build Coastguard Worker >>> IS_LINE_JUNK(' # \n') 1054*cda5da8dSAndroid Build Coastguard Worker True 1055*cda5da8dSAndroid Build Coastguard Worker >>> IS_LINE_JUNK('hello\n') 1056*cda5da8dSAndroid Build Coastguard Worker False 1057*cda5da8dSAndroid Build Coastguard Worker """ 1058*cda5da8dSAndroid Build Coastguard Worker 1059*cda5da8dSAndroid Build Coastguard Worker return pat(line) is not None 1060*cda5da8dSAndroid Build Coastguard Worker 1061*cda5da8dSAndroid Build Coastguard Workerdef IS_CHARACTER_JUNK(ch, ws=" \t"): 1062*cda5da8dSAndroid Build Coastguard Worker r""" 1063*cda5da8dSAndroid Build Coastguard Worker Return True for ignorable character: iff `ch` is a space or tab. 1064*cda5da8dSAndroid Build Coastguard Worker 1065*cda5da8dSAndroid Build Coastguard Worker Examples: 1066*cda5da8dSAndroid Build Coastguard Worker 1067*cda5da8dSAndroid Build Coastguard Worker >>> IS_CHARACTER_JUNK(' ') 1068*cda5da8dSAndroid Build Coastguard Worker True 1069*cda5da8dSAndroid Build Coastguard Worker >>> IS_CHARACTER_JUNK('\t') 1070*cda5da8dSAndroid Build Coastguard Worker True 1071*cda5da8dSAndroid Build Coastguard Worker >>> IS_CHARACTER_JUNK('\n') 1072*cda5da8dSAndroid Build Coastguard Worker False 1073*cda5da8dSAndroid Build Coastguard Worker >>> IS_CHARACTER_JUNK('x') 1074*cda5da8dSAndroid Build Coastguard Worker False 1075*cda5da8dSAndroid Build Coastguard Worker """ 1076*cda5da8dSAndroid Build Coastguard Worker 1077*cda5da8dSAndroid Build Coastguard Worker return ch in ws 1078*cda5da8dSAndroid Build Coastguard Worker 1079*cda5da8dSAndroid Build Coastguard Worker 1080*cda5da8dSAndroid Build Coastguard Worker######################################################################## 1081*cda5da8dSAndroid Build Coastguard Worker### Unified Diff 1082*cda5da8dSAndroid Build Coastguard Worker######################################################################## 1083*cda5da8dSAndroid Build Coastguard Worker 1084*cda5da8dSAndroid Build Coastguard Workerdef _format_range_unified(start, stop): 1085*cda5da8dSAndroid Build Coastguard Worker 'Convert range to the "ed" format' 1086*cda5da8dSAndroid Build Coastguard Worker # Per the diff spec at http://www.unix.org/single_unix_specification/ 1087*cda5da8dSAndroid Build Coastguard Worker beginning = start + 1 # lines start numbering with one 1088*cda5da8dSAndroid Build Coastguard Worker length = stop - start 1089*cda5da8dSAndroid Build Coastguard Worker if length == 1: 1090*cda5da8dSAndroid Build Coastguard Worker return '{}'.format(beginning) 1091*cda5da8dSAndroid Build Coastguard Worker if not length: 1092*cda5da8dSAndroid Build Coastguard Worker beginning -= 1 # empty ranges begin at line just before the range 1093*cda5da8dSAndroid Build Coastguard Worker return '{},{}'.format(beginning, length) 1094*cda5da8dSAndroid Build Coastguard Worker 1095*cda5da8dSAndroid Build Coastguard Workerdef unified_diff(a, b, fromfile='', tofile='', fromfiledate='', 1096*cda5da8dSAndroid Build Coastguard Worker tofiledate='', n=3, lineterm='\n'): 1097*cda5da8dSAndroid Build Coastguard Worker r""" 1098*cda5da8dSAndroid Build Coastguard Worker Compare two sequences of lines; generate the delta as a unified diff. 1099*cda5da8dSAndroid Build Coastguard Worker 1100*cda5da8dSAndroid Build Coastguard Worker Unified diffs are a compact way of showing line changes and a few 1101*cda5da8dSAndroid Build Coastguard Worker lines of context. The number of context lines is set by 'n' which 1102*cda5da8dSAndroid Build Coastguard Worker defaults to three. 1103*cda5da8dSAndroid Build Coastguard Worker 1104*cda5da8dSAndroid Build Coastguard Worker By default, the diff control lines (those with ---, +++, or @@) are 1105*cda5da8dSAndroid Build Coastguard Worker created with a trailing newline. This is helpful so that inputs 1106*cda5da8dSAndroid Build Coastguard Worker created from file.readlines() result in diffs that are suitable for 1107*cda5da8dSAndroid Build Coastguard Worker file.writelines() since both the inputs and outputs have trailing 1108*cda5da8dSAndroid Build Coastguard Worker newlines. 1109*cda5da8dSAndroid Build Coastguard Worker 1110*cda5da8dSAndroid Build Coastguard Worker For inputs that do not have trailing newlines, set the lineterm 1111*cda5da8dSAndroid Build Coastguard Worker argument to "" so that the output will be uniformly newline free. 1112*cda5da8dSAndroid Build Coastguard Worker 1113*cda5da8dSAndroid Build Coastguard Worker The unidiff format normally has a header for filenames and modification 1114*cda5da8dSAndroid Build Coastguard Worker times. Any or all of these may be specified using strings for 1115*cda5da8dSAndroid Build Coastguard Worker 'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'. 1116*cda5da8dSAndroid Build Coastguard Worker The modification times are normally expressed in the ISO 8601 format. 1117*cda5da8dSAndroid Build Coastguard Worker 1118*cda5da8dSAndroid Build Coastguard Worker Example: 1119*cda5da8dSAndroid Build Coastguard Worker 1120*cda5da8dSAndroid Build Coastguard Worker >>> for line in unified_diff('one two three four'.split(), 1121*cda5da8dSAndroid Build Coastguard Worker ... 'zero one tree four'.split(), 'Original', 'Current', 1122*cda5da8dSAndroid Build Coastguard Worker ... '2005-01-26 23:30:50', '2010-04-02 10:20:52', 1123*cda5da8dSAndroid Build Coastguard Worker ... lineterm=''): 1124*cda5da8dSAndroid Build Coastguard Worker ... print(line) # doctest: +NORMALIZE_WHITESPACE 1125*cda5da8dSAndroid Build Coastguard Worker --- Original 2005-01-26 23:30:50 1126*cda5da8dSAndroid Build Coastguard Worker +++ Current 2010-04-02 10:20:52 1127*cda5da8dSAndroid Build Coastguard Worker @@ -1,4 +1,4 @@ 1128*cda5da8dSAndroid Build Coastguard Worker +zero 1129*cda5da8dSAndroid Build Coastguard Worker one 1130*cda5da8dSAndroid Build Coastguard Worker -two 1131*cda5da8dSAndroid Build Coastguard Worker -three 1132*cda5da8dSAndroid Build Coastguard Worker +tree 1133*cda5da8dSAndroid Build Coastguard Worker four 1134*cda5da8dSAndroid Build Coastguard Worker """ 1135*cda5da8dSAndroid Build Coastguard Worker 1136*cda5da8dSAndroid Build Coastguard Worker _check_types(a, b, fromfile, tofile, fromfiledate, tofiledate, lineterm) 1137*cda5da8dSAndroid Build Coastguard Worker started = False 1138*cda5da8dSAndroid Build Coastguard Worker for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n): 1139*cda5da8dSAndroid Build Coastguard Worker if not started: 1140*cda5da8dSAndroid Build Coastguard Worker started = True 1141*cda5da8dSAndroid Build Coastguard Worker fromdate = '\t{}'.format(fromfiledate) if fromfiledate else '' 1142*cda5da8dSAndroid Build Coastguard Worker todate = '\t{}'.format(tofiledate) if tofiledate else '' 1143*cda5da8dSAndroid Build Coastguard Worker yield '--- {}{}{}'.format(fromfile, fromdate, lineterm) 1144*cda5da8dSAndroid Build Coastguard Worker yield '+++ {}{}{}'.format(tofile, todate, lineterm) 1145*cda5da8dSAndroid Build Coastguard Worker 1146*cda5da8dSAndroid Build Coastguard Worker first, last = group[0], group[-1] 1147*cda5da8dSAndroid Build Coastguard Worker file1_range = _format_range_unified(first[1], last[2]) 1148*cda5da8dSAndroid Build Coastguard Worker file2_range = _format_range_unified(first[3], last[4]) 1149*cda5da8dSAndroid Build Coastguard Worker yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm) 1150*cda5da8dSAndroid Build Coastguard Worker 1151*cda5da8dSAndroid Build Coastguard Worker for tag, i1, i2, j1, j2 in group: 1152*cda5da8dSAndroid Build Coastguard Worker if tag == 'equal': 1153*cda5da8dSAndroid Build Coastguard Worker for line in a[i1:i2]: 1154*cda5da8dSAndroid Build Coastguard Worker yield ' ' + line 1155*cda5da8dSAndroid Build Coastguard Worker continue 1156*cda5da8dSAndroid Build Coastguard Worker if tag in {'replace', 'delete'}: 1157*cda5da8dSAndroid Build Coastguard Worker for line in a[i1:i2]: 1158*cda5da8dSAndroid Build Coastguard Worker yield '-' + line 1159*cda5da8dSAndroid Build Coastguard Worker if tag in {'replace', 'insert'}: 1160*cda5da8dSAndroid Build Coastguard Worker for line in b[j1:j2]: 1161*cda5da8dSAndroid Build Coastguard Worker yield '+' + line 1162*cda5da8dSAndroid Build Coastguard Worker 1163*cda5da8dSAndroid Build Coastguard Worker 1164*cda5da8dSAndroid Build Coastguard Worker######################################################################## 1165*cda5da8dSAndroid Build Coastguard Worker### Context Diff 1166*cda5da8dSAndroid Build Coastguard Worker######################################################################## 1167*cda5da8dSAndroid Build Coastguard Worker 1168*cda5da8dSAndroid Build Coastguard Workerdef _format_range_context(start, stop): 1169*cda5da8dSAndroid Build Coastguard Worker 'Convert range to the "ed" format' 1170*cda5da8dSAndroid Build Coastguard Worker # Per the diff spec at http://www.unix.org/single_unix_specification/ 1171*cda5da8dSAndroid Build Coastguard Worker beginning = start + 1 # lines start numbering with one 1172*cda5da8dSAndroid Build Coastguard Worker length = stop - start 1173*cda5da8dSAndroid Build Coastguard Worker if not length: 1174*cda5da8dSAndroid Build Coastguard Worker beginning -= 1 # empty ranges begin at line just before the range 1175*cda5da8dSAndroid Build Coastguard Worker if length <= 1: 1176*cda5da8dSAndroid Build Coastguard Worker return '{}'.format(beginning) 1177*cda5da8dSAndroid Build Coastguard Worker return '{},{}'.format(beginning, beginning + length - 1) 1178*cda5da8dSAndroid Build Coastguard Worker 1179*cda5da8dSAndroid Build Coastguard Worker# See http://www.unix.org/single_unix_specification/ 1180*cda5da8dSAndroid Build Coastguard Workerdef context_diff(a, b, fromfile='', tofile='', 1181*cda5da8dSAndroid Build Coastguard Worker fromfiledate='', tofiledate='', n=3, lineterm='\n'): 1182*cda5da8dSAndroid Build Coastguard Worker r""" 1183*cda5da8dSAndroid Build Coastguard Worker Compare two sequences of lines; generate the delta as a context diff. 1184*cda5da8dSAndroid Build Coastguard Worker 1185*cda5da8dSAndroid Build Coastguard Worker Context diffs are a compact way of showing line changes and a few 1186*cda5da8dSAndroid Build Coastguard Worker lines of context. The number of context lines is set by 'n' which 1187*cda5da8dSAndroid Build Coastguard Worker defaults to three. 1188*cda5da8dSAndroid Build Coastguard Worker 1189*cda5da8dSAndroid Build Coastguard Worker By default, the diff control lines (those with *** or ---) are 1190*cda5da8dSAndroid Build Coastguard Worker created with a trailing newline. This is helpful so that inputs 1191*cda5da8dSAndroid Build Coastguard Worker created from file.readlines() result in diffs that are suitable for 1192*cda5da8dSAndroid Build Coastguard Worker file.writelines() since both the inputs and outputs have trailing 1193*cda5da8dSAndroid Build Coastguard Worker newlines. 1194*cda5da8dSAndroid Build Coastguard Worker 1195*cda5da8dSAndroid Build Coastguard Worker For inputs that do not have trailing newlines, set the lineterm 1196*cda5da8dSAndroid Build Coastguard Worker argument to "" so that the output will be uniformly newline free. 1197*cda5da8dSAndroid Build Coastguard Worker 1198*cda5da8dSAndroid Build Coastguard Worker The context diff format normally has a header for filenames and 1199*cda5da8dSAndroid Build Coastguard Worker modification times. Any or all of these may be specified using 1200*cda5da8dSAndroid Build Coastguard Worker strings for 'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'. 1201*cda5da8dSAndroid Build Coastguard Worker The modification times are normally expressed in the ISO 8601 format. 1202*cda5da8dSAndroid Build Coastguard Worker If not specified, the strings default to blanks. 1203*cda5da8dSAndroid Build Coastguard Worker 1204*cda5da8dSAndroid Build Coastguard Worker Example: 1205*cda5da8dSAndroid Build Coastguard Worker 1206*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(context_diff('one\ntwo\nthree\nfour\n'.splitlines(True), 1207*cda5da8dSAndroid Build Coastguard Worker ... 'zero\none\ntree\nfour\n'.splitlines(True), 'Original', 'Current')), 1208*cda5da8dSAndroid Build Coastguard Worker ... end="") 1209*cda5da8dSAndroid Build Coastguard Worker *** Original 1210*cda5da8dSAndroid Build Coastguard Worker --- Current 1211*cda5da8dSAndroid Build Coastguard Worker *************** 1212*cda5da8dSAndroid Build Coastguard Worker *** 1,4 **** 1213*cda5da8dSAndroid Build Coastguard Worker one 1214*cda5da8dSAndroid Build Coastguard Worker ! two 1215*cda5da8dSAndroid Build Coastguard Worker ! three 1216*cda5da8dSAndroid Build Coastguard Worker four 1217*cda5da8dSAndroid Build Coastguard Worker --- 1,4 ---- 1218*cda5da8dSAndroid Build Coastguard Worker + zero 1219*cda5da8dSAndroid Build Coastguard Worker one 1220*cda5da8dSAndroid Build Coastguard Worker ! tree 1221*cda5da8dSAndroid Build Coastguard Worker four 1222*cda5da8dSAndroid Build Coastguard Worker """ 1223*cda5da8dSAndroid Build Coastguard Worker 1224*cda5da8dSAndroid Build Coastguard Worker _check_types(a, b, fromfile, tofile, fromfiledate, tofiledate, lineterm) 1225*cda5da8dSAndroid Build Coastguard Worker prefix = dict(insert='+ ', delete='- ', replace='! ', equal=' ') 1226*cda5da8dSAndroid Build Coastguard Worker started = False 1227*cda5da8dSAndroid Build Coastguard Worker for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n): 1228*cda5da8dSAndroid Build Coastguard Worker if not started: 1229*cda5da8dSAndroid Build Coastguard Worker started = True 1230*cda5da8dSAndroid Build Coastguard Worker fromdate = '\t{}'.format(fromfiledate) if fromfiledate else '' 1231*cda5da8dSAndroid Build Coastguard Worker todate = '\t{}'.format(tofiledate) if tofiledate else '' 1232*cda5da8dSAndroid Build Coastguard Worker yield '*** {}{}{}'.format(fromfile, fromdate, lineterm) 1233*cda5da8dSAndroid Build Coastguard Worker yield '--- {}{}{}'.format(tofile, todate, lineterm) 1234*cda5da8dSAndroid Build Coastguard Worker 1235*cda5da8dSAndroid Build Coastguard Worker first, last = group[0], group[-1] 1236*cda5da8dSAndroid Build Coastguard Worker yield '***************' + lineterm 1237*cda5da8dSAndroid Build Coastguard Worker 1238*cda5da8dSAndroid Build Coastguard Worker file1_range = _format_range_context(first[1], last[2]) 1239*cda5da8dSAndroid Build Coastguard Worker yield '*** {} ****{}'.format(file1_range, lineterm) 1240*cda5da8dSAndroid Build Coastguard Worker 1241*cda5da8dSAndroid Build Coastguard Worker if any(tag in {'replace', 'delete'} for tag, _, _, _, _ in group): 1242*cda5da8dSAndroid Build Coastguard Worker for tag, i1, i2, _, _ in group: 1243*cda5da8dSAndroid Build Coastguard Worker if tag != 'insert': 1244*cda5da8dSAndroid Build Coastguard Worker for line in a[i1:i2]: 1245*cda5da8dSAndroid Build Coastguard Worker yield prefix[tag] + line 1246*cda5da8dSAndroid Build Coastguard Worker 1247*cda5da8dSAndroid Build Coastguard Worker file2_range = _format_range_context(first[3], last[4]) 1248*cda5da8dSAndroid Build Coastguard Worker yield '--- {} ----{}'.format(file2_range, lineterm) 1249*cda5da8dSAndroid Build Coastguard Worker 1250*cda5da8dSAndroid Build Coastguard Worker if any(tag in {'replace', 'insert'} for tag, _, _, _, _ in group): 1251*cda5da8dSAndroid Build Coastguard Worker for tag, _, _, j1, j2 in group: 1252*cda5da8dSAndroid Build Coastguard Worker if tag != 'delete': 1253*cda5da8dSAndroid Build Coastguard Worker for line in b[j1:j2]: 1254*cda5da8dSAndroid Build Coastguard Worker yield prefix[tag] + line 1255*cda5da8dSAndroid Build Coastguard Worker 1256*cda5da8dSAndroid Build Coastguard Workerdef _check_types(a, b, *args): 1257*cda5da8dSAndroid Build Coastguard Worker # Checking types is weird, but the alternative is garbled output when 1258*cda5da8dSAndroid Build Coastguard Worker # someone passes mixed bytes and str to {unified,context}_diff(). E.g. 1259*cda5da8dSAndroid Build Coastguard Worker # without this check, passing filenames as bytes results in output like 1260*cda5da8dSAndroid Build Coastguard Worker # --- b'oldfile.txt' 1261*cda5da8dSAndroid Build Coastguard Worker # +++ b'newfile.txt' 1262*cda5da8dSAndroid Build Coastguard Worker # because of how str.format() incorporates bytes objects. 1263*cda5da8dSAndroid Build Coastguard Worker if a and not isinstance(a[0], str): 1264*cda5da8dSAndroid Build Coastguard Worker raise TypeError('lines to compare must be str, not %s (%r)' % 1265*cda5da8dSAndroid Build Coastguard Worker (type(a[0]).__name__, a[0])) 1266*cda5da8dSAndroid Build Coastguard Worker if b and not isinstance(b[0], str): 1267*cda5da8dSAndroid Build Coastguard Worker raise TypeError('lines to compare must be str, not %s (%r)' % 1268*cda5da8dSAndroid Build Coastguard Worker (type(b[0]).__name__, b[0])) 1269*cda5da8dSAndroid Build Coastguard Worker for arg in args: 1270*cda5da8dSAndroid Build Coastguard Worker if not isinstance(arg, str): 1271*cda5da8dSAndroid Build Coastguard Worker raise TypeError('all arguments must be str, not: %r' % (arg,)) 1272*cda5da8dSAndroid Build Coastguard Worker 1273*cda5da8dSAndroid Build Coastguard Workerdef diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', 1274*cda5da8dSAndroid Build Coastguard Worker fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n'): 1275*cda5da8dSAndroid Build Coastguard Worker r""" 1276*cda5da8dSAndroid Build Coastguard Worker Compare `a` and `b`, two sequences of lines represented as bytes rather 1277*cda5da8dSAndroid Build Coastguard Worker than str. This is a wrapper for `dfunc`, which is typically either 1278*cda5da8dSAndroid Build Coastguard Worker unified_diff() or context_diff(). Inputs are losslessly converted to 1279*cda5da8dSAndroid Build Coastguard Worker strings so that `dfunc` only has to worry about strings, and encoded 1280*cda5da8dSAndroid Build Coastguard Worker back to bytes on return. This is necessary to compare files with 1281*cda5da8dSAndroid Build Coastguard Worker unknown or inconsistent encoding. All other inputs (except `n`) must be 1282*cda5da8dSAndroid Build Coastguard Worker bytes rather than str. 1283*cda5da8dSAndroid Build Coastguard Worker """ 1284*cda5da8dSAndroid Build Coastguard Worker def decode(s): 1285*cda5da8dSAndroid Build Coastguard Worker try: 1286*cda5da8dSAndroid Build Coastguard Worker return s.decode('ascii', 'surrogateescape') 1287*cda5da8dSAndroid Build Coastguard Worker except AttributeError as err: 1288*cda5da8dSAndroid Build Coastguard Worker msg = ('all arguments must be bytes, not %s (%r)' % 1289*cda5da8dSAndroid Build Coastguard Worker (type(s).__name__, s)) 1290*cda5da8dSAndroid Build Coastguard Worker raise TypeError(msg) from err 1291*cda5da8dSAndroid Build Coastguard Worker a = list(map(decode, a)) 1292*cda5da8dSAndroid Build Coastguard Worker b = list(map(decode, b)) 1293*cda5da8dSAndroid Build Coastguard Worker fromfile = decode(fromfile) 1294*cda5da8dSAndroid Build Coastguard Worker tofile = decode(tofile) 1295*cda5da8dSAndroid Build Coastguard Worker fromfiledate = decode(fromfiledate) 1296*cda5da8dSAndroid Build Coastguard Worker tofiledate = decode(tofiledate) 1297*cda5da8dSAndroid Build Coastguard Worker lineterm = decode(lineterm) 1298*cda5da8dSAndroid Build Coastguard Worker 1299*cda5da8dSAndroid Build Coastguard Worker lines = dfunc(a, b, fromfile, tofile, fromfiledate, tofiledate, n, lineterm) 1300*cda5da8dSAndroid Build Coastguard Worker for line in lines: 1301*cda5da8dSAndroid Build Coastguard Worker yield line.encode('ascii', 'surrogateescape') 1302*cda5da8dSAndroid Build Coastguard Worker 1303*cda5da8dSAndroid Build Coastguard Workerdef ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK): 1304*cda5da8dSAndroid Build Coastguard Worker r""" 1305*cda5da8dSAndroid Build Coastguard Worker Compare `a` and `b` (lists of strings); return a `Differ`-style delta. 1306*cda5da8dSAndroid Build Coastguard Worker 1307*cda5da8dSAndroid Build Coastguard Worker Optional keyword parameters `linejunk` and `charjunk` are for filter 1308*cda5da8dSAndroid Build Coastguard Worker functions, or can be None: 1309*cda5da8dSAndroid Build Coastguard Worker 1310*cda5da8dSAndroid Build Coastguard Worker - linejunk: A function that should accept a single string argument and 1311*cda5da8dSAndroid Build Coastguard Worker return true iff the string is junk. The default is None, and is 1312*cda5da8dSAndroid Build Coastguard Worker recommended; the underlying SequenceMatcher class has an adaptive 1313*cda5da8dSAndroid Build Coastguard Worker notion of "noise" lines. 1314*cda5da8dSAndroid Build Coastguard Worker 1315*cda5da8dSAndroid Build Coastguard Worker - charjunk: A function that accepts a character (string of length 1316*cda5da8dSAndroid Build Coastguard Worker 1), and returns true iff the character is junk. The default is 1317*cda5da8dSAndroid Build Coastguard Worker the module-level function IS_CHARACTER_JUNK, which filters out 1318*cda5da8dSAndroid Build Coastguard Worker whitespace characters (a blank or tab; note: it's a bad idea to 1319*cda5da8dSAndroid Build Coastguard Worker include newline in this!). 1320*cda5da8dSAndroid Build Coastguard Worker 1321*cda5da8dSAndroid Build Coastguard Worker Tools/scripts/ndiff.py is a command-line front-end to this function. 1322*cda5da8dSAndroid Build Coastguard Worker 1323*cda5da8dSAndroid Build Coastguard Worker Example: 1324*cda5da8dSAndroid Build Coastguard Worker 1325*cda5da8dSAndroid Build Coastguard Worker >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 1326*cda5da8dSAndroid Build Coastguard Worker ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 1327*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(diff), end="") 1328*cda5da8dSAndroid Build Coastguard Worker - one 1329*cda5da8dSAndroid Build Coastguard Worker ? ^ 1330*cda5da8dSAndroid Build Coastguard Worker + ore 1331*cda5da8dSAndroid Build Coastguard Worker ? ^ 1332*cda5da8dSAndroid Build Coastguard Worker - two 1333*cda5da8dSAndroid Build Coastguard Worker - three 1334*cda5da8dSAndroid Build Coastguard Worker ? - 1335*cda5da8dSAndroid Build Coastguard Worker + tree 1336*cda5da8dSAndroid Build Coastguard Worker + emu 1337*cda5da8dSAndroid Build Coastguard Worker """ 1338*cda5da8dSAndroid Build Coastguard Worker return Differ(linejunk, charjunk).compare(a, b) 1339*cda5da8dSAndroid Build Coastguard Worker 1340*cda5da8dSAndroid Build Coastguard Workerdef _mdiff(fromlines, tolines, context=None, linejunk=None, 1341*cda5da8dSAndroid Build Coastguard Worker charjunk=IS_CHARACTER_JUNK): 1342*cda5da8dSAndroid Build Coastguard Worker r"""Returns generator yielding marked up from/to side by side differences. 1343*cda5da8dSAndroid Build Coastguard Worker 1344*cda5da8dSAndroid Build Coastguard Worker Arguments: 1345*cda5da8dSAndroid Build Coastguard Worker fromlines -- list of text lines to compared to tolines 1346*cda5da8dSAndroid Build Coastguard Worker tolines -- list of text lines to be compared to fromlines 1347*cda5da8dSAndroid Build Coastguard Worker context -- number of context lines to display on each side of difference, 1348*cda5da8dSAndroid Build Coastguard Worker if None, all from/to text lines will be generated. 1349*cda5da8dSAndroid Build Coastguard Worker linejunk -- passed on to ndiff (see ndiff documentation) 1350*cda5da8dSAndroid Build Coastguard Worker charjunk -- passed on to ndiff (see ndiff documentation) 1351*cda5da8dSAndroid Build Coastguard Worker 1352*cda5da8dSAndroid Build Coastguard Worker This function returns an iterator which returns a tuple: 1353*cda5da8dSAndroid Build Coastguard Worker (from line tuple, to line tuple, boolean flag) 1354*cda5da8dSAndroid Build Coastguard Worker 1355*cda5da8dSAndroid Build Coastguard Worker from/to line tuple -- (line num, line text) 1356*cda5da8dSAndroid Build Coastguard Worker line num -- integer or None (to indicate a context separation) 1357*cda5da8dSAndroid Build Coastguard Worker line text -- original line text with following markers inserted: 1358*cda5da8dSAndroid Build Coastguard Worker '\0+' -- marks start of added text 1359*cda5da8dSAndroid Build Coastguard Worker '\0-' -- marks start of deleted text 1360*cda5da8dSAndroid Build Coastguard Worker '\0^' -- marks start of changed text 1361*cda5da8dSAndroid Build Coastguard Worker '\1' -- marks end of added/deleted/changed text 1362*cda5da8dSAndroid Build Coastguard Worker 1363*cda5da8dSAndroid Build Coastguard Worker boolean flag -- None indicates context separation, True indicates 1364*cda5da8dSAndroid Build Coastguard Worker either "from" or "to" line contains a change, otherwise False. 1365*cda5da8dSAndroid Build Coastguard Worker 1366*cda5da8dSAndroid Build Coastguard Worker This function/iterator was originally developed to generate side by side 1367*cda5da8dSAndroid Build Coastguard Worker file difference for making HTML pages (see HtmlDiff class for example 1368*cda5da8dSAndroid Build Coastguard Worker usage). 1369*cda5da8dSAndroid Build Coastguard Worker 1370*cda5da8dSAndroid Build Coastguard Worker Note, this function utilizes the ndiff function to generate the side by 1371*cda5da8dSAndroid Build Coastguard Worker side difference markup. Optional ndiff arguments may be passed to this 1372*cda5da8dSAndroid Build Coastguard Worker function and they in turn will be passed to ndiff. 1373*cda5da8dSAndroid Build Coastguard Worker """ 1374*cda5da8dSAndroid Build Coastguard Worker import re 1375*cda5da8dSAndroid Build Coastguard Worker 1376*cda5da8dSAndroid Build Coastguard Worker # regular expression for finding intraline change indices 1377*cda5da8dSAndroid Build Coastguard Worker change_re = re.compile(r'(\++|\-+|\^+)') 1378*cda5da8dSAndroid Build Coastguard Worker 1379*cda5da8dSAndroid Build Coastguard Worker # create the difference iterator to generate the differences 1380*cda5da8dSAndroid Build Coastguard Worker diff_lines_iterator = ndiff(fromlines,tolines,linejunk,charjunk) 1381*cda5da8dSAndroid Build Coastguard Worker 1382*cda5da8dSAndroid Build Coastguard Worker def _make_line(lines, format_key, side, num_lines=[0,0]): 1383*cda5da8dSAndroid Build Coastguard Worker """Returns line of text with user's change markup and line formatting. 1384*cda5da8dSAndroid Build Coastguard Worker 1385*cda5da8dSAndroid Build Coastguard Worker lines -- list of lines from the ndiff generator to produce a line of 1386*cda5da8dSAndroid Build Coastguard Worker text from. When producing the line of text to return, the 1387*cda5da8dSAndroid Build Coastguard Worker lines used are removed from this list. 1388*cda5da8dSAndroid Build Coastguard Worker format_key -- '+' return first line in list with "add" markup around 1389*cda5da8dSAndroid Build Coastguard Worker the entire line. 1390*cda5da8dSAndroid Build Coastguard Worker '-' return first line in list with "delete" markup around 1391*cda5da8dSAndroid Build Coastguard Worker the entire line. 1392*cda5da8dSAndroid Build Coastguard Worker '?' return first line in list with add/delete/change 1393*cda5da8dSAndroid Build Coastguard Worker intraline markup (indices obtained from second line) 1394*cda5da8dSAndroid Build Coastguard Worker None return first line in list with no markup 1395*cda5da8dSAndroid Build Coastguard Worker side -- indice into the num_lines list (0=from,1=to) 1396*cda5da8dSAndroid Build Coastguard Worker num_lines -- from/to current line number. This is NOT intended to be a 1397*cda5da8dSAndroid Build Coastguard Worker passed parameter. It is present as a keyword argument to 1398*cda5da8dSAndroid Build Coastguard Worker maintain memory of the current line numbers between calls 1399*cda5da8dSAndroid Build Coastguard Worker of this function. 1400*cda5da8dSAndroid Build Coastguard Worker 1401*cda5da8dSAndroid Build Coastguard Worker Note, this function is purposefully not defined at the module scope so 1402*cda5da8dSAndroid Build Coastguard Worker that data it needs from its parent function (within whose context it 1403*cda5da8dSAndroid Build Coastguard Worker is defined) does not need to be of module scope. 1404*cda5da8dSAndroid Build Coastguard Worker """ 1405*cda5da8dSAndroid Build Coastguard Worker num_lines[side] += 1 1406*cda5da8dSAndroid Build Coastguard Worker # Handle case where no user markup is to be added, just return line of 1407*cda5da8dSAndroid Build Coastguard Worker # text with user's line format to allow for usage of the line number. 1408*cda5da8dSAndroid Build Coastguard Worker if format_key is None: 1409*cda5da8dSAndroid Build Coastguard Worker return (num_lines[side],lines.pop(0)[2:]) 1410*cda5da8dSAndroid Build Coastguard Worker # Handle case of intraline changes 1411*cda5da8dSAndroid Build Coastguard Worker if format_key == '?': 1412*cda5da8dSAndroid Build Coastguard Worker text, markers = lines.pop(0), lines.pop(0) 1413*cda5da8dSAndroid Build Coastguard Worker # find intraline changes (store change type and indices in tuples) 1414*cda5da8dSAndroid Build Coastguard Worker sub_info = [] 1415*cda5da8dSAndroid Build Coastguard Worker def record_sub_info(match_object,sub_info=sub_info): 1416*cda5da8dSAndroid Build Coastguard Worker sub_info.append([match_object.group(1)[0],match_object.span()]) 1417*cda5da8dSAndroid Build Coastguard Worker return match_object.group(1) 1418*cda5da8dSAndroid Build Coastguard Worker change_re.sub(record_sub_info,markers) 1419*cda5da8dSAndroid Build Coastguard Worker # process each tuple inserting our special marks that won't be 1420*cda5da8dSAndroid Build Coastguard Worker # noticed by an xml/html escaper. 1421*cda5da8dSAndroid Build Coastguard Worker for key,(begin,end) in reversed(sub_info): 1422*cda5da8dSAndroid Build Coastguard Worker text = text[0:begin]+'\0'+key+text[begin:end]+'\1'+text[end:] 1423*cda5da8dSAndroid Build Coastguard Worker text = text[2:] 1424*cda5da8dSAndroid Build Coastguard Worker # Handle case of add/delete entire line 1425*cda5da8dSAndroid Build Coastguard Worker else: 1426*cda5da8dSAndroid Build Coastguard Worker text = lines.pop(0)[2:] 1427*cda5da8dSAndroid Build Coastguard Worker # if line of text is just a newline, insert a space so there is 1428*cda5da8dSAndroid Build Coastguard Worker # something for the user to highlight and see. 1429*cda5da8dSAndroid Build Coastguard Worker if not text: 1430*cda5da8dSAndroid Build Coastguard Worker text = ' ' 1431*cda5da8dSAndroid Build Coastguard Worker # insert marks that won't be noticed by an xml/html escaper. 1432*cda5da8dSAndroid Build Coastguard Worker text = '\0' + format_key + text + '\1' 1433*cda5da8dSAndroid Build Coastguard Worker # Return line of text, first allow user's line formatter to do its 1434*cda5da8dSAndroid Build Coastguard Worker # thing (such as adding the line number) then replace the special 1435*cda5da8dSAndroid Build Coastguard Worker # marks with what the user's change markup. 1436*cda5da8dSAndroid Build Coastguard Worker return (num_lines[side],text) 1437*cda5da8dSAndroid Build Coastguard Worker 1438*cda5da8dSAndroid Build Coastguard Worker def _line_iterator(): 1439*cda5da8dSAndroid Build Coastguard Worker """Yields from/to lines of text with a change indication. 1440*cda5da8dSAndroid Build Coastguard Worker 1441*cda5da8dSAndroid Build Coastguard Worker This function is an iterator. It itself pulls lines from a 1442*cda5da8dSAndroid Build Coastguard Worker differencing iterator, processes them and yields them. When it can 1443*cda5da8dSAndroid Build Coastguard Worker it yields both a "from" and a "to" line, otherwise it will yield one 1444*cda5da8dSAndroid Build Coastguard Worker or the other. In addition to yielding the lines of from/to text, a 1445*cda5da8dSAndroid Build Coastguard Worker boolean flag is yielded to indicate if the text line(s) have 1446*cda5da8dSAndroid Build Coastguard Worker differences in them. 1447*cda5da8dSAndroid Build Coastguard Worker 1448*cda5da8dSAndroid Build Coastguard Worker Note, this function is purposefully not defined at the module scope so 1449*cda5da8dSAndroid Build Coastguard Worker that data it needs from its parent function (within whose context it 1450*cda5da8dSAndroid Build Coastguard Worker is defined) does not need to be of module scope. 1451*cda5da8dSAndroid Build Coastguard Worker """ 1452*cda5da8dSAndroid Build Coastguard Worker lines = [] 1453*cda5da8dSAndroid Build Coastguard Worker num_blanks_pending, num_blanks_to_yield = 0, 0 1454*cda5da8dSAndroid Build Coastguard Worker while True: 1455*cda5da8dSAndroid Build Coastguard Worker # Load up next 4 lines so we can look ahead, create strings which 1456*cda5da8dSAndroid Build Coastguard Worker # are a concatenation of the first character of each of the 4 lines 1457*cda5da8dSAndroid Build Coastguard Worker # so we can do some very readable comparisons. 1458*cda5da8dSAndroid Build Coastguard Worker while len(lines) < 4: 1459*cda5da8dSAndroid Build Coastguard Worker lines.append(next(diff_lines_iterator, 'X')) 1460*cda5da8dSAndroid Build Coastguard Worker s = ''.join([line[0] for line in lines]) 1461*cda5da8dSAndroid Build Coastguard Worker if s.startswith('X'): 1462*cda5da8dSAndroid Build Coastguard Worker # When no more lines, pump out any remaining blank lines so the 1463*cda5da8dSAndroid Build Coastguard Worker # corresponding add/delete lines get a matching blank line so 1464*cda5da8dSAndroid Build Coastguard Worker # all line pairs get yielded at the next level. 1465*cda5da8dSAndroid Build Coastguard Worker num_blanks_to_yield = num_blanks_pending 1466*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('-?+?'): 1467*cda5da8dSAndroid Build Coastguard Worker # simple intraline change 1468*cda5da8dSAndroid Build Coastguard Worker yield _make_line(lines,'?',0), _make_line(lines,'?',1), True 1469*cda5da8dSAndroid Build Coastguard Worker continue 1470*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('--++'): 1471*cda5da8dSAndroid Build Coastguard Worker # in delete block, add block coming: we do NOT want to get 1472*cda5da8dSAndroid Build Coastguard Worker # caught up on blank lines yet, just process the delete line 1473*cda5da8dSAndroid Build Coastguard Worker num_blanks_pending -= 1 1474*cda5da8dSAndroid Build Coastguard Worker yield _make_line(lines,'-',0), None, True 1475*cda5da8dSAndroid Build Coastguard Worker continue 1476*cda5da8dSAndroid Build Coastguard Worker elif s.startswith(('--?+', '--+', '- ')): 1477*cda5da8dSAndroid Build Coastguard Worker # in delete block and see an intraline change or unchanged line 1478*cda5da8dSAndroid Build Coastguard Worker # coming: yield the delete line and then blanks 1479*cda5da8dSAndroid Build Coastguard Worker from_line,to_line = _make_line(lines,'-',0), None 1480*cda5da8dSAndroid Build Coastguard Worker num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0 1481*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('-+?'): 1482*cda5da8dSAndroid Build Coastguard Worker # intraline change 1483*cda5da8dSAndroid Build Coastguard Worker yield _make_line(lines,None,0), _make_line(lines,'?',1), True 1484*cda5da8dSAndroid Build Coastguard Worker continue 1485*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('-?+'): 1486*cda5da8dSAndroid Build Coastguard Worker # intraline change 1487*cda5da8dSAndroid Build Coastguard Worker yield _make_line(lines,'?',0), _make_line(lines,None,1), True 1488*cda5da8dSAndroid Build Coastguard Worker continue 1489*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('-'): 1490*cda5da8dSAndroid Build Coastguard Worker # delete FROM line 1491*cda5da8dSAndroid Build Coastguard Worker num_blanks_pending -= 1 1492*cda5da8dSAndroid Build Coastguard Worker yield _make_line(lines,'-',0), None, True 1493*cda5da8dSAndroid Build Coastguard Worker continue 1494*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('+--'): 1495*cda5da8dSAndroid Build Coastguard Worker # in add block, delete block coming: we do NOT want to get 1496*cda5da8dSAndroid Build Coastguard Worker # caught up on blank lines yet, just process the add line 1497*cda5da8dSAndroid Build Coastguard Worker num_blanks_pending += 1 1498*cda5da8dSAndroid Build Coastguard Worker yield None, _make_line(lines,'+',1), True 1499*cda5da8dSAndroid Build Coastguard Worker continue 1500*cda5da8dSAndroid Build Coastguard Worker elif s.startswith(('+ ', '+-')): 1501*cda5da8dSAndroid Build Coastguard Worker # will be leaving an add block: yield blanks then add line 1502*cda5da8dSAndroid Build Coastguard Worker from_line, to_line = None, _make_line(lines,'+',1) 1503*cda5da8dSAndroid Build Coastguard Worker num_blanks_to_yield,num_blanks_pending = num_blanks_pending+1,0 1504*cda5da8dSAndroid Build Coastguard Worker elif s.startswith('+'): 1505*cda5da8dSAndroid Build Coastguard Worker # inside an add block, yield the add line 1506*cda5da8dSAndroid Build Coastguard Worker num_blanks_pending += 1 1507*cda5da8dSAndroid Build Coastguard Worker yield None, _make_line(lines,'+',1), True 1508*cda5da8dSAndroid Build Coastguard Worker continue 1509*cda5da8dSAndroid Build Coastguard Worker elif s.startswith(' '): 1510*cda5da8dSAndroid Build Coastguard Worker # unchanged text, yield it to both sides 1511*cda5da8dSAndroid Build Coastguard Worker yield _make_line(lines[:],None,0),_make_line(lines,None,1),False 1512*cda5da8dSAndroid Build Coastguard Worker continue 1513*cda5da8dSAndroid Build Coastguard Worker # Catch up on the blank lines so when we yield the next from/to 1514*cda5da8dSAndroid Build Coastguard Worker # pair, they are lined up. 1515*cda5da8dSAndroid Build Coastguard Worker while(num_blanks_to_yield < 0): 1516*cda5da8dSAndroid Build Coastguard Worker num_blanks_to_yield += 1 1517*cda5da8dSAndroid Build Coastguard Worker yield None,('','\n'),True 1518*cda5da8dSAndroid Build Coastguard Worker while(num_blanks_to_yield > 0): 1519*cda5da8dSAndroid Build Coastguard Worker num_blanks_to_yield -= 1 1520*cda5da8dSAndroid Build Coastguard Worker yield ('','\n'),None,True 1521*cda5da8dSAndroid Build Coastguard Worker if s.startswith('X'): 1522*cda5da8dSAndroid Build Coastguard Worker return 1523*cda5da8dSAndroid Build Coastguard Worker else: 1524*cda5da8dSAndroid Build Coastguard Worker yield from_line,to_line,True 1525*cda5da8dSAndroid Build Coastguard Worker 1526*cda5da8dSAndroid Build Coastguard Worker def _line_pair_iterator(): 1527*cda5da8dSAndroid Build Coastguard Worker """Yields from/to lines of text with a change indication. 1528*cda5da8dSAndroid Build Coastguard Worker 1529*cda5da8dSAndroid Build Coastguard Worker This function is an iterator. It itself pulls lines from the line 1530*cda5da8dSAndroid Build Coastguard Worker iterator. Its difference from that iterator is that this function 1531*cda5da8dSAndroid Build Coastguard Worker always yields a pair of from/to text lines (with the change 1532*cda5da8dSAndroid Build Coastguard Worker indication). If necessary it will collect single from/to lines 1533*cda5da8dSAndroid Build Coastguard Worker until it has a matching pair from/to pair to yield. 1534*cda5da8dSAndroid Build Coastguard Worker 1535*cda5da8dSAndroid Build Coastguard Worker Note, this function is purposefully not defined at the module scope so 1536*cda5da8dSAndroid Build Coastguard Worker that data it needs from its parent function (within whose context it 1537*cda5da8dSAndroid Build Coastguard Worker is defined) does not need to be of module scope. 1538*cda5da8dSAndroid Build Coastguard Worker """ 1539*cda5da8dSAndroid Build Coastguard Worker line_iterator = _line_iterator() 1540*cda5da8dSAndroid Build Coastguard Worker fromlines,tolines=[],[] 1541*cda5da8dSAndroid Build Coastguard Worker while True: 1542*cda5da8dSAndroid Build Coastguard Worker # Collecting lines of text until we have a from/to pair 1543*cda5da8dSAndroid Build Coastguard Worker while (len(fromlines)==0 or len(tolines)==0): 1544*cda5da8dSAndroid Build Coastguard Worker try: 1545*cda5da8dSAndroid Build Coastguard Worker from_line, to_line, found_diff = next(line_iterator) 1546*cda5da8dSAndroid Build Coastguard Worker except StopIteration: 1547*cda5da8dSAndroid Build Coastguard Worker return 1548*cda5da8dSAndroid Build Coastguard Worker if from_line is not None: 1549*cda5da8dSAndroid Build Coastguard Worker fromlines.append((from_line,found_diff)) 1550*cda5da8dSAndroid Build Coastguard Worker if to_line is not None: 1551*cda5da8dSAndroid Build Coastguard Worker tolines.append((to_line,found_diff)) 1552*cda5da8dSAndroid Build Coastguard Worker # Once we have a pair, remove them from the collection and yield it 1553*cda5da8dSAndroid Build Coastguard Worker from_line, fromDiff = fromlines.pop(0) 1554*cda5da8dSAndroid Build Coastguard Worker to_line, to_diff = tolines.pop(0) 1555*cda5da8dSAndroid Build Coastguard Worker yield (from_line,to_line,fromDiff or to_diff) 1556*cda5da8dSAndroid Build Coastguard Worker 1557*cda5da8dSAndroid Build Coastguard Worker # Handle case where user does not want context differencing, just yield 1558*cda5da8dSAndroid Build Coastguard Worker # them up without doing anything else with them. 1559*cda5da8dSAndroid Build Coastguard Worker line_pair_iterator = _line_pair_iterator() 1560*cda5da8dSAndroid Build Coastguard Worker if context is None: 1561*cda5da8dSAndroid Build Coastguard Worker yield from line_pair_iterator 1562*cda5da8dSAndroid Build Coastguard Worker # Handle case where user wants context differencing. We must do some 1563*cda5da8dSAndroid Build Coastguard Worker # storage of lines until we know for sure that they are to be yielded. 1564*cda5da8dSAndroid Build Coastguard Worker else: 1565*cda5da8dSAndroid Build Coastguard Worker context += 1 1566*cda5da8dSAndroid Build Coastguard Worker lines_to_write = 0 1567*cda5da8dSAndroid Build Coastguard Worker while True: 1568*cda5da8dSAndroid Build Coastguard Worker # Store lines up until we find a difference, note use of a 1569*cda5da8dSAndroid Build Coastguard Worker # circular queue because we only need to keep around what 1570*cda5da8dSAndroid Build Coastguard Worker # we need for context. 1571*cda5da8dSAndroid Build Coastguard Worker index, contextLines = 0, [None]*(context) 1572*cda5da8dSAndroid Build Coastguard Worker found_diff = False 1573*cda5da8dSAndroid Build Coastguard Worker while(found_diff is False): 1574*cda5da8dSAndroid Build Coastguard Worker try: 1575*cda5da8dSAndroid Build Coastguard Worker from_line, to_line, found_diff = next(line_pair_iterator) 1576*cda5da8dSAndroid Build Coastguard Worker except StopIteration: 1577*cda5da8dSAndroid Build Coastguard Worker return 1578*cda5da8dSAndroid Build Coastguard Worker i = index % context 1579*cda5da8dSAndroid Build Coastguard Worker contextLines[i] = (from_line, to_line, found_diff) 1580*cda5da8dSAndroid Build Coastguard Worker index += 1 1581*cda5da8dSAndroid Build Coastguard Worker # Yield lines that we have collected so far, but first yield 1582*cda5da8dSAndroid Build Coastguard Worker # the user's separator. 1583*cda5da8dSAndroid Build Coastguard Worker if index > context: 1584*cda5da8dSAndroid Build Coastguard Worker yield None, None, None 1585*cda5da8dSAndroid Build Coastguard Worker lines_to_write = context 1586*cda5da8dSAndroid Build Coastguard Worker else: 1587*cda5da8dSAndroid Build Coastguard Worker lines_to_write = index 1588*cda5da8dSAndroid Build Coastguard Worker index = 0 1589*cda5da8dSAndroid Build Coastguard Worker while(lines_to_write): 1590*cda5da8dSAndroid Build Coastguard Worker i = index % context 1591*cda5da8dSAndroid Build Coastguard Worker index += 1 1592*cda5da8dSAndroid Build Coastguard Worker yield contextLines[i] 1593*cda5da8dSAndroid Build Coastguard Worker lines_to_write -= 1 1594*cda5da8dSAndroid Build Coastguard Worker # Now yield the context lines after the change 1595*cda5da8dSAndroid Build Coastguard Worker lines_to_write = context-1 1596*cda5da8dSAndroid Build Coastguard Worker try: 1597*cda5da8dSAndroid Build Coastguard Worker while(lines_to_write): 1598*cda5da8dSAndroid Build Coastguard Worker from_line, to_line, found_diff = next(line_pair_iterator) 1599*cda5da8dSAndroid Build Coastguard Worker # If another change within the context, extend the context 1600*cda5da8dSAndroid Build Coastguard Worker if found_diff: 1601*cda5da8dSAndroid Build Coastguard Worker lines_to_write = context-1 1602*cda5da8dSAndroid Build Coastguard Worker else: 1603*cda5da8dSAndroid Build Coastguard Worker lines_to_write -= 1 1604*cda5da8dSAndroid Build Coastguard Worker yield from_line, to_line, found_diff 1605*cda5da8dSAndroid Build Coastguard Worker except StopIteration: 1606*cda5da8dSAndroid Build Coastguard Worker # Catch exception from next() and return normally 1607*cda5da8dSAndroid Build Coastguard Worker return 1608*cda5da8dSAndroid Build Coastguard Worker 1609*cda5da8dSAndroid Build Coastguard Worker 1610*cda5da8dSAndroid Build Coastguard Worker_file_template = """ 1611*cda5da8dSAndroid Build Coastguard Worker<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 1612*cda5da8dSAndroid Build Coastguard Worker "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 1613*cda5da8dSAndroid Build Coastguard Worker 1614*cda5da8dSAndroid Build Coastguard Worker<html> 1615*cda5da8dSAndroid Build Coastguard Worker 1616*cda5da8dSAndroid Build Coastguard Worker<head> 1617*cda5da8dSAndroid Build Coastguard Worker <meta http-equiv="Content-Type" 1618*cda5da8dSAndroid Build Coastguard Worker content="text/html; charset=%(charset)s" /> 1619*cda5da8dSAndroid Build Coastguard Worker <title></title> 1620*cda5da8dSAndroid Build Coastguard Worker <style type="text/css">%(styles)s 1621*cda5da8dSAndroid Build Coastguard Worker </style> 1622*cda5da8dSAndroid Build Coastguard Worker</head> 1623*cda5da8dSAndroid Build Coastguard Worker 1624*cda5da8dSAndroid Build Coastguard Worker<body> 1625*cda5da8dSAndroid Build Coastguard Worker %(table)s%(legend)s 1626*cda5da8dSAndroid Build Coastguard Worker</body> 1627*cda5da8dSAndroid Build Coastguard Worker 1628*cda5da8dSAndroid Build Coastguard Worker</html>""" 1629*cda5da8dSAndroid Build Coastguard Worker 1630*cda5da8dSAndroid Build Coastguard Worker_styles = """ 1631*cda5da8dSAndroid Build Coastguard Worker table.diff {font-family:Courier; border:medium;} 1632*cda5da8dSAndroid Build Coastguard Worker .diff_header {background-color:#e0e0e0} 1633*cda5da8dSAndroid Build Coastguard Worker td.diff_header {text-align:right} 1634*cda5da8dSAndroid Build Coastguard Worker .diff_next {background-color:#c0c0c0} 1635*cda5da8dSAndroid Build Coastguard Worker .diff_add {background-color:#aaffaa} 1636*cda5da8dSAndroid Build Coastguard Worker .diff_chg {background-color:#ffff77} 1637*cda5da8dSAndroid Build Coastguard Worker .diff_sub {background-color:#ffaaaa}""" 1638*cda5da8dSAndroid Build Coastguard Worker 1639*cda5da8dSAndroid Build Coastguard Worker_table_template = """ 1640*cda5da8dSAndroid Build Coastguard Worker <table class="diff" id="difflib_chg_%(prefix)s_top" 1641*cda5da8dSAndroid Build Coastguard Worker cellspacing="0" cellpadding="0" rules="groups" > 1642*cda5da8dSAndroid Build Coastguard Worker <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> 1643*cda5da8dSAndroid Build Coastguard Worker <colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup> 1644*cda5da8dSAndroid Build Coastguard Worker %(header_row)s 1645*cda5da8dSAndroid Build Coastguard Worker <tbody> 1646*cda5da8dSAndroid Build Coastguard Worker%(data_rows)s </tbody> 1647*cda5da8dSAndroid Build Coastguard Worker </table>""" 1648*cda5da8dSAndroid Build Coastguard Worker 1649*cda5da8dSAndroid Build Coastguard Worker_legend = """ 1650*cda5da8dSAndroid Build Coastguard Worker <table class="diff" summary="Legends"> 1651*cda5da8dSAndroid Build Coastguard Worker <tr> <th colspan="2"> Legends </th> </tr> 1652*cda5da8dSAndroid Build Coastguard Worker <tr> <td> <table border="" summary="Colors"> 1653*cda5da8dSAndroid Build Coastguard Worker <tr><th> Colors </th> </tr> 1654*cda5da8dSAndroid Build Coastguard Worker <tr><td class="diff_add"> Added </td></tr> 1655*cda5da8dSAndroid Build Coastguard Worker <tr><td class="diff_chg">Changed</td> </tr> 1656*cda5da8dSAndroid Build Coastguard Worker <tr><td class="diff_sub">Deleted</td> </tr> 1657*cda5da8dSAndroid Build Coastguard Worker </table></td> 1658*cda5da8dSAndroid Build Coastguard Worker <td> <table border="" summary="Links"> 1659*cda5da8dSAndroid Build Coastguard Worker <tr><th colspan="2"> Links </th> </tr> 1660*cda5da8dSAndroid Build Coastguard Worker <tr><td>(f)irst change</td> </tr> 1661*cda5da8dSAndroid Build Coastguard Worker <tr><td>(n)ext change</td> </tr> 1662*cda5da8dSAndroid Build Coastguard Worker <tr><td>(t)op</td> </tr> 1663*cda5da8dSAndroid Build Coastguard Worker </table></td> </tr> 1664*cda5da8dSAndroid Build Coastguard Worker </table>""" 1665*cda5da8dSAndroid Build Coastguard Worker 1666*cda5da8dSAndroid Build Coastguard Workerclass HtmlDiff(object): 1667*cda5da8dSAndroid Build Coastguard Worker """For producing HTML side by side comparison with change highlights. 1668*cda5da8dSAndroid Build Coastguard Worker 1669*cda5da8dSAndroid Build Coastguard Worker This class can be used to create an HTML table (or a complete HTML file 1670*cda5da8dSAndroid Build Coastguard Worker containing the table) showing a side by side, line by line comparison 1671*cda5da8dSAndroid Build Coastguard Worker of text with inter-line and intra-line change highlights. The table can 1672*cda5da8dSAndroid Build Coastguard Worker be generated in either full or contextual difference mode. 1673*cda5da8dSAndroid Build Coastguard Worker 1674*cda5da8dSAndroid Build Coastguard Worker The following methods are provided for HTML generation: 1675*cda5da8dSAndroid Build Coastguard Worker 1676*cda5da8dSAndroid Build Coastguard Worker make_table -- generates HTML for a single side by side table 1677*cda5da8dSAndroid Build Coastguard Worker make_file -- generates complete HTML file with a single side by side table 1678*cda5da8dSAndroid Build Coastguard Worker 1679*cda5da8dSAndroid Build Coastguard Worker See tools/scripts/diff.py for an example usage of this class. 1680*cda5da8dSAndroid Build Coastguard Worker """ 1681*cda5da8dSAndroid Build Coastguard Worker 1682*cda5da8dSAndroid Build Coastguard Worker _file_template = _file_template 1683*cda5da8dSAndroid Build Coastguard Worker _styles = _styles 1684*cda5da8dSAndroid Build Coastguard Worker _table_template = _table_template 1685*cda5da8dSAndroid Build Coastguard Worker _legend = _legend 1686*cda5da8dSAndroid Build Coastguard Worker _default_prefix = 0 1687*cda5da8dSAndroid Build Coastguard Worker 1688*cda5da8dSAndroid Build Coastguard Worker def __init__(self,tabsize=8,wrapcolumn=None,linejunk=None, 1689*cda5da8dSAndroid Build Coastguard Worker charjunk=IS_CHARACTER_JUNK): 1690*cda5da8dSAndroid Build Coastguard Worker """HtmlDiff instance initializer 1691*cda5da8dSAndroid Build Coastguard Worker 1692*cda5da8dSAndroid Build Coastguard Worker Arguments: 1693*cda5da8dSAndroid Build Coastguard Worker tabsize -- tab stop spacing, defaults to 8. 1694*cda5da8dSAndroid Build Coastguard Worker wrapcolumn -- column number where lines are broken and wrapped, 1695*cda5da8dSAndroid Build Coastguard Worker defaults to None where lines are not wrapped. 1696*cda5da8dSAndroid Build Coastguard Worker linejunk,charjunk -- keyword arguments passed into ndiff() (used by 1697*cda5da8dSAndroid Build Coastguard Worker HtmlDiff() to generate the side by side HTML differences). See 1698*cda5da8dSAndroid Build Coastguard Worker ndiff() documentation for argument default values and descriptions. 1699*cda5da8dSAndroid Build Coastguard Worker """ 1700*cda5da8dSAndroid Build Coastguard Worker self._tabsize = tabsize 1701*cda5da8dSAndroid Build Coastguard Worker self._wrapcolumn = wrapcolumn 1702*cda5da8dSAndroid Build Coastguard Worker self._linejunk = linejunk 1703*cda5da8dSAndroid Build Coastguard Worker self._charjunk = charjunk 1704*cda5da8dSAndroid Build Coastguard Worker 1705*cda5da8dSAndroid Build Coastguard Worker def make_file(self, fromlines, tolines, fromdesc='', todesc='', 1706*cda5da8dSAndroid Build Coastguard Worker context=False, numlines=5, *, charset='utf-8'): 1707*cda5da8dSAndroid Build Coastguard Worker """Returns HTML file of side by side comparison with change highlights 1708*cda5da8dSAndroid Build Coastguard Worker 1709*cda5da8dSAndroid Build Coastguard Worker Arguments: 1710*cda5da8dSAndroid Build Coastguard Worker fromlines -- list of "from" lines 1711*cda5da8dSAndroid Build Coastguard Worker tolines -- list of "to" lines 1712*cda5da8dSAndroid Build Coastguard Worker fromdesc -- "from" file column header string 1713*cda5da8dSAndroid Build Coastguard Worker todesc -- "to" file column header string 1714*cda5da8dSAndroid Build Coastguard Worker context -- set to True for contextual differences (defaults to False 1715*cda5da8dSAndroid Build Coastguard Worker which shows full differences). 1716*cda5da8dSAndroid Build Coastguard Worker numlines -- number of context lines. When context is set True, 1717*cda5da8dSAndroid Build Coastguard Worker controls number of lines displayed before and after the change. 1718*cda5da8dSAndroid Build Coastguard Worker When context is False, controls the number of lines to place 1719*cda5da8dSAndroid Build Coastguard Worker the "next" link anchors before the next change (so click of 1720*cda5da8dSAndroid Build Coastguard Worker "next" link jumps to just before the change). 1721*cda5da8dSAndroid Build Coastguard Worker charset -- charset of the HTML document 1722*cda5da8dSAndroid Build Coastguard Worker """ 1723*cda5da8dSAndroid Build Coastguard Worker 1724*cda5da8dSAndroid Build Coastguard Worker return (self._file_template % dict( 1725*cda5da8dSAndroid Build Coastguard Worker styles=self._styles, 1726*cda5da8dSAndroid Build Coastguard Worker legend=self._legend, 1727*cda5da8dSAndroid Build Coastguard Worker table=self.make_table(fromlines, tolines, fromdesc, todesc, 1728*cda5da8dSAndroid Build Coastguard Worker context=context, numlines=numlines), 1729*cda5da8dSAndroid Build Coastguard Worker charset=charset 1730*cda5da8dSAndroid Build Coastguard Worker )).encode(charset, 'xmlcharrefreplace').decode(charset) 1731*cda5da8dSAndroid Build Coastguard Worker 1732*cda5da8dSAndroid Build Coastguard Worker def _tab_newline_replace(self,fromlines,tolines): 1733*cda5da8dSAndroid Build Coastguard Worker """Returns from/to line lists with tabs expanded and newlines removed. 1734*cda5da8dSAndroid Build Coastguard Worker 1735*cda5da8dSAndroid Build Coastguard Worker Instead of tab characters being replaced by the number of spaces 1736*cda5da8dSAndroid Build Coastguard Worker needed to fill in to the next tab stop, this function will fill 1737*cda5da8dSAndroid Build Coastguard Worker the space with tab characters. This is done so that the difference 1738*cda5da8dSAndroid Build Coastguard Worker algorithms can identify changes in a file when tabs are replaced by 1739*cda5da8dSAndroid Build Coastguard Worker spaces and vice versa. At the end of the HTML generation, the tab 1740*cda5da8dSAndroid Build Coastguard Worker characters will be replaced with a nonbreakable space. 1741*cda5da8dSAndroid Build Coastguard Worker """ 1742*cda5da8dSAndroid Build Coastguard Worker def expand_tabs(line): 1743*cda5da8dSAndroid Build Coastguard Worker # hide real spaces 1744*cda5da8dSAndroid Build Coastguard Worker line = line.replace(' ','\0') 1745*cda5da8dSAndroid Build Coastguard Worker # expand tabs into spaces 1746*cda5da8dSAndroid Build Coastguard Worker line = line.expandtabs(self._tabsize) 1747*cda5da8dSAndroid Build Coastguard Worker # replace spaces from expanded tabs back into tab characters 1748*cda5da8dSAndroid Build Coastguard Worker # (we'll replace them with markup after we do differencing) 1749*cda5da8dSAndroid Build Coastguard Worker line = line.replace(' ','\t') 1750*cda5da8dSAndroid Build Coastguard Worker return line.replace('\0',' ').rstrip('\n') 1751*cda5da8dSAndroid Build Coastguard Worker fromlines = [expand_tabs(line) for line in fromlines] 1752*cda5da8dSAndroid Build Coastguard Worker tolines = [expand_tabs(line) for line in tolines] 1753*cda5da8dSAndroid Build Coastguard Worker return fromlines,tolines 1754*cda5da8dSAndroid Build Coastguard Worker 1755*cda5da8dSAndroid Build Coastguard Worker def _split_line(self,data_list,line_num,text): 1756*cda5da8dSAndroid Build Coastguard Worker """Builds list of text lines by splitting text lines at wrap point 1757*cda5da8dSAndroid Build Coastguard Worker 1758*cda5da8dSAndroid Build Coastguard Worker This function will determine if the input text line needs to be 1759*cda5da8dSAndroid Build Coastguard Worker wrapped (split) into separate lines. If so, the first wrap point 1760*cda5da8dSAndroid Build Coastguard Worker will be determined and the first line appended to the output 1761*cda5da8dSAndroid Build Coastguard Worker text line list. This function is used recursively to handle 1762*cda5da8dSAndroid Build Coastguard Worker the second part of the split line to further split it. 1763*cda5da8dSAndroid Build Coastguard Worker """ 1764*cda5da8dSAndroid Build Coastguard Worker # if blank line or context separator, just add it to the output list 1765*cda5da8dSAndroid Build Coastguard Worker if not line_num: 1766*cda5da8dSAndroid Build Coastguard Worker data_list.append((line_num,text)) 1767*cda5da8dSAndroid Build Coastguard Worker return 1768*cda5da8dSAndroid Build Coastguard Worker 1769*cda5da8dSAndroid Build Coastguard Worker # if line text doesn't need wrapping, just add it to the output list 1770*cda5da8dSAndroid Build Coastguard Worker size = len(text) 1771*cda5da8dSAndroid Build Coastguard Worker max = self._wrapcolumn 1772*cda5da8dSAndroid Build Coastguard Worker if (size <= max) or ((size -(text.count('\0')*3)) <= max): 1773*cda5da8dSAndroid Build Coastguard Worker data_list.append((line_num,text)) 1774*cda5da8dSAndroid Build Coastguard Worker return 1775*cda5da8dSAndroid Build Coastguard Worker 1776*cda5da8dSAndroid Build Coastguard Worker # scan text looking for the wrap point, keeping track if the wrap 1777*cda5da8dSAndroid Build Coastguard Worker # point is inside markers 1778*cda5da8dSAndroid Build Coastguard Worker i = 0 1779*cda5da8dSAndroid Build Coastguard Worker n = 0 1780*cda5da8dSAndroid Build Coastguard Worker mark = '' 1781*cda5da8dSAndroid Build Coastguard Worker while n < max and i < size: 1782*cda5da8dSAndroid Build Coastguard Worker if text[i] == '\0': 1783*cda5da8dSAndroid Build Coastguard Worker i += 1 1784*cda5da8dSAndroid Build Coastguard Worker mark = text[i] 1785*cda5da8dSAndroid Build Coastguard Worker i += 1 1786*cda5da8dSAndroid Build Coastguard Worker elif text[i] == '\1': 1787*cda5da8dSAndroid Build Coastguard Worker i += 1 1788*cda5da8dSAndroid Build Coastguard Worker mark = '' 1789*cda5da8dSAndroid Build Coastguard Worker else: 1790*cda5da8dSAndroid Build Coastguard Worker i += 1 1791*cda5da8dSAndroid Build Coastguard Worker n += 1 1792*cda5da8dSAndroid Build Coastguard Worker 1793*cda5da8dSAndroid Build Coastguard Worker # wrap point is inside text, break it up into separate lines 1794*cda5da8dSAndroid Build Coastguard Worker line1 = text[:i] 1795*cda5da8dSAndroid Build Coastguard Worker line2 = text[i:] 1796*cda5da8dSAndroid Build Coastguard Worker 1797*cda5da8dSAndroid Build Coastguard Worker # if wrap point is inside markers, place end marker at end of first 1798*cda5da8dSAndroid Build Coastguard Worker # line and start marker at beginning of second line because each 1799*cda5da8dSAndroid Build Coastguard Worker # line will have its own table tag markup around it. 1800*cda5da8dSAndroid Build Coastguard Worker if mark: 1801*cda5da8dSAndroid Build Coastguard Worker line1 = line1 + '\1' 1802*cda5da8dSAndroid Build Coastguard Worker line2 = '\0' + mark + line2 1803*cda5da8dSAndroid Build Coastguard Worker 1804*cda5da8dSAndroid Build Coastguard Worker # tack on first line onto the output list 1805*cda5da8dSAndroid Build Coastguard Worker data_list.append((line_num,line1)) 1806*cda5da8dSAndroid Build Coastguard Worker 1807*cda5da8dSAndroid Build Coastguard Worker # use this routine again to wrap the remaining text 1808*cda5da8dSAndroid Build Coastguard Worker self._split_line(data_list,'>',line2) 1809*cda5da8dSAndroid Build Coastguard Worker 1810*cda5da8dSAndroid Build Coastguard Worker def _line_wrapper(self,diffs): 1811*cda5da8dSAndroid Build Coastguard Worker """Returns iterator that splits (wraps) mdiff text lines""" 1812*cda5da8dSAndroid Build Coastguard Worker 1813*cda5da8dSAndroid Build Coastguard Worker # pull from/to data and flags from mdiff iterator 1814*cda5da8dSAndroid Build Coastguard Worker for fromdata,todata,flag in diffs: 1815*cda5da8dSAndroid Build Coastguard Worker # check for context separators and pass them through 1816*cda5da8dSAndroid Build Coastguard Worker if flag is None: 1817*cda5da8dSAndroid Build Coastguard Worker yield fromdata,todata,flag 1818*cda5da8dSAndroid Build Coastguard Worker continue 1819*cda5da8dSAndroid Build Coastguard Worker (fromline,fromtext),(toline,totext) = fromdata,todata 1820*cda5da8dSAndroid Build Coastguard Worker # for each from/to line split it at the wrap column to form 1821*cda5da8dSAndroid Build Coastguard Worker # list of text lines. 1822*cda5da8dSAndroid Build Coastguard Worker fromlist,tolist = [],[] 1823*cda5da8dSAndroid Build Coastguard Worker self._split_line(fromlist,fromline,fromtext) 1824*cda5da8dSAndroid Build Coastguard Worker self._split_line(tolist,toline,totext) 1825*cda5da8dSAndroid Build Coastguard Worker # yield from/to line in pairs inserting blank lines as 1826*cda5da8dSAndroid Build Coastguard Worker # necessary when one side has more wrapped lines 1827*cda5da8dSAndroid Build Coastguard Worker while fromlist or tolist: 1828*cda5da8dSAndroid Build Coastguard Worker if fromlist: 1829*cda5da8dSAndroid Build Coastguard Worker fromdata = fromlist.pop(0) 1830*cda5da8dSAndroid Build Coastguard Worker else: 1831*cda5da8dSAndroid Build Coastguard Worker fromdata = ('',' ') 1832*cda5da8dSAndroid Build Coastguard Worker if tolist: 1833*cda5da8dSAndroid Build Coastguard Worker todata = tolist.pop(0) 1834*cda5da8dSAndroid Build Coastguard Worker else: 1835*cda5da8dSAndroid Build Coastguard Worker todata = ('',' ') 1836*cda5da8dSAndroid Build Coastguard Worker yield fromdata,todata,flag 1837*cda5da8dSAndroid Build Coastguard Worker 1838*cda5da8dSAndroid Build Coastguard Worker def _collect_lines(self,diffs): 1839*cda5da8dSAndroid Build Coastguard Worker """Collects mdiff output into separate lists 1840*cda5da8dSAndroid Build Coastguard Worker 1841*cda5da8dSAndroid Build Coastguard Worker Before storing the mdiff from/to data into a list, it is converted 1842*cda5da8dSAndroid Build Coastguard Worker into a single line of text with HTML markup. 1843*cda5da8dSAndroid Build Coastguard Worker """ 1844*cda5da8dSAndroid Build Coastguard Worker 1845*cda5da8dSAndroid Build Coastguard Worker fromlist,tolist,flaglist = [],[],[] 1846*cda5da8dSAndroid Build Coastguard Worker # pull from/to data and flags from mdiff style iterator 1847*cda5da8dSAndroid Build Coastguard Worker for fromdata,todata,flag in diffs: 1848*cda5da8dSAndroid Build Coastguard Worker try: 1849*cda5da8dSAndroid Build Coastguard Worker # store HTML markup of the lines into the lists 1850*cda5da8dSAndroid Build Coastguard Worker fromlist.append(self._format_line(0,flag,*fromdata)) 1851*cda5da8dSAndroid Build Coastguard Worker tolist.append(self._format_line(1,flag,*todata)) 1852*cda5da8dSAndroid Build Coastguard Worker except TypeError: 1853*cda5da8dSAndroid Build Coastguard Worker # exceptions occur for lines where context separators go 1854*cda5da8dSAndroid Build Coastguard Worker fromlist.append(None) 1855*cda5da8dSAndroid Build Coastguard Worker tolist.append(None) 1856*cda5da8dSAndroid Build Coastguard Worker flaglist.append(flag) 1857*cda5da8dSAndroid Build Coastguard Worker return fromlist,tolist,flaglist 1858*cda5da8dSAndroid Build Coastguard Worker 1859*cda5da8dSAndroid Build Coastguard Worker def _format_line(self,side,flag,linenum,text): 1860*cda5da8dSAndroid Build Coastguard Worker """Returns HTML markup of "from" / "to" text lines 1861*cda5da8dSAndroid Build Coastguard Worker 1862*cda5da8dSAndroid Build Coastguard Worker side -- 0 or 1 indicating "from" or "to" text 1863*cda5da8dSAndroid Build Coastguard Worker flag -- indicates if difference on line 1864*cda5da8dSAndroid Build Coastguard Worker linenum -- line number (used for line number column) 1865*cda5da8dSAndroid Build Coastguard Worker text -- line text to be marked up 1866*cda5da8dSAndroid Build Coastguard Worker """ 1867*cda5da8dSAndroid Build Coastguard Worker try: 1868*cda5da8dSAndroid Build Coastguard Worker linenum = '%d' % linenum 1869*cda5da8dSAndroid Build Coastguard Worker id = ' id="%s%s"' % (self._prefix[side],linenum) 1870*cda5da8dSAndroid Build Coastguard Worker except TypeError: 1871*cda5da8dSAndroid Build Coastguard Worker # handle blank lines where linenum is '>' or '' 1872*cda5da8dSAndroid Build Coastguard Worker id = '' 1873*cda5da8dSAndroid Build Coastguard Worker # replace those things that would get confused with HTML symbols 1874*cda5da8dSAndroid Build Coastguard Worker text=text.replace("&","&").replace(">",">").replace("<","<") 1875*cda5da8dSAndroid Build Coastguard Worker 1876*cda5da8dSAndroid Build Coastguard Worker # make space non-breakable so they don't get compressed or line wrapped 1877*cda5da8dSAndroid Build Coastguard Worker text = text.replace(' ',' ').rstrip() 1878*cda5da8dSAndroid Build Coastguard Worker 1879*cda5da8dSAndroid Build Coastguard Worker return '<td class="diff_header"%s>%s</td><td nowrap="nowrap">%s</td>' \ 1880*cda5da8dSAndroid Build Coastguard Worker % (id,linenum,text) 1881*cda5da8dSAndroid Build Coastguard Worker 1882*cda5da8dSAndroid Build Coastguard Worker def _make_prefix(self): 1883*cda5da8dSAndroid Build Coastguard Worker """Create unique anchor prefixes""" 1884*cda5da8dSAndroid Build Coastguard Worker 1885*cda5da8dSAndroid Build Coastguard Worker # Generate a unique anchor prefix so multiple tables 1886*cda5da8dSAndroid Build Coastguard Worker # can exist on the same HTML page without conflicts. 1887*cda5da8dSAndroid Build Coastguard Worker fromprefix = "from%d_" % HtmlDiff._default_prefix 1888*cda5da8dSAndroid Build Coastguard Worker toprefix = "to%d_" % HtmlDiff._default_prefix 1889*cda5da8dSAndroid Build Coastguard Worker HtmlDiff._default_prefix += 1 1890*cda5da8dSAndroid Build Coastguard Worker # store prefixes so line format method has access 1891*cda5da8dSAndroid Build Coastguard Worker self._prefix = [fromprefix,toprefix] 1892*cda5da8dSAndroid Build Coastguard Worker 1893*cda5da8dSAndroid Build Coastguard Worker def _convert_flags(self,fromlist,tolist,flaglist,context,numlines): 1894*cda5da8dSAndroid Build Coastguard Worker """Makes list of "next" links""" 1895*cda5da8dSAndroid Build Coastguard Worker 1896*cda5da8dSAndroid Build Coastguard Worker # all anchor names will be generated using the unique "to" prefix 1897*cda5da8dSAndroid Build Coastguard Worker toprefix = self._prefix[1] 1898*cda5da8dSAndroid Build Coastguard Worker 1899*cda5da8dSAndroid Build Coastguard Worker # process change flags, generating middle column of next anchors/links 1900*cda5da8dSAndroid Build Coastguard Worker next_id = ['']*len(flaglist) 1901*cda5da8dSAndroid Build Coastguard Worker next_href = ['']*len(flaglist) 1902*cda5da8dSAndroid Build Coastguard Worker num_chg, in_change = 0, False 1903*cda5da8dSAndroid Build Coastguard Worker last = 0 1904*cda5da8dSAndroid Build Coastguard Worker for i,flag in enumerate(flaglist): 1905*cda5da8dSAndroid Build Coastguard Worker if flag: 1906*cda5da8dSAndroid Build Coastguard Worker if not in_change: 1907*cda5da8dSAndroid Build Coastguard Worker in_change = True 1908*cda5da8dSAndroid Build Coastguard Worker last = i 1909*cda5da8dSAndroid Build Coastguard Worker # at the beginning of a change, drop an anchor a few lines 1910*cda5da8dSAndroid Build Coastguard Worker # (the context lines) before the change for the previous 1911*cda5da8dSAndroid Build Coastguard Worker # link 1912*cda5da8dSAndroid Build Coastguard Worker i = max([0,i-numlines]) 1913*cda5da8dSAndroid Build Coastguard Worker next_id[i] = ' id="difflib_chg_%s_%d"' % (toprefix,num_chg) 1914*cda5da8dSAndroid Build Coastguard Worker # at the beginning of a change, drop a link to the next 1915*cda5da8dSAndroid Build Coastguard Worker # change 1916*cda5da8dSAndroid Build Coastguard Worker num_chg += 1 1917*cda5da8dSAndroid Build Coastguard Worker next_href[last] = '<a href="#difflib_chg_%s_%d">n</a>' % ( 1918*cda5da8dSAndroid Build Coastguard Worker toprefix,num_chg) 1919*cda5da8dSAndroid Build Coastguard Worker else: 1920*cda5da8dSAndroid Build Coastguard Worker in_change = False 1921*cda5da8dSAndroid Build Coastguard Worker # check for cases where there is no content to avoid exceptions 1922*cda5da8dSAndroid Build Coastguard Worker if not flaglist: 1923*cda5da8dSAndroid Build Coastguard Worker flaglist = [False] 1924*cda5da8dSAndroid Build Coastguard Worker next_id = [''] 1925*cda5da8dSAndroid Build Coastguard Worker next_href = [''] 1926*cda5da8dSAndroid Build Coastguard Worker last = 0 1927*cda5da8dSAndroid Build Coastguard Worker if context: 1928*cda5da8dSAndroid Build Coastguard Worker fromlist = ['<td></td><td> No Differences Found </td>'] 1929*cda5da8dSAndroid Build Coastguard Worker tolist = fromlist 1930*cda5da8dSAndroid Build Coastguard Worker else: 1931*cda5da8dSAndroid Build Coastguard Worker fromlist = tolist = ['<td></td><td> Empty File </td>'] 1932*cda5da8dSAndroid Build Coastguard Worker # if not a change on first line, drop a link 1933*cda5da8dSAndroid Build Coastguard Worker if not flaglist[0]: 1934*cda5da8dSAndroid Build Coastguard Worker next_href[0] = '<a href="#difflib_chg_%s_0">f</a>' % toprefix 1935*cda5da8dSAndroid Build Coastguard Worker # redo the last link to link to the top 1936*cda5da8dSAndroid Build Coastguard Worker next_href[last] = '<a href="#difflib_chg_%s_top">t</a>' % (toprefix) 1937*cda5da8dSAndroid Build Coastguard Worker 1938*cda5da8dSAndroid Build Coastguard Worker return fromlist,tolist,flaglist,next_href,next_id 1939*cda5da8dSAndroid Build Coastguard Worker 1940*cda5da8dSAndroid Build Coastguard Worker def make_table(self,fromlines,tolines,fromdesc='',todesc='',context=False, 1941*cda5da8dSAndroid Build Coastguard Worker numlines=5): 1942*cda5da8dSAndroid Build Coastguard Worker """Returns HTML table of side by side comparison with change highlights 1943*cda5da8dSAndroid Build Coastguard Worker 1944*cda5da8dSAndroid Build Coastguard Worker Arguments: 1945*cda5da8dSAndroid Build Coastguard Worker fromlines -- list of "from" lines 1946*cda5da8dSAndroid Build Coastguard Worker tolines -- list of "to" lines 1947*cda5da8dSAndroid Build Coastguard Worker fromdesc -- "from" file column header string 1948*cda5da8dSAndroid Build Coastguard Worker todesc -- "to" file column header string 1949*cda5da8dSAndroid Build Coastguard Worker context -- set to True for contextual differences (defaults to False 1950*cda5da8dSAndroid Build Coastguard Worker which shows full differences). 1951*cda5da8dSAndroid Build Coastguard Worker numlines -- number of context lines. When context is set True, 1952*cda5da8dSAndroid Build Coastguard Worker controls number of lines displayed before and after the change. 1953*cda5da8dSAndroid Build Coastguard Worker When context is False, controls the number of lines to place 1954*cda5da8dSAndroid Build Coastguard Worker the "next" link anchors before the next change (so click of 1955*cda5da8dSAndroid Build Coastguard Worker "next" link jumps to just before the change). 1956*cda5da8dSAndroid Build Coastguard Worker """ 1957*cda5da8dSAndroid Build Coastguard Worker 1958*cda5da8dSAndroid Build Coastguard Worker # make unique anchor prefixes so that multiple tables may exist 1959*cda5da8dSAndroid Build Coastguard Worker # on the same page without conflict. 1960*cda5da8dSAndroid Build Coastguard Worker self._make_prefix() 1961*cda5da8dSAndroid Build Coastguard Worker 1962*cda5da8dSAndroid Build Coastguard Worker # change tabs to spaces before it gets more difficult after we insert 1963*cda5da8dSAndroid Build Coastguard Worker # markup 1964*cda5da8dSAndroid Build Coastguard Worker fromlines,tolines = self._tab_newline_replace(fromlines,tolines) 1965*cda5da8dSAndroid Build Coastguard Worker 1966*cda5da8dSAndroid Build Coastguard Worker # create diffs iterator which generates side by side from/to data 1967*cda5da8dSAndroid Build Coastguard Worker if context: 1968*cda5da8dSAndroid Build Coastguard Worker context_lines = numlines 1969*cda5da8dSAndroid Build Coastguard Worker else: 1970*cda5da8dSAndroid Build Coastguard Worker context_lines = None 1971*cda5da8dSAndroid Build Coastguard Worker diffs = _mdiff(fromlines,tolines,context_lines,linejunk=self._linejunk, 1972*cda5da8dSAndroid Build Coastguard Worker charjunk=self._charjunk) 1973*cda5da8dSAndroid Build Coastguard Worker 1974*cda5da8dSAndroid Build Coastguard Worker # set up iterator to wrap lines that exceed desired width 1975*cda5da8dSAndroid Build Coastguard Worker if self._wrapcolumn: 1976*cda5da8dSAndroid Build Coastguard Worker diffs = self._line_wrapper(diffs) 1977*cda5da8dSAndroid Build Coastguard Worker 1978*cda5da8dSAndroid Build Coastguard Worker # collect up from/to lines and flags into lists (also format the lines) 1979*cda5da8dSAndroid Build Coastguard Worker fromlist,tolist,flaglist = self._collect_lines(diffs) 1980*cda5da8dSAndroid Build Coastguard Worker 1981*cda5da8dSAndroid Build Coastguard Worker # process change flags, generating middle column of next anchors/links 1982*cda5da8dSAndroid Build Coastguard Worker fromlist,tolist,flaglist,next_href,next_id = self._convert_flags( 1983*cda5da8dSAndroid Build Coastguard Worker fromlist,tolist,flaglist,context,numlines) 1984*cda5da8dSAndroid Build Coastguard Worker 1985*cda5da8dSAndroid Build Coastguard Worker s = [] 1986*cda5da8dSAndroid Build Coastguard Worker fmt = ' <tr><td class="diff_next"%s>%s</td>%s' + \ 1987*cda5da8dSAndroid Build Coastguard Worker '<td class="diff_next">%s</td>%s</tr>\n' 1988*cda5da8dSAndroid Build Coastguard Worker for i in range(len(flaglist)): 1989*cda5da8dSAndroid Build Coastguard Worker if flaglist[i] is None: 1990*cda5da8dSAndroid Build Coastguard Worker # mdiff yields None on separator lines skip the bogus ones 1991*cda5da8dSAndroid Build Coastguard Worker # generated for the first line 1992*cda5da8dSAndroid Build Coastguard Worker if i > 0: 1993*cda5da8dSAndroid Build Coastguard Worker s.append(' </tbody> \n <tbody>\n') 1994*cda5da8dSAndroid Build Coastguard Worker else: 1995*cda5da8dSAndroid Build Coastguard Worker s.append( fmt % (next_id[i],next_href[i],fromlist[i], 1996*cda5da8dSAndroid Build Coastguard Worker next_href[i],tolist[i])) 1997*cda5da8dSAndroid Build Coastguard Worker if fromdesc or todesc: 1998*cda5da8dSAndroid Build Coastguard Worker header_row = '<thead><tr>%s%s%s%s</tr></thead>' % ( 1999*cda5da8dSAndroid Build Coastguard Worker '<th class="diff_next"><br /></th>', 2000*cda5da8dSAndroid Build Coastguard Worker '<th colspan="2" class="diff_header">%s</th>' % fromdesc, 2001*cda5da8dSAndroid Build Coastguard Worker '<th class="diff_next"><br /></th>', 2002*cda5da8dSAndroid Build Coastguard Worker '<th colspan="2" class="diff_header">%s</th>' % todesc) 2003*cda5da8dSAndroid Build Coastguard Worker else: 2004*cda5da8dSAndroid Build Coastguard Worker header_row = '' 2005*cda5da8dSAndroid Build Coastguard Worker 2006*cda5da8dSAndroid Build Coastguard Worker table = self._table_template % dict( 2007*cda5da8dSAndroid Build Coastguard Worker data_rows=''.join(s), 2008*cda5da8dSAndroid Build Coastguard Worker header_row=header_row, 2009*cda5da8dSAndroid Build Coastguard Worker prefix=self._prefix[1]) 2010*cda5da8dSAndroid Build Coastguard Worker 2011*cda5da8dSAndroid Build Coastguard Worker return table.replace('\0+','<span class="diff_add">'). \ 2012*cda5da8dSAndroid Build Coastguard Worker replace('\0-','<span class="diff_sub">'). \ 2013*cda5da8dSAndroid Build Coastguard Worker replace('\0^','<span class="diff_chg">'). \ 2014*cda5da8dSAndroid Build Coastguard Worker replace('\1','</span>'). \ 2015*cda5da8dSAndroid Build Coastguard Worker replace('\t',' ') 2016*cda5da8dSAndroid Build Coastguard Worker 2017*cda5da8dSAndroid Build Coastguard Workerdel re 2018*cda5da8dSAndroid Build Coastguard Worker 2019*cda5da8dSAndroid Build Coastguard Workerdef restore(delta, which): 2020*cda5da8dSAndroid Build Coastguard Worker r""" 2021*cda5da8dSAndroid Build Coastguard Worker Generate one of the two sequences that generated a delta. 2022*cda5da8dSAndroid Build Coastguard Worker 2023*cda5da8dSAndroid Build Coastguard Worker Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract 2024*cda5da8dSAndroid Build Coastguard Worker lines originating from file 1 or 2 (parameter `which`), stripping off line 2025*cda5da8dSAndroid Build Coastguard Worker prefixes. 2026*cda5da8dSAndroid Build Coastguard Worker 2027*cda5da8dSAndroid Build Coastguard Worker Examples: 2028*cda5da8dSAndroid Build Coastguard Worker 2029*cda5da8dSAndroid Build Coastguard Worker >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), 2030*cda5da8dSAndroid Build Coastguard Worker ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) 2031*cda5da8dSAndroid Build Coastguard Worker >>> diff = list(diff) 2032*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(restore(diff, 1)), end="") 2033*cda5da8dSAndroid Build Coastguard Worker one 2034*cda5da8dSAndroid Build Coastguard Worker two 2035*cda5da8dSAndroid Build Coastguard Worker three 2036*cda5da8dSAndroid Build Coastguard Worker >>> print(''.join(restore(diff, 2)), end="") 2037*cda5da8dSAndroid Build Coastguard Worker ore 2038*cda5da8dSAndroid Build Coastguard Worker tree 2039*cda5da8dSAndroid Build Coastguard Worker emu 2040*cda5da8dSAndroid Build Coastguard Worker """ 2041*cda5da8dSAndroid Build Coastguard Worker try: 2042*cda5da8dSAndroid Build Coastguard Worker tag = {1: "- ", 2: "+ "}[int(which)] 2043*cda5da8dSAndroid Build Coastguard Worker except KeyError: 2044*cda5da8dSAndroid Build Coastguard Worker raise ValueError('unknown delta choice (must be 1 or 2): %r' 2045*cda5da8dSAndroid Build Coastguard Worker % which) from None 2046*cda5da8dSAndroid Build Coastguard Worker prefixes = (" ", tag) 2047*cda5da8dSAndroid Build Coastguard Worker for line in delta: 2048*cda5da8dSAndroid Build Coastguard Worker if line[:2] in prefixes: 2049*cda5da8dSAndroid Build Coastguard Worker yield line[2:] 2050*cda5da8dSAndroid Build Coastguard Worker 2051*cda5da8dSAndroid Build Coastguard Workerdef _test(): 2052*cda5da8dSAndroid Build Coastguard Worker import doctest, difflib 2053*cda5da8dSAndroid Build Coastguard Worker return doctest.testmod(difflib) 2054*cda5da8dSAndroid Build Coastguard Worker 2055*cda5da8dSAndroid Build Coastguard Workerif __name__ == "__main__": 2056*cda5da8dSAndroid Build Coastguard Worker _test() 2057