1*0e209d39SAndroid Build Coastguard Worker // © 2016 and later: Unicode, Inc. and others. 2*0e209d39SAndroid Build Coastguard Worker // License & terms of use: http://www.unicode.org/copyright.html 3*0e209d39SAndroid Build Coastguard Worker 4*0e209d39SAndroid Build Coastguard Worker // edits.h 5*0e209d39SAndroid Build Coastguard Worker // created: 2016dec30 Markus W. Scherer 6*0e209d39SAndroid Build Coastguard Worker 7*0e209d39SAndroid Build Coastguard Worker #ifndef __EDITS_H__ 8*0e209d39SAndroid Build Coastguard Worker #define __EDITS_H__ 9*0e209d39SAndroid Build Coastguard Worker 10*0e209d39SAndroid Build Coastguard Worker #include "unicode/utypes.h" 11*0e209d39SAndroid Build Coastguard Worker 12*0e209d39SAndroid Build Coastguard Worker #if U_SHOW_CPLUSPLUS_API 13*0e209d39SAndroid Build Coastguard Worker 14*0e209d39SAndroid Build Coastguard Worker #include "unicode/uobject.h" 15*0e209d39SAndroid Build Coastguard Worker 16*0e209d39SAndroid Build Coastguard Worker /** 17*0e209d39SAndroid Build Coastguard Worker * \file 18*0e209d39SAndroid Build Coastguard Worker * \brief C++ API: C++ class Edits for low-level string transformations on styled text. 19*0e209d39SAndroid Build Coastguard Worker */ 20*0e209d39SAndroid Build Coastguard Worker 21*0e209d39SAndroid Build Coastguard Worker U_NAMESPACE_BEGIN 22*0e209d39SAndroid Build Coastguard Worker 23*0e209d39SAndroid Build Coastguard Worker class UnicodeString; 24*0e209d39SAndroid Build Coastguard Worker 25*0e209d39SAndroid Build Coastguard Worker /** 26*0e209d39SAndroid Build Coastguard Worker * Records lengths of string edits but not replacement text. Supports replacements, insertions, deletions 27*0e209d39SAndroid Build Coastguard Worker * in linear progression. Does not support moving/reordering of text. 28*0e209d39SAndroid Build Coastguard Worker * 29*0e209d39SAndroid Build Coastguard Worker * There are two types of edits: <em>change edits</em> and <em>no-change edits</em>. Add edits to 30*0e209d39SAndroid Build Coastguard Worker * instances of this class using {@link #addReplace(int32_t, int32_t)} (for change edits) and 31*0e209d39SAndroid Build Coastguard Worker * {@link #addUnchanged(int32_t)} (for no-change edits). Change edits are retained with full granularity, 32*0e209d39SAndroid Build Coastguard Worker * whereas adjacent no-change edits are always merged together. In no-change edits, there is a one-to-one 33*0e209d39SAndroid Build Coastguard Worker * mapping between code points in the source and destination strings. 34*0e209d39SAndroid Build Coastguard Worker * 35*0e209d39SAndroid Build Coastguard Worker * After all edits have been added, instances of this class should be considered immutable, and an 36*0e209d39SAndroid Build Coastguard Worker * {@link Edits::Iterator} can be used for queries. 37*0e209d39SAndroid Build Coastguard Worker * 38*0e209d39SAndroid Build Coastguard Worker * There are four flavors of Edits::Iterator: 39*0e209d39SAndroid Build Coastguard Worker * 40*0e209d39SAndroid Build Coastguard Worker * <ul> 41*0e209d39SAndroid Build Coastguard Worker * <li>{@link #getFineIterator()} retains full granularity of change edits. 42*0e209d39SAndroid Build Coastguard Worker * <li>{@link #getFineChangesIterator()} retains full granularity of change edits, and when calling 43*0e209d39SAndroid Build Coastguard Worker * next() on the iterator, skips over no-change edits (unchanged regions). 44*0e209d39SAndroid Build Coastguard Worker * <li>{@link #getCoarseIterator()} treats adjacent change edits as a single edit. (Adjacent no-change 45*0e209d39SAndroid Build Coastguard Worker * edits are automatically merged during the construction phase.) 46*0e209d39SAndroid Build Coastguard Worker * <li>{@link #getCoarseChangesIterator()} treats adjacent change edits as a single edit, and when 47*0e209d39SAndroid Build Coastguard Worker * calling next() on the iterator, skips over no-change edits (unchanged regions). 48*0e209d39SAndroid Build Coastguard Worker * </ul> 49*0e209d39SAndroid Build Coastguard Worker * 50*0e209d39SAndroid Build Coastguard Worker * For example, consider the string "abcßDeF", which case-folds to "abcssdef". This string has the 51*0e209d39SAndroid Build Coastguard Worker * following fine edits: 52*0e209d39SAndroid Build Coastguard Worker * <ul> 53*0e209d39SAndroid Build Coastguard Worker * <li>abc ⇨ abc (no-change) 54*0e209d39SAndroid Build Coastguard Worker * <li>ß ⇨ ss (change) 55*0e209d39SAndroid Build Coastguard Worker * <li>D ⇨ d (change) 56*0e209d39SAndroid Build Coastguard Worker * <li>e ⇨ e (no-change) 57*0e209d39SAndroid Build Coastguard Worker * <li>F ⇨ f (change) 58*0e209d39SAndroid Build Coastguard Worker * </ul> 59*0e209d39SAndroid Build Coastguard Worker * and the following coarse edits (note how adjacent change edits get merged together): 60*0e209d39SAndroid Build Coastguard Worker * <ul> 61*0e209d39SAndroid Build Coastguard Worker * <li>abc ⇨ abc (no-change) 62*0e209d39SAndroid Build Coastguard Worker * <li>ßD ⇨ ssd (change) 63*0e209d39SAndroid Build Coastguard Worker * <li>e ⇨ e (no-change) 64*0e209d39SAndroid Build Coastguard Worker * <li>F ⇨ f (change) 65*0e209d39SAndroid Build Coastguard Worker * </ul> 66*0e209d39SAndroid Build Coastguard Worker * 67*0e209d39SAndroid Build Coastguard Worker * The "fine changes" and "coarse changes" iterators will step through only the change edits when their 68*0e209d39SAndroid Build Coastguard Worker * `Edits::Iterator::next()` methods are called. They are identical to the non-change iterators when 69*0e209d39SAndroid Build Coastguard Worker * their `Edits::Iterator::findSourceIndex()` or `Edits::Iterator::findDestinationIndex()` 70*0e209d39SAndroid Build Coastguard Worker * methods are used to walk through the string. 71*0e209d39SAndroid Build Coastguard Worker * 72*0e209d39SAndroid Build Coastguard Worker * For examples of how to use this class, see the test `TestCaseMapEditsIteratorDocs` in 73*0e209d39SAndroid Build Coastguard Worker * UCharacterCaseTest.java. 74*0e209d39SAndroid Build Coastguard Worker * 75*0e209d39SAndroid Build Coastguard Worker * An Edits object tracks a separate UErrorCode, but ICU string transformation functions 76*0e209d39SAndroid Build Coastguard Worker * (e.g., case mapping functions) merge any such errors into their API's UErrorCode. 77*0e209d39SAndroid Build Coastguard Worker * 78*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 79*0e209d39SAndroid Build Coastguard Worker */ 80*0e209d39SAndroid Build Coastguard Worker class U_COMMON_API Edits final : public UMemory { 81*0e209d39SAndroid Build Coastguard Worker public: 82*0e209d39SAndroid Build Coastguard Worker /** 83*0e209d39SAndroid Build Coastguard Worker * Constructs an empty object. 84*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 85*0e209d39SAndroid Build Coastguard Worker */ Edits()86*0e209d39SAndroid Build Coastguard Worker Edits() : 87*0e209d39SAndroid Build Coastguard Worker array(stackArray), capacity(STACK_CAPACITY), length(0), delta(0), numChanges(0), 88*0e209d39SAndroid Build Coastguard Worker errorCode_(U_ZERO_ERROR) {} 89*0e209d39SAndroid Build Coastguard Worker /** 90*0e209d39SAndroid Build Coastguard Worker * Copy constructor. 91*0e209d39SAndroid Build Coastguard Worker * @param other source edits 92*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 93*0e209d39SAndroid Build Coastguard Worker */ Edits(const Edits & other)94*0e209d39SAndroid Build Coastguard Worker Edits(const Edits &other) : 95*0e209d39SAndroid Build Coastguard Worker array(stackArray), capacity(STACK_CAPACITY), length(other.length), 96*0e209d39SAndroid Build Coastguard Worker delta(other.delta), numChanges(other.numChanges), 97*0e209d39SAndroid Build Coastguard Worker errorCode_(other.errorCode_) { 98*0e209d39SAndroid Build Coastguard Worker copyArray(other); 99*0e209d39SAndroid Build Coastguard Worker } 100*0e209d39SAndroid Build Coastguard Worker /** 101*0e209d39SAndroid Build Coastguard Worker * Move constructor, might leave src empty. 102*0e209d39SAndroid Build Coastguard Worker * This object will have the same contents that the source object had. 103*0e209d39SAndroid Build Coastguard Worker * @param src source edits 104*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 105*0e209d39SAndroid Build Coastguard Worker */ Edits(Edits && src)106*0e209d39SAndroid Build Coastguard Worker Edits(Edits &&src) noexcept : 107*0e209d39SAndroid Build Coastguard Worker array(stackArray), capacity(STACK_CAPACITY), length(src.length), 108*0e209d39SAndroid Build Coastguard Worker delta(src.delta), numChanges(src.numChanges), 109*0e209d39SAndroid Build Coastguard Worker errorCode_(src.errorCode_) { 110*0e209d39SAndroid Build Coastguard Worker moveArray(src); 111*0e209d39SAndroid Build Coastguard Worker } 112*0e209d39SAndroid Build Coastguard Worker 113*0e209d39SAndroid Build Coastguard Worker /** 114*0e209d39SAndroid Build Coastguard Worker * Destructor. 115*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 116*0e209d39SAndroid Build Coastguard Worker */ 117*0e209d39SAndroid Build Coastguard Worker ~Edits(); 118*0e209d39SAndroid Build Coastguard Worker 119*0e209d39SAndroid Build Coastguard Worker /** 120*0e209d39SAndroid Build Coastguard Worker * Assignment operator. 121*0e209d39SAndroid Build Coastguard Worker * @param other source edits 122*0e209d39SAndroid Build Coastguard Worker * @return *this 123*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 124*0e209d39SAndroid Build Coastguard Worker */ 125*0e209d39SAndroid Build Coastguard Worker Edits &operator=(const Edits &other); 126*0e209d39SAndroid Build Coastguard Worker 127*0e209d39SAndroid Build Coastguard Worker /** 128*0e209d39SAndroid Build Coastguard Worker * Move assignment operator, might leave src empty. 129*0e209d39SAndroid Build Coastguard Worker * This object will have the same contents that the source object had. 130*0e209d39SAndroid Build Coastguard Worker * The behavior is undefined if *this and src are the same object. 131*0e209d39SAndroid Build Coastguard Worker * @param src source edits 132*0e209d39SAndroid Build Coastguard Worker * @return *this 133*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 134*0e209d39SAndroid Build Coastguard Worker */ 135*0e209d39SAndroid Build Coastguard Worker Edits &operator=(Edits &&src) noexcept; 136*0e209d39SAndroid Build Coastguard Worker 137*0e209d39SAndroid Build Coastguard Worker /** 138*0e209d39SAndroid Build Coastguard Worker * Resets the data but may not release memory. 139*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 140*0e209d39SAndroid Build Coastguard Worker */ 141*0e209d39SAndroid Build Coastguard Worker void reset() noexcept; 142*0e209d39SAndroid Build Coastguard Worker 143*0e209d39SAndroid Build Coastguard Worker /** 144*0e209d39SAndroid Build Coastguard Worker * Adds a no-change edit: a record for an unchanged segment of text. 145*0e209d39SAndroid Build Coastguard Worker * Normally called from inside ICU string transformation functions, not user code. 146*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 147*0e209d39SAndroid Build Coastguard Worker */ 148*0e209d39SAndroid Build Coastguard Worker void addUnchanged(int32_t unchangedLength); 149*0e209d39SAndroid Build Coastguard Worker /** 150*0e209d39SAndroid Build Coastguard Worker * Adds a change edit: a record for a text replacement/insertion/deletion. 151*0e209d39SAndroid Build Coastguard Worker * Normally called from inside ICU string transformation functions, not user code. 152*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 153*0e209d39SAndroid Build Coastguard Worker */ 154*0e209d39SAndroid Build Coastguard Worker void addReplace(int32_t oldLength, int32_t newLength); 155*0e209d39SAndroid Build Coastguard Worker /** 156*0e209d39SAndroid Build Coastguard Worker * Sets the UErrorCode if an error occurred while recording edits. 157*0e209d39SAndroid Build Coastguard Worker * Preserves older error codes in the outErrorCode. 158*0e209d39SAndroid Build Coastguard Worker * Normally called from inside ICU string transformation functions, not user code. 159*0e209d39SAndroid Build Coastguard Worker * @param outErrorCode Set to an error code if it does not contain one already 160*0e209d39SAndroid Build Coastguard Worker * and an error occurred while recording edits. 161*0e209d39SAndroid Build Coastguard Worker * Otherwise unchanged. 162*0e209d39SAndroid Build Coastguard Worker * @return true if U_FAILURE(outErrorCode) 163*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 164*0e209d39SAndroid Build Coastguard Worker */ 165*0e209d39SAndroid Build Coastguard Worker UBool copyErrorTo(UErrorCode &outErrorCode) const; 166*0e209d39SAndroid Build Coastguard Worker 167*0e209d39SAndroid Build Coastguard Worker /** 168*0e209d39SAndroid Build Coastguard Worker * How much longer is the new text compared with the old text? 169*0e209d39SAndroid Build Coastguard Worker * @return new length minus old length 170*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 171*0e209d39SAndroid Build Coastguard Worker */ lengthDelta()172*0e209d39SAndroid Build Coastguard Worker int32_t lengthDelta() const { return delta; } 173*0e209d39SAndroid Build Coastguard Worker /** 174*0e209d39SAndroid Build Coastguard Worker * @return true if there are any change edits 175*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 176*0e209d39SAndroid Build Coastguard Worker */ hasChanges()177*0e209d39SAndroid Build Coastguard Worker UBool hasChanges() const { return numChanges != 0; } 178*0e209d39SAndroid Build Coastguard Worker 179*0e209d39SAndroid Build Coastguard Worker /** 180*0e209d39SAndroid Build Coastguard Worker * @return the number of change edits 181*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 182*0e209d39SAndroid Build Coastguard Worker */ numberOfChanges()183*0e209d39SAndroid Build Coastguard Worker int32_t numberOfChanges() const { return numChanges; } 184*0e209d39SAndroid Build Coastguard Worker 185*0e209d39SAndroid Build Coastguard Worker /** 186*0e209d39SAndroid Build Coastguard Worker * Access to the list of edits. 187*0e209d39SAndroid Build Coastguard Worker * 188*0e209d39SAndroid Build Coastguard Worker * At any moment in time, an instance of this class points to a single edit: a "window" into a span 189*0e209d39SAndroid Build Coastguard Worker * of the source string and the corresponding span of the destination string. The source string span 190*0e209d39SAndroid Build Coastguard Worker * starts at {@link #sourceIndex()} and runs for {@link #oldLength()} chars; the destination string 191*0e209d39SAndroid Build Coastguard Worker * span starts at {@link #destinationIndex()} and runs for {@link #newLength()} chars. 192*0e209d39SAndroid Build Coastguard Worker * 193*0e209d39SAndroid Build Coastguard Worker * The iterator can be moved between edits using the `next()`, `findSourceIndex(int32_t, UErrorCode &)`, 194*0e209d39SAndroid Build Coastguard Worker * and `findDestinationIndex(int32_t, UErrorCode &)` methods. 195*0e209d39SAndroid Build Coastguard Worker * Calling any of these methods mutates the iterator to make it point to the corresponding edit. 196*0e209d39SAndroid Build Coastguard Worker * 197*0e209d39SAndroid Build Coastguard Worker * For more information, see the documentation for {@link Edits}. 198*0e209d39SAndroid Build Coastguard Worker * 199*0e209d39SAndroid Build Coastguard Worker * @see getCoarseIterator 200*0e209d39SAndroid Build Coastguard Worker * @see getFineIterator 201*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 202*0e209d39SAndroid Build Coastguard Worker */ 203*0e209d39SAndroid Build Coastguard Worker struct U_COMMON_API Iterator final : public UMemory { 204*0e209d39SAndroid Build Coastguard Worker /** 205*0e209d39SAndroid Build Coastguard Worker * Default constructor, empty iterator. 206*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 207*0e209d39SAndroid Build Coastguard Worker */ Iteratorfinal208*0e209d39SAndroid Build Coastguard Worker Iterator() : 209*0e209d39SAndroid Build Coastguard Worker array(nullptr), index(0), length(0), 210*0e209d39SAndroid Build Coastguard Worker remaining(0), onlyChanges_(false), coarse(false), 211*0e209d39SAndroid Build Coastguard Worker dir(0), changed(false), oldLength_(0), newLength_(0), 212*0e209d39SAndroid Build Coastguard Worker srcIndex(0), replIndex(0), destIndex(0) {} 213*0e209d39SAndroid Build Coastguard Worker /** 214*0e209d39SAndroid Build Coastguard Worker * Copy constructor. 215*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 216*0e209d39SAndroid Build Coastguard Worker */ 217*0e209d39SAndroid Build Coastguard Worker Iterator(const Iterator &other) = default; 218*0e209d39SAndroid Build Coastguard Worker /** 219*0e209d39SAndroid Build Coastguard Worker * Assignment operator. 220*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 221*0e209d39SAndroid Build Coastguard Worker */ 222*0e209d39SAndroid Build Coastguard Worker Iterator &operator=(const Iterator &other) = default; 223*0e209d39SAndroid Build Coastguard Worker 224*0e209d39SAndroid Build Coastguard Worker /** 225*0e209d39SAndroid Build Coastguard Worker * Advances the iterator to the next edit. 226*0e209d39SAndroid Build Coastguard Worker * @param errorCode ICU error code. Its input value must pass the U_SUCCESS() test, 227*0e209d39SAndroid Build Coastguard Worker * or else the function returns immediately. Check for U_FAILURE() 228*0e209d39SAndroid Build Coastguard Worker * on output or use with function chaining. (See User Guide for details.) 229*0e209d39SAndroid Build Coastguard Worker * @return true if there is another edit 230*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 231*0e209d39SAndroid Build Coastguard Worker */ nextfinal232*0e209d39SAndroid Build Coastguard Worker UBool next(UErrorCode &errorCode) { return next(onlyChanges_, errorCode); } 233*0e209d39SAndroid Build Coastguard Worker 234*0e209d39SAndroid Build Coastguard Worker /** 235*0e209d39SAndroid Build Coastguard Worker * Moves the iterator to the edit that contains the source index. 236*0e209d39SAndroid Build Coastguard Worker * The source index may be found in a no-change edit 237*0e209d39SAndroid Build Coastguard Worker * even if normal iteration would skip no-change edits. 238*0e209d39SAndroid Build Coastguard Worker * Normal iteration can continue from a found edit. 239*0e209d39SAndroid Build Coastguard Worker * 240*0e209d39SAndroid Build Coastguard Worker * The iterator state before this search logically does not matter. 241*0e209d39SAndroid Build Coastguard Worker * (It may affect the performance of the search.) 242*0e209d39SAndroid Build Coastguard Worker * 243*0e209d39SAndroid Build Coastguard Worker * The iterator state after this search is undefined 244*0e209d39SAndroid Build Coastguard Worker * if the source index is out of bounds for the source string. 245*0e209d39SAndroid Build Coastguard Worker * 246*0e209d39SAndroid Build Coastguard Worker * @param i source index 247*0e209d39SAndroid Build Coastguard Worker * @param errorCode ICU error code. Its input value must pass the U_SUCCESS() test, 248*0e209d39SAndroid Build Coastguard Worker * or else the function returns immediately. Check for U_FAILURE() 249*0e209d39SAndroid Build Coastguard Worker * on output or use with function chaining. (See User Guide for details.) 250*0e209d39SAndroid Build Coastguard Worker * @return true if the edit for the source index was found 251*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 252*0e209d39SAndroid Build Coastguard Worker */ findSourceIndexfinal253*0e209d39SAndroid Build Coastguard Worker UBool findSourceIndex(int32_t i, UErrorCode &errorCode) { 254*0e209d39SAndroid Build Coastguard Worker return findIndex(i, true, errorCode) == 0; 255*0e209d39SAndroid Build Coastguard Worker } 256*0e209d39SAndroid Build Coastguard Worker 257*0e209d39SAndroid Build Coastguard Worker /** 258*0e209d39SAndroid Build Coastguard Worker * Moves the iterator to the edit that contains the destination index. 259*0e209d39SAndroid Build Coastguard Worker * The destination index may be found in a no-change edit 260*0e209d39SAndroid Build Coastguard Worker * even if normal iteration would skip no-change edits. 261*0e209d39SAndroid Build Coastguard Worker * Normal iteration can continue from a found edit. 262*0e209d39SAndroid Build Coastguard Worker * 263*0e209d39SAndroid Build Coastguard Worker * The iterator state before this search logically does not matter. 264*0e209d39SAndroid Build Coastguard Worker * (It may affect the performance of the search.) 265*0e209d39SAndroid Build Coastguard Worker * 266*0e209d39SAndroid Build Coastguard Worker * The iterator state after this search is undefined 267*0e209d39SAndroid Build Coastguard Worker * if the source index is out of bounds for the source string. 268*0e209d39SAndroid Build Coastguard Worker * 269*0e209d39SAndroid Build Coastguard Worker * @param i destination index 270*0e209d39SAndroid Build Coastguard Worker * @param errorCode ICU error code. Its input value must pass the U_SUCCESS() test, 271*0e209d39SAndroid Build Coastguard Worker * or else the function returns immediately. Check for U_FAILURE() 272*0e209d39SAndroid Build Coastguard Worker * on output or use with function chaining. (See User Guide for details.) 273*0e209d39SAndroid Build Coastguard Worker * @return true if the edit for the destination index was found 274*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 275*0e209d39SAndroid Build Coastguard Worker */ findDestinationIndexfinal276*0e209d39SAndroid Build Coastguard Worker UBool findDestinationIndex(int32_t i, UErrorCode &errorCode) { 277*0e209d39SAndroid Build Coastguard Worker return findIndex(i, false, errorCode) == 0; 278*0e209d39SAndroid Build Coastguard Worker } 279*0e209d39SAndroid Build Coastguard Worker 280*0e209d39SAndroid Build Coastguard Worker /** 281*0e209d39SAndroid Build Coastguard Worker * Computes the destination index corresponding to the given source index. 282*0e209d39SAndroid Build Coastguard Worker * If the source index is inside a change edit (not at its start), 283*0e209d39SAndroid Build Coastguard Worker * then the destination index at the end of that edit is returned, 284*0e209d39SAndroid Build Coastguard Worker * since there is no information about index mapping inside a change edit. 285*0e209d39SAndroid Build Coastguard Worker * 286*0e209d39SAndroid Build Coastguard Worker * (This means that indexes to the start and middle of an edit, 287*0e209d39SAndroid Build Coastguard Worker * for example around a grapheme cluster, are mapped to indexes 288*0e209d39SAndroid Build Coastguard Worker * encompassing the entire edit. 289*0e209d39SAndroid Build Coastguard Worker * The alternative, mapping an interior index to the start, 290*0e209d39SAndroid Build Coastguard Worker * would map such an interval to an empty one.) 291*0e209d39SAndroid Build Coastguard Worker * 292*0e209d39SAndroid Build Coastguard Worker * This operation will usually but not always modify this object. 293*0e209d39SAndroid Build Coastguard Worker * The iterator state after this search is undefined. 294*0e209d39SAndroid Build Coastguard Worker * 295*0e209d39SAndroid Build Coastguard Worker * @param i source index 296*0e209d39SAndroid Build Coastguard Worker * @param errorCode ICU error code. Its input value must pass the U_SUCCESS() test, 297*0e209d39SAndroid Build Coastguard Worker * or else the function returns immediately. Check for U_FAILURE() 298*0e209d39SAndroid Build Coastguard Worker * on output or use with function chaining. (See User Guide for details.) 299*0e209d39SAndroid Build Coastguard Worker * @return destination index; undefined if i is not 0..string length 300*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 301*0e209d39SAndroid Build Coastguard Worker */ 302*0e209d39SAndroid Build Coastguard Worker int32_t destinationIndexFromSourceIndex(int32_t i, UErrorCode &errorCode); 303*0e209d39SAndroid Build Coastguard Worker 304*0e209d39SAndroid Build Coastguard Worker /** 305*0e209d39SAndroid Build Coastguard Worker * Computes the source index corresponding to the given destination index. 306*0e209d39SAndroid Build Coastguard Worker * If the destination index is inside a change edit (not at its start), 307*0e209d39SAndroid Build Coastguard Worker * then the source index at the end of that edit is returned, 308*0e209d39SAndroid Build Coastguard Worker * since there is no information about index mapping inside a change edit. 309*0e209d39SAndroid Build Coastguard Worker * 310*0e209d39SAndroid Build Coastguard Worker * (This means that indexes to the start and middle of an edit, 311*0e209d39SAndroid Build Coastguard Worker * for example around a grapheme cluster, are mapped to indexes 312*0e209d39SAndroid Build Coastguard Worker * encompassing the entire edit. 313*0e209d39SAndroid Build Coastguard Worker * The alternative, mapping an interior index to the start, 314*0e209d39SAndroid Build Coastguard Worker * would map such an interval to an empty one.) 315*0e209d39SAndroid Build Coastguard Worker * 316*0e209d39SAndroid Build Coastguard Worker * This operation will usually but not always modify this object. 317*0e209d39SAndroid Build Coastguard Worker * The iterator state after this search is undefined. 318*0e209d39SAndroid Build Coastguard Worker * 319*0e209d39SAndroid Build Coastguard Worker * @param i destination index 320*0e209d39SAndroid Build Coastguard Worker * @param errorCode ICU error code. Its input value must pass the U_SUCCESS() test, 321*0e209d39SAndroid Build Coastguard Worker * or else the function returns immediately. Check for U_FAILURE() 322*0e209d39SAndroid Build Coastguard Worker * on output or use with function chaining. (See User Guide for details.) 323*0e209d39SAndroid Build Coastguard Worker * @return source index; undefined if i is not 0..string length 324*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 325*0e209d39SAndroid Build Coastguard Worker */ 326*0e209d39SAndroid Build Coastguard Worker int32_t sourceIndexFromDestinationIndex(int32_t i, UErrorCode &errorCode); 327*0e209d39SAndroid Build Coastguard Worker 328*0e209d39SAndroid Build Coastguard Worker /** 329*0e209d39SAndroid Build Coastguard Worker * Returns whether the edit currently represented by the iterator is a change edit. 330*0e209d39SAndroid Build Coastguard Worker * 331*0e209d39SAndroid Build Coastguard Worker * @return true if this edit replaces oldLength() units with newLength() different ones. 332*0e209d39SAndroid Build Coastguard Worker * false if oldLength units remain unchanged. 333*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 334*0e209d39SAndroid Build Coastguard Worker */ hasChangefinal335*0e209d39SAndroid Build Coastguard Worker UBool hasChange() const { return changed; } 336*0e209d39SAndroid Build Coastguard Worker 337*0e209d39SAndroid Build Coastguard Worker /** 338*0e209d39SAndroid Build Coastguard Worker * The length of the current span in the source string, which starts at {@link #sourceIndex}. 339*0e209d39SAndroid Build Coastguard Worker * 340*0e209d39SAndroid Build Coastguard Worker * @return the number of units in the original string which are replaced or remain unchanged. 341*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 342*0e209d39SAndroid Build Coastguard Worker */ oldLengthfinal343*0e209d39SAndroid Build Coastguard Worker int32_t oldLength() const { return oldLength_; } 344*0e209d39SAndroid Build Coastguard Worker 345*0e209d39SAndroid Build Coastguard Worker /** 346*0e209d39SAndroid Build Coastguard Worker * The length of the current span in the destination string, which starts at 347*0e209d39SAndroid Build Coastguard Worker * {@link #destinationIndex}, or in the replacement string, which starts at 348*0e209d39SAndroid Build Coastguard Worker * {@link #replacementIndex}. 349*0e209d39SAndroid Build Coastguard Worker * 350*0e209d39SAndroid Build Coastguard Worker * @return the number of units in the modified string, if hasChange() is true. 351*0e209d39SAndroid Build Coastguard Worker * Same as oldLength if hasChange() is false. 352*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 353*0e209d39SAndroid Build Coastguard Worker */ newLengthfinal354*0e209d39SAndroid Build Coastguard Worker int32_t newLength() const { return newLength_; } 355*0e209d39SAndroid Build Coastguard Worker 356*0e209d39SAndroid Build Coastguard Worker /** 357*0e209d39SAndroid Build Coastguard Worker * The start index of the current span in the source string; the span has length 358*0e209d39SAndroid Build Coastguard Worker * {@link #oldLength}. 359*0e209d39SAndroid Build Coastguard Worker * 360*0e209d39SAndroid Build Coastguard Worker * @return the current index into the source string 361*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 362*0e209d39SAndroid Build Coastguard Worker */ sourceIndexfinal363*0e209d39SAndroid Build Coastguard Worker int32_t sourceIndex() const { return srcIndex; } 364*0e209d39SAndroid Build Coastguard Worker 365*0e209d39SAndroid Build Coastguard Worker /** 366*0e209d39SAndroid Build Coastguard Worker * The start index of the current span in the replacement string; the span has length 367*0e209d39SAndroid Build Coastguard Worker * {@link #newLength}. Well-defined only if the current edit is a change edit. 368*0e209d39SAndroid Build Coastguard Worker * 369*0e209d39SAndroid Build Coastguard Worker * The *replacement string* is the concatenation of all substrings of the destination 370*0e209d39SAndroid Build Coastguard Worker * string corresponding to change edits. 371*0e209d39SAndroid Build Coastguard Worker * 372*0e209d39SAndroid Build Coastguard Worker * This method is intended to be used together with operations that write only replacement 373*0e209d39SAndroid Build Coastguard Worker * characters (e.g. operations specifying the \ref U_OMIT_UNCHANGED_TEXT option). 374*0e209d39SAndroid Build Coastguard Worker * The source string can then be modified in-place. 375*0e209d39SAndroid Build Coastguard Worker * 376*0e209d39SAndroid Build Coastguard Worker * @return the current index into the replacement-characters-only string, 377*0e209d39SAndroid Build Coastguard Worker * not counting unchanged spans 378*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 379*0e209d39SAndroid Build Coastguard Worker */ replacementIndexfinal380*0e209d39SAndroid Build Coastguard Worker int32_t replacementIndex() const { 381*0e209d39SAndroid Build Coastguard Worker // TODO: Throw an exception if we aren't in a change edit? 382*0e209d39SAndroid Build Coastguard Worker return replIndex; 383*0e209d39SAndroid Build Coastguard Worker } 384*0e209d39SAndroid Build Coastguard Worker 385*0e209d39SAndroid Build Coastguard Worker /** 386*0e209d39SAndroid Build Coastguard Worker * The start index of the current span in the destination string; the span has length 387*0e209d39SAndroid Build Coastguard Worker * {@link #newLength}. 388*0e209d39SAndroid Build Coastguard Worker * 389*0e209d39SAndroid Build Coastguard Worker * @return the current index into the full destination string 390*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 391*0e209d39SAndroid Build Coastguard Worker */ destinationIndexfinal392*0e209d39SAndroid Build Coastguard Worker int32_t destinationIndex() const { return destIndex; } 393*0e209d39SAndroid Build Coastguard Worker 394*0e209d39SAndroid Build Coastguard Worker #ifndef U_HIDE_INTERNAL_API 395*0e209d39SAndroid Build Coastguard Worker /** 396*0e209d39SAndroid Build Coastguard Worker * A string representation of the current edit represented by the iterator for debugging. You 397*0e209d39SAndroid Build Coastguard Worker * should not depend on the contents of the return string. 398*0e209d39SAndroid Build Coastguard Worker * @internal 399*0e209d39SAndroid Build Coastguard Worker */ 400*0e209d39SAndroid Build Coastguard Worker UnicodeString& toString(UnicodeString& appendTo) const; 401*0e209d39SAndroid Build Coastguard Worker #endif // U_HIDE_INTERNAL_API 402*0e209d39SAndroid Build Coastguard Worker 403*0e209d39SAndroid Build Coastguard Worker private: 404*0e209d39SAndroid Build Coastguard Worker friend class Edits; 405*0e209d39SAndroid Build Coastguard Worker 406*0e209d39SAndroid Build Coastguard Worker Iterator(const uint16_t *a, int32_t len, UBool oc, UBool crs); 407*0e209d39SAndroid Build Coastguard Worker 408*0e209d39SAndroid Build Coastguard Worker int32_t readLength(int32_t head); 409*0e209d39SAndroid Build Coastguard Worker void updateNextIndexes(); 410*0e209d39SAndroid Build Coastguard Worker void updatePreviousIndexes(); 411*0e209d39SAndroid Build Coastguard Worker UBool noNext(); 412*0e209d39SAndroid Build Coastguard Worker UBool next(UBool onlyChanges, UErrorCode &errorCode); 413*0e209d39SAndroid Build Coastguard Worker UBool previous(UErrorCode &errorCode); 414*0e209d39SAndroid Build Coastguard Worker /** @return -1: error or i<0; 0: found; 1: i>=string length */ 415*0e209d39SAndroid Build Coastguard Worker int32_t findIndex(int32_t i, UBool findSource, UErrorCode &errorCode); 416*0e209d39SAndroid Build Coastguard Worker 417*0e209d39SAndroid Build Coastguard Worker const uint16_t *array; 418*0e209d39SAndroid Build Coastguard Worker int32_t index, length; 419*0e209d39SAndroid Build Coastguard Worker // 0 if we are not within compressed equal-length changes. 420*0e209d39SAndroid Build Coastguard Worker // Otherwise the number of remaining changes, including the current one. 421*0e209d39SAndroid Build Coastguard Worker int32_t remaining; 422*0e209d39SAndroid Build Coastguard Worker UBool onlyChanges_, coarse; 423*0e209d39SAndroid Build Coastguard Worker 424*0e209d39SAndroid Build Coastguard Worker int8_t dir; // iteration direction: back(<0), initial(0), forward(>0) 425*0e209d39SAndroid Build Coastguard Worker UBool changed; 426*0e209d39SAndroid Build Coastguard Worker int32_t oldLength_, newLength_; 427*0e209d39SAndroid Build Coastguard Worker int32_t srcIndex, replIndex, destIndex; 428*0e209d39SAndroid Build Coastguard Worker }; 429*0e209d39SAndroid Build Coastguard Worker 430*0e209d39SAndroid Build Coastguard Worker /** 431*0e209d39SAndroid Build Coastguard Worker * Returns an Iterator for coarse-grained change edits 432*0e209d39SAndroid Build Coastguard Worker * (adjacent change edits are treated as one). 433*0e209d39SAndroid Build Coastguard Worker * Can be used to perform simple string updates. 434*0e209d39SAndroid Build Coastguard Worker * Skips no-change edits. 435*0e209d39SAndroid Build Coastguard Worker * @return an Iterator that merges adjacent changes. 436*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 437*0e209d39SAndroid Build Coastguard Worker */ getCoarseChangesIterator()438*0e209d39SAndroid Build Coastguard Worker Iterator getCoarseChangesIterator() const { 439*0e209d39SAndroid Build Coastguard Worker return Iterator(array, length, true, true); 440*0e209d39SAndroid Build Coastguard Worker } 441*0e209d39SAndroid Build Coastguard Worker 442*0e209d39SAndroid Build Coastguard Worker /** 443*0e209d39SAndroid Build Coastguard Worker * Returns an Iterator for coarse-grained change and no-change edits 444*0e209d39SAndroid Build Coastguard Worker * (adjacent change edits are treated as one). 445*0e209d39SAndroid Build Coastguard Worker * Can be used to perform simple string updates. 446*0e209d39SAndroid Build Coastguard Worker * Adjacent change edits are treated as one edit. 447*0e209d39SAndroid Build Coastguard Worker * @return an Iterator that merges adjacent changes. 448*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 449*0e209d39SAndroid Build Coastguard Worker */ getCoarseIterator()450*0e209d39SAndroid Build Coastguard Worker Iterator getCoarseIterator() const { 451*0e209d39SAndroid Build Coastguard Worker return Iterator(array, length, false, true); 452*0e209d39SAndroid Build Coastguard Worker } 453*0e209d39SAndroid Build Coastguard Worker 454*0e209d39SAndroid Build Coastguard Worker /** 455*0e209d39SAndroid Build Coastguard Worker * Returns an Iterator for fine-grained change edits 456*0e209d39SAndroid Build Coastguard Worker * (full granularity of change edits is retained). 457*0e209d39SAndroid Build Coastguard Worker * Can be used for modifying styled text. 458*0e209d39SAndroid Build Coastguard Worker * Skips no-change edits. 459*0e209d39SAndroid Build Coastguard Worker * @return an Iterator that separates adjacent changes. 460*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 461*0e209d39SAndroid Build Coastguard Worker */ getFineChangesIterator()462*0e209d39SAndroid Build Coastguard Worker Iterator getFineChangesIterator() const { 463*0e209d39SAndroid Build Coastguard Worker return Iterator(array, length, true, false); 464*0e209d39SAndroid Build Coastguard Worker } 465*0e209d39SAndroid Build Coastguard Worker 466*0e209d39SAndroid Build Coastguard Worker /** 467*0e209d39SAndroid Build Coastguard Worker * Returns an Iterator for fine-grained change and no-change edits 468*0e209d39SAndroid Build Coastguard Worker * (full granularity of change edits is retained). 469*0e209d39SAndroid Build Coastguard Worker * Can be used for modifying styled text. 470*0e209d39SAndroid Build Coastguard Worker * @return an Iterator that separates adjacent changes. 471*0e209d39SAndroid Build Coastguard Worker * @stable ICU 59 472*0e209d39SAndroid Build Coastguard Worker */ getFineIterator()473*0e209d39SAndroid Build Coastguard Worker Iterator getFineIterator() const { 474*0e209d39SAndroid Build Coastguard Worker return Iterator(array, length, false, false); 475*0e209d39SAndroid Build Coastguard Worker } 476*0e209d39SAndroid Build Coastguard Worker 477*0e209d39SAndroid Build Coastguard Worker /** 478*0e209d39SAndroid Build Coastguard Worker * Merges the two input Edits and appends the result to this object. 479*0e209d39SAndroid Build Coastguard Worker * 480*0e209d39SAndroid Build Coastguard Worker * Consider two string transformations (for example, normalization and case mapping) 481*0e209d39SAndroid Build Coastguard Worker * where each records Edits in addition to writing an output string.<br> 482*0e209d39SAndroid Build Coastguard Worker * Edits ab reflect how substrings of input string a 483*0e209d39SAndroid Build Coastguard Worker * map to substrings of intermediate string b.<br> 484*0e209d39SAndroid Build Coastguard Worker * Edits bc reflect how substrings of intermediate string b 485*0e209d39SAndroid Build Coastguard Worker * map to substrings of output string c.<br> 486*0e209d39SAndroid Build Coastguard Worker * This function merges ab and bc such that the additional edits 487*0e209d39SAndroid Build Coastguard Worker * recorded in this object reflect how substrings of input string a 488*0e209d39SAndroid Build Coastguard Worker * map to substrings of output string c. 489*0e209d39SAndroid Build Coastguard Worker * 490*0e209d39SAndroid Build Coastguard Worker * If unrelated Edits are passed in where the output string of the first 491*0e209d39SAndroid Build Coastguard Worker * has a different length than the input string of the second, 492*0e209d39SAndroid Build Coastguard Worker * then a U_ILLEGAL_ARGUMENT_ERROR is reported. 493*0e209d39SAndroid Build Coastguard Worker * 494*0e209d39SAndroid Build Coastguard Worker * @param ab reflects how substrings of input string a 495*0e209d39SAndroid Build Coastguard Worker * map to substrings of intermediate string b. 496*0e209d39SAndroid Build Coastguard Worker * @param bc reflects how substrings of intermediate string b 497*0e209d39SAndroid Build Coastguard Worker * map to substrings of output string c. 498*0e209d39SAndroid Build Coastguard Worker * @param errorCode ICU error code. Its input value must pass the U_SUCCESS() test, 499*0e209d39SAndroid Build Coastguard Worker * or else the function returns immediately. Check for U_FAILURE() 500*0e209d39SAndroid Build Coastguard Worker * on output or use with function chaining. (See User Guide for details.) 501*0e209d39SAndroid Build Coastguard Worker * @return *this, with the merged edits appended 502*0e209d39SAndroid Build Coastguard Worker * @stable ICU 60 503*0e209d39SAndroid Build Coastguard Worker */ 504*0e209d39SAndroid Build Coastguard Worker Edits &mergeAndAppend(const Edits &ab, const Edits &bc, UErrorCode &errorCode); 505*0e209d39SAndroid Build Coastguard Worker 506*0e209d39SAndroid Build Coastguard Worker private: 507*0e209d39SAndroid Build Coastguard Worker void releaseArray() noexcept; 508*0e209d39SAndroid Build Coastguard Worker Edits ©Array(const Edits &other); 509*0e209d39SAndroid Build Coastguard Worker Edits &moveArray(Edits &src) noexcept; 510*0e209d39SAndroid Build Coastguard Worker setLastUnit(int32_t last)511*0e209d39SAndroid Build Coastguard Worker void setLastUnit(int32_t last) { array[length - 1] = (uint16_t)last; } lastUnit()512*0e209d39SAndroid Build Coastguard Worker int32_t lastUnit() const { return length > 0 ? array[length - 1] : 0xffff; } 513*0e209d39SAndroid Build Coastguard Worker 514*0e209d39SAndroid Build Coastguard Worker void append(int32_t r); 515*0e209d39SAndroid Build Coastguard Worker UBool growArray(); 516*0e209d39SAndroid Build Coastguard Worker 517*0e209d39SAndroid Build Coastguard Worker static const int32_t STACK_CAPACITY = 100; 518*0e209d39SAndroid Build Coastguard Worker uint16_t *array; 519*0e209d39SAndroid Build Coastguard Worker int32_t capacity; 520*0e209d39SAndroid Build Coastguard Worker int32_t length; 521*0e209d39SAndroid Build Coastguard Worker int32_t delta; 522*0e209d39SAndroid Build Coastguard Worker int32_t numChanges; 523*0e209d39SAndroid Build Coastguard Worker UErrorCode errorCode_; 524*0e209d39SAndroid Build Coastguard Worker uint16_t stackArray[STACK_CAPACITY]; 525*0e209d39SAndroid Build Coastguard Worker }; 526*0e209d39SAndroid Build Coastguard Worker 527*0e209d39SAndroid Build Coastguard Worker U_NAMESPACE_END 528*0e209d39SAndroid Build Coastguard Worker 529*0e209d39SAndroid Build Coastguard Worker #endif /* U_SHOW_CPLUSPLUS_API */ 530*0e209d39SAndroid Build Coastguard Worker 531*0e209d39SAndroid Build Coastguard Worker #endif // __EDITS_H__ 532