1*cda5da8dSAndroid Build Coastguard Worker"""text_file 2*cda5da8dSAndroid Build Coastguard Worker 3*cda5da8dSAndroid Build Coastguard Workerprovides the TextFile class, which gives an interface to text files 4*cda5da8dSAndroid Build Coastguard Workerthat (optionally) takes care of stripping comments, ignoring blank 5*cda5da8dSAndroid Build Coastguard Workerlines, and joining lines with backslashes.""" 6*cda5da8dSAndroid Build Coastguard Worker 7*cda5da8dSAndroid Build Coastguard Workerimport sys, io 8*cda5da8dSAndroid Build Coastguard Worker 9*cda5da8dSAndroid Build Coastguard Worker 10*cda5da8dSAndroid Build Coastguard Workerclass TextFile: 11*cda5da8dSAndroid Build Coastguard Worker """Provides a file-like object that takes care of all the things you 12*cda5da8dSAndroid Build Coastguard Worker commonly want to do when processing a text file that has some 13*cda5da8dSAndroid Build Coastguard Worker line-by-line syntax: strip comments (as long as "#" is your 14*cda5da8dSAndroid Build Coastguard Worker comment character), skip blank lines, join adjacent lines by 15*cda5da8dSAndroid Build Coastguard Worker escaping the newline (ie. backslash at end of line), strip 16*cda5da8dSAndroid Build Coastguard Worker leading and/or trailing whitespace. All of these are optional 17*cda5da8dSAndroid Build Coastguard Worker and independently controllable. 18*cda5da8dSAndroid Build Coastguard Worker 19*cda5da8dSAndroid Build Coastguard Worker Provides a 'warn()' method so you can generate warning messages that 20*cda5da8dSAndroid Build Coastguard Worker report physical line number, even if the logical line in question 21*cda5da8dSAndroid Build Coastguard Worker spans multiple physical lines. Also provides 'unreadline()' for 22*cda5da8dSAndroid Build Coastguard Worker implementing line-at-a-time lookahead. 23*cda5da8dSAndroid Build Coastguard Worker 24*cda5da8dSAndroid Build Coastguard Worker Constructor is called as: 25*cda5da8dSAndroid Build Coastguard Worker 26*cda5da8dSAndroid Build Coastguard Worker TextFile (filename=None, file=None, **options) 27*cda5da8dSAndroid Build Coastguard Worker 28*cda5da8dSAndroid Build Coastguard Worker It bombs (RuntimeError) if both 'filename' and 'file' are None; 29*cda5da8dSAndroid Build Coastguard Worker 'filename' should be a string, and 'file' a file object (or 30*cda5da8dSAndroid Build Coastguard Worker something that provides 'readline()' and 'close()' methods). It is 31*cda5da8dSAndroid Build Coastguard Worker recommended that you supply at least 'filename', so that TextFile 32*cda5da8dSAndroid Build Coastguard Worker can include it in warning messages. If 'file' is not supplied, 33*cda5da8dSAndroid Build Coastguard Worker TextFile creates its own using 'io.open()'. 34*cda5da8dSAndroid Build Coastguard Worker 35*cda5da8dSAndroid Build Coastguard Worker The options are all boolean, and affect the value returned by 36*cda5da8dSAndroid Build Coastguard Worker 'readline()': 37*cda5da8dSAndroid Build Coastguard Worker strip_comments [default: true] 38*cda5da8dSAndroid Build Coastguard Worker strip from "#" to end-of-line, as well as any whitespace 39*cda5da8dSAndroid Build Coastguard Worker leading up to the "#" -- unless it is escaped by a backslash 40*cda5da8dSAndroid Build Coastguard Worker lstrip_ws [default: false] 41*cda5da8dSAndroid Build Coastguard Worker strip leading whitespace from each line before returning it 42*cda5da8dSAndroid Build Coastguard Worker rstrip_ws [default: true] 43*cda5da8dSAndroid Build Coastguard Worker strip trailing whitespace (including line terminator!) from 44*cda5da8dSAndroid Build Coastguard Worker each line before returning it 45*cda5da8dSAndroid Build Coastguard Worker skip_blanks [default: true} 46*cda5da8dSAndroid Build Coastguard Worker skip lines that are empty *after* stripping comments and 47*cda5da8dSAndroid Build Coastguard Worker whitespace. (If both lstrip_ws and rstrip_ws are false, 48*cda5da8dSAndroid Build Coastguard Worker then some lines may consist of solely whitespace: these will 49*cda5da8dSAndroid Build Coastguard Worker *not* be skipped, even if 'skip_blanks' is true.) 50*cda5da8dSAndroid Build Coastguard Worker join_lines [default: false] 51*cda5da8dSAndroid Build Coastguard Worker if a backslash is the last non-newline character on a line 52*cda5da8dSAndroid Build Coastguard Worker after stripping comments and whitespace, join the following line 53*cda5da8dSAndroid Build Coastguard Worker to it to form one "logical line"; if N consecutive lines end 54*cda5da8dSAndroid Build Coastguard Worker with a backslash, then N+1 physical lines will be joined to 55*cda5da8dSAndroid Build Coastguard Worker form one logical line. 56*cda5da8dSAndroid Build Coastguard Worker collapse_join [default: false] 57*cda5da8dSAndroid Build Coastguard Worker strip leading whitespace from lines that are joined to their 58*cda5da8dSAndroid Build Coastguard Worker predecessor; only matters if (join_lines and not lstrip_ws) 59*cda5da8dSAndroid Build Coastguard Worker errors [default: 'strict'] 60*cda5da8dSAndroid Build Coastguard Worker error handler used to decode the file content 61*cda5da8dSAndroid Build Coastguard Worker 62*cda5da8dSAndroid Build Coastguard Worker Note that since 'rstrip_ws' can strip the trailing newline, the 63*cda5da8dSAndroid Build Coastguard Worker semantics of 'readline()' must differ from those of the builtin file 64*cda5da8dSAndroid Build Coastguard Worker object's 'readline()' method! In particular, 'readline()' returns 65*cda5da8dSAndroid Build Coastguard Worker None for end-of-file: an empty string might just be a blank line (or 66*cda5da8dSAndroid Build Coastguard Worker an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is 67*cda5da8dSAndroid Build Coastguard Worker not.""" 68*cda5da8dSAndroid Build Coastguard Worker 69*cda5da8dSAndroid Build Coastguard Worker default_options = { 'strip_comments': 1, 70*cda5da8dSAndroid Build Coastguard Worker 'skip_blanks': 1, 71*cda5da8dSAndroid Build Coastguard Worker 'lstrip_ws': 0, 72*cda5da8dSAndroid Build Coastguard Worker 'rstrip_ws': 1, 73*cda5da8dSAndroid Build Coastguard Worker 'join_lines': 0, 74*cda5da8dSAndroid Build Coastguard Worker 'collapse_join': 0, 75*cda5da8dSAndroid Build Coastguard Worker 'errors': 'strict', 76*cda5da8dSAndroid Build Coastguard Worker } 77*cda5da8dSAndroid Build Coastguard Worker 78*cda5da8dSAndroid Build Coastguard Worker def __init__(self, filename=None, file=None, **options): 79*cda5da8dSAndroid Build Coastguard Worker """Construct a new TextFile object. At least one of 'filename' 80*cda5da8dSAndroid Build Coastguard Worker (a string) and 'file' (a file-like object) must be supplied. 81*cda5da8dSAndroid Build Coastguard Worker They keyword argument options are described above and affect 82*cda5da8dSAndroid Build Coastguard Worker the values returned by 'readline()'.""" 83*cda5da8dSAndroid Build Coastguard Worker if filename is None and file is None: 84*cda5da8dSAndroid Build Coastguard Worker raise RuntimeError("you must supply either or both of 'filename' and 'file'") 85*cda5da8dSAndroid Build Coastguard Worker 86*cda5da8dSAndroid Build Coastguard Worker # set values for all options -- either from client option hash 87*cda5da8dSAndroid Build Coastguard Worker # or fallback to default_options 88*cda5da8dSAndroid Build Coastguard Worker for opt in self.default_options.keys(): 89*cda5da8dSAndroid Build Coastguard Worker if opt in options: 90*cda5da8dSAndroid Build Coastguard Worker setattr(self, opt, options[opt]) 91*cda5da8dSAndroid Build Coastguard Worker else: 92*cda5da8dSAndroid Build Coastguard Worker setattr(self, opt, self.default_options[opt]) 93*cda5da8dSAndroid Build Coastguard Worker 94*cda5da8dSAndroid Build Coastguard Worker # sanity check client option hash 95*cda5da8dSAndroid Build Coastguard Worker for opt in options.keys(): 96*cda5da8dSAndroid Build Coastguard Worker if opt not in self.default_options: 97*cda5da8dSAndroid Build Coastguard Worker raise KeyError("invalid TextFile option '%s'" % opt) 98*cda5da8dSAndroid Build Coastguard Worker 99*cda5da8dSAndroid Build Coastguard Worker if file is None: 100*cda5da8dSAndroid Build Coastguard Worker self.open(filename) 101*cda5da8dSAndroid Build Coastguard Worker else: 102*cda5da8dSAndroid Build Coastguard Worker self.filename = filename 103*cda5da8dSAndroid Build Coastguard Worker self.file = file 104*cda5da8dSAndroid Build Coastguard Worker self.current_line = 0 # assuming that file is at BOF! 105*cda5da8dSAndroid Build Coastguard Worker 106*cda5da8dSAndroid Build Coastguard Worker # 'linebuf' is a stack of lines that will be emptied before we 107*cda5da8dSAndroid Build Coastguard Worker # actually read from the file; it's only populated by an 108*cda5da8dSAndroid Build Coastguard Worker # 'unreadline()' operation 109*cda5da8dSAndroid Build Coastguard Worker self.linebuf = [] 110*cda5da8dSAndroid Build Coastguard Worker 111*cda5da8dSAndroid Build Coastguard Worker def open(self, filename): 112*cda5da8dSAndroid Build Coastguard Worker """Open a new file named 'filename'. This overrides both the 113*cda5da8dSAndroid Build Coastguard Worker 'filename' and 'file' arguments to the constructor.""" 114*cda5da8dSAndroid Build Coastguard Worker self.filename = filename 115*cda5da8dSAndroid Build Coastguard Worker self.file = io.open(self.filename, 'r', errors=self.errors) 116*cda5da8dSAndroid Build Coastguard Worker self.current_line = 0 117*cda5da8dSAndroid Build Coastguard Worker 118*cda5da8dSAndroid Build Coastguard Worker def close(self): 119*cda5da8dSAndroid Build Coastguard Worker """Close the current file and forget everything we know about it 120*cda5da8dSAndroid Build Coastguard Worker (filename, current line number).""" 121*cda5da8dSAndroid Build Coastguard Worker file = self.file 122*cda5da8dSAndroid Build Coastguard Worker self.file = None 123*cda5da8dSAndroid Build Coastguard Worker self.filename = None 124*cda5da8dSAndroid Build Coastguard Worker self.current_line = None 125*cda5da8dSAndroid Build Coastguard Worker file.close() 126*cda5da8dSAndroid Build Coastguard Worker 127*cda5da8dSAndroid Build Coastguard Worker def gen_error(self, msg, line=None): 128*cda5da8dSAndroid Build Coastguard Worker outmsg = [] 129*cda5da8dSAndroid Build Coastguard Worker if line is None: 130*cda5da8dSAndroid Build Coastguard Worker line = self.current_line 131*cda5da8dSAndroid Build Coastguard Worker outmsg.append(self.filename + ", ") 132*cda5da8dSAndroid Build Coastguard Worker if isinstance(line, (list, tuple)): 133*cda5da8dSAndroid Build Coastguard Worker outmsg.append("lines %d-%d: " % tuple(line)) 134*cda5da8dSAndroid Build Coastguard Worker else: 135*cda5da8dSAndroid Build Coastguard Worker outmsg.append("line %d: " % line) 136*cda5da8dSAndroid Build Coastguard Worker outmsg.append(str(msg)) 137*cda5da8dSAndroid Build Coastguard Worker return "".join(outmsg) 138*cda5da8dSAndroid Build Coastguard Worker 139*cda5da8dSAndroid Build Coastguard Worker def error(self, msg, line=None): 140*cda5da8dSAndroid Build Coastguard Worker raise ValueError("error: " + self.gen_error(msg, line)) 141*cda5da8dSAndroid Build Coastguard Worker 142*cda5da8dSAndroid Build Coastguard Worker def warn(self, msg, line=None): 143*cda5da8dSAndroid Build Coastguard Worker """Print (to stderr) a warning message tied to the current logical 144*cda5da8dSAndroid Build Coastguard Worker line in the current file. If the current logical line in the 145*cda5da8dSAndroid Build Coastguard Worker file spans multiple physical lines, the warning refers to the 146*cda5da8dSAndroid Build Coastguard Worker whole range, eg. "lines 3-5". If 'line' supplied, it overrides 147*cda5da8dSAndroid Build Coastguard Worker the current line number; it may be a list or tuple to indicate a 148*cda5da8dSAndroid Build Coastguard Worker range of physical lines, or an integer for a single physical 149*cda5da8dSAndroid Build Coastguard Worker line.""" 150*cda5da8dSAndroid Build Coastguard Worker sys.stderr.write("warning: " + self.gen_error(msg, line) + "\n") 151*cda5da8dSAndroid Build Coastguard Worker 152*cda5da8dSAndroid Build Coastguard Worker def readline(self): 153*cda5da8dSAndroid Build Coastguard Worker """Read and return a single logical line from the current file (or 154*cda5da8dSAndroid Build Coastguard Worker from an internal buffer if lines have previously been "unread" 155*cda5da8dSAndroid Build Coastguard Worker with 'unreadline()'). If the 'join_lines' option is true, this 156*cda5da8dSAndroid Build Coastguard Worker may involve reading multiple physical lines concatenated into a 157*cda5da8dSAndroid Build Coastguard Worker single string. Updates the current line number, so calling 158*cda5da8dSAndroid Build Coastguard Worker 'warn()' after 'readline()' emits a warning about the physical 159*cda5da8dSAndroid Build Coastguard Worker line(s) just read. Returns None on end-of-file, since the empty 160*cda5da8dSAndroid Build Coastguard Worker string can occur if 'rstrip_ws' is true but 'strip_blanks' is 161*cda5da8dSAndroid Build Coastguard Worker not.""" 162*cda5da8dSAndroid Build Coastguard Worker # If any "unread" lines waiting in 'linebuf', return the top 163*cda5da8dSAndroid Build Coastguard Worker # one. (We don't actually buffer read-ahead data -- lines only 164*cda5da8dSAndroid Build Coastguard Worker # get put in 'linebuf' if the client explicitly does an 165*cda5da8dSAndroid Build Coastguard Worker # 'unreadline()'. 166*cda5da8dSAndroid Build Coastguard Worker if self.linebuf: 167*cda5da8dSAndroid Build Coastguard Worker line = self.linebuf[-1] 168*cda5da8dSAndroid Build Coastguard Worker del self.linebuf[-1] 169*cda5da8dSAndroid Build Coastguard Worker return line 170*cda5da8dSAndroid Build Coastguard Worker 171*cda5da8dSAndroid Build Coastguard Worker buildup_line = '' 172*cda5da8dSAndroid Build Coastguard Worker 173*cda5da8dSAndroid Build Coastguard Worker while True: 174*cda5da8dSAndroid Build Coastguard Worker # read the line, make it None if EOF 175*cda5da8dSAndroid Build Coastguard Worker line = self.file.readline() 176*cda5da8dSAndroid Build Coastguard Worker if line == '': 177*cda5da8dSAndroid Build Coastguard Worker line = None 178*cda5da8dSAndroid Build Coastguard Worker 179*cda5da8dSAndroid Build Coastguard Worker if self.strip_comments and line: 180*cda5da8dSAndroid Build Coastguard Worker 181*cda5da8dSAndroid Build Coastguard Worker # Look for the first "#" in the line. If none, never 182*cda5da8dSAndroid Build Coastguard Worker # mind. If we find one and it's the first character, or 183*cda5da8dSAndroid Build Coastguard Worker # is not preceded by "\", then it starts a comment -- 184*cda5da8dSAndroid Build Coastguard Worker # strip the comment, strip whitespace before it, and 185*cda5da8dSAndroid Build Coastguard Worker # carry on. Otherwise, it's just an escaped "#", so 186*cda5da8dSAndroid Build Coastguard Worker # unescape it (and any other escaped "#"'s that might be 187*cda5da8dSAndroid Build Coastguard Worker # lurking in there) and otherwise leave the line alone. 188*cda5da8dSAndroid Build Coastguard Worker 189*cda5da8dSAndroid Build Coastguard Worker pos = line.find("#") 190*cda5da8dSAndroid Build Coastguard Worker if pos == -1: # no "#" -- no comments 191*cda5da8dSAndroid Build Coastguard Worker pass 192*cda5da8dSAndroid Build Coastguard Worker 193*cda5da8dSAndroid Build Coastguard Worker # It's definitely a comment -- either "#" is the first 194*cda5da8dSAndroid Build Coastguard Worker # character, or it's elsewhere and unescaped. 195*cda5da8dSAndroid Build Coastguard Worker elif pos == 0 or line[pos-1] != "\\": 196*cda5da8dSAndroid Build Coastguard Worker # Have to preserve the trailing newline, because it's 197*cda5da8dSAndroid Build Coastguard Worker # the job of a later step (rstrip_ws) to remove it -- 198*cda5da8dSAndroid Build Coastguard Worker # and if rstrip_ws is false, we'd better preserve it! 199*cda5da8dSAndroid Build Coastguard Worker # (NB. this means that if the final line is all comment 200*cda5da8dSAndroid Build Coastguard Worker # and has no trailing newline, we will think that it's 201*cda5da8dSAndroid Build Coastguard Worker # EOF; I think that's OK.) 202*cda5da8dSAndroid Build Coastguard Worker eol = (line[-1] == '\n') and '\n' or '' 203*cda5da8dSAndroid Build Coastguard Worker line = line[0:pos] + eol 204*cda5da8dSAndroid Build Coastguard Worker 205*cda5da8dSAndroid Build Coastguard Worker # If all that's left is whitespace, then skip line 206*cda5da8dSAndroid Build Coastguard Worker # *now*, before we try to join it to 'buildup_line' -- 207*cda5da8dSAndroid Build Coastguard Worker # that way constructs like 208*cda5da8dSAndroid Build Coastguard Worker # hello \\ 209*cda5da8dSAndroid Build Coastguard Worker # # comment that should be ignored 210*cda5da8dSAndroid Build Coastguard Worker # there 211*cda5da8dSAndroid Build Coastguard Worker # result in "hello there". 212*cda5da8dSAndroid Build Coastguard Worker if line.strip() == "": 213*cda5da8dSAndroid Build Coastguard Worker continue 214*cda5da8dSAndroid Build Coastguard Worker else: # it's an escaped "#" 215*cda5da8dSAndroid Build Coastguard Worker line = line.replace("\\#", "#") 216*cda5da8dSAndroid Build Coastguard Worker 217*cda5da8dSAndroid Build Coastguard Worker # did previous line end with a backslash? then accumulate 218*cda5da8dSAndroid Build Coastguard Worker if self.join_lines and buildup_line: 219*cda5da8dSAndroid Build Coastguard Worker # oops: end of file 220*cda5da8dSAndroid Build Coastguard Worker if line is None: 221*cda5da8dSAndroid Build Coastguard Worker self.warn("continuation line immediately precedes " 222*cda5da8dSAndroid Build Coastguard Worker "end-of-file") 223*cda5da8dSAndroid Build Coastguard Worker return buildup_line 224*cda5da8dSAndroid Build Coastguard Worker 225*cda5da8dSAndroid Build Coastguard Worker if self.collapse_join: 226*cda5da8dSAndroid Build Coastguard Worker line = line.lstrip() 227*cda5da8dSAndroid Build Coastguard Worker line = buildup_line + line 228*cda5da8dSAndroid Build Coastguard Worker 229*cda5da8dSAndroid Build Coastguard Worker # careful: pay attention to line number when incrementing it 230*cda5da8dSAndroid Build Coastguard Worker if isinstance(self.current_line, list): 231*cda5da8dSAndroid Build Coastguard Worker self.current_line[1] = self.current_line[1] + 1 232*cda5da8dSAndroid Build Coastguard Worker else: 233*cda5da8dSAndroid Build Coastguard Worker self.current_line = [self.current_line, 234*cda5da8dSAndroid Build Coastguard Worker self.current_line + 1] 235*cda5da8dSAndroid Build Coastguard Worker # just an ordinary line, read it as usual 236*cda5da8dSAndroid Build Coastguard Worker else: 237*cda5da8dSAndroid Build Coastguard Worker if line is None: # eof 238*cda5da8dSAndroid Build Coastguard Worker return None 239*cda5da8dSAndroid Build Coastguard Worker 240*cda5da8dSAndroid Build Coastguard Worker # still have to be careful about incrementing the line number! 241*cda5da8dSAndroid Build Coastguard Worker if isinstance(self.current_line, list): 242*cda5da8dSAndroid Build Coastguard Worker self.current_line = self.current_line[1] + 1 243*cda5da8dSAndroid Build Coastguard Worker else: 244*cda5da8dSAndroid Build Coastguard Worker self.current_line = self.current_line + 1 245*cda5da8dSAndroid Build Coastguard Worker 246*cda5da8dSAndroid Build Coastguard Worker # strip whitespace however the client wants (leading and 247*cda5da8dSAndroid Build Coastguard Worker # trailing, or one or the other, or neither) 248*cda5da8dSAndroid Build Coastguard Worker if self.lstrip_ws and self.rstrip_ws: 249*cda5da8dSAndroid Build Coastguard Worker line = line.strip() 250*cda5da8dSAndroid Build Coastguard Worker elif self.lstrip_ws: 251*cda5da8dSAndroid Build Coastguard Worker line = line.lstrip() 252*cda5da8dSAndroid Build Coastguard Worker elif self.rstrip_ws: 253*cda5da8dSAndroid Build Coastguard Worker line = line.rstrip() 254*cda5da8dSAndroid Build Coastguard Worker 255*cda5da8dSAndroid Build Coastguard Worker # blank line (whether we rstrip'ed or not)? skip to next line 256*cda5da8dSAndroid Build Coastguard Worker # if appropriate 257*cda5da8dSAndroid Build Coastguard Worker if (line == '' or line == '\n') and self.skip_blanks: 258*cda5da8dSAndroid Build Coastguard Worker continue 259*cda5da8dSAndroid Build Coastguard Worker 260*cda5da8dSAndroid Build Coastguard Worker if self.join_lines: 261*cda5da8dSAndroid Build Coastguard Worker if line[-1] == '\\': 262*cda5da8dSAndroid Build Coastguard Worker buildup_line = line[:-1] 263*cda5da8dSAndroid Build Coastguard Worker continue 264*cda5da8dSAndroid Build Coastguard Worker 265*cda5da8dSAndroid Build Coastguard Worker if line[-2:] == '\\\n': 266*cda5da8dSAndroid Build Coastguard Worker buildup_line = line[0:-2] + '\n' 267*cda5da8dSAndroid Build Coastguard Worker continue 268*cda5da8dSAndroid Build Coastguard Worker 269*cda5da8dSAndroid Build Coastguard Worker # well, I guess there's some actual content there: return it 270*cda5da8dSAndroid Build Coastguard Worker return line 271*cda5da8dSAndroid Build Coastguard Worker 272*cda5da8dSAndroid Build Coastguard Worker def readlines(self): 273*cda5da8dSAndroid Build Coastguard Worker """Read and return the list of all logical lines remaining in the 274*cda5da8dSAndroid Build Coastguard Worker current file.""" 275*cda5da8dSAndroid Build Coastguard Worker lines = [] 276*cda5da8dSAndroid Build Coastguard Worker while True: 277*cda5da8dSAndroid Build Coastguard Worker line = self.readline() 278*cda5da8dSAndroid Build Coastguard Worker if line is None: 279*cda5da8dSAndroid Build Coastguard Worker return lines 280*cda5da8dSAndroid Build Coastguard Worker lines.append(line) 281*cda5da8dSAndroid Build Coastguard Worker 282*cda5da8dSAndroid Build Coastguard Worker def unreadline(self, line): 283*cda5da8dSAndroid Build Coastguard Worker """Push 'line' (a string) onto an internal buffer that will be 284*cda5da8dSAndroid Build Coastguard Worker checked by future 'readline()' calls. Handy for implementing 285*cda5da8dSAndroid Build Coastguard Worker a parser with line-at-a-time lookahead.""" 286*cda5da8dSAndroid Build Coastguard Worker self.linebuf.append(line) 287